EP2451196A1 - Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three - Google Patents
Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three Download PDFInfo
- Publication number
- EP2451196A1 EP2451196A1 EP10306212A EP10306212A EP2451196A1 EP 2451196 A1 EP2451196 A1 EP 2451196A1 EP 10306212 A EP10306212 A EP 10306212A EP 10306212 A EP10306212 A EP 10306212A EP 2451196 A1 EP2451196 A1 EP 2451196A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- ambisonics
- order
- value
- coefficients
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/11—Application of ambisonics in stereophonic audio systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
Definitions
- the invention relates to a method and to an apparatus for generating and for decoding sound field data including Ambisonics sound field data of an order higher than three, wherein for encoding and for decoding different processing paths can be used.
- 2D presentations include formats like stereo or surround sound, and are based on audio container formats like WAV and BWF (Broadcast Wave Format).
- WAV Broadcast Wave Format
- the wave format WAV is described in Microsoft, "Multiple Channel Audio Data and WAVE Files", updated March 7,2007 , http://www.microsoft.com/whdc/device/audio/multichaud.mspx , and in http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats /WAVE/WAVE.html, last update 19 June 2006.
- WFS combines a high number of spherical sound sources for emulating plane waves from different directions. Therefore, a lot of loudspeakers or audio channels are required.
- a description contains a number of source signals as well as their specific positions.
- Ambisonics uses specific coefficients based on spherical harmonics for providing a sound field description that is independent from any specific loudspeaker set-up. This leads to a description which does not require information about loudspeaker positions during sound field recording or generation of synthetic scenes.
- the reproduction accuracy in an Ambisonics system can be modified by its order N .
- the 'higher-order Ambisonics' (HOA) description considers an order of more than one, and the focus in this application is on HOA.
- the number of required audio information channels can be determined for a 2D or a 3D system, because this depends on the number of spherical harmonic bases.
- 'mixed orders' have different orders in 2D (x-y plane only) and 3D (additionally z axis).
- the first-order B-Format uses three channels for 2D and four channels for 3D.
- the first-order B-Format is extended to the higher-order B-format. Depending on O a horizontal (2D), a full-sphere (3D), or a mixture sound field type description can be generated. By ignoring appropriate channels, this B-format is backward compatible, i.e. a 2D Ambisonics receiver is able to decode the 2D components from a 3D Ambisonics sound field.
- the extended B-format for HOA considers only orders up to three, which corresponds to 16 channels maximum.
- the older UHJ-format was introduced to enable mono and stereo compatibility.
- the G-format was introduced to reproduce sound scenarios in 5.1 environments.
- Wave FORMAT_EXTENSIBLE format is an extension of the above-mentioned WAV format.
- One application is the use of Ambisonics B-format in the WAVEX description: "Wave Format Extensible and the .amb suffix or WAVEX and Ambisonics", http://mchapman.com/amb/wavex .
- Wave-based audio format descriptions are used in different applications.
- An environment which is very important today and will become even more important in the future are internet applications based on Ethernet transmission protocols.
- a data structure for Ambisonics transmission that is able to use the above-mentioned B-format as well as additional features like the Ambisonics order and their coefficient's bit lengths in an efficient manner is not yet known to the applicant.
- RTP Real-Time Protocol
- This payload header extends the RTP header of Fig. 1 by a 2-octet extended sequence number and a 2-octet extended time stamp. Furthermore, one octet for flags and a reserved field, followed by a 3-octet SMPTE time stamp and a 4-octet offset value is proposed therein.
- the 32-bit aligned payload data is following the header data.
- a problem to be solved by the invention is to provide a data structure (i.e. a protocol layer) for 3D higher-order Ambisonics sound field description formats, which can be used for real-time transmission over Ethernet.
- This problem is solved by the encoding method disclosed in claim 1 and the decoding method disclosed in claim 3. Apparatuses which utilise these methods are disclosed in claims 2 and 4, respectively.
- the data structures described below facilitate real-time transmission of 3D sound field descriptions over Ethernet. From the content of additional metadata the transmitted 3D sound field can be adapted at receiver side to the available headphones or the number and positions of loudspeakers, for regular as well as for irregular set-ups. No regular loudspeaker set-ups including a large number of loudspeakers are required like in WFS.
- the sound quality level can be adapted to the available sound reproduction system, e.g. by mapping a 3D Ambisonics sound field description onto a 2D loudspeaker set-up.
- inventive data structure considers single microphones or microphone arrays as well as virtual acoustical sources with different accuracies and sample rates.
- moving sources i.e. sources with time-dependent spatial positions
- Ambisonics descriptions inherently.
- the Ambisonics header information level is adaptable between a simple and an encoder related mode.
- the latter one enables fast decoder modifications. This is useful especially for real-time applications.
- the proposed data structure is extendable for classical audio scene descriptions, i.e. sound sources and their positions.
- the inventive Ambisonics processing is based on linear operators, i.e. the Ambisonics channels data can be packed and transmitted singly or in an assembled manner as a matrix.
- the inventive encoding method is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said method including the steps:
- the inventive encoder apparatus is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said apparatus including:
- the inventive decoding method is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said method including the steps:
- the inventive decoder apparatus is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said apparatus including:
- a first step or multiplier 33 all s source signals x ( k ) at each sample time kT , i.e. virtual single sources as well as microphone array sources, are multiplied with a matrix ⁇ defined in Eq.(1).
- Fig. 3 shows a block diagram of an Ambisonics encoder for these four cases at production side. The required functions are represented by corresponding steps or stages in front of the transmission. All processing steps are clocked by a frequency that is made in stage 38 synchronous with the sample frequency 1/T.
- a controller 37 receives a mode selection signal and the value of order N , and controls an optional multiplexer 36 that receives the filter responses and the output signal of multiplier 33, and outputs the inventive data structure frames 39.
- Multiplier 33 represents a directional encoder providing corresponding coefficients and outputs the unfiltered vector data d ( k ), the order N value, and parameter Norm .
- An array response filter 42 ('Filter 1' in Fig. 4 ) only for the microphone sources data can be arranged at decoder side.
- the unfiltered vector data d ( k ), the order N value, and parameter Norm are assembled in a combiner 340 with radii data R S ( t ), and are fed to an optional multiplexer 36.
- Radii data R S ( t ) represent the distances of the audio sources of the S input signals x ( k ), and refer to microphones as well as to artificially generated virtual sound sources.
- the coefficients vector data d ( k ) pass through an array response filter 341 for the microphone sources (filter 2).
- the filtering compensates the microphone-array response and is based on Bessel or Hankel functions. Basically, the signals from the output vectors d ( k ) are filtered.
- the other inputs serve as parameters for the filter, e.g. parameter R is used for the term k * r .
- the filtering is relevant only for microphones that have the individual radius R m . Such radii are taken into consideration in the term k * r of the Bessel or Hankel functions. Normally, the amplitude response of the filter starts with a lowpass characteristic but increases for higher frequencies.
- the filtering is performed in dependency from the Ambisonics order N , the order n and the radii R m values, so as to compensate for non-linear frequency dependency.
- a subsequent normalisation step or stage 351 for spherical waves data provides filtered coefficients A ( k ). It is assumed that there is also a corresponding filter at reproduction side (filter 431 in Fig. 4 ).
- the filtered and normalised coefficients A ( k ), parameter Norm and the order N value are fed to multiplexer 36.
- the coefficients vector data d ( k ) pass through an array response filter 342 for the microphone sources (filter 3).
- the filtering is performed in dependency from said Ambisonics order N , said order n , the radii R m values and a radius R ref value representing the average radius R ref of the loudspeakers at decoder side as described in the below section "Radius R ref (RREF)", so as to compensate for non-linear frequency dependency.
- a filter for spherical waves data is also arranged at reproduction side. Then the average radius R ref of the loudspeakers has to be considered already in filter 342.
- a subsequent normalisation step or stage 352 for spherical waves data provides filtered coefficients A ( k ).
- Step/stage 352 can include a distance coding like that described in connection with Fig. 4 .
- the filtered coefficients A ( k ) from step/stage 352, parameter Norm , the order N value and radius value R ref are fed to multiplexer
- the coefficients vector data d ( k ) pass through an array response filter 343 for the microphone sources (filter 4).
- the filtering is performed in dependency from the Ambisonics order N , the radii R m values and a Plane Wave parameter.
- a subsequent normalisation step or stage 353 for plane waves data provides parameter Norm , the order N value and a flag for Plane Wave to multiplexer 36.
- the Ambisonics encoder can code the output signals 361 in any one of these paths, in any two of these paths, or in more than two of these paths.
- the normalisation steps or stages 351 to 353 can use a normalisation or scaling as described below in section "Ambisonics Normalisation/Scaling Format (ANSF)".
- the Ambisonics decoder depicted in Fig. 4 parses the incoming data data structures in a parser 41 in order to detect the case type and to provide the data for performing the appropriate functions.
- An example for such parser is disclosed in WO 2009/106637 A1 .
- Unfiltered vector data d ( k ), order value N , parameter Norm and each radii data R S ( t ) are parsed. These values pass through an array response filter 42 (Filter 1) for filtering (a filtering as described in Fig. 3 ) the received d ( k ) data under consideration of all radii R S ( t ).
- the resulting filtered coefficients A ( k ) are distance coded (DC) in a distance coding step or stage 431 for all loudspeaker radii R LS and order N , and pass thereafter together with loudspeaker direction values ⁇ l ( representing the directions of the LS loudspeakers 46 ), value N and parameter Norm through an optional multiplexer 44 to a panning or pseudo inverse step or stage 45.
- Distance coding means taking into account Bessel or Hankel functions with radii parameter in term k * r for plane or spherical waves.
- Filtered coefficients A ( k ), parameter Norm and order value N are parsed.
- the filtered coefficients A ( k ) are distance coded (DC) in a distance coding step or stage 432 for all loudspeaker radii R LS and order N , and pass thereafter together with loudspeaker direction values ⁇ l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step or stage 45.
- Spherical waves on AE and AD sides are assumed.
- Filtered coefficients A ( k ), order value N , parameter Norm and radius value R ref are parsed.
- the filtered coefficients A ( k ) are distance coded (DC) in a distance coding step or stage 432 for all loudspeaker radii R LS and order N under consideration of radius R ref , and pass thereafter together with loudspeaker direction values ⁇ l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step or stage 45.
- Spherical waves on AE and AD sides are assumed.
- Filtered coefficients A ( k ), order value N , parameter Norm and a flag for Plane Waves are parsed.
- the filtered coefficients A ( k ) together with loudspeaker direction values ⁇ l , value N and parameter Norm pass through multiplexer 44 to the panning or pseudo inverse step or stage 45. Plane waves on AE and AD sides are assumed.
- a mode selector 47 selects in multiplexer 44 the corresponding path or paths a) to d) which was or were used at encoder side.
- Decoder 45 which represents a panning or a mode matching operation including pseudo inverse, inverts the matrix ⁇ operation in the Ambisonics encoder in Fig. 3 , and applies this operation to the filtered coefficients A ( k ) or the filtered and distance coded coefficients A '( k ), respectively, in dependency from the parameter Norm , order value N and the loudspeaker direction values ⁇ l , and provides the l loudspeaker signals for a loudspeaker array 46.
- Parser 41 also provides synchronisation information that is used for re-synchronisation of a clock 48.
- the invention specifies a packet-based streaming format for encapsulating spatial sound field descriptions based on Ambisonics into an extended real-time transport protocol, in particular RTP, for real-time streaming of spatial audio scenes.
- RTP extended real-time transport protocol
- the focus is on a standalone spatial (2D/3D) audio real-time application, e.g. a transmission of a live concert or a live sport event via IP. This requires a specific spatial audio layer including time stamps and possibly synchronisation information.
- the Ambisonics real-time stream can be used together with an RTP layer.
- alternative RTP layers with or without extended headers are described below.
- EASF Extended Ambisonics streaming format
- Ethernet transmissions are performed in data packets with a typical packet length called 'path MTU' with up to 1500 or 9000 bytes.
- 'path MTU' a typical packet length
- 'frames' Such frame represents a dedicated time interval within which a typical number of packets is transmitted.
- a frame For example in video applications, in 1080p video mode a frame contains 1080 data packets of which each one describes one line of a complete video frame.
- a transmission should be frame based.
- Case 1 requires a transmission of each time-dependent radii R S ( t ). This is an option if filter processing is to be performed in the decoder. However, in the following section the focus is on Cases 2-4 in which the filtered coefficients A ( k ) are transmitted. This allows a higher bandwidth because the transmission remains independent from all source positions, i.e. this is suited more for Ambisonics.
- the protocol For standalone audio transmission, the protocol contains the following header data structure.
- Payload Type 7 bits
- the payload type is defined for an Audio standalone transmission as EASF.
- EASF audio standalone transmission
- the film format is chosen, e.g. DPX.
- Sequence Number 16 bits The LSB bits for the sequence number. It increments by one for each RTP data packet sent, and may be used by the receiver for detecting packet loss and for restoring the packet sequence. The initial value of the sequence number is random (i.e. unpredictable) in order to make known-plaintext attacks on encryption more difficult.
- Timestamp 32 bits
- the timestamp denotes the sampling instant of the frame to which the RTP packet belongs. Packets belonging to the same frame must have the same timestamp.
- RTP payload header extension According to the invention, the fields of the known RTP header keep their usual meaning, but that header is amended as follows: RTP Payload Frame Status (PLFS) - 2 bit The frame status describes which type of data will follow the extended RTP header in the payload block: PLFS code Payload type 00 Ambisonics coefficients 01 Frame end (+ Ambisonics coefficients) 10 Frame begin (+ Metadata) 11 Metadata I.e., in the first packet of a frame, instead of audio data, additional metadata can be transmitted. In case of Ambisonics transmission, the metadata contains source and Ambisonics encoder related information (production side information) required for the decoding process.
- Time Code/Sync Frequency (TCSF) - 30 bit unsigned integer
- the following SMPTE time code or the synchronisation is based on a specific clock frequency, the Time Code/Sync Frequency TSCF.
- the TCSF is defined as a 30 bit integer field. The value is represented in Hz and leads to a frequency range from 0 to 1073.741824 MHz, wherein a value of 0 Hz is signalling that no time code is available.
- the selection in data field AST facilitates not only a separation within Ambisonics (cf. the example provided below in connection with Fig. 9 ) but also the parallel transmission of differently encoded audio source signals (Ambisonocs and/or PCM data + position data), i.e. the inventive protocol can be complemented e.g. for PCM data.
- the below-described SMPTE Time Code/Clock Sync Info (STCSI) facilitates the temporally correct assignment of the audio signal sources.
- the general Ambisonics header is transmitted only in the first data packet of a frame and the individual Ambisonics header is transmitted in all other data packets.
- the general Ambisonics header shall also be available in every data packet in front of the individual Ambisonics header. This mode enables a modification of the parameters in each data packet, i.e. in real-time. It can be useful for real-time applications where no or only small buffers are available. However, this mode decreases the available bandwidth.
- Different sources can generate audio signals at the same time.
- Known protocols are based on a separate transmission of the sound sources, i.e. every data frame refers to a single temporal section in which, depending on the sampling frequency, several samples can be contained. Therefore, in known protocols, different source signal occurring at the same time instant will use the same time stamp and the same frame number. This poses no problem for an offline processing, i.e. no real-time processing.
- the transmitted data are buffered and assembled later on. However, this does not work for real-time processing in which a small latency is demanded.
- the data field XAH facilitates a continued entrainment of the header, and the parser 41 in Fig. 4 can switch back and forth block-by-block (or Ethernet packet-by-packet or frame-by-frame) between different audio sources types.
- Distinguishing between general header and individual header facilitates a real-time adaptation.
- the value in the 24 bit field STCSI (see below) represents the SMPTE time code. If STS is set, field STCSI contains user-specific synchronisation information.
- the packet offset describes the distance in bytes between the first payload octet of the first data packet in a frame relative to the first payload octet in the current data packet.
- PAO(HIGH) represents the 32 MSBs and PAO(LOW) represents the 32 LSBs.
- Ambisonics payload data and Ambisonics header data shall be fragmented such that the resulting RTP data packet is smaller than the 'path MTU' mentioned above.
- the path MTU is a 'jumbo frame' of e.g. 9000 bytes.
- a small individual Ambisonics header is sent in front of each data packet.
- a general header contains source and encoder related information that can be useful for the Ambisonics decoder. It contains information that is valid for the all data packets within a frame, and for small frames and/or data packets it can be sent once at the beginning of a frame. Especially for real-time applications where the packet information is changing frequently, it can be advantageous to send the general header with each data packet.
- Table 1 AFT code Format 00 B-Format order 01 numerical upward 10 numerical downward 11 Reserved Degree n Order m Channel 0 0 W 1 1 X 1 -1 Y 1 0 Z 2 0 R 2 1 S 2 -1 T 2 2 U 2 -2 V 3 0 K 3 1 L 3 -1 M 3 2 N 3 -2 O 3 3 P 3 -3 Q
- the sequence of each matrix column in Eq.(1) from top to bottom represents a numerical upward order type.
- a degree value always starts with 0 and runs up to Ambisonics Order N .
- the sequence starts with lowest order - N and runs up to order + N .
- the downward type uses for each degree the reversed order.
- the Ambisonics order describes the quality of the Ambisonics en- and decoding via ⁇ .
- An order up to 255 should be sufficient.
- the order is distinguished in horizontal and vertical direction. In case of 2D, only AHO has a value greater than '0'.
- a mixed order can have different AHO and AVO values.
- Ambisonics Normalisation/Scaling Format (ANSF) - 3 bit Identifies different normalisation formats, typically used for Ambisonics.
- the normalisation corresponds to the orthogonality relationship between Y n m and Y n ⁇ m ⁇ * .
- additional normalisation principles e.g. Furse-Malham.
- the Furse-Malham formulation facilitates a normalisation of the coefficients to get maximum values of ⁇ 1, which yields an optimal dynamic range.
- the scaling factors are fixed over one frame. The scaling factors will be transmitted only once in front of the Ambisonics coefficients.
- ANF code Format 000 Orthonormal 001 Schmidt semi-normalised 010 4n normalised 011 Unnormalised 100 Furse-Malham 101 Dedicated scaling 11x Reserved
- the reference radius R ref value of the loudspeakers in mm is required in case of spherical waves.
- f audible frequencies
- speed of sound c 340 m/s.
- This code defines the word length as well as the format (integer/floating point) of the transmitted Ambisonics coefficients A ( k ).
- the sample format enables an adaptation to different value ranges.
- nine sample formats are predefined: ASF code Format 0000 Unsigned integer 8 bit 0001 Signed integer 8 bit 0010 Signed integer 16 bit 0011 Signed integer 24 bit 0100 Signed integer 32 bit 0101 Signed integer 64 bit 0110 Float 32 bit (binary single prec.) 0111 Float 64 bit (binary double prec.) 1000 Float 128 bit (binary quad prec.) 1001-1111 Reserved
- AIB If ASF is specified as an integer format, the number AIB of invalid bits can mask the lowest bits within the ASF integer. AIB is coded as 5 bit unsigned integer value, so that up to 31 bits can be marked as invalid. Valid bits start at MSB. Note that the word length of AIB is less than the ASF integer word length.
- the rate at which the input data x i ( k ) are sampled is coded as an unsigned integer.
- FSM If FSM is cleared, the following 31 bits for FS represent the file size in bytes. If FSM is set, FS represents the total number of data packets in the actual frame.
- the frame size number FS is to be interpreted in view of the FSM flag's value. Depending on the application, the frame size can vary from frame to frame.
- a 'frame' can contain several equal-length packets, wherein the last packet can have a different length that is described in the individual Ambisonics header. Every packet may use such a header for describing at the end lengths values that differ from prior packet lengths.
- bits in front of APL are reserved. This enables an extension of the individual header, e.g. by packet related flags, and a 32 bit alignment for the following Ambisonics coefficients.
- the maximum length is 65535.
- the payload data type is defined in the data field PLFS (RTP Payload Frame Status), cf. Fig. 5 .
- PLFS RTP Payload Frame Status
- cf. Fig. 5 the payload data type is defined in the data field PLFS (RTP Payload Frame Status), cf. Fig. 5 .
- 'pure' Ambisonics data or 'pure' metadata can be arranged.
- the transmission processing operates in a sequential manner, i.e. at each transmission clock step (which is totally different from the sampling rate) only 32 or 64 bits of a data packet can be dealt with.
- the number of considered Ambisonics samples in one data packet is related to one concatenated sample time or to a group of concatenated sample times.
- the following examples of payload data show different dimensions, orders, and Ambisonics coefficients based on the encoder/decoder cases 2 to 4 of Fig. 3 .
- the first index x of A( x , y ) describes the sequence number for a specific order, whereas the second index y stands for the sample time k in a data packet.
- SMPTE MXF and XML are pre-defined.
- AMT code Format 0x00 SMPTE MXF 0x80 XML 0x01-7F Rsrvd 0x81-0xFF Rsrvd
- This data field is followed by specific metadata. If possible the metadata descriptions should be kept simple in order to get only one metadata packet in the 'begin packet' of a frame. However, the packet length in bytes is the same as for Ambisonics coefficients. If the amount of metadata will exceed this packet length, the metadata has to be fragmented into several packets which shall be inserted between packets with Ambisonics coefficients. If the metadata amount in bytes in one packet is less than the regular packet length, the remaining packet bytes are to be padded with '0' or stuffing bits.
- the encapsulated CRC word at the end of each Ethernet packet should be used.
- the content addressable memories CAM detect all protocol data which will lead to a decision about how the received data are to be processed in the following steps or stages, and the registers REG store information about the length or the payload data.
- the parser evaluates the header data in a hierarchical manner and can be implemented in hardware or software, according to any real-time requirements.
- spherical waves SPW or plane waves PW e.g. the worldwide live broadcast of a concert in 3D format, wherein all receiving units are arranged in cinemas.
- the individual signals are to be transmitted separately so that a correct presentation can be facilitated.
- the parser can distinguish this and supply two separate 'distance coding' units with the corresponding data items.
- the inventive Ambisonics decoder depicted in Fig. 4 can process all these signals, whereas in the prior art several decoders would be required. I.e., the considering the Ambisonics wave type facilitates the advantages described above.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Abstract
Audio signal datastreams for 2D presentation are channel oriented. Due to 3D video in cinema and broadcasting, spatial or 3D audio becomes attractive. Ambisonics coding/decoding provides a sound field description that is independent from any specific loudspeaker set-up. The inventive system facilitates real-time transmission of Ambisonics of an order higher than '3' as well as single microphone signals. The transmitted 3D sound field can be adapted at receiver side to the available positions of loudspeakers. The Ambisonics header information level is adaptable between a simple and an encoder related mode enabling fast decoder modifications. The Ambisonics processing is based on linear operators, i.e. the Ambisonics channels data can be packed and transmitted singly or in an assembled manner as a matrix.
Description
- The invention relates to a method and to an apparatus for generating and for decoding sound field data including Ambisonics sound field data of an order higher than three, wherein for encoding and for decoding different processing paths can be used.
- Traditional audio data signal transport streams for 2D presentation are channel oriented. 2D presentations include formats like stereo or surround sound, and are based on audio container formats like WAV and BWF (Broadcast Wave Format). The wave format WAV is described in Microsoft, "Multiple Channel Audio Data and WAVE Files", updated March 7,2007, http://www.microsoft.com/whdc/device/audio/multichaud.mspx , and in http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats /WAVE/WAVE.html, last update 19 June 2006.
- Improved surround systems require an increasing number of loudspeakers or audio channels, which leads to an extension of these audio container formats.
- Due to the upcoming 3D video activities in cinema and broadcasting, spatial or 3D audio becomes more and more attractive. Nevertheless, descriptions of spatial audio scenes are significantly more complex than in existing 2D surround sound systems. Well-known descriptions are based on Wave Field Synthesis (WFS, cf.
WO2004/047485 A1 ) as well as on Ambisonics, which was already developed in the early 1970s: http://en.wikipedia.org/wiki/Ambisonics . - WFS combines a high number of spherical sound sources for emulating plane waves from different directions. Therefore, a lot of loudspeakers or audio channels are required. A description contains a number of source signals as well as their specific positions.
- Ambisonics, however, uses specific coefficients based on spherical harmonics for providing a sound field description that is independent from any specific loudspeaker set-up. This leads to a description which does not require information about loudspeaker positions during sound field recording or generation of synthetic scenes. The reproduction accuracy in an Ambisonics system can be modified by its order N. The 'higher-order Ambisonics' (HOA) description considers an order of more than one, and the focus in this application is on HOA.
- By that order the number of required audio information channels can be determined for a 2D or a 3D system, because this depends on the number of spherical harmonic bases. The number O of channels is for 2D: O=2·N+1, for 3D: O=(N+1)2. Besides true 2D or 3D cases, 'mixed orders' have different orders in 2D (x-y plane only) and 3D (additionally z axis).
- The first-order B-Format uses three channels for 2D and four channels for 3D. The first-order B-Format is extended to the higher-order B-format. Depending on O a horizontal (2D), a full-sphere (3D), or a mixture sound field type description can be generated. By ignoring appropriate channels, this B-format is backward compatible, i.e. a 2D Ambisonics receiver is able to decode the 2D components from a 3D Ambisonics sound field. The extended B-format for HOA considers only orders up to three, which corresponds to 16 channels maximum.
- The older UHJ-format was introduced to enable mono and stereo compatibility. The G-format was introduced to reproduce sound scenarios in 5.1 environments.
- However, all these existing formats do not consider orders of more than three.
- The Wave FORMAT_EXTENSIBLE format is an extension of the above-mentioned WAV format. One application is the use of Ambisonics B-format in the WAVEX description: "Wave Format Extensible and the .amb suffix or WAVEX and Ambisonics", http://mchapman.com/amb/wavex .
- As mentioned above, known Ambisonics formats do not consider orders of more than three.
- Wave-based audio format descriptions are used in different applications. An environment which is very important today and will become even more important in the future are internet applications based on Ethernet transmission protocols. However, a data structure for Ambisonics transmission that is able to use the above-mentioned B-format as well as additional features like the Ambisonics order and their coefficient's bit lengths in an efficient manner is not yet known to the applicant.
- Another aspect is that in case of B-format always plane waves are assumed for the sound sources. Even for a higher quality of the acoustic wave field reproduction, a more realistic view should emulate the sound sources as spherical waves. But spherical waves will introduce more complex frequency dependencies than plane waves.
- Furthermore, a transmission of video content is in many cases combined with audio content transmission. Existing streaming data structures, e.g. for cinema applications, consider 2D surround sound only, for example WAV or AIFF (Audio Interchange File Format).
- A combined real-time transmission of video and audio format that is based on an extended 'Real-Time Protocol' (RTP) has been published in H. Schulzrinne et al., "RFC3550-RTP: A Transport Protocol for Real-Time Applications", Columbia University, http://www.faqs.org/rfcs/rfc3550.html, July 2003, in particular sections 5.1 and 5.3.1. The standard RTP header uses 12 data octets (8-bit data fields) in every RTP packet as depicted in
Fig. 1 . InEP 1936908 A1 an extension for such RTP header is proposed, for additionally encapsulating an extended RTP header and DPX (Digital Moving-Picture Exchange) data, AIFF/BWF audio data, or metadata, as depicted inFig. 2 . - This payload header extends the RTP header of
Fig. 1 by a 2-octet extended sequence number and a 2-octet extended time stamp. Furthermore, one octet for flags and a reserved field, followed by a 3-octet SMPTE time stamp and a 4-octet offset value is proposed therein. The 32-bit aligned payload data is following the header data. - A problem to be solved by the invention is to provide a data structure (i.e. a protocol layer) for 3D higher-order Ambisonics sound field description formats, which can be used for real-time transmission over Ethernet. This problem is solved by the encoding method disclosed in
claim 1 and the decoding method disclosed inclaim 3. Apparatuses which utilise these methods are disclosed inclaims - The data structures described below facilitate real-time transmission of 3D sound field descriptions over Ethernet. From the content of additional metadata the transmitted 3D sound field can be adapted at receiver side to the available headphones or the number and positions of loudspeakers, for regular as well as for irregular set-ups. No regular loudspeaker set-ups including a large number of loudspeakers are required like in WFS.
- Advantageously, in the inventive transmission data structure the sound quality level can be adapted to the available sound reproduction system, e.g. by mapping a 3D Ambisonics sound field description onto a 2D loudspeaker set-up. Advantageously, the inventive format enables Ambisonics orders up to N =255, whereas known Ambisonics formats allow orders up to N =3 only.
- Further, the inventive data structure considers single microphones or microphone arrays as well as virtual acoustical sources with different accuracies and sample rates. Advantageously, moving sources (i.e. sources with time-dependent spatial positions) are considered in the Ambisonics descriptions inherently.
- The Ambisonics header information level is adaptable between a simple and an encoder related mode. The latter one enables fast decoder modifications. This is useful especially for real-time applications.
- The proposed data structure is extendable for classical audio scene descriptions, i.e. sound sources and their positions.
- Generally, the inventive Ambisonics processing is based on linear operators, i.e. the Ambisonics channels data can be packed and transmitted singly or in an assembled manner as a matrix.
- In principle, the inventive encoding method is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said method including the steps:
- receiving S input signals x (k) from a microphone array including M microphones, and/or from one or more virtual sound sources;
- multiplying said input signals x (k) with a matrix Ψ,
- processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:
- a) combining said coefficients vector data d (k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said S input signals x(k);
- b) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A(k), said parameter Norm and said order N value;
- c) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii Rm values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;
- d) based on plane waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;
- in case a processing took place in two or more of said paths, multiplexing the corresponding data;
- output of data frames including said provided data and values.
- In principle the inventive encoder apparatus is suited for generating sound field data including Ambisonics sound field data of an order higher than three, said apparatus including:
- means being adapted for multiplying S input signals x (k), which are received from a microphone array including M microphones and/or from one or more virtual sound sources, with a matrix Ψ,
- means being adapted for processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:
- a) combining said coefficients vector data d(k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said S input signals x(k);
- b) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm and said order N value;
- c) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;
- d) based on plane waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;
- a multiplexer means for multiplexing the corresponding data in case a processing took place in two or more of said paths, which multiplexer means provide data frames including said provided data and values.
- In principle, the inventive decoding method is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said method including the steps:
- parsing the incoming encoded data, determining the type or types a) to d) of said paths used for said encoding and providing the further data required for a decoding according to the encoding path type or types;
- performing a corresponding decoding processing for one or two or more of the paths a) to d):
- a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
and distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm; - b) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm; - c) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm; - d) based on plane waves, providing said filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves;
- a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
- in case a processing took place in two or more of said paths, multiplexing the corresponding data, wherein the selected path or paths are determined based on parameter Norm, order value N and said Plane Waves flag;
- decoding said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array.
- In principle the inventive decoder apparatus is suited for decoding sound field data that were encoded according to the above encoding method using one or two or more of said paths, said apparatus including:
- means being adapted for parsing the incoming encoded data, and for determining the type or types a) to d) of said paths used for said encoding and for providing the further data required for a decoding according to the encoding path type or types;
- means being adapted for performing a corresponding decoding processing for one or two or more of the paths a) to d):
- a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
and distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A'(k) together with loudspeaker direction values Ω l , value N and parameter Norm; - b) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm; - c) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm; - d) based on plane waves, providing said filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves;
- a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
- multiplexing means which, in case a processing took place in two or more of said paths, select the corresponding data to be combined, based on parameter Norm, order value N and said Plane Waves flag;
- decoding means which decode said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array.
- Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
- Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
- Fig. 1
- Known RTP header format;
- Fig. 2
- Known extended RTP header format encapsulating DPX data, audio data or metadata;
- Fig. 3
- Ambisonics encoder facilitating different applications at production side before Ambisonics coefficients and metadata are transmitted;
- Fig. 4
- Ambisonics decoder facilitating different applications at reproduction side following reception of Ambisonics coefficients and metadata;
- Fig. 5
- RTP payload header extension for Ambisonics data according to the invention;
- Fig. 6
- General Ambisonics data header;
- Fig. 7
- Individual Ambisonics data header;
- Fig. 8
- Ambisonics metadata;
- Fig. 9
- Ambisonics receiver parser.
- At first, different scenarios for sound recording or production as well as for reproduction are considered in order to derive the inventive Ethernet/IP based streaming data format. The description of these scenarios is based at production side on an Ambisonics encoder (AE) and at reproduction side on an Ambisonics decoder (AD).
- In an Ambisonics encoder as shown in
Fig. 3 there are two different kinds of possible input signals: - a
microphone array 31 including m microphones, i.e. real sound sources; - v
virtual sources 32, i.e. synthetic sounds. - For an HOA description of a source not only the time dependent source signal s(t) is required but also its position, which may move around and is time-dependent, too. The source position can be described by its spherical coordinates, i.e. the radius rS from the origin to the source and the angles (Θ S , Φ S ) = Ω S , where Θ S denotes the inclination and Φ S denotes the azimuth angle in the x,y plane.
- In a first step or
multiplier 33, all s source signals x (k) at each sample time kT, i.e. virtual single sources as well as microphone array sources, are multiplied with a matrix Ψ defined in Eq.(1). - Matrix Ψ with O rows and S columns performs a direction coding because Ψ contains the spherical harmonics
range 0,...,N, and the values of m are running from -N to +N. - Matrix Ψ is used to output a vector d (k) of N Ambisonics signals for every sample time instant k, as defined in Eq. (2) and Eq. (3) :
These signals are representing the complete sound field description that has to be transmitted to the reproduction side. Vector d (k) contains the directional information only. However, the distances of all sources over a specific frequency range are to be considered, too, and the frequency behaviour or dependency is non-linear. Thereforeadditional filters - The pressure of a sound field p(r,Θ,Φ,k ω) can be calculated as follows:
- All these dependencies lead to the following four cases that are to be considered for an extended transmission of Ambisonics coefficients based on RTP.
Fig. 3 shows a block diagram of an Ambisonics encoder for these four cases at production side. The required functions are represented by corresponding steps or stages in front of the transmission. All processing steps are clocked by a frequency that is made instage 38 synchronous with thesample frequency 1/T. A controller 37 receives a mode selection signal and the value of order N, and controls anoptional multiplexer 36 that receives the filter responses and the output signal ofmultiplier 33, and outputs the inventive data structure frames 39.Multiplier 33 represents a directional encoder providing corresponding coefficients and outputs the unfiltered vector data d (k), the order N value, and parameter Norm. - An array response filter 42 ('Filter 1' in
Fig. 4 ) only for the microphone sources data can be arranged at decoder side. The unfiltered vector data d (k), the order N value, and parameter Norm are assembled in a combiner 340 with radii data RS (t), and are fed to anoptional multiplexer 36. Radii data RS (t) represent the distances of the audio sources of the S input signals x(k), and refer to microphones as well as to artificially generated virtual sound sources. - The coefficients vector data d (k) pass through an
array response filter 341 for the microphone sources (filter 2). The filtering compensates the microphone-array response and is based on Bessel or Hankel functions. Basically, the signals from the output vectors d (k) are filtered. The other inputs serve as parameters for the filter, e.g. parameter R is used for the term k*r. The filtering is relevant only for microphones that have the individual radius Rm . Such radii are taken into consideration in the term k*r of the Bessel or Hankel functions. Normally, the amplitude response of the filter starts with a lowpass characteristic but increases for higher frequencies. The filtering is performed in dependency from the Ambisonics order N, the order n and the radii Rm values, so as to compensate for non-linear frequency dependency. A subsequent normalisation step orstage 351 for spherical waves data provides filtered coefficients A (k). It is assumed that there is also a corresponding filter at reproduction side (filter 431 inFig. 4 ). The filtered and normalised coefficients A (k), parameter Norm and the order N value are fed tomultiplexer 36. - The coefficients vector data d (k) pass through an array response filter 342 for the microphone sources (filter 3). The filtering is performed in dependency from said Ambisonics order N, said order n, the radii Rm values and a radius Rref value representing the average radius Rref of the loudspeakers at decoder side as described in the below section "Radius Rref (RREF)", so as to compensate for non-linear frequency dependency. In case microphone signals are used, a filter for spherical waves data is also arranged at reproduction side. Then the average radius Rref of the loudspeakers has to be considered already in filter 342. A subsequent normalisation step or
stage 352 for spherical waves data provides filtered coefficients A (k). Step/stage 352 can include a distance coding like that described in connection withFig. 4 . The filtered coefficients A (k) from step/stage 352, parameter Norm, the order N value and radius value Rref are fed tomultiplexer 36. - The coefficients vector data d (k) pass through an
array response filter 343 for the microphone sources (filter 4). The filtering is performed in dependency from the Ambisonics order N, the radii Rm values and a Plane Wave parameter. A subsequent normalisation step orstage 353 for plane waves data provides parameter Norm, the order N value and a flag for Plane Wave to multiplexer 36. - The Ambisonics encoder can code the output signals 361 in any one of these paths, in any two of these paths, or in more than two of these paths.
The normalisation steps orstages 351 to 353 can use a normalisation or scaling as described below in section "Ambisonics Normalisation/Scaling Format (ANSF)". - Following transmission of the values mentioned above, e.g. via an Ethernet connection, at reproduction side the Ambisonics decoder depicted in
Fig. 4 parses the incoming data data structures in aparser 41 in order to detect the case type and to provide the data for performing the appropriate functions. An example for such parser is disclosed inWO 2009/106637 A1 . - Unfiltered vector data d (k), order value N, parameter Norm and each radii data RS (t) are parsed. These values pass through an array response filter 42 (Filter 1) for filtering (a filtering as described in
Fig. 3 ) the received d (k) data under consideration of all radii RS (t). The resulting filtered coefficients A (k) are distance coded (DC) in a distance coding step or stage 431 for all loudspeaker radii RLS and order N, and pass thereafter together with loudspeaker direction values Ωl (representing the directions of the LS loudspeakers 46), value N and parameter Norm through an optional multiplexer 44 to a panning or pseudo inverse step orstage 45. Distance coding means taking into account Bessel or Hankel functions with radii parameter in term k*r for plane or spherical waves. Examples of distance coding are published in M.A.Poletti, "Three-Dimensional Surround Sound Systems Based on Sperical Harmonics", J.Audio Eng.Soc., vol.53, no.11, November 2005, e.g. in equations (31) and (32), and in J.Daniel, "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format", AES 23th Intl.Conf., Copenhagen, Denmark, 23-25 May 2003. - Filtered coefficients A (k), parameter Norm and order value N are parsed. The filtered coefficients A (k) are distance coded (DC) in a distance coding step or
stage 432 for all loudspeaker radii RLS and order N, and pass thereafter together with loudspeaker direction values Ω l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step orstage 45. Spherical waves on AE and AD sides are assumed. - Filtered coefficients A(k), order value N, parameter Norm and radius value Rref are parsed. The filtered coefficients A (k) are distance coded (DC) in a distance coding step or
stage 432 for all loudspeaker radii RLS and order N under consideration of radius Rref , and pass thereafter together with loudspeaker direction values Ω l , value N and parameter Norm through multiplexer 44 to the panning or pseudo inverse step orstage 45. Spherical waves on AE and AD sides are assumed. - Filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves are parsed. The filtered coefficients A (k) together with loudspeaker direction values Ω l , value N and parameter Norm pass through multiplexer 44 to the panning or pseudo inverse step or
stage 45. Plane waves on AE and AD sides are assumed. - Based on parameter Norm, order value N and the Plane Waves flag, a
mode selector 47 selects in multiplexer 44 the corresponding path or paths a) to d) which was or were used at encoder side.Decoder 45, which represents a panning or a mode matching operation including pseudo inverse, inverts the matrix Ψ operation in the Ambisonics encoder inFig. 3 , and applies this operation to the filtered coefficients A(k) or the filtered and distance coded coefficients A '(k), respectively, in dependency from the parameter Norm, order value N and the loudspeaker direction values Ω l , and provides the l loudspeaker signals for aloudspeaker array 46. The matrix Ψ operation is inverted for cases 1-3 byw l (k)=D·A '(k), and forcase 4 bywl (k)=D·A(k).Parser 41 also provides synchronisation information that is used for re-synchronisation of aclock 48. - The invention specifies a packet-based streaming format for encapsulating spatial sound field descriptions based on Ambisonics into an extended real-time transport protocol, in particular RTP, for real-time streaming of spatial audio scenes. The focus is on a standalone spatial (2D/3D) audio real-time application, e.g. a transmission of a live concert or a live sport event via IP. This requires a specific spatial audio layer including time stamps and possibly synchronisation information. The Ambisonics real-time stream can be used together with an RTP layer. In addition, alternative RTP layers with or without extended headers are described below.
- In general, for a spatial audio transmission a sound field description in Ambisonics can be used in which possible sound source positions are inherently encoded. An alternative is the transmission of the source signals together with their time-dependent or time-independent positions. A switching possibility between these two alternatives is provided, too, but the directly following section will focus on Ambisonics.
- Ethernet transmissions (e.g. via internet) are performed in data packets with a typical packet length called 'path MTU' with up to 1500 or 9000 bytes. In case Ambisonics sound fields are to be transmitted via Ethernet, such relatively small data packets are not large enough. Therefore, several packets can be combined in larger containers named 'frames'. Such frame represents a dedicated time interval within which a typical number of packets is transmitted. For example in video applications, in 1080p video mode a frame contains 1080 data packets of which each one describes one line of a complete video frame. Especially for real-time applications, even for audio (where low latency and low packet loss is important), a transmission should be frame based.
- Because Ambisonics supports a sound field description independent of positions but with an adaptable quality, different amounts of data per packet or frame are possible. However, the number of octets in a data packet shall always be the same within a frame, except the last packet. In principle, the RTP sequence number is to be incremented with each packet.
- With regard to
Fig. 3 andFig. 4 ,Case 1 requires a transmission of each time-dependent radii RS (t). This is an option if filter processing is to be performed in the decoder. However, in the following section the focus is on Cases 2-4 in which the filtered coefficients A (k) are transmitted. This allows a higher bandwidth because the transmission remains independent from all source positions, i.e. this is suited more for Ambisonics. - For standalone audio transmission, the protocol contains the following header data structure.
A standard RTP header (cf.Fig. 1 ) containing the following bit fields:
Version (V) - 2 bit
RTP Version (default is V=2)
Padding (P) - 1 bit
If set, a data packet will contain several additional padding bytes. These are always located at the end following the payload. The last padding byte contains a count of how many padding bytes are to be ignored.
Extension (X) - 1 bit
If set, the fixed header is followed by exactly one header extension.
CSRC count (CC) - 4 bit
The number of contributing source identifiers, following the fixed header.
Marker (M) - 1 bit
In general, the marker bit can be defined by a profile. Here, it signalises the end of a frame, i.e. it is set for the last data packet. For other packets it must be cleared.
Payload Type (PT) - 7 bits
The payload type is defined for an Audio standalone transmission as EASF. For a combined transmission with uncompressed video the film format is chosen, e.g. DPX.
Sequence Number - 16 bits
The LSB bits for the sequence number. It increments by one for each RTP data packet sent, and may be used by the receiver for detecting packet loss and for restoring the packet sequence. The initial value of the sequence number is random (i.e. unpredictable) in order to make known-plaintext attacks on encryption more difficult. The standard 16-bit sequence number is augmented with another 16 bits in the payload header in order to avoid problems due to wrap-around when operating at high data rates.
Timestamp - 32 bits
The timestamp denotes the sampling instant of the frame to which the RTP packet belongs. Packets belonging to the same frame must have the same timestamp.
RTP payload header extension
According to the invention, the fields of the known RTP header keep their usual meaning, but that header is amended as follows:
RTP Payload Frame Status (PLFS) - 2 bit
The frame status describes which type of data will follow the extended RTP header in the payload block:PLFS code Payload type 00 Ambisonics coefficients 01 Frame end (+ Ambisonics coefficients) 10 Frame begin (+ Metadata) 11 Metadata - Time Code/Sync Frequency (TCSF) - 30 bit unsigned integer The following SMPTE time code or the synchronisation is based on a specific clock frequency, the Time Code/Sync Frequency TSCF. In order to support a large range of frequencies, the TCSF is defined as a 30 bit integer field. The value is represented in Hz and leads to a frequency range from 0 to 1073.741824 MHz, wherein a value of 0 Hz is signalling that no time code is available.
- The transmission of audio content is possible in different modes. In form of Ambisonics sound field descriptions or sampled audio sources including their positions. The following table shows AST values and their meaning.
AST code Possible sources 00 Sound field 01 Sound sources + fixed positions 10 Sound sources + time dependent positions 11 Reserved - The selection in data field AST facilitates not only a separation within Ambisonics (cf. the example provided below in connection with
Fig. 9 ) but also the parallel transmission of differently encoded audio source signals (Ambisonocs and/or PCM data + position data), i.e. the inventive protocol can be complemented e.g. for PCM data. The below-described SMPTE Time Code/Clock Sync Info (STCSI) facilitates the temporally correct assignment of the audio signal sources. - The dimension in case of existing and extendable formats is described as follows:
ADIM code Dimension 0 2D 1 3D - If XAH is cleared, the general Ambisonics header is transmitted only in the first data packet of a frame and the individual Ambisonics header is transmitted in all other data packets.
- If XAH is set, the general Ambisonics header shall also be available in every data packet in front of the individual Ambisonics header. This mode enables a modification of the parameters in each data packet, i.e. in real-time. It can be useful for real-time applications where no or only small buffers are available. However, this mode decreases the available bandwidth.
- Different sources can generate audio signals at the same time. Known protocols are based on a separate transmission of the sound sources, i.e. every data frame refers to a single temporal section in which, depending on the sampling frequency, several samples can be contained. Therefore, in known protocols, different source signal occurring at the same time instant will use the same time stamp and the same frame number. This poses no problem for an offline processing, i.e. no real-time processing. The transmitted data are buffered and assembled later on. However, this does not work for real-time processing in which a small latency is demanded. In the inventive protocol, the data field XAH facilitates a continued entrainment of the header, and the
parser 41 inFig. 4 can switch back and forth block-by-block (or Ethernet packet-by-packet or frame-by-frame) between different audio sources types. - Distinguishing between general header and individual header facilitates a real-time adaptation.
- If STS is cleared, the value in the 24 bit field STCSI (see below) represents the SMPTE time code. If STS is set, field STCSI contains user-specific synchronisation information.
- Reserved bits for future applications concerning the SMPTE time code or clock synchronisation.
- Identifies the SMPTE time code (hh:mm:ss:frfr = 6:6:6:6 bit), or synchronisation information for the local clocks of each source and sink. That synchronisation information format is user-dependent. It appears that this kind of synchronisation has not been used before for Ambisonics- and video synchronisation.
- In a current frame the packet offset describes the distance in bytes between the first payload octet of the first data packet in a frame relative to the first payload octet in the current data packet. PAO(HIGH) represents the 32 MSBs and PAO(LOW) represents the 32 LSBs.
- The above known and extended RTP header data are depicted in
Fig. 5 . PAO(LOW) is followed by the Ambisonics payload data. - Ambisonics payload data and Ambisonics header data shall be fragmented such that the resulting RTP data packet is smaller than the 'path MTU' mentioned above. In case of 10GE transmission the path MTU is a 'jumbo frame' of e.g. 9000 bytes. There are two types of Ambisonics headers. A small individual Ambisonics header is sent in front of each data packet. A general header contains source and encoder related information that can be useful for the Ambisonics decoder. It contains information that is valid for the all data packets within a frame, and for small frames and/or data packets it can be sent once at the beginning of a frame. Especially for real-time applications where the packet information is changing frequently, it can be advantageous to send the general header with each data packet.
- The endianness used for the transmitted Ambisonics data.
AE code Dimension 0 Big Endian 1 Little Endian - Identifies the length of the complete header in byte.
- Traditionally, Ambisonics assumes that all audio sources and loudspeakers provide plane waves for modelling the sound field. A typical example is the B-format. However, an extended Ambisonics sound field description with higher quality requires also a modelling with spherical waves. Therefore, the AWT field considers both possibilities.
AWT code Dimension 0 Plane wave 1 Spherical wave - Identifies the sequence of how the Ambisonics coefficients are transmitted. Up to 4 order types can be addressed. The different formats depend on the order and indexing in Eq. (1), i.e. how the spherical harmonics are ordered in a column of W. The existing Ambisonics B-format uses a specific sequence of Ambisonics coefficients according to Table 1, wherein K to Z denotes known B-Format channels. In case of 3D the coefficients are transmitted from top to bottom in Table 1.
E.g. for degree n=2, the sequence will be WXYZRSTUV.Table 1 AFT code Format 00 B-Format order 01 numerical upward 10 numerical downward 11 Reserved Degree n Order m Channel 0 0 W 1 1 X 1 -1 Y 1 0 Z 2 0 R 2 1 S 2 -1 T 2 2 U 2 -2 V 3 0 K 3 1 L 3 -1 M 3 2 N 3 -2 O 3 3 P 3 -3 Q - As an alternative, the sequence of each matrix column in Eq.(1) from top to bottom represents a numerical upward order type. A degree value always starts with 0 and runs up to Ambisonics Order N. For each degree, the sequence starts with lowest order -N and runs up to order +N. The downward type uses for each degree the reversed order.
- The Ambisonics order describes the quality of the Ambisonics en- and decoding via Ψ. An order up to 255 should be sufficient. According to the audio dimension the order is distinguished in horizontal and vertical direction.
In case of 2D, only AHO has a value greater than '0'. A mixed order can have different AHO and AVO values. - For possible extension of order related issues, these reserved bits are considered in front of AHO and AVO.
- Ambisonics Normalisation/Scaling Format (ANSF) - 3 bit Identifies different normalisation formats, typically used for Ambisonics. The normalisation corresponds to the orthogonality relationship between
In case of dedicated scaling the scaling factors are fixed over one frame. The scaling factors will be transmitted only once in front of the Ambisonics coefficients.ANF code Format 000 Orthonormal 001 Schmidt semi-normalised 010 4n normalised 011 Unnormalised 100 Furse-Malham 101 Dedicated scaling 11x Reserved - The reference radius Rref value of the loudspeakers in mm is required in case of spherical waves. The maximal radius depends on the acoustic wave length λ which can be calculated from audible frequencies f (FLOW =20 Hz - fHI =20 kHz) and the speed of sound c =340 m/s. Thus for the radius Rref , values from 17.000 mm to 17 mm are required and a word length of 16 bit is sufficient for that.
- This code defines the word length as well as the format (integer/floating point) of the transmitted Ambisonics coefficients A (k). The sample format enables an adaptation to different value ranges. In the following table nine sample formats are predefined:
ASF code Format 0000 Unsigned integer 8 bit0001 Signed integer 8 bit0010 Signed integer 16 bit 0011 Signed integer 24 bit 0100 Signed integer 32 bit0101 Signed integer 64 bit 0110 Float 32 bit (binary single prec.)0111 Float 64 bit (binary double prec.) 1000 Float 128 bit (binary quad prec.) 1001-1111 Reserved - If ASF is specified as an integer format, the number AIB of invalid bits can mask the lowest bits within the ASF integer. AIB is coded as 5 bit unsigned integer value, so that up to 31 bits can be marked as invalid. Valid bits start at MSB. Note that the word length of AIB is less than the ASF integer word length.
- The rate at which the input data xi (k) are sampled. The value in Hz is coded as an unsigned integer.
- If FSM is cleared, the following 31 bits for FS represent the file size in bytes. If FSM is set, FS represents the total number of data packets in the actual frame.
- The frame size number FS is to be interpreted in view of the FSM flag's value. Depending on the application, the frame size can vary from frame to frame.
- As mentioned above, a frame represents a unit of several data packets. It is assumed that for uncompressed data all packets expect the last one will have the same length. Then the frame size in bytes can be calculated to #bytes per frame = (FS-1)*packet size + last packet size.
- Basic Ethernet applications do normally use MTU sizes of 1500 bytes. Modern 10 Gigabit Ethernet applications consider larger MTUs (e.g. 'jumbo frames' with 9000 to 16000 bytes). To enable data sets larger than 232 bytes (4GB), the frame size should be specified as a number of data packets. I.e., if a data packet contains 9000 bytes the maximum frame size would be greater than 35 Tbyte.
- The general Ambisonics data header in the Ambisonics payload data is depicted in
Fig. 6 . A 'frame' can contain several equal-length packets, wherein the last packet can have a different length that is described in the individual Ambisonics header. Every packet may use such a header for describing at the end lengths values that differ from prior packet lengths. - The bits in front of APL are reserved. This enables an extension of the individual header, e.g. by packet related flags, and a 32 bit alignment for the following Ambisonics coefficients.
- Defines the MTU length for each individual data packet in bytes. The maximum length is 65535.
- This individual Ambisonics header is depicted in
Fig. 7 . If applied, the two data fields RSRVD and APL will follow data field FS inFig. 6 . APL contains the length of the following Ethernet packet which contains payload data (Ambisonics components). - As mentioned above, the payload data type is defined in the data field PLFS (RTP Payload Frame Status), cf.
Fig. 5 . Following the individual Ambisonics header, and possibly the individual Ambisonics header, 'pure' Ambisonics data or 'pure' metadata can be arranged. - Due to the time dependency of the input samples x(kT)= x(k) and of the direction and radii RS (t), it is important to perform the Ambisonics encoding and decoding with regard to the specific sample time kT or even simpler at k.
- However, when considering a protocol based transmission, the transmission processing operates in a sequential manner, i.e. at each transmission clock step (which is totally different from the sampling rate) only 32 or 64 bits of a data packet can be dealt with. The number of considered Ambisonics samples in one data packet is related to one concatenated sample time or to a group of concatenated sample times.
- Normally, all Ambisonics coefficients have the same length across all data packets in a frame. However, if the general Ambisonics header is inserted in a normal data packet, the data parameters can be modified within a frame.
- The following examples of payload data show different dimensions, orders, and Ambisonics coefficients based on the encoder/
decoder cases 2 to 4 ofFig. 3 . The first index x of A(x,y) describes the sequence number for a specific order, whereas the second index y stands for the sample time k in a data packet.
Example 1: ADIM=1, AHO=AVO=3, ASF=2 Example 2: ADIM=1, AHO=AVO=2, ASF=3, AIB=2 Example 3: ADIM=1, AHO=AVO=2, ASF=4, AIB=7 Example 4: ADIM=1, AHO=AVO=1, ASF=4 Example 5: ADIM=1, AHO=AVO=1, ASF=7 - If PLFS is set to 102 = 210, metadata are transmitted instead of Ambisonics coefficients. For metadata different formats are existing, of which some are considered below. Thus in front of the concrete metadata content a metadata type defines the specific formats as depicted in
Fig. 8 . The first two data fields RSRVD and APL are like inFig. 7 . - The types SMPTE MXF and XML are pre-defined.
AMT code Format 0x00 SMPTE MXF 0x80 XML 0x01-7F Rsrvd 0x81-0xFF Rsrvd - Reserved bits for future applications concerning metadata.
- This data field is followed by specific metadata. If possible the metadata descriptions should be kept simple in order to get only one metadata packet in the 'begin packet' of a frame. However, the packet length in bytes is the same as for Ambisonics coefficients. If the amount of metadata will exceed this packet length, the metadata has to be fragmented into several packets which shall be inserted between packets with Ambisonics coefficients. If the metadata amount in bytes in one packet is less than the regular packet length, the remaining packet bytes are to be padded with '0' or stuffing bits.
- For channel coding purposes the encapsulated CRC word at the end of each Ethernet packet should be used.
- At the production side as shown in
Fig. 3 , three different Cases are considered by the above-mentioned data structure (i.e. three cases where A (k) data are transmitted and one case where d (k) data are transmitted). The question is how to detect the Ambisonics Encoding/Decoding mode at reproduction or receiver side. The Case chosen at production side can be derived inparser 41 inFig. 4 from the bit fields RREF and AFT. The following table shows the values for RREF and AFT and their meaning:Mode Payload data 2 filtered A (k), RREF=0, AFT= Spherical Wave 3 filtered A (k), RREF≠0, AFT= Spherical Wave 4 filtered A (k), AFT= B-format/Plane Wave RREF is obsolete figures 3 and4 , inFig. 9 theparser 41 of the Ambisonics decoder inFig. 4 is shown in more detail. For collecting corresponding data items from an Ambisonics data stream ADSTR, the parser can use registers REG and content addressable memories CAM. The content addressable memories CAM detect all protocol data which will lead to a decision about how the received data are to be processed in the following steps or stages, and the registers REG store information about the length or the payload data. The parser evaluates the header data in a hierarchical manner and can be implemented in hardware or software, according to any real-time requirements. - Several audio signals are generated and transmitted as spherical waves SPW or plane waves PW, e.g. the worldwide live broadcast of a concert in 3D format, wherein all receiving units are arranged in cinemas. In such case the individual signals are to be transmitted separately so that a correct presentation can be facilitated. By a corresponding arrangement of the protocol (Ambisonics Wave Type AWT described above) the parser can distinguish this and supply two separate 'distance coding' units with the corresponding data items. The inventive Ambisonics decoder depicted in
Fig. 4 can process all these signals, whereas in the prior art several decoders would be required. I.e., the considering the Ambisonics wave type facilitates the advantages described above.
Claims (8)
- Method for generating sound field data including Ambisonics sound field data of an order higher than three, said method including the steps:- receiving s input signals x (k) from a microphone array (31) including m microphones, and/or from one or more virtual sound sources (32);- multiplying (33) said input signals x (k) with a matrix Ψ,
so as to get coefficients vector data d (k) representing coded directional information of N Ambisonics signals for every sample time instant k;- processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:a) combining (340) said coefficients vector data d (k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said input signals x(k);b) based on spherical waves, array response filtering (341) said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising (351) for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm and said order N value;c) based on spherical waves, array response filtering (342) said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii Rm values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising (352) for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;d) based on plane waves, array response filtering (343) said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii Rm values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising (353) for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;- in case a processing took place in two or more of said paths, multiplexing (36) the corresponding data;- output (361) of data frames (39) including said provided data and values. - Apparatus for generating sound field data including Ambisonics sound field data of an order higher than three, said apparatus including:- means (33) being adapted for multiplying S input signals x (k), which are received from a microphone array (31) including m microphones and/or from one or more virtual sound sources (32), with a matrix Ψ,- means (340,341,351,342,352,343,353) being adapted for processing said coefficients vector data d (k), value N and parameter Norm in one or two or more of the following four paths:a) combining said coefficients vector data d (k), said value N and said parameter Norm with radii data RS representing the distances of the sources of said S input signals x(k);b) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N and radii Rm values, said radii Rm values representing individual microphone radii in a microphone array, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm and said order N value;c) based on spherical waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a radius Rref value, said radius Rref value representing a mean radius of loudspeakers arranged at decoder side, so as to compensate for non-linear frequency dependency, followed by normalising for spherical waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said radius Rref value;d) based on plane waves, array response filtering said coefficients vector data d (k) in dependency from said Ambisonics order N, said radii RM values and a Plane Wave parameter, so as to compensate for non-linear frequency dependency, followed by normalising for plane waves data, so as to provide filtered coefficients A (k), said parameter Norm, said order N value, and said Plane Wave parameter;- a multiplexer means (36) for multiplexing the corresponding data in case a processing took place in two or more of said paths, which multiplexer means provide data frames (39) including said provided data and values.
- Method for decoding sound field data that were encoded according to claim 1 using one or two or more of said paths, said method including the steps:- parsing (41) the incoming encoded data, determining the type or types a) to d) of said paths used for said encoding and providing the further data required for a decoding according to the encoding path type or types;- performing a corresponding decoding processing for one or two or more of the paths a) to d):a) based on spherical waves, filtering (42) the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k),
and distance coding (431) said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;b) based on spherical waves, distance coding (432) said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;c) based on spherical waves, distance coding (433) said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;d) based on plane waves, providing said filtered coefficients A (k), order value N, parameter Norm and a flag for Plane Waves;- in case a processing took place in two or more of said paths, multiplexing (44) the corresponding data, wherein the selected (47) path or paths are determined based on parameter Norm, order value N and said Plane Waves flag;- decoding (45) said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array (46). - Apparatus for decoding sound field data that were encoded according to claim 1 using one or two or more of said paths, said apparatus including:- means (41) being adapted for parsing the incoming encoded data, and for determining the type or types a) to d) of said paths used for said encoding and for providing the further data required for a decoding according to the encoding path type or types;- means (42,431,432,433) being adapted for performing a corresponding decoding processing for one or two or more of the paths a) to d) :a) based on spherical waves, filtering the received coefficients vector data d (k) in dependency from said radii data RS so as to provide filtered coefficients A (k), and distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;b) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and for all radii Rl of loudspeakers to be used for a presentation of the decoded signals,
and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;c) based on spherical waves, distance coding said filtered coefficients A (k) in dependency from said order value N and said radius value Rref for all radii Rl of loudspeakers to be used for a presentation of the decoded signals, and providing the distance encoded filtered coefficients A '(k) together with loudspeaker direction values Ω l , value N and parameter Norm;d) based on plane waves, providing said filtered coefficients A(k), order value N, parameter Norm and a flag for Plane Waves;- multiplexing means (44) which, in case a processing took place in two or more of said paths, select the corresponding data to be combined, based on parameter Norm, order value N and said Plane Waves flag;- decoding means (45) which decode said distance encoded filtered coefficients A'(k) or said filtered coefficients A(k), respectively, in dependency from said parameter Norm, said order value N and said loudspeaker direction values Ω l , so as to provide loudspeaker signals for a loudspeaker array (46). - Method according to claim 3, or apparatus according to claim 4, wherein said parser (41) includes registers (REG) and content addressable memories (CAM) for collecting data items from the decoder input data by evaluating header data in a hierarchical manner, and wherein said content addressable memories (CAM) detect all protocol data which will lead to a decision about how the received data are to be processed in the decoding, and said registers (REG) store data item length information and/or information about payload data.
- Method according to claim 5, or apparatus according to claim 5, wherein said parser (41) provides data for two or more individual audio signals by distinguishing Ambisonics plane wave and spherical wave types (AWT).
- Method according to claim one of claims 1, 3 and 5, or apparatus according to one of claims 2, 4 and 5, wherein said Ambisonics sound field data are transferred using Ethernet or internet or a protocol network.
- Data structure for Ambisonics audio signal data which can be encoded according to claim 1, said data structure including:- a data field determining plane wave and spherical wave Ambisonics;- a data field determining the Ambisonics order types B-Format order, numerical upward order, numerical downward order;- a data field determining the channel in dependency from the degree n and the order m;- a data field determining horizontal or vertical order of the coefficients in the Ambisonics matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10306212A EP2451196A1 (en) | 2010-11-05 | 2010-11-05 | Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10306212A EP2451196A1 (en) | 2010-11-05 | 2010-11-05 | Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2451196A1 true EP2451196A1 (en) | 2012-05-09 |
Family
ID=43585582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP10306212A Withdrawn EP2451196A1 (en) | 2010-11-05 | 2010-11-05 | Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP2451196A1 (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013545391A (en) * | 2010-11-05 | 2013-12-19 | トムソン ライセンシング | Data structure for higher-order ambisonics audio data |
WO2014012945A1 (en) * | 2012-07-16 | 2014-01-23 | Thomson Licensing | Method and device for rendering an audio soundfield representation for audio playback |
EP2733963A1 (en) | 2012-11-14 | 2014-05-21 | Thomson Licensing | Method and apparatus for facilitating listening to a sound signal for matrixed sound signals |
WO2014124261A1 (en) * | 2013-02-08 | 2014-08-14 | Qualcomm Incorporated | Signaling audio rendering information in a bitstream |
DE102013223201B3 (en) * | 2013-11-14 | 2015-05-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and device for compressing and decompressing sound field data of a region |
WO2015104166A1 (en) * | 2014-01-08 | 2015-07-16 | Thomson Licensing | Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field |
WO2015130765A1 (en) * | 2014-02-25 | 2015-09-03 | Qualcomm Incorporated | Order format signaling for higher-order ambisonic audio data |
EP3002960A1 (en) * | 2014-10-04 | 2016-04-06 | Patents Factory Ltd. Sp. z o.o. | System and method for generating surround sound |
US9483228B2 (en) | 2013-08-26 | 2016-11-01 | Dolby Laboratories Licensing Corporation | Live engine |
US9609452B2 (en) | 2013-02-08 | 2017-03-28 | Qualcomm Incorporated | Obtaining sparseness information for higher order ambisonic audio renderers |
CN106796794A (en) * | 2014-10-07 | 2017-05-31 | 高通股份有限公司 | The normalization of environment high-order ambiophony voice data |
US9883310B2 (en) | 2013-02-08 | 2018-01-30 | Qualcomm Incorporated | Obtaining symmetry information for higher order ambisonic audio renderers |
CN109448742A (en) * | 2012-12-12 | 2019-03-08 | 杜比国际公司 | The method and apparatus that the high-order ambiophony of sound field is indicated to carry out compression and decompression |
US10334387B2 (en) | 2015-06-25 | 2019-06-25 | Dolby Laboratories Licensing Corporation | Audio panning transformation system and method |
US10356484B2 (en) | 2013-03-15 | 2019-07-16 | Samsung Electronics Co., Ltd. | Data transmitting apparatus, data receiving apparatus, data transceiving system, method for transmitting data, and method for receiving data |
TWI666931B (en) * | 2013-03-15 | 2019-07-21 | 三星電子股份有限公司 | Data transmitting apparatus, data receiving apparatus and data transceiving system |
CN111460883A (en) * | 2020-01-22 | 2020-07-28 | 电子科技大学 | Video behavior automatic description method based on deep reinforcement learning |
CN112216292A (en) * | 2014-06-27 | 2021-01-12 | 杜比国际公司 | Method and apparatus for decoding a compressed HOA sound representation of a sound or sound field |
CN112908349A (en) * | 2014-06-27 | 2021-06-04 | 杜比国际公司 | Method and apparatus for determining a minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame |
US11234091B2 (en) | 2012-05-14 | 2022-01-25 | Dolby Laboratories Licensing Corporation | Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation |
US11962990B2 (en) | 2013-05-29 | 2024-04-16 | Qualcomm Incorporated | Reordering of foreground audio objects in the ambisonics domain |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004047485A1 (en) | 2002-11-21 | 2004-06-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio playback system and method for playing back an audio signal |
EP1936908A1 (en) | 2006-12-19 | 2008-06-25 | Deutsche Thomson OHG | Method, apparatus and data container for transferring high resolution audio/video data in a high speed IP network |
WO2009106637A1 (en) | 2008-02-28 | 2009-09-03 | Thomson Licensing | Hardware-based parser for packect-oriented protocols |
-
2010
- 2010-11-05 EP EP10306212A patent/EP2451196A1/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004047485A1 (en) | 2002-11-21 | 2004-06-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio playback system and method for playing back an audio signal |
EP1936908A1 (en) | 2006-12-19 | 2008-06-25 | Deutsche Thomson OHG | Method, apparatus and data container for transferring high resolution audio/video data in a high speed IP network |
WO2009106637A1 (en) | 2008-02-28 | 2009-09-03 | Thomson Licensing | Hardware-based parser for packect-oriented protocols |
Non-Patent Citations (5)
Title |
---|
"Spherical harmonics", 28 June 2011 (2011-06-28), XP002646194, Retrieved from the Internet <URL:http://en.wikipedia.org/wiki/Spherical_harmonics> [retrieved on 20110628] * |
J.DANIEL: "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format", AES 23RD INTERNATIONAL CONFERENCE, vol. 23, 23 May 2003 (2003-05-23), XP002647040 * |
J.DANIEL: "Spatial Sound Encoding Including Near Field Effect: Introducing Distance Coding Filters and a Viable, New Ambisonic Format", AES 23TH INTL.CONF., vol. 23, 25 May 2003 (2003-05-25) |
M.A.POLETTI: "Three-Dimensional Surround Sound Systems Based on Sperical Harmonics", J.AUDIO ENG.SOC., vol. 53, no. 11, November 2005 (2005-11-01) |
MICROSOFT, MULTIPLE CHANNEL AUDIO DATA AND WAVE FILES, 7 March 2007 (2007-03-07) |
Cited By (75)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013545391A (en) * | 2010-11-05 | 2013-12-19 | トムソン ライセンシング | Data structure for higher-order ambisonics audio data |
US9241216B2 (en) | 2010-11-05 | 2016-01-19 | Thomson Licensing | Data structure for higher order ambisonics audio data |
TWI823073B (en) * | 2012-05-14 | 2023-11-21 | 瑞典商杜比國際公司 | Method and apparatus for compressing and decompressing a higher order ambisonics signal representation and non-transitory computer readable medium |
US11234091B2 (en) | 2012-05-14 | 2022-01-25 | Dolby Laboratories Licensing Corporation | Method and apparatus for compressing and decompressing a Higher Order Ambisonics signal representation |
US11792591B2 (en) | 2012-05-14 | 2023-10-17 | Dolby Laboratories Licensing Corporation | Method and apparatus for compressing and decompressing a higher order Ambisonics signal representation |
CN107071685A (en) * | 2012-07-16 | 2017-08-18 | 杜比国际公司 | The method and apparatus for audio playback is represented for rendering audio sound field |
CN106658342A (en) * | 2012-07-16 | 2017-05-10 | 杜比国际公司 | Method and device for rendering an audio soundfield representation for audio playback |
EP4284026A3 (en) * | 2012-07-16 | 2024-02-21 | Dolby International AB | Method and device for rendering an audio soundfield representation |
WO2014012945A1 (en) * | 2012-07-16 | 2014-01-23 | Thomson Licensing | Method and device for rendering an audio soundfield representation for audio playback |
US11743669B2 (en) | 2012-07-16 | 2023-08-29 | Dolby Laboratories Licensing Corporation | Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield |
US11451920B2 (en) | 2012-07-16 | 2022-09-20 | Dolby Laboratories Licensing Corporation | Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield |
CN106658343B (en) * | 2012-07-16 | 2018-10-19 | 杜比国际公司 | Method and apparatus for rendering the expression of audio sound field for audio playback |
EP4013072A1 (en) * | 2012-07-16 | 2022-06-15 | Dolby International AB | Method and device for rendering an audio soundfield representation |
US10075799B2 (en) | 2012-07-16 | 2018-09-11 | Dolby Laboratories Licensing Corporation | Method and device for rendering an audio soundfield representation |
US10939220B2 (en) | 2012-07-16 | 2021-03-02 | Dolby Laboratories Licensing Corporation | Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield |
US10595145B2 (en) | 2012-07-16 | 2020-03-17 | Dolby Laboratories Licensing Corporation | Method and device for decoding a higher-order ambisonics (HOA) representation of an audio soundfield |
CN104584588B (en) * | 2012-07-16 | 2017-03-29 | 杜比国际公司 | The method and apparatus for audio playback is represented for rendering audio sound field |
CN106658343A (en) * | 2012-07-16 | 2017-05-10 | 杜比国际公司 | Method and device for rendering an audio sound field representation for audio playback |
CN104584588A (en) * | 2012-07-16 | 2015-04-29 | 汤姆逊许可公司 | Method and device for rendering an audio soundfield representation for audio playback |
CN107071685B (en) * | 2012-07-16 | 2020-02-14 | 杜比国际公司 | Method and apparatus for rendering an audio soundfield representation for audio playback |
US9712938B2 (en) | 2012-07-16 | 2017-07-18 | Dolby Laboratories Licensing Corporation | Method and device rendering an audio soundfield representation for audio playback |
US10306393B2 (en) | 2012-07-16 | 2019-05-28 | Dolby Laboratories Licensing Corporation | Method and device for rendering an audio soundfield representation |
CN107071686A (en) * | 2012-07-16 | 2017-08-18 | 杜比国际公司 | The method and apparatus for audio playback is represented for rendering audio sound field |
CN107071687A (en) * | 2012-07-16 | 2017-08-18 | 杜比国际公司 | The method and apparatus for audio playback is represented for rendering audio sound field |
CN106658342B (en) * | 2012-07-16 | 2020-02-14 | 杜比国际公司 | Method and apparatus for rendering an audio soundfield representation for audio playback |
CN107071687B (en) * | 2012-07-16 | 2020-02-14 | 杜比国际公司 | Method and apparatus for rendering an audio soundfield representation for audio playback |
CN107071686B (en) * | 2012-07-16 | 2020-02-14 | 杜比国际公司 | Method and apparatus for rendering an audio soundfield representation for audio playback |
US9961470B2 (en) | 2012-07-16 | 2018-05-01 | Dolby Laboratories Licensing Corporation | Method and device for rendering an audio soundfield representation |
US9723424B2 (en) | 2012-11-14 | 2017-08-01 | Dolby Laboratories Licensing Corporation | Making available a sound signal for higher order ambisonics signals |
EP2733963A1 (en) | 2012-11-14 | 2014-05-21 | Thomson Licensing | Method and apparatus for facilitating listening to a sound signal for matrixed sound signals |
WO2014075934A1 (en) | 2012-11-14 | 2014-05-22 | Thomson Licensing | Making available a sound signal for higher order ambisonics signals |
CN109448742A (en) * | 2012-12-12 | 2019-03-08 | 杜比国际公司 | The method and apparatus that the high-order ambiophony of sound field is indicated to carry out compression and decompression |
CN109448742B (en) * | 2012-12-12 | 2023-09-01 | 杜比国际公司 | Method and apparatus for compressing and decompressing higher order ambisonic representations of a sound field |
CN104981869B (en) * | 2013-02-08 | 2019-04-26 | 高通股份有限公司 | Audio spatial cue is indicated with signal in bit stream |
US9883310B2 (en) | 2013-02-08 | 2018-01-30 | Qualcomm Incorporated | Obtaining symmetry information for higher order ambisonic audio renderers |
US9609452B2 (en) | 2013-02-08 | 2017-03-28 | Qualcomm Incorporated | Obtaining sparseness information for higher order ambisonic audio renderers |
RU2661775C2 (en) * | 2013-02-08 | 2018-07-19 | Квэлкомм Инкорпорейтед | Transmission of audio rendering signal in bitstream |
WO2014124261A1 (en) * | 2013-02-08 | 2014-08-14 | Qualcomm Incorporated | Signaling audio rendering information in a bitstream |
CN104981869A (en) * | 2013-02-08 | 2015-10-14 | 高通股份有限公司 | Signaling audio rendering information in a bitstream |
US9870778B2 (en) | 2013-02-08 | 2018-01-16 | Qualcomm Incorporated | Obtaining sparseness information for higher order ambisonic audio renderers |
US10178489B2 (en) | 2013-02-08 | 2019-01-08 | Qualcomm Incorporated | Signaling audio rendering information in a bitstream |
TWI666931B (en) * | 2013-03-15 | 2019-07-21 | 三星電子股份有限公司 | Data transmitting apparatus, data receiving apparatus and data transceiving system |
US10356484B2 (en) | 2013-03-15 | 2019-07-16 | Samsung Electronics Co., Ltd. | Data transmitting apparatus, data receiving apparatus, data transceiving system, method for transmitting data, and method for receiving data |
US11962990B2 (en) | 2013-05-29 | 2024-04-16 | Qualcomm Incorporated | Reordering of foreground audio objects in the ambisonics domain |
US9483228B2 (en) | 2013-08-26 | 2016-11-01 | Dolby Laboratories Licensing Corporation | Live engine |
WO2015071148A1 (en) | 2013-11-14 | 2015-05-21 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and device for compressing and decompressing sound field data of an area |
DE102013223201B3 (en) * | 2013-11-14 | 2015-05-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and device for compressing and decompressing sound field data of a region |
EP4089675A1 (en) * | 2014-01-08 | 2022-11-16 | Dolby International AB | Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field |
US11211078B2 (en) | 2014-01-08 | 2021-12-28 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations |
CN111179951A (en) * | 2014-01-08 | 2020-05-19 | 杜比国际公司 | Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium |
CN111182443A (en) * | 2014-01-08 | 2020-05-19 | 杜比国际公司 | Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium |
CN111179955A (en) * | 2014-01-08 | 2020-05-19 | 杜比国际公司 | Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium |
US10714112B2 (en) | 2014-01-08 | 2020-07-14 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoded higher order Ambisonics representations |
US10147437B2 (en) | 2014-01-08 | 2018-12-04 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoding higher order ambisonics representations |
EP3648102A1 (en) * | 2014-01-08 | 2020-05-06 | Dolby International AB | Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field |
CN111028849A (en) * | 2014-01-08 | 2020-04-17 | 杜比国际公司 | Method and apparatus for decoding a bitstream comprising an encoded HOA representation, and medium |
CN111179955B (en) * | 2014-01-08 | 2024-04-09 | 杜比国际公司 | Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium |
CN111182443B (en) * | 2014-01-08 | 2021-10-22 | 杜比国际公司 | Method and apparatus for decoding a bitstream comprising an encoded HOA representation |
US10424312B2 (en) | 2014-01-08 | 2019-09-24 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations |
CN105981100A (en) * | 2014-01-08 | 2016-09-28 | 杜比国际公司 | Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field |
CN111028849B (en) * | 2014-01-08 | 2024-03-01 | 杜比国际公司 | Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium |
CN111179951B (en) * | 2014-01-08 | 2024-03-01 | 杜比国际公司 | Decoding method and apparatus comprising a bitstream encoding an HOA representation, and medium |
US11869523B2 (en) | 2014-01-08 | 2024-01-09 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations |
US11488614B2 (en) | 2014-01-08 | 2022-11-01 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoded Higher Order Ambisonics representations |
US9990934B2 (en) | 2014-01-08 | 2018-06-05 | Dolby Laboratories Licensing Corporation | Method and apparatus for improving the coding of side information required for coding a Higher Order Ambisonics representation of a sound field |
US10553233B2 (en) | 2014-01-08 | 2020-02-04 | Dolby Laboratories Licensing Corporation | Method and apparatus for decoding a bitstream including encoded higher order ambisonics representations |
WO2015104166A1 (en) * | 2014-01-08 | 2015-07-16 | Thomson Licensing | Method and apparatus for improving the coding of side information required for coding a higher order ambisonics representation of a sound field |
WO2015130765A1 (en) * | 2014-02-25 | 2015-09-03 | Qualcomm Incorporated | Order format signaling for higher-order ambisonic audio data |
CN112216292A (en) * | 2014-06-27 | 2021-01-12 | 杜比国际公司 | Method and apparatus for decoding a compressed HOA sound representation of a sound or sound field |
CN112908349A (en) * | 2014-06-27 | 2021-06-04 | 杜比国际公司 | Method and apparatus for determining a minimum number of integer bits required to represent non-differential gain values for compression of a representation of a HOA data frame |
EP3002960A1 (en) * | 2014-10-04 | 2016-04-06 | Patents Factory Ltd. Sp. z o.o. | System and method for generating surround sound |
CN106796794A (en) * | 2014-10-07 | 2017-05-31 | 高通股份有限公司 | The normalization of environment high-order ambiophony voice data |
US10334387B2 (en) | 2015-06-25 | 2019-06-25 | Dolby Laboratories Licensing Corporation | Audio panning transformation system and method |
CN111460883B (en) * | 2020-01-22 | 2022-05-03 | 电子科技大学 | Video behavior automatic description method based on deep reinforcement learning |
CN111460883A (en) * | 2020-01-22 | 2020-07-28 | 电子科技大学 | Video behavior automatic description method based on deep reinforcement learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2451196A1 (en) | Method and apparatus for generating and for decoding sound field data including ambisonics sound field data of an order higher than three | |
EP3175446B1 (en) | Audio processing systems and methods | |
TWI476761B (en) | Audio encoding method and system for generating a unified bitstream decodable by decoders implementing different decoding protocols | |
EP3800898B1 (en) | Data processor and transport of user control data to audio decoders and renderers | |
JP4787442B2 (en) | System and method for providing interactive audio in a multi-channel audio environment | |
CN111837182B (en) | Method and apparatus for generating or decoding a bitstream comprising an immersive audio signal | |
EP1949693B1 (en) | Method and apparatus for processing/transmitting bit-stream, and method and apparatus for receiving/processing bit-stream | |
JP7207447B2 (en) | Receiving device, receiving method, transmitting device and transmitting method | |
JP6908168B2 (en) | Receiver, receiver, transmitter and transmit method | |
JP7310849B2 (en) | Receiving device and receiving method | |
WO2020152394A1 (en) | Audio representation and associated rendering | |
CN106375778B (en) | Method for transmitting three-dimensional audio program code stream conforming to digital movie specification | |
JP6699564B2 (en) | Transmission device, transmission method, reception device, and reception method | |
KR101531510B1 (en) | Receiving system and method of processing audio data | |
CN114448955B (en) | Digital audio network transmission method, device, equipment and storage medium | |
WO2021255327A1 (en) | Managing network jitter for multiple audio streams |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20121110 |