WO2010105695A1 - Codage audio multicanaux - Google Patents

Codage audio multicanaux Download PDF

Info

Publication number
WO2010105695A1
WO2010105695A1 PCT/EP2009/053331 EP2009053331W WO2010105695A1 WO 2010105695 A1 WO2010105695 A1 WO 2010105695A1 EP 2009053331 W EP2009053331 W EP 2009053331W WO 2010105695 A1 WO2010105695 A1 WO 2010105695A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
dependent
signal
channels
independent
Prior art date
Application number
PCT/EP2009/053331
Other languages
English (en)
Inventor
Pasi Ojala
Jussi Virolainen
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/EP2009/053331 priority Critical patent/WO2010105695A1/fr
Publication of WO2010105695A1 publication Critical patent/WO2010105695A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/008Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding

Definitions

  • Embodiments of the present invention relate to multi channel audio coding. In particular, they relate to encoding and/or decoding and/or rendering multi channel audio.
  • Multi channel audio coding may be used for coding, for example, speech, music etc.
  • Multi-channel audio coding may be used, for example, for Digital Audio Broadcasting, Digital TV Broadcasting, Music download service, Streaming music service, Internet radio, teleconferencing, transmission of real time multimedia over packet switched network (e.g. Voice over IP, Multimedia Broadcast Multicast Service (MBMS) and Packet-switched streaming (PSS))
  • packet switched network e.g. Voice over IP, Multimedia Broadcast Multicast Service (MBMS) and Packet-switched streaming (PSS)
  • a method comprising: encoding at least a portion of an intermediate signal, formed by encoding at set of dependent input audio channels, and at least one independent input audio channel to produce an output signal.
  • the method may comprise: encoding at least one set of dependent input audio channels to produce the intermediate signal.
  • the method may further comprise: configuring an encoder based upon a configuration of the input audio channels.
  • the method may further comprise: monitoring a configuration of the input audio channels; and dynamically changing a configuration of the encoder when the configuration of the input audio channels changes.
  • the method may further comprise: obtaining information on the configuration of the input audio channels using a dependency discovery procedure.
  • the method may further comprise: obtaining information on the configuration of the input audio channels by: defining a reference audio channel; correlating audio channels against the reference audio channel; identifying audio channels with a higher correlation as dependent channels; and identifying audio channels with a lower correlation as independent channels.
  • the method may further comprise: outputting the information on the configuration of the input audio channels implicitly and/or explicitly.
  • a configuration of the input audio channels may be defined by the number of sets of dependent input audio channels, the number of input audio channels in each such set and the number of independent input audio channels.
  • the intermediate signal may comprise a downmixed signal and parameters, wherein the portion of the intermediate signal encoded to produce the output signal is the downmixed signal and the parameters are output.
  • the output signal may comprise a downmixed signal and parameters.
  • the output parameters may comprise interchannel differences including level difference and time/phase difference of dependent input audio channels.
  • the output parameters may comprise interchannel differences including level difference of independent input audio channels.
  • the method may further comprise: identifying that the intermediate signal is associated with the same acoustic space as the destination of the output signal; enabling encoding of the at least one independent input audio channel to produce an output signal and preventing encoding of the intermediate signal to produce the output signal.
  • the method may be performed by a computer program loaded into a processor.
  • a computer program product comprising machine readable instructions which when loaded into a processor control the processor to: encode at least a portion of an intermediate signal, formed by encoding at set of dependent input audio channels, and at least one independent input audio channel to produce an output signal.
  • an apparatus comprising: means for encoding at least a portion of an intermediate signal, formed by encoding at set of dependent input audio channels, and at least one independent input audio channel to produce an output signal.
  • the apparatus may further comprise means for encoding at least one set of dependent input audio channels to produce the intermediate signal.
  • an apparatus comprising: a first stage configured to encode at least one set of dependent input audio channels to produce an intermediate signal; and a second stage configured to encode at least a portion of an intermediate signal and at least one independent input audio channel to produce an output signal.
  • a method comprising: identifying an audio object, if any, that has originated from the same acoustic space as the destination of the encoded audio by, for example, prevent encoding at the second stage of the identified audio object.
  • Identifying the audio object may comprise identifying a current output audio channel; and identifying an audio object, if any, that has a high correlation with the current output audio channel. According to various, but not necessarily all, embodiments of the invention there is provided a method comprising: identifying a configuration of dependent and independent audio objects within a received audio signal; and controlling processing of the received audio signal by controlling processing of the audio objects.
  • a dependent audio object may comprise a plurality of audio channels that have a high inter-channel correlation and wherein the independent audio objects comprises an audio channel and wherein the there is low correlation between audio objects.
  • the processing may be dynamic and may change with a configuration of dependent and independent audio objects.
  • the method may comprise determining a configuration of received dependent and independent audio objects.
  • Controlling processing involves changing a configuration of the decoder to match a configuration of received dependent and independent audio objects.
  • the method may comprise: decoding the received signal to provide at least one decoded independent audio object and at least one encoded dependent audio object; and decoding the encoded dependent audio object to provide a decoded dependent object comprising a set of dependent audio channels
  • the method may comprise: controlling decoding at least in part in response to user control.
  • Decoding may respect integrity of audio objects.
  • Controlling processing may involve rendering each audio object independently.
  • the method may further comprise controlling rendering of audio objects at least in part in response to user control to enable spatial positioning of audio objects.
  • the received signal may comprise a downmix signal and side information.
  • Controlling processing of the received audio signal may comprise controlling creation or use of decoded audio objects.
  • Controlling processing of the received audio signal may comprise preventing an audio object originating from one acoustic space being rendered in that acoustic space.
  • Controlling processing of the received audio signal may comprise selecting of a subset of encoded dependent audio objects for decoding.
  • Controlling processing of the received audio signal may comprise selecting a sub-set of decoded audio objects for rendering.
  • the method may be performed using a computer program loaded into a processor.
  • a computer program product comprising machine readable instructions which when loaded into a processor control the processor to: identify a configuration of dependent and independent audio objects within a received audio signal; and control processing of the received audio signal by controlling processing of the audio objects.
  • an apparatus comprising: means for identifying a configuration of dependent and independent audio objects within a received audio signal; and means for controlling processing of the received audio signal by controlling processing of the audio objects.
  • a decoder apparatus comprising: a first stage configured to decode an input signal to produce at least an intermediate signal and at least one independent input audio channel; and a second stage configured to decode the intermediate signal to produce at least one set of dependent input audio channels.
  • Fig 1 schematically illustrates a system for multi-channel audio coding
  • Fig 2 schematically illustrates an example of the encoder apparatus as functional blocks
  • Fig 3 schematically illustrates components of a coder apparatus that may be used as an encoder apparatus and/or a decoder apparatus;
  • Fig 4 schematically illustrates a parametric encoder apparatus
  • Fig 5 schematically illustrates a encoding method
  • Fig 6 schematically illustrates a parametric encoder apparatus
  • Fig 7 schematically illustrates a distributed implementation of an encoder apparatus
  • Fig 8 schematically illustrates an encoder apparatus and a decoder apparatus which receives input signals from the encoder apparatus
  • Fig 9 schematically illustrates a decoder in which the multi-channel output of a synthesis block is mixed into a plurality of output audio channels
  • Fig 10 schematically illustrates an example of the decoder apparatus as functional blocks
  • Fig 1 1 schematically illustrates an example of the decoder apparatus as functional blocks
  • Fig 12 schematically illustrates a encoding method.
  • input audio channels are classified into either independent audio objects comprising a single input audio channel or dependent audio objects comprising a set of correlated input audio channels.
  • each set of correlated input audio channels may be encoded to form an encoded dependent audio object.
  • the independent audio objects and the encoded dependent audio objects may be encoded to form an encoder output signal.
  • the encoder output signal may be decoded to provide the independent audio objects and the encoded dependent audio objects.
  • the encoded dependent audio objects may be independently decoded to provide respective dependent audio objects that comprise sets of correlated audio channels.
  • the independent audio objects and the dependent audio objects may be independently rendered.
  • Fig 1 schematically illustrates a system 2 for multi-channel audio coding.
  • Multichannel audio coding may be used, for example, for Digital Audio Broadcasting, Digital TV Broadcasting, Music download service, Streaming music service, Internet radio, conversational applications, teleconferencing etc.
  • spatial audio coding is performed as different audio channels are associated with different microphones. There are two acoustic spaces and three microphones but it should be appreciated that there may be more acoustic spaces and more or less microphones.
  • system 2 is described in the context of teleconferencing, this should not be considered to limit application of the technology described. It may be used in other multi-channel audio coding scenarios.
  • a user 12 produces speech 14 that is captured by a microphone 16 and provided as an input audio signal to an encoder apparatus 4.
  • the signals provided by the microphone 16 represent an input audio channel 31.
  • a user 21 In a second acoustic space 20, a user 21 produces speech 23 that is captured by microphones 25 and 26 and provided as an input audio signal to the encoder apparatus 4.
  • the signals provided by the microphone 25 represent an input audio channel 33.
  • a user 22 produces speech 24 that is captured by microphones 26 and 25 and provided as an input audio signal to the encoder apparatus 4.
  • the signals provided by the microphone 26 represent an input audio channel 35.
  • the second acoustic space 20 is acoustically isolated from the first acoustic space 10 in that the input audio channel 31 is uncorrelated to the input audio channels 33, 35.
  • the input audio channels 33, 35 of the first acoustic space 10 are correlated.
  • the input audio signal from the first acoustic space 10 could be considered as an independent audio channel 31.
  • the input audio signals from the second acoustic space 20 could be considered as a set of dependent audio channels 33, 35.
  • the set is independent of the audio channel 31
  • the input audio channels 33, 35 and the input audio channel 31 which are independent, collectively form the input audio channels 37.
  • speech 14 or other ambient audio captured by the microphone 16 is not captured by either of the microphones 25, 26 and speech 23, 24 or other ambient audio captured by the microphones 25, 26 is not captured by the microphones 16.
  • speech 23 or other ambient audio captured by the microphone 25 is captured by the microphone 26 and speech 24 or other ambient audio captured by the microphone 26 is captured by the microphone 25.
  • the first acoustic space 10 and the second acoustic space 20 may share the same physical space, such as a room, or may be in different physical spaces such as different rooms, buildings etc
  • Dependent or correlated channels originate from the same shared acoustic space whereas independent or uncorrelated channels originate from a unique (unshared) acoustic space.
  • the acoustic spaces may be dynamic and vary in time.
  • the two users 21 , 22 and/or their microphones 25, 26 may move further away from each other losing dependency and splitting the second acoustic space into two acoustic spaces.
  • the signals captured by the microphones 16, 25, 26 are sent via an uplink communications channel 30, which may be a radio channel, to the encoder apparatus 4.
  • the encoder apparatus 4 encodes the various signals received via the multiple input audio channels and produces an output signal 34.
  • one teleconference client may provide the microphone 16 and one or more teleconference clients may provide the microphones 25, 35.
  • a teleconference bridge or server 32 may host all or part of the encoder apparatus 4 and provide the output signal 34 via an downlink communications channel, which may be a radio channel, to the various teleconference clients.
  • Different output signals 34 may be prepared and downloaded to different teleconference clients. For example, to avoid unnecessary echo, it may be advantageous not to forward the input audio signal received from a particular teleconference client back to the auditory space occupied by that teleconference client and from which the input audio signal originated.
  • the microphones 25, 26 may be microphones in a multi-microphone (MMIC) array (narrowly spaced microphones in fixed positions). In this case, the input audio signals provided by microphones 25, 26 are highly correlated with each other.
  • MMIC multi-microphone
  • the microphones 25, 26 may be a microphones dedicated to different users. Although the users have individual dedicated microphones to capture their speech, each dedicated microphone may receive audio from some of the other users as well. The input audio signals provided by the microphones 25, 26 are highly correlated with each other.
  • the encoder apparatus 4 may receive pre-recorded audio channels instead of receiving audio channels directly from microphones, like the microphones 16, 25 and 26 in the example of Fig 1 .
  • the pre-recorded audio signals may comprise dependent or independent channels.
  • dependent channels may be recorded in the same acoustic space to create inherently dependent audio channels, or dependent channels may comprise signal from independent audio sources to artificially form a set of dependent channels.
  • Fig 2 schematically illustrates an example of the encoder apparatus 4 as functional blocks.
  • the information concerning correlation/dependency is used to control encoding the multi-channel input audio signals.
  • the input audio channels are grouped into correlated/dependent channels and uncorrelated/independent channels.
  • the correlated/dependent channels are grouped into sets where each set is a set of channels that are mutually correlated but are uncorrelated with other sets. That is, there is intra-set correlation/dependency but not inter-set correlation/dependency.
  • the sets of correlated/dependent channels are encoded in a first stage to form encoded dependent audio objects.
  • the encoded dependent audio objects and the uncorrelated/independent channels are encoded in a second stage to form an output signal.
  • the first stage comprises one or more dependent source encoder apparatus 6, which if there is more than one are arranged in parallel.
  • a dependent source encoder apparatus 6 receives a plurality of input audio signals from one set of correlated input audio channels and outputs an intermediate signal 7.
  • the second stage comprises an independent source encoder apparatus 8.
  • the independent source encoder apparatus 8 receives a plurality of input audio signals from the uncorrelated/independent input audio channels and outputs the output signal 34.
  • the second stage is in cascade to the first stage.
  • the cascade arrangement is such that the first stage precedes the second stage in time but not necessarily in space as the same circuitry may be reused for each of the sets in the first stage and also for the second stage.
  • the Fig 2 depicts a logical structure, and the different stages may or may not be co- located. Separate stages could be located in different physical elements or the same encoder element could be used for each stage.
  • the configuration of the encoder apparatus 4 depends upon the configuration of the input audio channels- the number of sets of dependent input audio channels, the number of input audio channels in each set and the number of independent input audio channels.
  • the configuration of the encoder apparatus 4 is based upon knowledge of the dependency and independency of the input audio channels.
  • the configuration of the encoder apparatus is dynamically adapted. It changes with the configuration of the input audio channels.
  • Knowledge of the dependency and independency of the input audio channels may be obtained by the encoder apparatus 4 in a number of ways.
  • the encoder apparatus 4 may, for example, detect externally provided a priori information on the configuration of the input audio channels.
  • the information may for example, be context information.
  • Location information based for example on cell-id, GPS, Bluetooth neighbourhood, user identity and telephone number or the like, could be used to determine the probability of dependence.
  • the information may, for example, define a fixed configuration.
  • the information may, for example, be provided via a user input.
  • a user may be able to register at the conference bridge 32 knowledge of the presence of other participants in the same auditory space.
  • the encoder apparatus 4 may, for example, detect information on the configuration of the input audio channels.
  • the encoder apparatus 4 may detect automatically the dependency of input audio channels.
  • the encoder apparatus 4 automatically and continuously monitors the dependency between the input audio channels by correlation or by using some other similarity analysis method. This may not add any computational load to the encoder, since the inter-channel correlation may be determined as part of the encoding process anyway.
  • Determining whether a particular audio channel is correlated to a reference audio channel involves defining the particular audio channel, defining the reference audio channel; correlating the particular audio channel and the reference audio channel. If this cross-correlation is above a defined threshold then the particular audio channel is correlated to the reference channel. If this cross-correlation is not above a defined threshold then the particular audio channel is not correlated to the reference channel.
  • the dependency/independency information may be determined only when corresponding input audio channels are active simultaneously, or the corresponding signal levels are above certain predetermined threshold so that they should contain meaningful content.
  • the teleconference server 32 may perform a dependency discovery procedure.
  • the server may cause the teleconference clients, in a predetermined sequence, to emit a predetermined audio signal via a loudspeaker into an output audio channel. This may be achieved by including the predetermined audio signal within the output signal 34 sent to the teleconferencing client or it may be achieved by sending an indicator from the server to the client that indicates when and/or which one of a plurality of predetermined audio signals should be emitted by the client.
  • the server 32 is then able to detect the predetermined signal within the input audio channels. This allows the server to identify correlation between input audio channels and also to identify correlation between an output audio channel and input audio channels.
  • the predetermined audio signal is preferably but not necessarily inaudible to the human ear.
  • the dependency discovery process may be repeated according to a schedule or in response to a trigger event.
  • Information identifying the configuration of the encoder apparatus 4 or the configuration of the input audio channels is communicated to a client either implicitly or explicitly.
  • An explicit identification may, for example, involve sending the following information: the number N of independent input audio channels, the number S of sets of dependent input audio channels, the number M s of input audio channels in each set.
  • This may, for example, be sent in a terse predetermined format such as [N, S, M 1 , M 2 , ... M 5 ].
  • the explicit signaling may also contain explicit mapping of the audio channels in the sets.
  • the predetermined format may be written in binary bit field.
  • the configuration of dependent channels may also be signaled.
  • 6 independent channels
  • 3 dependent sets
  • N and S may be identified via the use of encoding parameters that have different characteristics for independent input audio channels and sets of dependent input audio channels.
  • Fig 3 schematically illustrates components of a coder apparatus that may be used as an encoder apparatus 4 and/or a decoder apparatus 80.
  • the coder apparatus may be an end-product or a module.
  • module' refers to a unit or apparatus that excludes certain parts/components that would be added by an end manufacturer or a user to form an end-product apparatus.
  • Implementation of a coder can be in hardware alone (a circuit, a processor%), have certain aspects in software including firmware alone or can be a combination of hardware and software (including firmware).
  • the coder may be implemented using instructions that enable hardware functionality, for example, by using executable computer program instructions in a general- purpose or special-purpose processor that may be stored on a computer readable storage medium (disk, memory etc) to be executed by such a processor.
  • a general- purpose or special-purpose processor that may be stored on a computer readable storage medium (disk, memory etc) to be executed by such a processor.
  • an encoder apparatus 4 comprises: a processor 40, a memory 42 and an input/output interface 44 such as, for example, a network adapter.
  • the processor 40 is configured to read from and write to the memory 42.
  • the processor 40 may also comprise an output interface via which data and/or commands are output by the processor 40 and an input interface via which data and/or commands are input to the processor 40.
  • the memory 42 stores a computer program 46 comprising computer program instructions that control the operation of the coder apparatus when loaded into the processor 40.
  • the computer program instructions 46 provide the logic and routines that enables the apparatus to perform the methods illustrated in Figs 5 and/or 12.
  • the processor 40 by reading the memory 42 is able to load and execute the computer program 46.
  • the computer program may arrive at the coder apparatus via any suitable delivery mechanism 48.
  • the delivery mechanism 48 may be, for example, a computer- readable storage medium, a computer program product, a memory device, a record medium such as a CD-ROM or DVD, an article of manufacture that tangibly embodies the computer program 46.
  • the delivery mechanism may be a signal configured to reliably transfer the computer program 46.
  • the coder apparatus may propagate or transmit the computer program 46 as a computer data signal.
  • the memory 42 is illustrated as a single component it may be implemented as one or more separate components some or all of which may be integrated/removable and/or may provide permanent/semi-permanent/ dynamic/cached storage.
  • references to 'computer-readable storage medium', 'computer program product', 'tangibly embodied computer program' etc. or a 'controller', 'computer', 'processor' etc. should be understood to encompass not only computers having different architectures such as single /multi- processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field- programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other devices.
  • References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device whether instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device etc.
  • Fig 4 schematically illustrates a encoder apparatus 4 operable as the dependent source encoder apparatus 6 and/or the independent source encoder apparatus 8.
  • the illustrated multichannel audio encoder apparatus 4 is, in this example, a parametric encoder that encodes according to a defined parametric model.
  • the model assumes correlated audio channels.
  • the parametric model is, in this example, a perceptual model that enables lossy compression and reduction of bandwidth.
  • the encoder apparatus 4 performs spatial audio coding using binaural cue coding (BCC) parameterisation.
  • BCC represent the original audio as a downmix signal comprising a reduced number of audio channels formed from the channels of the original signal, for example as a monophonic or as two channel (stereo) sum signal, along with a bit stream of parameters describing the spatial image.
  • a transformer 50 transforms the input audio signals (two or more input audio channels) in time-frequency domain using for example Fourier transform or Quadrature Mirror Filter (QMF) filterbank decomposition over discrete time frames.
  • QMF Quadrature Mirror Filter
  • An output from the transformer 50 is provided to audio scene analyser 54 which produces scene parameters 55.
  • the audio scene is analysed in the transform domain for example operating with transform domain coefficients or in time domain for example operating with signals in two or more subbands, and the corresponding parameterisation 55 is extracted and processed for transmission or storage for later consumption.
  • the audio scene analyser 54 performs BCC analysis. This may comprise computation of inter-channel level difference (ILD) and inter-channel time difference (ITD) parameters estimated within a transform domain time-frequency slot, i.e. in a frequency sub-band for an input frame.
  • ILD inter-channel level difference
  • ITD inter-channel time difference
  • IC inter-channel coherence
  • ILD, ITD and ICC parameters are determined for each time- frequency slot of the input signal, or a subset of frequency slots representing perceptually most important frequency components.
  • the ILD and ITD parameters may be determined between an input audio channel and a reference channel, typically between each input audio channel and a reference input audio channel.
  • the ICC is typically determined individually for each channel compared to reference channel.
  • the representation can be generalized to cover more than two input audio channels and/or a configuration using more than one downmix signal.
  • the inter-channel level difference (ILD) for each subband AL n is typically estimated as:
  • the inter-channel time difference i.e. the delay between the two input audio channels, may be determined in as follows
  • Equation (3) The normalised correlation of Equation (3) is actually the interchannel coherence (IC) parameter. It may be utilised for capturing the ambient components that are decorrelated with the sound components represented by phase and magnitude parameters in Equations (1 ) and (2).
  • IC interchannel coherence
  • BCC coefficients may be determined in Discrete Fourier Transform (DFT) domain.
  • DFT Discrete Fourier Transform
  • STFT windowed Short Time Fourier Transform
  • S n are the spectral coefficient two input audio channels L, R for subband n of the given analysis frame, respectively.
  • the transform domain ILD may be determined as in Equation (1 ) rt L * rt L
  • time difference may be more convenient to handle as an interchannel phase difference (ICPD)
  • Interchannel coherence may be computed in frequency domain using a computation quite similar to one used in time domain calculation in Equation (3):
  • Equations (5) - (7) in DFT domain may require significantly less computation, when the ICPD phase estimation of DFT domain spectral coefficients is used instead of the time domain ITD estimation using the correlation estimate.
  • the level and time/phase difference cues represent the dry surround sound components, i.e. they can be considered to model the sound source locations in space.
  • ILD and ITD cues represent surround sound panning coefficients.
  • the coherence cue is dependent upon the relation between coherent and decorrelated sounds.
  • the level of late reverberation of the sound sources e.g. due to the room effect, and the ambient sound distributed between the input channels may have a significant contribution to the perceived spatial audio sensation.
  • a downmixer 52 creates downmix signals as a combination of channels of the input signals.
  • the parameters describing the audio scene could also be used for additional processing of multi-channel input signal prior to or after the downmixing process, for example to eliminate the time difference between the channels in order to provide time-aligned audio across input channels.
  • the downmix signal is typically created as a linear combination of channels of the input signal in transform domain.
  • the downmix may be created simply by averaging the signals in left and right channels:
  • the left and right input channels could be weighted prior to combination in such a manner that the energy of the signal is preserved. This may be useful e.g. when the signal energy on one of the channels is close to zero.
  • the decoder does not need to be aware of the details of the process of generating the downmix signal(s).
  • the downmix method needs to be predetermined or otherwise signalled for the decoder. Otherwise, the conversion from a single ILD parameter to channel gains for left and right channels may not be possible.
  • Inverse transformer 56 produces downmixed audio signal 57 in the time domain.
  • the encoder may skip one transform.
  • the output of a multi-channel or binaural encoder typically comprises the downmix audio signal or signals 57 and the scene parameters 55, in this case the spatial parameters: inter-channel level difference (ILD), for example the rotation matrix representing the stereo panning coefficients, inter-channel phase difference (ICPD), for example the interchannel time difference (ITD), and the inter-channel correlation (IC).
  • ILD inter-channel level difference
  • ICPD inter-channel phase difference
  • ITD interchannel time difference
  • IC inter-channel correlation
  • Fig 5 schematically illustrates an encoding method 60 which may be performed by the encoding system illustrated in Fig 6.
  • the dependencies of the input audio channels are determined.
  • the dependency between two input audio channels is determined when both input audio channels are active simultaneously or the corresponding signal levels are above a predetermined level.
  • this block may be performed by the dependency/correlation determinator 70 which produces as output a configuration control signal 72.
  • this block may be performed within the encoder apparatus 4.
  • the encoder is configured based upon the configuration of the input audio channels identified by the configuration control signal 72.
  • the encoding of the sets of dependent input audio channels is performed producing the intermediate signal 7 comprising a downmixed signal 57 and scene parameters 55.
  • the scene parameters include the level and time/phase difference and correlation cues of dependent input audio channels.
  • the encoding of the independent input audio channels and the downmixed signals 57 provided by the first stage, is performed producing the downmixed signal 57 and scene parameters 55.
  • the scene parameters include the level and may include time/phase difference and correlation cues of the independent input audio channels.
  • the encoded downmix bit stream 57 output from the second stage is provided for transmission or for storage.
  • the scene parameters 55 from the first stage and the scene parameters from the second stage are combined and provided for transmission or for storage
  • the output may consist of a mono (or stereo) downmix, ILD, ITD and ICC for each set of dependent input audio channels and ILD for the independent audio channels.
  • the configuration of the encoder apparatus 4 may also be provided for transmission or storage. Information on the configuration of the encoder may be provided implicitly or explicitly.
  • the presence of dependent and independent sources/channels may be implicitly indicated in the bit stream with the time/phase difference and correlation cues.
  • the side information does contain ITD and ICC cue or these cues have pre-defined values indicating independency between the sources, the corresponding input audio channels are dependent on each other.
  • pre-defined values indicating independency between the sources may be certain exact values or values within a predetermined range of values.
  • explicit signalling may be used to indicate the dependency status of the encoded audio sources/channels.
  • the two-stage configuration is presented only as an example of preferred implementation with two dependent and one independent source.
  • the example configuration illustrated in Fig 6 depicts a logical structure, and the different stages may or may not be co-located.
  • these encoders may or may not be co- located.
  • Fig 7 schematically illustrates an implementation of an encoder apparatus 4 in which the first stage and the second stage are performed in distinct and remote network elements.
  • the first stage is performed at a first network element 74 and the second stage is performed at a second, different, network element 76.
  • Fig 8 schematically illustrates a decoder apparatus 80 which receives input signals 57, 55 from the encoder apparatus 4.
  • the decoder apparatus 80 comprises a synthesis block 82 and a parameter processing block 84.
  • BCC synthesis may occur at the synthesis block 84.
  • a frame of downmixed signal(s) 57 consisting of N samples s 0 , ... , S N-1 is converted to N spectral samples S 0 ,... , S N _ 1 e.g. with DTF transform.
  • BCC interchannel parameters (cues) 85 for example ILD and ITD described above, are output from the parameter processing block 84 and applied in the synthesis block 82 to create spatial audio signals, in this example binaural audio, in a plurality (N) of output audio channels 83.
  • the left and right output audio channel signals are synthesised for each subband n as follows
  • S n is the spectral coefficient vector of the downmixed signal according to
  • Equation (8) S n and S n are the spectral coefficients of left and right binaural signal, respectively.
  • the BCC synthesis using frequency dependent level and delay parameters recreates the sound components representing the audio sources.
  • the ambience may still be missing and it may be synthesised using the coherence parameter.
  • a method for synthesis of the ambient component based on the coherence cue consists of decorrelation of a signal to create late reverberation signal.
  • the implementation may consist of filtering each output audio channel using a random phase filter and adding the result into the output. When a different filter delays are applied to each channel, a set of decorrelated signals is created.
  • Fig 9 schematically illustrates a decoder in which the multi-channel output of the synthesis block 82 is mixed, by mixer 89 into a plurality (K) of output audio channels 91 .
  • the mixer 89 may be responsive to user input 93 identifying the user's loudspeaker setup to change the mixing and the nature and number of the output audio channels 91 .
  • music or conversation recorded with binaural microphones could be played back through a multi-channel loudspeaker setup.
  • Teleconference systems typically host more than two different participants. For intelligibility reasons it may be beneficial to render the sound of each participant in a different direction. That is, the user experience is improved when each sound source is placed in a different location in a 3D audio space.
  • the output of the mixer 89 is a two channel stereo (or binaural) signal.
  • the mixer 89 can render the multi-channel content in different directions. That is, the arbitrary output audio channel configuration of the output of the spatial decoder is driven into predetermined auditory locations for example by the teleconference client and/or the user.
  • the rendering of the multi-channel content in the mixer 89 could simply consist of stereo panning of the sources between the two output channels.
  • the independent audio channels may be rendered using the available level difference cue and user or service determined 3D rendering requirements, such as binauralisation or stereo/multi-channel panning.
  • the dependent audio channels may be decorrelated before rendering based on the correlation cue using random phase filtering
  • Fig 10 schematically illustrates an example of the decoder apparatus 80 as functional blocks. Information concerning correlation/dependency between the audio channels is taken into account when decoding the encoded signals received from the encoder apparatus 4.
  • the configuration of the decoder apparatus 80 depends upon the configuration the encoder apparatus 4 (Figs 2 and 6) which in turn depends upon the configuration of the input audio channels 37, the number of sets of dependent input audio channels, the number of input audio channels in each set and the number of independent input audio channels.
  • the configuration of the decoder apparatus 80 is based upon knowledge of the dependency and independency of the input audio channels and/or knowledge of the configuration of the encoder apparatus 4.
  • Information identifying the configuration of the encoder apparatus 4 and/or the configuration of the input audio channels 37 is communicated from the encoder apparatus 4 to the decoder apparatus 80 either implicitly or explicitly as described previously.
  • the decoder apparatus 80 may determine the configuration of the encoder apparatus 4 and/or the configuration of the input audio channels 37 from the implicitly or explicitly communicated information.
  • the decoder apparatus 80 has two stages.
  • the first stage 90 reverses the second stage of the encoder apparatus 4.
  • the second stage 92 reverses the first stage of the encoder apparatus 4 and recovers the N input audio channels 37 as N output audio channels 83.
  • the number of output channels could be different to the number of input channels, i.e. N encoded input audio channels may decoded into M output audio channels.
  • the output audio channels and the input audio channels correspond but are not identical as the encoding/decoding is lossy.
  • the first stage 90 comprises an independent decoder block. This may be the decoder apparatus 80 illustrated in Fig 9.
  • the independent decoder block recovers the independent audio channel(s) 31 using downmix signal 57 and side information 55 and provides the independent audio channel(s) 31 to rendering block 96 for independent rendering.
  • the independent decoder block recovers the intermediate signal(s) 7 using downmix signal 57 and side information 55 and provides intermediate signal(s) 7 to the second stage 92.
  • the second stage 92 comprises one or more dependent decoder blocks. There is a decoder block in parallel for each one of the sets of dependent input audio channels.
  • a single decoder apparatus 80 as illustrated in Fig 9 may operate sequentially as each of the dependent decoder blocks and it may also operate as the independent decoder block. For simplicity, only a single dependent decoder block is described with reference to Fig 10 but it should be understood that there may be additional dependent decoder blocks.
  • the dependent decoder block recovers the dependent audio channel(s) 33, 35 using intermediate downmix signal 7 and the side information 55 and provides the set of dependent audio channels 33, 35 to rendering block 94 for rendering independently of the independent audio channel(s) 31 and other sets of dependent audio channels.
  • the second stage 92 is in cascade to the first stage 90.
  • the cascade arrangement is such that the first stage 90 precedes the second stage 92 in time but not necessarily in space as the same circuitry may be reused for each of the sets in the second stage and also for the first stage.
  • the Fig 9 depicts a logical structure, and the different stages may or may not be co- located. Separate stages could be located in different physical elements or the same decoder element could be used for each stage.
  • the stages do not need to have their own time-frequency transforms.
  • audio output is based at least part on the configuration of the encoder apparatus 4 and/or the configuration of the input audio channels 37 - the number of sets of dependent input audio channels, the number of input audio channels in each set and the number of independent input audio channels.
  • Each set of dependent audio channels is treated as an individual audio object - a dependent audio object.
  • the independent audio channels are treated as individual audio objects- independent audio objects.
  • a particular individual dependent audio channel will not be an individual audio object but will be contained within the dependent audio object representing a set of dependent audio channels that includes that particular individual dependent audio channel.
  • Encoding comprises a first stage 6 for making dependent audio objects from the individual dependent audio channels.
  • Encoding comprises a second stage 8 for making a combined audio object 34 from all the input audio objects including the dependent audio objects created in the first stage and the independent audio objects/channels.
  • the combined audio object may be represented as a downmix signal and side information.
  • the decoding comprises a first stage 90 for converting the combined audio object 34 into its constituent audio objects including the dependent audio objects and the independent audio channels/objects.
  • the decoding comprises a second stage for converting each of the dependent audio objects into a set of dependent audio channels.
  • decoding has a configuration that is dynamic and changes with the dependencies of the input audio channels.
  • the rendering of the audio objects may occur independently.
  • each independent audio channel may be individually rendered and each set of dependent audio channels may be separately rendered.
  • rendering may be dynamic and change with the dependencies of the input audio channels.
  • the user via, for example, a control signal 85 can control the encoding and/or via a control signal 98 can control the rendering.
  • a control signal 85 can control the encoding and/or via a control signal 98 can control the rendering.
  • the user may choose the audio format in which an audio object is rendered and/or choose the 3D position at which an audio object is rendered,
  • Fig 1 1 represents an example according an embodiment of the invention for detecting whether the output of the spatial decoder is dependent or independent.
  • the BCC side information does not contain ITD or ICC (ITD -, ICC -) it relates to an independent audio object, that is, an independent audio channel.
  • the BCC side information contains ILD, ITD and ICC, it relates to a dependent audio object, that is, a set of dependent audio channels.
  • a rendering control signal 98 comprising explicit information on the physical arrangement of the microphones producing the input audio channels 37.
  • the input downmix signal 57 is first transformed e.g. with DFT transform. After that the processing chain on each decoder stage may be performed in the same domain without inverse transform to time domain.
  • Inverse transform may be conducted only at the rendering stage. For example, stereo panning of two sources/objects could be done in either time or transform domain. In this case the number of transforms compared to one stage decoding is not increased, since there is one transform for the input and an inverse transform for each output similar to one stage decoding.
  • the rendering in 3D audio space may be controlled by the server 32 - for example by a teleconference server - with additional information about the positions of independent sources/objects.
  • the position information from the server 32 is provided as side information embedded in the BCC bit stream or as an additional control signalling 98.
  • additional enhancement "guidance" regarding e.g. the relative widening and panning of dependent sources/objects may be provided.
  • Positions could be based e.g. on user input 85 through user interface.
  • the relative positions could be set based on the geographical location of the actual sound sources using e.g. GPS information.
  • the GUI or application will in this case automatically keep dependent audio objects intact and prevent then being split into channels so that the relative positions of the channels within the dependent audio object cannot be altered accidentally, which may cause audible distortion.
  • the independent audio channels could be placed more freely within the spatial image.
  • Fig 12 schematically illustrates a decoding method 100.
  • a configuration of dependent and independent audio objects is detected as described above.
  • the detection may involve obtaining implicit and/or explicit information communicated to the encoder such as, for example, number S of dependent audio objects, the size M s of dependent audio objects- the number of constituent audio channels, the number N of independent audio objects.
  • the configuration of the decoder is set to match the configuration of the dependent and independent audio objects.
  • the first stage 90 is configured to produce an intermediate signal 7 for each of S dependent audio objects and an output audio channel 31 for each of the N independent audio objects.
  • the second stage 92 is configured so that each encoding block s in the second stage 92 receives one of the intermediate signals 7 from the first stage 90 and produces M s output audio channels.
  • the received downmix signal is decomposed into a predetermined number (N+S) of audio objects including N independent audio objects (channels) and S dependent audio objects (uncorrelated sets of dependent channels).
  • the S dependent audio objects (uncorrelated sets of dependent channels) are decoded.
  • the audio objects are independently rendered using their associated output audio channels.
  • Each audio object is associated with an acoustic space.
  • This may be achieved using selective encoding, selective decoding or selective rendering. It may be possible to identify which audio object, if any, shares an acoustic space with an output signal 34 by deliberately introducing a test signal into the output signal. The various audio objects may be analyzed to discover which, if any, include the test signal.
  • an audio object is deliberately not encoded at the second stage 8. This may be achieved, for example, by multiplexing the inputs to the second stage 8.
  • the selection of which audio object to deliberately not encode may depend upon the destination of the output signal.
  • the acoustic space associated with the client to which the output signal 34 is to be sent is identified. Any audio objects that originate from the same acoustic space are deliberately not encoded.
  • the steps may include: a) identifying an audio object, if any, that has originated from the same acoustic space as the destination of the encoded audio by, for example,
  • an audio object may deliberately not be decoded at the first stage 90 and/or at the second stage 92. This may be achieved, for example, by multiplexing the outputs of the first stage 90 and/or the second stage 92.
  • the selection of which audio object to deliberately not decode depends upon the acoustic space associated with the respective audio object. Any audio objects that originate from the same acoustic space may deliberately not be decoded.
  • the steps may include: a) identifying an audio object, if any, that has originated from the same acoustic space as the client. b) preventing decoding at the first stage 90 and/or at the second stage 92 of the identified audio object.
  • an audio object may deliberately not be rendered at a client. This may be achieved, for example, by multiplexing the inputs to the rendering block 89. The selection of which audio object to deliberately not render depends upon the acoustic space associated with the rendering block 89. Any audio objects that originate from the same acoustic space may deliberately not be rendered.
  • the steps may include: a) identifying an audio object, if any, that has originated from the same acoustic space as the client. b) prevent rendering of the identified audio object.
  • the blocks illustrated in the Figs 2 to 12 may represent steps in a method and/or sections of code in the computer program 46.
  • the illustration of a particular order to the blocks does not necessarily imply that there is a required or preferred order for the blocks and the order and arrangement of the block may be varied. Furthermore, it may be possible for some steps to be omitted.

Abstract

L'invention concerne un procédé qui comprend les étapes consistant à: coder au moins une partie d'un signal intermédiaire, formé par le codage d'un ensemble de canaux audio d'entrée dépendants et d'au moins un canal audio d'entrée indépendant, afin de produire un signal de sortie; identifier une configuration d'objets audio dépendants et indépendants dans un signal audio reçu; et commander le traitement du signal audio reçu par la commande du traitement des objets audio.
PCT/EP2009/053331 2009-03-20 2009-03-20 Codage audio multicanaux WO2010105695A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2009/053331 WO2010105695A1 (fr) 2009-03-20 2009-03-20 Codage audio multicanaux

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2009/053331 WO2010105695A1 (fr) 2009-03-20 2009-03-20 Codage audio multicanaux

Publications (1)

Publication Number Publication Date
WO2010105695A1 true WO2010105695A1 (fr) 2010-09-23

Family

ID=40639578

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2009/053331 WO2010105695A1 (fr) 2009-03-20 2009-03-20 Codage audio multicanaux

Country Status (1)

Country Link
WO (1) WO2010105695A1 (fr)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015011057A1 (fr) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Réduction d'artéfacts de filtre en peigne dans un mixage réducteur multicanaux à alignement de phase adaptatif
RU2625939C2 (ru) * 2012-10-05 2017-07-19 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Кодер, декодер и способы для зависимого от сигнала преобразования масштаба при пространственном кодировании аудиообъектов
CN109219847A (zh) * 2016-06-01 2019-01-15 杜比国际公司 将多声道音频内容转换成基于对象的音频内容的方法及用于处理具有空间位置的音频内容的方法
RU2678136C1 (ru) * 2015-02-02 2019-01-23 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство и способ обработки кодированного аудиосигнала
WO2019091860A1 (fr) * 2017-11-10 2019-05-16 Nokia Technologies Oy Informations de dépendance de flux audio
US10482888B2 (en) 2013-01-22 2019-11-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for spatial audio object coding employing hidden objects for signal mixture manipulation
US10659900B2 (en) 2013-07-22 2020-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
US10701504B2 (en) 2013-07-22 2020-06-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for realizing a SAOC downmix of 3D audio content
US11227616B2 (en) 2013-07-22 2022-01-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
US11984131B2 (en) 2013-07-22 2024-05-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040049379A1 (en) * 2002-09-04 2004-03-11 Microsoft Corporation Multi-channel audio encoding and decoding
WO2006108462A1 (fr) * 2005-04-15 2006-10-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Codage audio hierarchique multicanaux effectue avec des informations annexes compactes
US20080160977A1 (en) * 2006-12-27 2008-07-03 Nokia Corporation Teleconference group formation using context information
WO2008111773A1 (fr) * 2007-03-09 2008-09-18 Lg Electronics Inc. Procédé et appareil de traitement de signal audio
WO2008120933A1 (fr) * 2007-03-30 2008-10-09 Electronics And Telecommunications Research Institute Dispositif et procédé de codage et décodage de signal audio multi-objet multicanal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040049379A1 (en) * 2002-09-04 2004-03-11 Microsoft Corporation Multi-channel audio encoding and decoding
WO2006108462A1 (fr) * 2005-04-15 2006-10-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Codage audio hierarchique multicanaux effectue avec des informations annexes compactes
US20080160977A1 (en) * 2006-12-27 2008-07-03 Nokia Corporation Teleconference group formation using context information
WO2008111773A1 (fr) * 2007-03-09 2008-09-18 Lg Electronics Inc. Procédé et appareil de traitement de signal audio
WO2008120933A1 (fr) * 2007-03-30 2008-10-09 Electronics And Telecommunications Research Institute Dispositif et procédé de codage et décodage de signal audio multi-objet multicanal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAUMGARTE F ET AL: "Binaural cue coding-part II: schemes and applications", IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, IEEE SERVICE CENTER, NEW YORK, NY, US, vol. 11, no. 6, 1 November 2003 (2003-11-01), pages 520 - 531, XP011104739, ISSN: 1063-6676 *
KYUNGRYEOL KOO ET AL: "Variable Subband Analysis for High Quality Spatial Audio Object Coding", ADVANCED COMMUNICATION TECHNOLOGY, 2008. ICACT 2008. 10TH INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 17 February 2008 (2008-02-17), pages 1205 - 1208, XP031245331, ISBN: 978-89-5519-136-3 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734833B2 (en) 2012-10-05 2017-08-15 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoder, decoder and methods for backward compatible dynamic adaption of time/frequency resolution spatial-audio-object-coding
US10152978B2 (en) 2012-10-05 2018-12-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Encoder, decoder and methods for signal-dependent zoom-transform in spatial audio object coding
RU2625939C2 (ru) * 2012-10-05 2017-07-19 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Кодер, декодер и способы для зависимого от сигнала преобразования масштаба при пространственном кодировании аудиообъектов
US10482888B2 (en) 2013-01-22 2019-11-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for spatial audio object coding employing hidden objects for signal mixture manipulation
US11227616B2 (en) 2013-07-22 2022-01-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
CN111862997A (zh) * 2013-07-22 2020-10-30 弗朗霍夫应用科学研究促进协会 使用自适应相位校准的多声道降混的梳型滤波器的伪迹消除
US11984131B2 (en) 2013-07-22 2024-05-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Concept for audio encoding and decoding for audio channels and audio objects
US11910176B2 (en) 2013-07-22 2024-02-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
US11463831B2 (en) 2013-07-22 2022-10-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for efficient object metadata coding
US10360918B2 (en) 2013-07-22 2019-07-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
EP2838086A1 (fr) * 2013-07-22 2015-02-18 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Dans une réduction d'artefacts de filtre en peigne dans un mixage réducteur multicanal à alignement de phase adaptatif
US11337019B2 (en) 2013-07-22 2022-05-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
US10659900B2 (en) 2013-07-22 2020-05-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for low delay object metadata coding
US10701504B2 (en) 2013-07-22 2020-06-30 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for realizing a SAOC downmix of 3D audio content
US10715943B2 (en) 2013-07-22 2020-07-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for efficient object metadata coding
CN105518775A (zh) * 2013-07-22 2016-04-20 弗朗霍夫应用科学研究促进协会 使用自适应相位校准的多声道降混的梳型滤波器的伪迹消除
US10937435B2 (en) 2013-07-22 2021-03-02 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Reduction of comb filter artifacts in multi-channel downmix with adaptive phase alignment
US11330386B2 (en) 2013-07-22 2022-05-10 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for realizing a SAOC downmix of 3D audio content
WO2015011057A1 (fr) * 2013-07-22 2015-01-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Réduction d'artéfacts de filtre en peigne dans un mixage réducteur multicanaux à alignement de phase adaptatif
US11004455B2 (en) 2015-02-02 2021-05-11 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing an encoded audio signal
US10529344B2 (en) 2015-02-02 2020-01-07 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for processing an encoded audio signal
RU2678136C1 (ru) * 2015-02-02 2019-01-23 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство и способ обработки кодированного аудиосигнала
CN109219847B (zh) * 2016-06-01 2023-07-25 杜比国际公司 将多声道音频内容转换成基于对象的音频内容的方法及用于处理具有空间位置的音频内容的方法
CN109219847A (zh) * 2016-06-01 2019-01-15 杜比国际公司 将多声道音频内容转换成基于对象的音频内容的方法及用于处理具有空间位置的音频内容的方法
US11443753B2 (en) 2017-11-10 2022-09-13 Nokia Technologies Oy Audio stream dependency information
WO2019091860A1 (fr) * 2017-11-10 2019-05-16 Nokia Technologies Oy Informations de dépendance de flux audio

Similar Documents

Publication Publication Date Title
KR101450414B1 (ko) 멀티-채널 오디오 프로세싱
EP2898506B1 (fr) Approche de codage audio spatial en couches
WO2010105695A1 (fr) Codage audio multicanaux
CN111316354B (zh) 目标空间音频参数和相关联的空间音频播放的确定
TWI508058B (zh) 多聲道音訊處理技術
CN110890101B (zh) 用于基于语音增强元数据进行解码的方法和设备
JP2023126225A (ja) DirACベース空間オーディオコーディングに関する符号化、復号、シーン処理、および他の手順のための装置、方法、およびコンピュータプログラム
EP3748632A1 (fr) Codage et décodage de signaux audio
WO2010125228A1 (fr) Codage de signaux audio multivues
TWI794911B (zh) 用以編碼音訊信號或用以解碼經編碼音訊場景之設備、方法及電腦程式
TWI747095B (zh) 使用擴散補償用於編碼、解碼、場景處理及基於空間音訊編碼與DirAC有關的其他程序的裝置、方法及電腦程式
US20220383885A1 (en) Apparatus and method for audio encoding
KR102284104B1 (ko) 입력 신호를 처리하기 위한 인코딩 장치 및 인코딩된 신호를 처리하기 위한 디코딩 장치
GB2574667A (en) Spatial audio capture, transmission and reproduction
US11430451B2 (en) Layered coding of audio with discrete objects
CN114586381A (zh) 空间音频表示和渲染
RU2809587C1 (ru) Устройство, способ и компьютерная программа для кодирования звукового сигнала или для декодирования кодированной аудиосцены
Kondo et al. DRT Evaluation of Localized Speech Intelligibility in Virtual 3-D Acoustic Space

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09779193

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09779193

Country of ref document: EP

Kind code of ref document: A1