WO2020152394A1 - Représentation audio et rendu associé - Google Patents

Représentation audio et rendu associé Download PDF

Info

Publication number
WO2020152394A1
WO2020152394A1 PCT/FI2020/050014 FI2020050014W WO2020152394A1 WO 2020152394 A1 WO2020152394 A1 WO 2020152394A1 FI 2020050014 W FI2020050014 W FI 2020050014W WO 2020152394 A1 WO2020152394 A1 WO 2020152394A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
mono
metadata
signal
evs
Prior art date
Application number
PCT/FI2020/050014
Other languages
English (en)
Inventor
Lasse Laaksonen
Mikko-Ville Laitinen
Anssi RÄMÖ
Tapani PIHLAJAKUJA
Adriana Vasilache
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2020152394A1 publication Critical patent/WO2020152394A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/0017Lossless audio signal coding; Perfect reconstruction of coded audio signal by transmission of coding error
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/03Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/03Application of parametric coding in stereophonic audio systems

Definitions

  • the present application relates to apparatus and methods for sound-field related audio representation and associated rendering, but not exclusively for audio representation for an audio encoder and decoder.
  • Immersive audio codecs are being implemented supporting a multitude of operating points ranging from a low bit rate operation to transparency.
  • An example of such a codec is the immersive voice and audio services (IVAS) codec which is being designed to be suitable for use over a communications network such as a 3GPP 4G/5G network.
  • Such immersive services include uses for example in immersive voice and audio for applications such as virtual reality (VR), augmented reality (AR) and mixed reality (MR).
  • This audio codec is expected to handle the encoding, decoding and rendering of speech, music and generic audio. It is furthermore expected to support channel-based audio and scene-based audio inputs including spatial information about the sound field and sound sources.
  • the codec is also expected to operate with low latency to enable conversational services as well as support high error robustness under various transmission conditions.
  • parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters.
  • parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands.
  • These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array.
  • These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics. Summary
  • an apparatus comprising means for: obtaining an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • the input format may further comprise a definition configured to control an encoder.
  • the means for may be further for: encoding a mono audio signal based on the mono audio signal; encoding a multichannel audio signal based on the mono audio signal and the metadata signal associated with the mono signal.
  • the encoding a mono audio signal based on the mono audio signal may be based on the definition configured to control an encoder.
  • the input format may further comprise a multichannel audio signal, wherein the means for may be further for: encoding a mono audio signal based on the mono audio signal; and encoding a multichannel audio signal based on the multichannel audio signal.
  • the multichannel audio signal may be a stereo audio signal.
  • the encoded mono audio signal may be an enhanced voice system encoded mono audio signal.
  • the encoded multichannel audio signal may be one of: an enhanced voice system encoded multichannel audio signal; and an Immersive Voice and Audio Services multichannel audio signal.
  • the metadata signal may comprise: two directional parameters for each time- frequency tile, wherein the direction parameters are limited to a single plane of elevation; and direct-to-total energy ratios associated with the two directional parameters, wherein the sum of direct-to-energy ratios for the two directions is 1 .
  • a method comprising obtaining an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • the input format may further comprise a definition configured to control an encoding.
  • the method may further comprise: encoding a mono audio signal based on the mono audio signal; encoding a multichannel audio signal based on the mono audio signal and the metadata signal associated with the mono signal.
  • Encoding a mono audio signal based on the mono audio signal may further comprise encoding based on the definition.
  • the input format may further comprise a multichannel audio signal, wherein the method may further comprise: encoding a mono audio signal based on the mono audio signal; and encoding a multichannel audio signal based on the multichannel audio signal.
  • the multichannel audio signal may be a stereo audio signal.
  • the encoded mono audio signal may be an enhanced voice system encoded mono audio signal.
  • the encoded multichannel audio signal may be one of: an enhanced voice system encoded multichannel audio signal; and an Immersive Voice and Audio Services multichannel audio signal.
  • the metadata signal may comprise: two directional parameters for each time- frequency tile, wherein the direction parameters are limited to a single plane of elevation; and direct-to-total energy ratios associated with the two directional parameters, wherein the sum of direct-to-energy ratios for the two directions is 1 .
  • an apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to: obtain an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • the input format may further comprise a definition configured to control an encoding.
  • the apparatus may further be caused to: encode a mono audio signal based on the mono audio signal; encode a multichannel audio signal based on the mono audio signal and the metadata signal associated with the mono signal.
  • the apparatus caused to encode a mono audio signal based on the mono audio signal may further be caused to encode based on the definition.
  • the input format may further comprise a multichannel audio signal, wherein the apparatus may further be caused to: encode a mono audio signal based on the mono audio signal; and encode a multichannel audio signal based on the multichannel audio signal.
  • the multichannel audio signal may be a stereo audio signal.
  • the encoded mono audio signal may be an enhanced voice system encoded mono audio signal.
  • the encoded multichannel audio signal may be one of: an enhanced voice system encoded multichannel audio signal; and an Immersive Voice and Audio Services multichannel audio signal.
  • the metadata signal may comprise: two directional parameters for each time- frequency tile, wherein the direction parameters are limited to a single plane of elevation; and direct-to-total energy ratios associated with the two directional parameters, wherein the sum of direct-to-energy ratios for the two directions is 1 .
  • an apparatus comprising obtaining circuitry configured to obtain an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • a computer program comprising instructions [or a computer readable medium comprising program instructions] for causing an apparatus to perform at least the following: obtaining an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • a non-transitory computer readable medium comprising program instructions for causing an apparatus to perform at least the following: obtaining an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • an apparatus comprising: means for obtaining an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • a computer readable medium comprising program instructions for causing an apparatus to perform at least the following: obtaining an input format for generating an encoded mono audio signal and/or a multichannel audio signal, the input format comprising: a mono audio signal and a metadata signal associated with the mono signal, the metadata signal configured to enable the generation of the encoded multichannel audio signal from the mono audio signal.
  • Encoding a mono audio signal may be encoding a bit-exact mono audio signal.
  • the encoded mono audio signal may be an encoded bit-exact mono audio signal.
  • An apparatus comprising means for performing the actions of the method as described above.
  • An apparatus configured to perform the actions of the method as described above.
  • a computer program comprising program instructions for causing a computer to perform the method as described above.
  • a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
  • An electronic device may comprise apparatus as described herein.
  • a chipset may comprise apparatus as described herein.
  • Embodiments of the present application aim to address problems associated with the state of the art.
  • Figure 1 shows schematically a system of apparatus suitable for implementing some embodiments
  • Figure 2 shows schematically a system of apparatus for an IVAS encoder architecture of Figure 1 including a mono signal input;
  • Figure 3 shows schematically a first example IVAS encoder architecture of Figure 1 including a mono signal input according to some embodiments
  • Figure 4 shows schematically a second example IVAS encoder architecture of Figure 1 including a mono signal input according to some embodiments
  • Figure 5 shows example bit distribution for bitstream examples based on the first example IVAS encoder architecture shown in Figure 3;
  • FIGS 6 to 9 show example voice conference systems employing some embodiments
  • Figure 10 shows a third example IVAS encoder architecture of Figure 1 including a mono signal input according to some embodiments
  • Figure 11 shows an example voice conference systems employing the third example IVAS encoder architecture as shown in Figure 10 according to some embodiments.
  • Figure 12 shows an example device suitable for implementing the apparatus shown.
  • a practical stereo encoding may utilize an adaptive, smart downmix, for example to compensate inter-channel time/phase differences, which may maintain quality and produce as faithful a reproduction of the original stereo signal as a mono signal as possible.
  • any other codec or coding can implement some embodiments as described herein.
  • an embedded bit-exact stereo or spatial extension can be of great interest for interoperability and conformance reasons (particularly in managed services with quality of service (QoS) requirements).
  • the concept as discussed in further detail hereafter is able in some embodiments to allow an embedded stereo (or spatial) extension to feature bit-exact legacy mono operation in an embedded encoding structure while providing freedom of high-quality stereo-to-mono (or spatial-to-mono) downmix. Additionally the embodiments can extend Metadata-assisted spatial audio (MASA), which may be understood (at least) as an ingest format intended for the 3GPP IVAS audio codec, for (embedded) stereo encoding in a“spatial” MASA compatible way.
  • MASA Metadata-assisted spatial audio
  • the embodiments may thus define an EETU-MASA (Embedded EVS Stereo Using Metadata-Assisted Spatial Audio) method allowing embedded stereo (and by regular MASA extension, of course, immersive/spatial) operation on top of the legacy EVS codec in a way where the mono downmix can be guaranteed to be bit- exact with EVS operation as specified by the standards such as TS 26.445, TS 26.442, and TS 26.444.
  • EETU-MASA Embedded EVS Stereo Using Metadata-Assisted Spatial Audio
  • EETU-MASA is implemented as stereo as 1 -channel + metadata input:
  • the MASA format is configured with a channel configuration option‘stereo using mono input + MASA metadata’.
  • This configuration information can be provided for example via a specific metadata field (e.g., called ‘Channel configuration’ or‘Channel audio format’).
  • An IVAS encoder on receiving this this input can be configured to select EVS as the core coding of the mono stream (treating the input as mono without metadata), while any MASA metadata is fed into MASA metadata encoder.
  • the bitstream from the IVAS encoder can in some embodiments include a bit-exact EVS mono and additional MASA metadata providing a stereo extension. This can be transmitted to a suitable IVAS decoder as is.
  • the IVAS recipient receives the IVAS mono stream and decodes stereo (or spatial audio).
  • a suitable network element such as MCU
  • the network element performs a transcoding from IVAS to EVS which is lossless for the mono part.
  • the EVS recipient receives the EVS mono stream and decodes mono.
  • the stereo mode of the MASA spatial metadata can be achieved using the definition below:
  • Direction index is limited to Left and Right with no elevation
  • All the other parameters can be omitted or set to zero/default.
  • the above definition may be particularly useful in situations with independent stereo channels. For example in a multipoint control unit (MCU) stereo audio use case.
  • MCU multipoint control unit
  • the capture and audio processing system before the (IVAS) encoder in other words the part of the end-to-end system that creates the (MASA) input for the encoder, is free to apply any stereo-to-mono downmix that is suitable for best possible quality.
  • the (EVS/IVAS) encoder in such embodiments is configured to see the single mono downmix, and bit-exactness of the mono signal is therefore maintained.
  • the methods as described in the embodiments hereafter can furthermore be used for core codecs other than EVS and for extensions/immersive codecs other than IVAS.
  • an additional advantage may be that a bit exact embedded stereo is allowed also on top of EVS adaptive multi-rate wideband (AMR- WB) interoperable (IO) modes (and therefore not only the EVS primary modes).
  • AMR- WB EVS adaptive multi-rate wideband
  • IO interoperable
  • an EETU-MASA with stereo input as 3-channel + metadata example there may be defined an EETU-MASA with stereo input as 3-channel + metadata example.
  • the MASA format comprises a channel configuration option of‘stereo using combination of mono input and stereo input + MASA metadata’.
  • This configuration information can be provided for example via a specific metadata field (e.g., called ‘Channel configuration’ or ‘Channel audio format’).
  • this input can configure the encoder to select EVS as the core coding of the mono stream (treating the input as mono without metadata), while at least the stereo stream with the MASA metadata is fed into the IVAS MASA encoder (including the metadata encoding).
  • This mode of operation is thus a parallel stereo/spatial mono downmix encoding for bit-exact backwards interoperability.
  • the bitstream from the (IVAS) encoder will comprise a bit-exact (EVS) mono and additional (IVAS) bitstream with (MASA) metadata providing a stereo extension.
  • EVS bit-exact
  • AMA additional (IVAS) bitstream with (MASA) metadata providing a stereo extension.
  • This can be transmitted to an (IVAS) decoder as is (or with the EVS payload dropped).
  • the (IVAS) recipient receives the (IVAS) stream and decodes stereo (or spatial audio).
  • a suitable network element such as a MCU
  • the network element can be configured to perform a transcoding from IVAS to EVS which is lossless for the mono part.
  • the EVS recipient can be configured to receive the EVS mono stream and decodes the mono signals.
  • the IVAS encoder (or any stereo/spatial encoder) can provide the EVS bitstream (or any mono bitstream) and the IVAS bitstream (or any stereo/spatial bitstream) as separate packets.
  • FIG. 1 With respect to Figure 1 is shown an example apparatus and system for implementing the obtaining and encoding an audio signal (in the form of audio capture in this example) and rendering (the encoded audio signals).
  • the system 100 is shown with an‘analysis’ part 121 and a‘synthesis’ part 131 .
  • The‘analysis’ part 121 is the part from receiving the multi-channel signals up to an encoding of the metadata and transport signal and the‘synthesis’ part 131 is the part from a decoding of the encoded metadata and transport signal to the presentation of the re-generated signal (for example in multi-channel loudspeaker form).
  • the input to the system 100 and the‘analysis’ part 121 is the multi-channel signals 102.
  • a microphone channel signal input is described, however any suitable input (or synthetic multi-channel) format may be implemented in other embodiments.
  • the spatial analyser and the spatial analysis may be implemented external to the encoder.
  • the spatial metadata associated with the audio signals may be a provided to an encoder as a separate bit-stream.
  • the spatial metadata may be provided as a set of spatial (direction) index values.
  • the multi-channel signals are passed to a transport signal generator 103 and to an analysis processor 105.
  • the transport signal generator 103 is configured to receive the multi-channel signals and generate a suitable transport signal comprising a determined number of channels and output the transport signals 104.
  • the transport signal generator 103 may be configured to generate a 2 audio channel downmix of the multi-channel signals.
  • the determined number of channels may be any suitable number of channels.
  • the transport signal generator in some embodiments is configured to otherwise select or combine, for example, by beamforming techniques the input audio signals to the determined number of channels and output these as transport signals.
  • the transport signal generator 103 is optional and the multi-channel signals are passed unprocessed to an encoder 107 in the same manner as the transport signal are in this example.
  • the analysis processor 105 is also configured to receive the multi-channel signals and analyse the signals to produce metadata 106 associated with the multi-channel signals and thus associated with the transport signals 104.
  • the analysis processor 105 may be configured to generate the metadata which may comprise, for each time-frequency analysis interval, a direction parameter 108 and an energy ratio parameter 1 10 and a coherence parameter 1 12 (and in some embodiments a diffuseness parameter).
  • the direction, energy ratio and coherence parameters (and diffuseness parameter) may in some embodiments be considered to be spatial audio parameters.
  • the spatial audio parameters comprise parameters which aim to characterize the sound-field created by the multi-channel signals (or two or more playback audio signals in general).
  • the parameters generated may differ from frequency band to frequency band.
  • band X all of the parameters are generated and transmitted, whereas in band Y only one of the parameters is generated and transmitted, and furthermore in band Z no parameters are generated or transmitted.
  • band Z no parameters are generated or transmitted.
  • the transport signals 104 and the metadata 106 may be passed to an encoder 107.
  • the spatial audio parameters may be grouped or separated into directional and non-directional (such as, e.g., diffuse) parameters.
  • the encoder 107 may comprise an audio encoder core 109 which is configured to receive the transport (for example downmix) signals 104 and generate a suitable encoding of these audio signals.
  • the encoder 107 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
  • the encoding may be implemented using any suitable scheme.
  • the encoder 107 may furthermore comprise a metadata encoder/quantizer 1 1 1 which is configured to receive the metadata and output an encoded or compressed form of the information.
  • the encoder 107 may further interleave, multiplex to a single data stream or embed the metadata within encoded downmix signals before transmission or storage shown in Figure 1 by the dashed line.
  • the multiplexing may be implemented using any suitable scheme.
  • the received or retrieved data may be received by a decoder/demultiplexer 133.
  • the decoder/demultiplexer 133 may demultiplex the encoded streams and pass the audio encoded stream to a transport extractor 135 which is configured to decode the audio signals to obtain the transport signals.
  • the decoder/demultiplexer 133 may comprise a metadata extractor 137 which is configured to receive the encoded metadata and generate metadata.
  • the decoder/demultiplexer 133 can in some embodiments be a computer (running suitable software stored on memory and on at least one processor), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
  • the decoded metadata and transport audio signals may be passed to a synthesis processor 139.
  • the system 100‘synthesis’ part 131 further shows a synthesis processor 139 configured to receive the transport and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural signals for headphone listening or Ambisonics signals, depending on the use case) based on the transport signals and the metadata.
  • a synthesis processor 139 configured to receive the transport and the metadata and re-creates in any suitable format a synthesized spatial audio in the form of multi-channel signals 1 10 (these may be multichannel loudspeaker format or in some embodiments any suitable output format such as binaural signals for headphone listening or Ambisonics signals, depending on the use case) based on the transport signals and the metadata.
  • the system (analysis part) is configured to receive multi-channel audio signals.
  • the system (analysis part) is configured to generate a suitable transport audio signal (for example by selecting or downmixing some of the audio signal channels).
  • the system is then configured to encode for storage/transmission the transport signal and the metadata.
  • the system may store/transmit the encoded transport and metadata.
  • the system may retrieve/receive the encoded transport and metadata.
  • the system is configured to extract the transport and metadata from encoded transport and metadata parameters, for example demultiplex and decode the encoded transport and metadata parameters.
  • the system (synthesis part) is configured to synthesize an output multi channel audio signal based on extracted transport audio signals and metadata.
  • the apparatus and methods can be implemented as part of a MASA format definition, encoder functionality, and bitstream format (including, e.g., RTP header). These embodiments are relevant for the audio codec standard as well as various network functionalities (e.g., MCU operation).
  • FIG. 2 With respect to Figure 2 is shown a high-level view of an example IVAS encoder including the various inputs which may, as non-exclusive examples, be expected for the codec.
  • the underlying idea is that mono signals are handled by a bit-exact implementation of the EVS codec, while any stereo, spatial or immersive input is handled by the IVAS core tools complemented in some cases by a metadata encoder.
  • the system as shown in Figure 2 can comprise input format generators 203.
  • the input format generators 203 may be considered in some examples to be the same as the transport signal generator 103 and the analysis processor 105 from Figure 1 .
  • the input format generators 203 may be configured to generate suitable audio signals and metadata for capturing the audio and spatial audio qualities of the input signals, which may originate from microphone capture, some other source (such as a file) or a combination thereof.
  • a relevant microphone capture may be a multi-microphone audio capture on a mobile device (such as a smartphone), while a relevant other source may be a channel-based music file (such as a 5.1 music mix file). Any other suitable microphone array capture or source can also be used.
  • the input format generators 203 can comprise a mono audio signal generator 205 configured to generate a suitable mono audio signal.
  • the input format generators 203 can also comprise a multichannel or spatial format generator 207.
  • the multichannel or spatial format generator 207 in some embodiments comprises a metadata-assisted spatial audio generator 209.
  • the metadata-assisted spatial audio generator 209 is configured to generate audio signals (such as the transport audio signals in the form as a stereo-channel audio signal) and metadata associated with the audio signals.
  • the multichannel or spatial format generator 207 in some embodiments comprises a multichannel format generator 21 1 configured to generate suitable multichannel audio signals (for example stereo channel format audio signals and/or 5.1 channel format audio signals).
  • the multichannel or spatial format generator 207 in some embodiments comprises an ambisonics generator 213 configured to generate a suitable ambisonics format audio signal (which may comprise first order ambisonics and/or higher order ambisonics).
  • the multichannel or spatial format generator 207 in some embodiments can comprise an independent mono streams with metadata generator 215 configured to generate mono audio signals and metadata.
  • the apparatus comprises encoders 221 .
  • the encoder(s) is configured to receive the output of the input format generators 203 and encode these into a suitable format for storage and/or transmission.
  • the encoders may be considered to be the same as the encoder 107.
  • the encoders 221 may comprise a bit exact EVS encoder 223.
  • the bit exact EVS encoder 223 may be configured to receive the mono audio signal from the input format generators 203 and generate a bit exact EVS mono audio signal.
  • the encoders 221 may comprise IVAS core encoder 225.
  • the IVAS core encoder 225 may be configured to receive the audio signals generated by the input format generators 203 and encode these according to the IVAS standard.
  • the encoders comprises a metadata encoder 227.
  • the metadata encoder is configured to receive the spatial metadata and encode it or compress it in any suitable manner.
  • the encoders 221 in some embodiments can be configured to combine or multiplex the datastreams generated by the encoders prior to being transmitted and/or stored.
  • the system furthermore comprises a transmitter configured to transmit or store the bitstream 231 .
  • the mono input 301 is passed to the encoder 31 1 and the bit exact EVS encoder 317 in the same manner as shown in Figure 2.
  • the stereo and immersive audio input 303 is passed to the encoder 31 1 and a pre-processor 315.
  • the encoder 31 1 in some embodiments comprises a pre-processor 315.
  • the pre-processor 315 may be configured to receive the stereo and immersive inputs and pre-process the signal before being passed to the downmixer 313 and to the IVAS core encoder 319.
  • the metadata output of the pre-processor 315 can be passed to the metadata encoder 321 .
  • the encoder 31 1 furthermore comprises a downmixer 313.
  • the downmixer 313 is configured to process the pre-processed audio signal and output a downmixed or mono channel audio signal to the bit-exact EVS encoder 317.
  • the downmixer 313 in some embodiments is further configured to output metadata associated with the downmixed audio signal to the metadata encoder 321 .
  • the encoder 31 1 may comprise a bit-exact EVS encoder 317.
  • the bit-exact EVS encoder 317 may be configured to receive the mono audio signal from the mono input 301 and the downmixer 313 and generate an EVS mono audio signal.
  • the encoder 31 1 may comprise the IVAS core encoder 319.
  • the IVAS core encoder 319 may be configured to receive the audio signals generated by the pre-processor 315 and encode these according to the IVAS standard.
  • the encoders comprises the metadata encoder 321 .
  • the metadata encoder 321 is configured to receive the spatial metadata from the downmixer 313 and pre-processor 315 and encode it or compress it in any suitable manner.
  • FIG. 4 With respect to Figure 4 is shown how an embedded EVS stereo generated signal can be implemented within the system shown in Figure 2 according to a first example embodiment.
  • This example improves over the example shown in Figure 3 in that although the apparatus in Figure 3 implements an embedded EVS stereo it is not a bit exact output when compared to a mono downmix of the same stereo signal into a legacy EVS mono encoder. This is because there is a signal delay due to pre-processing (such as any highpass or lowpass filtering) affecting among other things the exact framing of the signal encoding. For example, if the input framing in the encoder is changed even by introducing a one-sample delay, the resulting bitstream will be different.
  • pre-processing such as any highpass or lowpass filtering
  • the pre-processing itself can change the signal characteristics (such as removal of low-frequency or high-frequency components).
  • Another example is if an active downmix is performed to deal with certain time/phase alignment effect, and this downmix processing differs from the downmix performed outside the codec.
  • the apparatus in Figure 3 may be modified such that the pre-processing is skipped when the embedded stereo mode is used this complicates the apparatus and introduces mode switching issues.
  • the embodiments further improve over the apparatus as shown in Figure 3 in that the downmix inside the codec is not limited to a simple downmix to be able to produce the same downmix outside the codec and inside the codec (as could be required for any managed system conformance test, where the requirement to be tested is providing“an embedded bit-exact EVS mono downmix bitstream”).
  • the example shown in Figure 4 features the same mono input 301 and a stereo (and immersive audio) input 303.
  • the mono input 301 is passed to the encoder 31 1 and the bit exact EVS encoder 317 in the same manner as shown in Figure 3.
  • the stereo and immersive audio input 303 is passed to the encoder 31 1 and a pre-processor 315.
  • the encoder 31 1 in some embodiments comprises the pre-processor 315.
  • the pre-processor 315 may be configured to receive the stereo and immersive inputs and pre-process the signal before being passed to the IVAS core encoder 319.
  • the metadata output of the pre-processor 315 can be passed to the metadata encoder 321 .
  • the apparatus differs from the example shown in Figure 3 in that the format generator/inputs include a further input.
  • the further input is designated the Embedded EVS stereo using MASA (EETU-MASA) input 401 .
  • MASA EETU-MASA
  • a mono-downmixed parametric stereo representation of the stereo input is thus used which removes the need for passing the stereo or other multichannel audio signals through the pre-processor, for the inclusion of the downmixer prior to the EVS encoder, and allows the use of the metadata encoding as is.
  • the mono-downmixed parametric stereo representation in some embodiments is an extension of the MASA format.
  • the extension is compatible with the MASA format parameter set. In principle, it is straightforward to allow encoding mode switching with this input, however, in some embodiments the mode is primarily used for the embedded bit-exact EVS stereo operation.
  • the EETU-MASA input can be defined as (or additionally support) the following:
  • Direction parameters per time-frequency (TF) tile one or two Direction parameters per time-frequency (TF) tile; a direction index limited to planar front sector (left-front-right) or any equivalent sector;
  • the stereo-to-mono downmix may be determined based on a capture or device implementation preference.
  • the EETU-MASA input is configured to pass the audio signal 441 to the bit- exact-EVS encoder 317 and to pass the metadata 443 to the metadata encoder 321 .
  • the encoder 31 1 may comprise a bit-exact EVS encoder 317.
  • the bit-exact EVS encoder 317 may be configured to receive the mono audio signal from the mono input 301 and the EETU-MASA input 401 and attempt to generate a bit exact EVS mono audio signal.
  • the encoder 31 1 may comprise the IVAS core encoder 319.
  • the IVAS core encoder 319 may be configured to receive the audio signals generated by the pre-processor 315 and encode these according to the IVAS standard.
  • the encoders comprises the metadata encoder 321 .
  • the metadata encoder 321 is configured to receive the spatial metadata from the EETU-MASA input 401 and pre-processor 315 and encode it or compress it in any suitable manner.
  • the rendering at the decoder is configured to provide a stereo signal. It is understood this stereo is preferably a head-locked stereo (in other words no head-tracking is needed and should not affect the rendering).
  • a mode selection selects based on a relevant criteria one of the two modes for each frame of audio. Typically, fluctuation from one mode to another and back on a frame- to-frame basis would be avoided.
  • the mode selection in this case is part of the front- end processing and seen on the format level by the audio encoder.
  • the EETU-MASA format comprises a channel configuration parameter which may be defined as a channel configuration specifying ‘stereo input as mono + restricted MASA metadata’.
  • this configuration information when detected by the encoder 41 1 configures the EVS encoder 317 to automatically trigger EVS mono encoding and configures the metadata encoder 321 to generate a separate metadata (stream) encoding for the stereo extension.
  • Example outputs from the encoder are show in Figure 5.
  • Figure 5a (the upper block) shows an example where the full IVAS payload is allocated between EVS BE bitstream and stereo (spatial) extension metadata.
  • the EVS BE allowance may be 9.6 kbps and the metadata 3.6 kbps, where the available bitrate is 16.4 kbps the EVS BE allowance may be 13.2 kbps and the metadata 3.2, kbps where the available bitrate is 24.4 kbps the EVS BE allowance may be 16.4 kbps and the metadata 8.0 kbps and where the available bitrate is 32.0 kbps the EVS BE allowance may be 24.4 kbps and the metadata 7.6 kbps.
  • Figure 5b illustrates an option where the extension bit rate is reduced to allow the first bit in the IVAS payload to indicate the extension usage as shown by the small 0.05 kbps block proceeding the EVS BE blocks.
  • the EVS BE allowance may be 9.6 kbps and the metadata 3.55 kbps, where the available bitrate is 16.4 kbps the extension usage is 0.05 kbps, the EVS BE allowance may be 13.2 kbps and the metadata 3.15 kbps where the available bitrate is 24.4 kbps the extension usage is 0.05 kbps, the EVS BE allowance may be 16.4 kbps and the metadata 7.95 kbps and where the available bitrate is 32.0 kbps the extension usage is 0.05 kbps, the EVS BE allowance may be 24.4 kbps and the metadata 7.55 kbps.
  • Figure 5c shows a further illustration for a 32-kbps packet, which is similar to the middle block but utilizing the first bit of each embedded stream for increased packet flexibility.
  • the 32 kbps packet can be divided into extension usage of 4x 0.05 kbps, 9.6 kbps EVS BE, 3.55 kbps metadata, 3.15 kbps metadata, 7.95 kbps metadata and 7.55 kbps metadata.
  • the 32 kbps packet can also be divided into extension usage of 3x 0.05 kbps, 13.2 kbps EVS BE, 3.15 kbps metadata, 7.95 kbps metadata and 7.55 kbps metadata.
  • the 32 kbps packet also can be divided into extension usage of 2x 0.05 kbps, 16.4 kbps EVS BE, 7.95 kbps metadata and 7.55 kbps metadata. Additionally is shown the 32 kbps packet can be divided into extension usage of 1 x 0.05 kbps, 24.4 kbps EVS BE and 7.55 kbps metadata. This illustrates the flexibility of the embedded packetization.
  • part of the bits used for the metadata can in some embodiments be used for residual coding, a differential extra layer on top of the core EVS coded downmix.
  • the difference can be applied on top of sub blocks of the core codec, for example as, Algebraic code-excited linear prediction (ACELP) sub blocks, TCX sub blocks, etc.
  • ACELP Algebraic code-excited linear prediction
  • TCX TCX sub blocks
  • the EETU-MASA input is a straightforward new extension of the MASA metadata definition providing an additional audio representation/mode based on a limitation applied on parameter usage (parameters used and allowed values). It is designed to be fully compatible with MASA format.
  • the EETU-MASA enables IVAS stereo operation with an embedded bit-exact EVS mono downmix bitstream.
  • the IVAS operation can also be a spatial operation with an embedded bit-exact EVS mono downmix bitstream.
  • the embodiments furthermore allow a switching between stereo and spatial IVAS operation based on the input metadata while providing an embedded bit-exact EVS mono downmix bitstream.
  • Figure 6 presents a first voice conferencing scenario between three participants with a wide range of device capabilities.
  • the system shows a legacy EVS upstream 602 implementation on user equipment 601 with mono capture and playback via earpiece (user A), an IVAS upstream 604 implementation on user equipment 603 with spatial audio capture and playback via headphones (user B), and an IVAS upstream 606 implementation on a conference room setup 605 using stereo audio capture and multi-channel loudspeaker presentation (user C).
  • the common codec that can be negotiated between these users is the EVS codec (either legacy EVS for all or with two users using EVS in IVAS).
  • IVAS MASA spatial audio upstream
  • IVAS stereo stereo audio upstream
  • Figure 7 presents the same scenario as Figure 6 for the downstream
  • Figure 8 which adds an additional fourth user D using user equipment 807 who is always in a listening-only mode.
  • fourth user D has joined the audio conference through a separate number or link allowing user D only to listen in.
  • each user is delivered an audio representation that is relevant to the user equipment with a reduced number of re-encodings and different bitstreams.
  • a transmitting user for example user equipment 603 or 605 sends an IVAS payload consisting of‘EVS+stereo metadata’ to the network.
  • the spatial metadata is stripped and legacy EVS is delivered.
  • the MCU may be configured to transmit EVS mono with stereo/spatial metadata stripped out 702 to the user equipment 601 and may be further configured to transmit EVS mono with stereo metadata (with any spatial metadata stripped out) 704 to the user equipment 603.
  • Immersive participants for example user C operating user equipment 605 may be configured to receive from the MCU 607 a EVS mono and spatial metadata downlink 706.
  • user D operating user equipment 807 may be configured to receive from the MCU 607 an EVS mono and spatial metadata downlink 808.
  • user B 603 could at least in some embodiments also receive bitstream describing EVS mono and spatial metadata instead of EVS mono and stereo metadata. This is because a spatial signal can be presented over headphones, e.g., via means of binauralization.
  • FIG. 9 is shown a further example wherein user B (user equipment 903) uploads or transmits to a MCU node 915 of a network an IVAS payload consisting of EVS mono 906 and stereo metadata 904.
  • the MCU nodes 915, 917 are shown passing the IVAS payload (EVS mono 906 and stereo metadata 904).
  • the receiving user equipment associated with receiver 1 (user equipment 901 ) is configured to receive from the MCU node 915 signals where the metadata is stripped and legacy EVS 906 is delivered.
  • the MCU node 917 may be configured to transmit EVS mono 906 with stereo/spatial metadata stripped out to the user equipment 905.
  • Immersive participants for example receiver 3 (user equipment 907) may be configured to receive from the MCU 917 an IVAS payload (in the form of an EVS mono 906 and spatial metadata 904) downlink.
  • an IVAS payload in the form of an EVS mono 906 and spatial metadata 904 downlink.
  • FIG. 10 is shown a further example of some embodiments.
  • the EETU-MASA input 401 (and a stereo or multichannel signal) is furthermore passed 1045 to the pre-processor 315.
  • backward compatible embedded EVS encoding in the IVAS codec is achieved by representing the stereo input as a combination of a mono downmix and a stereo (or more generally multichannel) representation.
  • the input is thus a 3-channel input.
  • the input can furthermore include full MASA metadata. In preferred embodiments, this can be considered to be a special case of MASA input format.
  • the mono downmix can be generated using any suitable means.
  • the resulting mono signal is utilized as one component of the 3-channel input.
  • the original stereo, or stereo with MASA metadata, is similarly used as one component of the 3-channel input.
  • the 3-channel input for the IVAS encoder can be created for example based on a mixing of at least two audio streams (e.g., an operation on an MCU). At least in some embodiments, any delay incurred by the mono downmix can be taken into account for the stereo signal of the 3- channel input.
  • the mono and stereo audio signals can thus be fully aligned in time.
  • the mono channel of the 3- channel input is fed (without metadata) into a bit-exact EVS encoder.
  • the EVS codec may be instructed to encode the signal at a fixed bit rate.
  • the EVS encoding can be without bit rate switching.
  • the stereo (+ metadata) is encoded using the IVAS core encoding (and metadata encoding).
  • the mono stream may be fed also to the IVAS core encoder (and may in some embodiments be always provided to the EVS encoder).
  • the substantially simultaneously encoded EVS and IVAS frames are packed together in some embodiments into a common package for transmission.
  • the EVS bitstream is part of the IVAS package in a special operation mode.
  • the EVS bitstream can be provided to a legacy EVS user. However, when IVAS is being decoded, the EVS bitstream may be simply discarded.
  • the first two examples may be implemented in this way.
  • By“embedded scalable” it is understood“regular” embedded operation, for example resembling Figure 5.
  • the third example may be implemented in such a manner.
  • This package in some embodiments includes three separate encodings: EVS at 13.2 kbps, EVS-based IVAS stereo at 16.4 kbps, and a 47.6 kbps IVAS encoding (that may be, for example, a high-quality stereo or a spatial signal).
  • Figure 1 1 presents a further use example associated with the further example shown in Figure 10.
  • a fixed packed size 1 1 15 may be used to communicate between MCUs such as MCU 1 121 and MCU 1 1 1 1 . While it may seem wasteful to deliver several encodings in a single package, there may be systems where a fixed prioritized packet size (e.g., a 64-kbps channel or some other size channel) for voice communication is implemented.
  • The“package-embedded” delivery can in this case be used to provide various levels of service, e.g., to conference call participants with different capabilities.
  • an IVAS mobile device 1 101 and user may establish an IVAS connection 1 102 (for example with a MASA input) with the MCU 1 121 .
  • a legacy EVS mobile device 1 105 and user may establish an EVS only connection 1 106 with the MCU 1 121 .
  • a further legacy EVS mobile device 1 103 and user may establish an EVS only connection 1 104 with the MCU 1 1 1 1 .
  • a fixed line device 1 107 and user may additionally establish a fixed package size 1 108 (for example 64kbps) connection with the MCU 1 1 1 1 .
  • the device may be any suitable electronics device or apparatus.
  • the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
  • the device 1400 comprises at least one processor or central processing unit 1407.
  • the processor 1407 can be configured to execute various program codes such as the methods such as described herein.
  • the device 1400 comprises a memory 141 1 .
  • the at least one processor 1407 is coupled to the memory 141 1 .
  • the memory 141 1 can be any suitable storage means.
  • the memory 141 1 comprises a program code section for storing program codes implementable upon the processor 1407.
  • the memory 141 1 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
  • the device 1400 comprises a user interface 1405.
  • the user interface 1405 can be coupled in some embodiments to the processor 1407.
  • the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405.
  • the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad.
  • the user interface 1405 can enable the user to obtain information from the device 1400.
  • the user interface 1405 may comprise a display configured to display information from the device 1400 to the user.
  • the user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400.
  • the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
  • the device 1400 comprises an input/output port 1409.
  • the input/output port 1409 in some embodiments comprises a transceiver.
  • the transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
  • the transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
  • the transceiver can communicate with further apparatus by any suitable known communications protocol.
  • the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
  • UMTS universal mobile telecommunications system
  • WLAN wireless local area network
  • IRDA infrared data communication pathway
  • the transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device.
  • the device 1400 may be employed as at least part of the synthesis device.
  • the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code.
  • the input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones (which may be a headtracked or a non-tracked headphones) or similar.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Stereophonic System (AREA)

Abstract

La présente invention concerne un appareil qui comprend des moyens pour : obtenir un format d'entrée pour générer un signal audio mono codé et/ou un signal audio multicanal, le format d'entrée comprenant : un signal audio mono et un signal de métadonnées associé au signal mono, le signal de métadonnées étant conçu pour permettre la génération du signal audio multicanal codé à partir du signal audio mono.
PCT/FI2020/050014 2019-01-22 2020-01-09 Représentation audio et rendu associé WO2020152394A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1900871.3A GB2580899A (en) 2019-01-22 2019-01-22 Audio representation and associated rendering
GB1900871.3 2019-01-22

Publications (1)

Publication Number Publication Date
WO2020152394A1 true WO2020152394A1 (fr) 2020-07-30

Family

ID=65656028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2020/050014 WO2020152394A1 (fr) 2019-01-22 2020-01-09 Représentation audio et rendu associé

Country Status (2)

Country Link
GB (1) GB2580899A (fr)
WO (1) WO2020152394A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021170903A1 (fr) * 2020-02-28 2021-09-02 Nokia Technologies Oy Représentation audio et rendu associé
CN116830193A (zh) * 2023-04-11 2023-09-29 北京小米移动软件有限公司 音频码流信号处理方法、装置、电子设备和存储介质
JP7491393B2 (ja) 2020-11-05 2024-05-28 日本電信電話株式会社 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7491394B2 (ja) 2020-11-05 2024-05-28 日本電信電話株式会社 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7491395B2 (ja) 2020-11-05 2024-05-28 日本電信電話株式会社 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148526A1 (fr) * 2016-03-03 2017-09-08 Nokia Technologies Oy Codeur de signal audio, décodeur de signal audio, procédé de codage et procédé de décodage
GB2559200A (en) * 2017-01-31 2018-08-01 Nokia Technologies Oy Stereo audio signal encoder
GB2559199A (en) * 2017-01-31 2018-08-01 Nokia Technologies Oy Stereo audio signal encoder
WO2018193163A1 (fr) * 2017-04-20 2018-10-25 Nokia Technologies Oy Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SE519985C2 (sv) * 2000-09-15 2003-05-06 Ericsson Telefon Ab L M Kodning och avkodning av signaler från flera kanaler
US9454973B2 (en) * 2009-04-07 2016-09-27 Telefonaktiebolaget Lm Ericsson (Publ) Method and arrangement for providing a backwards compatible payload format
US9460729B2 (en) * 2012-09-21 2016-10-04 Dolby Laboratories Licensing Corporation Layered approach to spatial audio coding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017148526A1 (fr) * 2016-03-03 2017-09-08 Nokia Technologies Oy Codeur de signal audio, décodeur de signal audio, procédé de codage et procédé de décodage
GB2559200A (en) * 2017-01-31 2018-08-01 Nokia Technologies Oy Stereo audio signal encoder
GB2559199A (en) * 2017-01-31 2018-08-01 Nokia Technologies Oy Stereo audio signal encoder
WO2018193163A1 (fr) * 2017-04-20 2018-10-25 Nokia Technologies Oy Amélioration de lecture de haut-parleur à l'aide d'un signal audio traité en étendue spatiale

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021170903A1 (fr) * 2020-02-28 2021-09-02 Nokia Technologies Oy Représentation audio et rendu associé
JP7491393B2 (ja) 2020-11-05 2024-05-28 日本電信電話株式会社 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7491394B2 (ja) 2020-11-05 2024-05-28 日本電信電話株式会社 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7491395B2 (ja) 2020-11-05 2024-05-28 日本電信電話株式会社 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
CN116830193A (zh) * 2023-04-11 2023-09-29 北京小米移动软件有限公司 音频码流信号处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
GB2580899A (en) 2020-08-05
GB201900871D0 (en) 2019-03-13

Similar Documents

Publication Publication Date Title
US12014743B2 (en) Spatial audio parameter merging
WO2020152394A1 (fr) Représentation audio et rendu associé
RU2759160C2 (ru) УСТРОЙСТВО, СПОСОБ И КОМПЬЮТЕРНАЯ ПРОГРАММА ДЛЯ КОДИРОВАНИЯ, ДЕКОДИРОВАНИЯ, ОБРАБОТКИ СЦЕНЫ И ДРУГИХ ПРОЦЕДУР, ОТНОСЯЩИХСЯ К ОСНОВАННОМУ НА DirAC ПРОСТРАНСТВЕННОМУ АУДИОКОДИРОВАНИЮ
JP6045696B2 (ja) オーディオ信号処理方法および装置
CN111819863A (zh) 用音频信号及相关联元数据表示空间音频
CN114600188A (zh) 用于音频编码的装置和方法
KR20210072736A (ko) 인코딩 및 디코딩 동작을 단순화하기 위해 상이한 포맷으로 캡처된 오디오 신호들을 축소된 수의 포맷으로 변환하는 것
WO2019105575A1 (fr) Détermination de codage de paramètre audio spatial et décodage associé
EP4315324A1 (fr) Combinaison de flux audio spatiaux
CN112567765B (zh) 空间音频捕获、传输和再现
EP3824464A1 (fr) Commande de la concentration audio pour le traitement audio spatial
AU2022233430A1 (en) Audio codec with adaptive gain control of downmixed signals
WO2022038307A1 (fr) Opération de transmission discontinue pour des paramètres audio spatiaux
WO2022223133A1 (fr) Codage de paramètres spatiaux du son et décodage associé
US20230188924A1 (en) Spatial Audio Object Positional Distribution within Spatial Audio Communication Systems
US20230197087A1 (en) Spatial audio parameter encoding and associated decoding
WO2020201619A1 (fr) Représentation audio spatiale et rendu associé
WO2023179846A1 (fr) Codage audio spatial paramétrique
EP4364136A1 (fr) Création d'un flux audio spatial à partir d'objets audio ayant une étendue spatiale
Fug et al. An Introduction to MPEG-H 3D Audio
WO2023088560A1 (fr) Traitement de métadonnées pour ambiophonie de premier ordre
CA3208666A1 (fr) Transformation de parametres audio spatiaux
CN116982109A (zh) 具有下混信号自适应增益控制的音频编解码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20744558

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20744558

Country of ref document: EP

Kind code of ref document: A1