EP3766262B1 - Spatial audio parameter smoothing - Google Patents
Spatial audio parameter smoothing Download PDFInfo
- Publication number
- EP3766262B1 EP3766262B1 EP19767481.5A EP19767481A EP3766262B1 EP 3766262 B1 EP3766262 B1 EP 3766262B1 EP 19767481 A EP19767481 A EP 19767481A EP 3766262 B1 EP3766262 B1 EP 3766262B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- audio signal
- parameter
- spatial
- generate
- direction smoothness
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000009499 grossing Methods 0.000 title claims description 70
- 230000005236 sound signal Effects 0.000 claims description 132
- 238000004091 panning Methods 0.000 claims description 31
- 238000000034 method Methods 0.000 claims description 27
- 238000004458 analytical method Methods 0.000 claims description 24
- 230000003044 adaptive effect Effects 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 8
- 238000012935 Averaging Methods 0.000 claims 8
- 230000015572 biosynthetic process Effects 0.000 description 21
- 238000003786 synthesis reaction Methods 0.000 description 21
- 230000002123 temporal effect Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 8
- 239000004065 semiconductor Substances 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000008867 communication pathway Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012732 spatial analysis Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/007—Two-channel systems in which the audio signals are in digital form
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
- H04S7/304—For headphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0012—Smoothing of parameters of the decoder interpolation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/03—Aspects of down-mixing multi-channel audio to configurations with lower numbers of playback channels, e.g. 7.1 -> 5.1
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Definitions
- the present application relates to apparatus and methods for temporal spatial audio parameter smoothing. This includes but is not exclusively for sound reproduction systems and sound reproduction methods producing multichannel audio channel outputs.
- Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters.
- parameters such as directions of the sound in frequency bands, and the ratio parameters expressing relative energies of the directional and non-directional parts of the captured sound in frequency bands.
- These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array.
- These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
- the directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
- a parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the proportion of the sound energy that is directional) can be also utilized as the spatial metadata for an audio codec.
- these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata.
- the stereo signal could be encoded, for example, with an AAC encoder.
- a decoder can decode the audio signals into PCM signals, and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
- Patent application publication US2009067634 discloses modifying spatial audio parameters associated with one or more audio objects of a stereo or multichannel audio signal to provide remixing capabilities.
- Patent application publication WO2014162171 discloses a spatial audio analyser configured to determine an audio source with a location associated with a visual image element, and an audio processor arranged to change an audio characteristic of the audio source in response to a control input.
- Patent application publication EP2942981 discloses an audio signal processing system for consistent acoustic scene reproduction based on informed spatial filtering.
- Patent application publication US2013329922 discloses using vector base amplitude panning (VBAP) for playing back an object's audio and using the positioning of sound reproduction devices and the object' s location information to determine which sound reproduction devices are used for playing back the object's audio.
- VBAP vector base amplitude panning
- Patent application publication JP2015080119 discloses a method of improving the degree of freedom when calculating a panning coefficient for sound image localisation within a three-dimensional space.
- a computer program product stored on a medium may cause an apparatus to perform the method as described herein.
- An electronic device may comprise apparatus as described herein.
- a chipset may comprise apparatus as described herein.
- Embodiments of the present application aim to address problems associated with the state of the art.
- the spatial sound source is a microphone array.
- the spatial sound source may be a 5.1 multichannel or other format multi-channel mix or Ambisonics signals.
- Parametric spatial audio capture refers to adaptive DSP-driven audio capture methods covering 1) analysing perceptually relevant parameters in frequency bands, for example, the directionality of the propagating sound at the recording position, and 2) reproducing spatial sound in a perceptual sense at the rendering side according to the estimated spatial parameters.
- the reproduction can be, for example, for headphones or multichannel loudspeaker setups.
- Parametric spatial audio capture methods may employ these determined parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands to describe the perceptual spatial properties of the captured sound at the position of the microphone array and may use these parameters in synthesis of the spatial sound.
- SPAC Parametric spatial audio capture methods
- the spatial properties are estimated from the sound field, they can significantly fluctuate over time and frequency, e.g., due to the reverberation and/or multiple simultaneous sound sources.
- parametric spatial audio processing methods typically utilize smoothing in the synthesis, in order to avoid possible artefacts caused by rapidly fluctuating parameters (these artefacts are typically referred to as "musical noise").
- Similar parametrization may also be used for the compression of spatial audio, e.g., from 5.1 multichannel signals.
- the parameters are estimated from the input loudspeaker signals. Nevertheless, the parameters typically fluctuate also in this case. Hence, the temporal smoothing is needed also with loudspeaker input.
- the spatial parameters are determined in the time-frequency domain, i.e., each parameter value is associated with a certain frequency band and temporal frame.
- Examples of possible spatial parameters include (but are not limited to):
- the spatial audio parametrizations describe how the sound is distributed in space, either generally (e.g., using directions) or relatively (e.g., as level differences between certain channels).
- the audio and the parameters may be processed and/or transmitted/stored in between the analysis and the synthesis.
- the parametric spatial audio processing methods are often based on analysing the direction of arrival (and other spatial parameters) in frequency bands. If there would be a single sound source in anechoic condition, the direction would stably point to the sound source at all frequencies. However, in typical acoustic environments, the microphones capture also other sounds than just the sound source, such as reverberation and ambient sounds. Moreover, there may be multiple simultaneous sources. As a result, the estimated directions typically significantly fluctuate over time and the estimates are different at different frequency bands.
- Parametric spatial audio processing methods such as employed in embodiments as described in further detail hereafter synthesize the spatial sound based on the analysed parameters (such as the aforementioned direction) and related audio signals (e.g., 2 captured microphone signals).
- vector base amplitude panning VBAP
- VBAP computes gains for a subset of loudspeakers based on the direction, and the audio signal is multiplied with these gains and fed to these loudspeakers.
- the concept as discussed hereafter proposes apparatus and methods to adapt the smoothing needed in the synthesis of spatial sound in parametric spatial audio processing in order to have quality audio output with different types of sound scenes.
- the embodiments as described hereafter relate to parametric spatial audio processing where a solution is provided to improve the temporal smoothing processing needed in the synthesis of spatial audio in the aforementioned parametric spatial audio processing and where the temporal smoothing is improved by analysing the required amount of smoothing adaptively.
- the analysis being related to the stability of the direction-related parameter(s) and producing a measure of directional stability and determining the time coefficients of the temporal smoothing based on the measure of directional stability.
- the direction-related parameter may as described in further detail in the embodiments hereafter refer to a direction.
- the amount of smoothing can be analysed using the direct-to-total energy ratio.
- the value of the energy ratio is monitored over time, and where it is constantly high, the time coefficient of the smoothing can be set smaller (less smoothing applied).
- the time coefficient can be set to a default value (more smoothing applied).
- FIG. 1 A block diagram of an example system for implementing some embodiments is shown in Figure 1 .
- Figure 1 shows an example capture device 101.
- the capture device may be a VR capture device, a mobile phone or any other suitable electronic apparatus comprising one or more microphone arrays.
- the capture device 101 thus in some embodiments comprises microphones 100 1 , 100 2.
- the microphone audio signals 102 captured by the microphones 100 1 , 100 2 may be stored and later processed, or directly processed.
- An analysis processor 103 may receive the microphone audio signals 102 from the capture device 101.
- the analysis processor 103 can, for example, be a computer or a mobile phone (running suitable software), or alternatively a specific device utilizing, for example, field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs).
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- the capture device 101 and the analysis processor 103 are implemented on the same apparatus or device.
- the analysis processor Based on the microphone-array signals, the analysis processor creates a data stream 104.
- the data stream may comprise transport audio signals and spatial metadata (e.g., directions and energy ratios in frequency bands).
- the data stream 104 may be transmitted or stored for example within some storage 105 such as memory, or alternatively directly processed in the same device.
- a synthesis processor 107 may receive the data stream 104.
- the synthesis processor 107 can, for example, be a computer or a mobile phone (running suitable software), or alternatively a specific device utilizing, for example, FPGAs or ASICs.
- the synthesis processor can be configured to produce output audio signals.
- the output signals can be binaural signals 109.
- the output signals can be multi-channel signals.
- the headphones 111 or other playback apparatus may be configured to receive the output of the synthesis processor 107 and output the audio signals in a format suitable for listening.
- the initial operation is the capture (or otherwise input) of the audio signals as shown in Figure 2 by step 201.
- step 203 Having captured the audio signals they are analysed to generate the data stream as shown in Figure 2 by step 203.
- the data stream may then be transmitted and received (or stored and retrieved) as shown in Figure 2 by step 205.
- the output may be synthesized based at least on the data stream as shown in Figure 2 by step 207.
- the synthesized audio signal output signals may then be output to a suitable output such as headphones as shown in Figure 2 by step 209.
- an example analysis processor 103 such as shown in Figure 1 .
- the input to the analysis processor 103 are the microphone array signals 102.
- a transport audio signal generator 301 may be configured to receive the microphone array signals 102 and create the transport audio signals.
- the transport audio signals are selected from the microphone array signals.
- the microphone array signals may be downmixed to generate the transport audio signals.
- the transport audio signals may be obtained by processing the microphone array signals.
- the transport audio signal generator 301 may be configured to generate any suitable number of transport audio signals (or channels), for example in some embodiments the transport audio signal generator 301 is configured to generate two transport audio signals. In some embodiments the transport audio signal generator 301 is further configured to compress the audio signals. For example in some embodiments the audio signals may be compressed using an advanced audio coding (AAC) or enhanced voice services (EVS) compression coding.
- AAC advanced audio coding
- EVS enhanced voice services
- the analysis processor 103 comprises a spatial analyser 303.
- the spatial analyser 303 is also configured to receive the microphone array signals 103 and generate metadata 304 based on a spatial analysis of the microphone array signals.
- the spatial analyser 303 may be configured to determine any suitable spatial metadata parameter.
- spatial metadata parameters determined include (but are not limited to): Direction and direct-to-total energy ratio; Direction and diffuseness; Inter-channel level difference, inter-channel phase difference, and inter-channel coherence. In some embodiments these parameters are determined in time-frequency domain. It should be noted that also other parametrizations may be used than those presented above.
- the spatial audio parametrizations describe how the sound is distributed in space, either generally (e.g., using directions) or relatively (e.g., as level differences between certain channels).
- the metadata 304 comprises directions 306 and energy ratios 308.
- the metadata may be compressed and/or quantized.
- the analysis processor 103 may furthermore comprise a multiplexer or mux 305 which is configured to receive the metadata 304 and the transport audio signals 302 and generate a combined data stream 104.
- the combination may be any suitable combination.
- the input to the analysis processor 103 can also be other types of audio signals, such as multichannel loudspeaker signals, audio objects, or Ambisonic signals.
- the exact implementation of the analysis processor may be any suitable implementation (as indicated above a computer running suitable software, a FPGAs or ASICs etc). caused to produce the transport audio signals and the spatial metadata in the time-frequency domain.
- the initial operation is receiving the microphone array audio signals as shown in Figure 4 by step 401.
- the microphone audio signals Having received the microphone audio signals they are analysed to generate the transport audio signals (for example selection, downmixing or other processing) as shown in Figure 4 by step 403.
- the transport audio signals for example selection, downmixing or other processing
- microphone audio signals are spatially analysed to generate the metadata, for example the directions and energy ratios as shown in Figure 4 by step 405.
- the metadata and the transport audio signals may then be combined to generate the data stream as shown in Figure 4 by step 407.
- FIG. 5 an example synthesis processor 107 (as shown in Figure 1 ) according to some embodiments is shown.
- a demultiplexer, or demux, 501 is configured to receive the data stream 104 and caused to demultiplex the data stream into a transport audio signals 502 and metadata 504.
- the demultiplexer is furthermore caused to decode the audio signals.
- the metadata in some embodiments is with the time-frequency domain, and comprises parameters such as directions ⁇ (k,n) 506 and direct-to-total energy ratios r(k,n) 508, where k is the frequency band index and n the temporal frame index.
- the demultiplexed data is furthermore decompressed/dequantized to attempt to regenerate the originally determined parameters.
- a spatial synthesizer 503 is configured to receive the transport audio signals 502 and the metadata and caused to generate the multichannel output signals 510 such as the binaural output signals 109 shown in Figure 1 .
- the initial operation is receiving the data stream as shown in Figure 6 by step 601.
- the multichannel (binaural or otherwise) output signals may then be synthesized from the transport audio signals and the metadata as shown in Figure 6 by step 605.
- the multichannel (binaural or otherwise) output signals may then be output as shown in Figure 6 by step 607.
- the input to the spatial synthesizer 503 is in some embodiments the transport audio signals 502 and furthermore the metadata 504 (which may include the energy ratios 508 and the directions 506).
- the transport audio signals 502 are transformed to the time-frequency domain using a suitable transformer.
- a short-time Fourier transformer (STFT) 701 is configured to apply a short-time Fourier transform to the transport audio signals to generate suitable time-frequency domain audio signals S i ( k , n ) 700.
- STFT short-time Fourier transformer
- any suitable time-frequency transformer may be used, for example a quadrature mirror filterbank (QMF).
- a divider 705 may receive the time-frequency domain audio signals S i ( k , n ) 700 and the energy ratios 508 and divide the time-frequency domain audio signals S i ( k , n ) 700 to ambient and direct parts using the energy ratio r ( k , n ) 508.
- a smoothing coefficients determiner 703 may also receive the time-frequency domain audio signals S i ( k , n ) 700 and the energy ratios 508 and determine suitable smoothing coefficients 706.
- Figure 7b differs with respect to the example spatial synthesizer shown in Figure 7a in that the smoothing coefficients determiner in Figure 7b is caused to receive the time-frequency domain audio signals S i ( k , n ) 700 and the directions 506.
- the smoothing coefficients determiner 703 may be configured to adaptively determine the smoothing coefficient(s) ⁇ ( k , n ) .
- a panning gain determiner 715 may be configured to receive the directions 506 and based on the output speaker/headphone configuration and the directions determine suitable panning gains 708.
- the amplitude panning gains may be computed using any suitable manner, for example vector base amplitude panning (VBAP) based on the received direction ⁇ (k,n).
- VBAP vector base amplitude panning
- any suitable smoothing may be applied.
- the smoothing 'filter' may therefore be multiple order and similarly the smoothing coefficient ⁇ ( k, n ) may be a vector value.
- the actual value(s) of ⁇ may depend on the filterbank, and typically is frequency-dependent (values may include, e.g., 0.1). In general, the larger the value is, the less smoothing is applied.
- a decorrelator 707 is configured to receive the ambient audio signal part 702 and process it to make it perceived as being surrounding, for example by decorrelating and spreading the ambient audio signal part 702 across the audio scene.
- a positioner 709 is configured to receive the direct audio signal part 704 and the smoothed panning gains 710 and position the direct audio signal part 704 using a suitable positioning, for example using the smoothed panning gains and an amplitude panning operation.
- a merger 711 or other suitable combiner is configured to receive the spread ambient signal part from the decorrelator 707 and the positioned direct audio signals part from the positioner 709 and combine or merge these resulting audio signals.
- An inverse short-time Fourier transformer (Inverse STFT) 713 is configured to receive the combined audio signals and apply an inverse short-time Fourier transform (or other suitable frequency to time domain transform) to generate the multi-channel audio signals 510 which may be passed to a suitable output device such as the headphones or multi-channel loudspeaker setup.
- a suitable output device such as the headphones or multi-channel loudspeaker setup.
- the panning gains are determined directly from the direction metadata, and the "direct sound" is also positioned with these gains after smoothing.
- the panning gains are not directly determined from the direction metadata, but instead determined indirectly.
- the smoothing of these gains as described above may be applied to any suitably generated gains.
- the directions may be used (together with the energy ratios and transport audio signals) to determine a target energy distribution of the output multichannel signals.
- the target energy distribution may be compared to the energy distribution of the transport audio signals (or to the energy distribution of intermediate signals obtained from the transport audio signals by mixing).
- Panning gains or any gains that position audio may be obtained as a ratio of these values and the "Smoother" 717 may be applied to these gains.
- the method of generating the panning gains may be one of many optional methods which is then smoothed according to methods as described herein.
- the spatial synthesizer in some embodiments is configured to receive the transport audio signals as shown in Figures 8a and 8b by step 801.
- the spatial synthesizer in some embodiments is furthermore configured to receive the energy ratios as shown in Figures 8a and 8b by step 803.
- the spatial synthesizer in some embodiments is also configured to receive the directions as shown in Figures 8a and 8b by step 805.
- the received transport audio signals are in some embodiments converted into a time-frequency domain form, for example by applying a suitable time-frequency domain transform to the transport audio signals as shown in Figures 8a and 8b by step 807.
- the time-frequency domain audio signals may then in some embodiments be divided into ambient and direct parts (based on the energy ratios) as shown in Figures 8a and 8b by step 813.
- smoothing coefficients may be determined based on the energy ratios and the time-frequency domain audio signals as shown in Figure 8a by step 811.
- smoothing coefficients may be determined based on the directions and the time-frequency domain audio signals as shown in Figure 8b by step 851.
- Panning gains may be determined based on the received directions as shown in Figures 8a and 8b by step 809.
- a series of smoothed panning gains may be determined based on the determined panning gains and the smoothing coefficients as shown in Figures 8a and 8b by step 817.
- the ambient audio signal part may be decorrelated as shown in Figures 8a and 8b by step 815.
- the positional component of the audio signals may then be determined based on the smoothed panning gains and the direct audio signal part as shown in Figures 8a and 8b by step 819.
- a positional component of the audio signals or positioned audio signal can be a number of audio signals which are combined to produce a virtual sound source positioned in a three dimensional space.
- the positional component of the audio signals and the decorrelated ambient audio signal may then be combined or merged as shown in Figures 8a and 8b by step 821.
- the combined audio signals may then be inverse time-frequency domain transformed to generate the multichannel audio signals in a suitable format to be output as shown in Figures 8a and 8b by step 823.
- the smoothing coefficients determiner 703 is configured to generate values which may be used to smooth the panning gains in order to avoid "musical noise" artefacts.
- the inputs to the smoothing coefficients determiner 703 are shown as the time-frequency domain audio signals 700 and the energy ratios 508.
- a direction smoothness estimator 903 is configured to estimate a direction smoothness ⁇ ( k , n ) .
- this direction smoothness may be estimated or determined from the energy ratios r ( k, n ) 508.
- the direction smoothness value ⁇ ( k , n ) 904 can be estimated by using or calculating the fluctuation of the direction value.
- a circular variance of the directions ⁇ ( k, n ) is determined and this is used as the basis of a direction smoothness.
- any suitable analysis of the temporal fluctuation of the directions may be used to determine the direction smoothness estimate.
- An average direction smoothness estimator 905 is configured to receive the energy 902 and direction smoothness estimates 904 and determine an average over time (and in some embodiments over frequency).
- a direction smoothness estimates to smoothing coefficients converter may receive the averaged direction smoothness estimate ⁇ ' ( k, n ) 906 and generate the smoothing coefficients ⁇ ( k, n ) .
- ⁇ fast may, e.g., include 0.4
- ⁇ slow may, e.g., include 0.1.
- fast and slow coefficients may depend on the actual implementation and may be frequency-dependent.
- the smoothing coefficients may be a vector instead of a single value. This for example may occur when the smoothing is other than a first-order IIR smoothing.
- These embodiments may therefore may implement "fast settings” and “slow settings” which are interpolated based on the "averaged direction smoothness estimates”. In such embodiments these "settings" may depend on the implementation, for example whether it is a single value or a vector of values.
- the smoothing coefficients ⁇ ( k, n )706 may then be output.
- the time-frequency domain audio signals are received as shown in Figure 10 by step 1001.
- the estimate of the energy (of the audio signals) may be determined based on the time-frequency domain audio signals as shown in Figure 10 by step 1005.
- step 1007 Furthermore the estimate of the direction smoothness based on the energy ratios (or based on any other suitable parameter such as an analysis of the directions) is shown in Figure 10 by step 1007.
- the estimate of the average direction smoothness is then determined based on the energy estimate and the direction smoothness estimates as shown in Figure 10 by step 1009.
- the smoothness coefficients are then output as shown in Figure 10 by step 1013.
- the smoothness coefficients were determined based on the use of the spatial metadata (for example the metadata generated in the analysis processor such as found within a spatial audio capture (SPAC) which generates directions and direct-to-total energy ratios).
- SPAC spatial audio capture
- the above methods can be modified without inventive skill to be used with any method utilizing similar parameters.
- Some of the advantages of the proposed embodiments is that significant amount of smoothing can applied with typical sound scenes, and thus musical noise artefacts are avoided. Furthermore when the sound scene does not require so much smoothing, the amount of smoothing applied can be reduced and thus the reproduction can react faster to changes in the sound field.
- Figure 11 shows three graph traces showing a reference audio signal, a reproduction using a fixed smoothing coefficient and a reproduction using an adaptive smoothing coefficient.
- the sound scene contains two sources located in different directions in anechoic conditions. The sound was rendered to a multichannel setup.
- the reference graph trace 1101 shows the signal of one output channel of the audio signal and shows the first source 1103 but not the other source.
- the adaptive smoothing example 1121 having analysed that the directions are stable, and there is not as much need for temporal smoothing is configured to set the smoothing to a faster mode, and the sound source is not reproduced from the wrong channel. In such a manner the reproduction is perceived to react fast to changes in the direction.
- the implementation can be by software implemented, for example on a mobile phone (or a computer) 1200.
- the software running inside the mobile phone 1200 may be configured to receive an encoded bitstream (it may have been e.g., transmitted real-time or it may have been stored to the device).
- the bitstream can also be any other suitable bitstream.
- a demultiplexer 1203 (DEMUX) is configured to demultiplex the bitstream into an audio bitstream 1204 and to a spatial metadata bitstream 1206.
- An enhanced voice standard (EVS) or encoded bitstream decoder 1205 is configured to extract the transport audio signals 1206 from the audio bitstream (or any decoder that corresponds to the utilized codec).
- a metadata decoder 1207 is used to decompress the spatial metadata 1208, for example comprising the directions 1210 and energy ratios 1212.
- the spatial synthesiser 1209 (similar to the spatial synthesizer in the embodiments above) is configured to receive transport audio signals 1206 and the metadata 1208 and output multichannel loudspeaker signals 1211 that may be reproduced using a multichannel loudspeaker setup. In some embodiments the spatial synthesizer 1209 is configured to generate binaural audio signals that may be reproduced using headphones.
- a microphone array 1301 for example part of a mobile phone, is configured to capture audio signals 1302.
- the captured microphone array audio signals 1302 may be processed by software 1300 running inside the mobile phone.
- the software 1300 may be an analysis processor configured to analyse the captured microphone array signals 1302 in a manner such as described with respect to Figures 1 and 3 and is configured to generate spatial metadata 1304 (comprising directions 1306 and energy ratios 1308).
- a synthesis processor 1305 which is configured to receive the spatial metadata 1304 from the analysis processor 1303 along with the captured microphone array audio signals 1302 (or alternatively a subset or a processed set of the microphone signals).
- the synthesis processor 1305 may operate in a manner similar to the synthesis processor 1305 as described with respect to Figures 1 and 5 , 7a , 7b and 9 .
- the synthesis processor 1305 may be configured to output a multichannel audio signal (for example a binaural signal or a surround loudspeaker signal or Ambisonic signal).
- the multichannel audio signals 1307 can therefore be listened to directly (when fed to headphones or loudspeakers, or reproduced using an Ambisonic renderer), stored (with any suitable codec) and/or transmitted to a remote device.
- codec use implementation is described above it is noted that some embodiments may be used with any suitable codec that utilizes smoothing and can provide information on the smoothness of the direction-related parameters.
- the proposed method can also be applied in any kind of spatial audio processing which operates in time-frequency domain.
- the device may be any suitable electronics device or apparatus.
- the device 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc.
- the device 1400 comprises at least one processor or central processing unit 1407.
- the processor 1407 can be configured to execute various program codes such as the methods such as described herein.
- the device 1400 comprises a memory 1411.
- the at least one processor 1407 is coupled to the memory 1411.
- the memory 1411 can be any suitable storage means.
- the memory 1411 comprises a program code section for storing program codes implementable upon the processor 1407.
- the memory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by the processor 1407 whenever needed via the memory-processor coupling.
- the device 1400 comprises a user interface 1405.
- the user interface 1405 can be coupled in some embodiments to the processor 1407.
- the processor 1407 can control the operation of the user interface 1405 and receive inputs from the user interface 1405.
- the user interface 1405 can enable a user to input commands to the device 1400, for example via a keypad.
- the user interface 1405 can enable the user to obtain information from the device 1400.
- the user interface 1405 may comprise a display configured to display information from the device 1400 to the user.
- the user interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to the device 1400 and further displaying information to the user of the device 1400.
- the user interface 1405 may be the user interface for communicating with the position determiner as described herein.
- the device 1400 comprises an input/output port 1409.
- the input/output port 1409 in some embodiments comprises a transceiver.
- the transceiver in such embodiments can be coupled to the processor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network.
- the transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling.
- the transceiver can communicate with further apparatus by any suitable known communications protocol.
- the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
- UMTS universal mobile telecommunications system
- WLAN wireless local area network
- IRDA infrared data communication pathway
- the transceiver input/output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using the processor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device.
- the device 1400 may be employed as at least part of the synthesis device.
- the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using the processor 1407 executing suitable code.
- the input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones or similar.
- the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
- some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
- While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
- any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
- the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
- the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
- the design of integrated circuits is by and large a highly automated process.
- Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
- Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
- the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Description
- The present application relates to apparatus and methods for temporal spatial audio parameter smoothing. This includes but is not exclusively for sound reproduction systems and sound reproduction methods producing multichannel audio channel outputs.
- Parametric spatial audio processing is a field of audio signal processing where the spatial aspect of the sound is described using a set of parameters. For example, in parametric spatial audio capture from microphone arrays, it is a typical and an effective choice to estimate from the microphone array signals a set of parameters such as directions of the sound in frequency bands, and the ratio parameters expressing relative energies of the directional and non-directional parts of the captured sound in frequency bands. These parameters are known to well describe the perceptual spatial properties of the captured sound at the position of the microphone array. These parameters can be utilized in synthesis of the spatial sound accordingly, for headphones binaurally, for loudspeakers, or to other formats, such as Ambisonics.
- The directions and direct-to-total energy ratios in frequency bands are thus a parameterization that is particularly effective for spatial audio capture.
- A parameter set consisting of a direction parameter in frequency bands and an energy ratio parameter in frequency bands (indicating the proportion of the sound energy that is directional) can be also utilized as the spatial metadata for an audio codec. For example, these parameters can be estimated from microphone-array captured audio signals, and for example a stereo signal can be generated from the microphone array signals to be conveyed with the spatial metadata. The stereo signal could be encoded, for example, with an AAC encoder. A decoder can decode the audio signals into PCM signals, and process the sound in frequency bands (using the spatial metadata) to obtain the spatial output, for example a binaural output.
- Patent application publication
US2009067634 discloses modifying spatial audio parameters associated with one or more audio objects of a stereo or multichannel audio signal to provide remixing capabilities. - Patent application publication
WO2014162171 discloses a spatial audio analyser configured to determine an audio source with a location associated with a visual image element, and an audio processor arranged to change an audio characteristic of the audio source in response to a control input. - Patent application publication
EP2942981 discloses an audio signal processing system for consistent acoustic scene reproduction based on informed spatial filtering. - Patent application publication
US2013329922 discloses using vector base amplitude panning (VBAP) for playing back an object's audio and using the positioning of sound reproduction devices and the object' s location information to determine which sound reproduction devices are used for playing back the object's audio. - Patent application publication
JP2015080119 - There is provided according to a first aspect an apparatus for spatial audio signal processing, as set forth in
independent claim 1. - According to a second aspect there is provided a method for spatial audio signal processing, as set forth in independent claim 7.
- Preferred embodiments are set forth in the dependent claims.
- A computer program product stored on a medium may cause an apparatus to perform the method as described herein.
- An electronic device may comprise apparatus as described herein.
- A chipset may comprise apparatus as described herein.
- Embodiments of the present application aim to address problems associated with the state of the art.
- For a better understanding of the present application, reference will now be made by way of example to the accompanying drawings in which:
-
Figure 1 shows schematically an example system utilizing embodiments described hereafter; -
Figure 2 shows a flow diagram of the operation of the example system shown inFigure 1 ; -
Figure 3 shows schematically an example analysis processor shown inFigure 1 according to some embodiments; -
Figure 4 shows a flow diagram of the operation of the example analysis processor shown inFigure 3 ; -
Figure 5 shows schematically an example synthesis processor shown inFigure 1 according to some embodiments; -
Figure 6 shows a flow diagram of the operation of the example synthesis processor shown inFigure 5 ; -
Figures 7a and7b show schematically example spatial synthesizers shown inFigure 5 according to some embodiments; -
Figures 8a and8b show flow diagrams of the operation of the spatial synthesizers shown inFigures 7a and7b ; -
Figure 9 shows schematically an example smoothing coefficients determiner shown inFigures 7a and7b according to some embodiments; -
Figure 10 shows a flow diagram of the operation of the smoothing coefficients determiner shown inFigure 9 ; -
Figure 11 shows example graphs demonstrating the effect of implementing the embodiments; -
Figure 12 shows an example implementation of the embodiments as shown inFigures 1 to 10 ; -
Figure 13 shows a further example implementation of the embodiments as shown inFigures 1 to 10 ; and -
Figure 14 shows schematically an example device suitable for implementing the embodiments shown. - The following describes in further detail suitable apparatus and possible mechanisms for the provision of adaptive parameter smoothing.
- In the following embodiments and examples the spatial sound source is a microphone array. Alternatively the spatial sound source may be a 5.1 multichannel or other format multi-channel mix or Ambisonics signals.
- As described above parametric spatial audio capture methods can be used to enable a perceptually accurate spatial sound reproduction. Parametric spatial audio capture refers to adaptive DSP-driven audio capture methods covering 1) analysing perceptually relevant parameters in frequency bands, for example, the directionality of the propagating sound at the recording position, and 2) reproducing spatial sound in a perceptual sense at the rendering side according to the estimated spatial parameters. The reproduction can be, for example, for headphones or multichannel loudspeaker setups. By estimating and reproducing the perceptually relevant spatial properties (parameters) of the sound field, a spatial perception similar to that which would occur in the original sound field can be reproduced. As the result, the listener can perceive the multitude of sources, their directions and distances, as well as properties of the surrounding physical space, among the other spatial sound features, as if the listener was in the position of the capture device.
- Parametric spatial audio capture methods (SPAC) may employ these determined parameters such as directions of the sound in frequency bands, and the ratios between the directional and non-directional parts of the captured sound in frequency bands to describe the perceptual spatial properties of the captured sound at the position of the microphone array and may use these parameters in synthesis of the spatial sound. As the spatial properties are estimated from the sound field, they can significantly fluctuate over time and frequency, e.g., due to the reverberation and/or multiple simultaneous sound sources. Hence, parametric spatial audio processing methods typically utilize smoothing in the synthesis, in order to avoid possible artefacts caused by rapidly fluctuating parameters (these artefacts are typically referred to as "musical noise").
- Similar parametrization may also be used for the compression of spatial audio, e.g., from 5.1 multichannel signals. In this case, the parameters are estimated from the input loudspeaker signals. Nevertheless, the parameters typically fluctuate also in this case. Hence, the temporal smoothing is needed also with loudspeaker input.
- Typically, the spatial parameters are determined in the time-frequency domain, i.e., each parameter value is associated with a certain frequency band and temporal frame. Examples of possible spatial parameters include (but are not limited to):
- Direction and direct-to-total energy ratio
- Direction and diffuseness
- Inter-channel level difference, inter-channel phase difference, and inter-channel coherence
- These parameters are determined in time-frequency domain. It should be noted that also other parametrizations may be used than those presented above. In general, typically the spatial audio parametrizations describe how the sound is distributed in space, either generally (e.g., using directions) or relatively (e.g., as level differences between certain channels). Moreover, it should be noted that, in such methods, the audio and the parameters may be processed and/or transmitted/stored in between the analysis and the synthesis.
- The parametric spatial audio processing methods are often based on analysing the direction of arrival (and other spatial parameters) in frequency bands. If there would be a single sound source in anechoic condition, the direction would stably point to the sound source at all frequencies. However, in typical acoustic environments, the microphones capture also other sounds than just the sound source, such as reverberation and ambient sounds. Moreover, there may be multiple simultaneous sources. As a result, the estimated directions typically significantly fluctuate over time and the estimates are different at different frequency bands.
- Parametric spatial audio processing methods such as employed in embodiments as described in further detail hereafter synthesize the spatial sound based on the analysed parameters (such as the aforementioned direction) and related audio signals (e.g., 2 captured microphone signals). In the case of loudspeaker rendering, vector base amplitude panning (VBAP) is a common method to position the audio to the analysed direction. VBAP computes gains for a subset of loudspeakers based on the direction, and the audio signal is multiplied with these gains and fed to these loudspeakers.
- The concept as discussed hereafter proposes apparatus and methods to adapt the smoothing needed in the synthesis of spatial sound in parametric spatial audio processing in order to have quality audio output with different types of sound scenes.
- Furthermore the embodiments as described hereafter relate to parametric spatial audio processing where a solution is provided to improve the temporal smoothing processing needed in the synthesis of spatial audio in the aforementioned parametric spatial audio processing and where the temporal smoothing is improved by analysing the required amount of smoothing adaptively. The analysis being related to the stability of the direction-related parameter(s) and producing a measure of directional stability and determining the time coefficients of the temporal smoothing based on the measure of directional stability.
- The direction-related parameter may as described in further detail in the embodiments hereafter refer to a direction.
- In some embodiments the amount of smoothing can be analysed using the direct-to-total energy ratio. The value of the energy ratio is monitored over time, and where it is constantly high, the time coefficient of the smoothing can be set smaller (less smoothing applied). Correspondingly, where the energy ratio is not constantly high, the time coefficient can be set to a default value (more smoothing applied).
- A block diagram of an example system for implementing some embodiments is shown in
Figure 1 . -
Figure 1 shows anexample capture device 101. The capture device may be a VR capture device, a mobile phone or any other suitable electronic apparatus comprising one or more microphone arrays. Thecapture device 101 thus in some embodiments comprisesmicrophones microphones - An
analysis processor 103 may receive the microphone audio signals 102 from thecapture device 101. Theanalysis processor 103 can, for example, be a computer or a mobile phone (running suitable software), or alternatively a specific device utilizing, for example, field programmable gate arrays (FPGAs) or application specific integrated circuits (ASICs). In some embodiments thecapture device 101 and theanalysis processor 103 are implemented on the same apparatus or device. - Based on the microphone-array signals, the analysis processor creates a
data stream 104. The data stream may comprise transport audio signals and spatial metadata (e.g., directions and energy ratios in frequency bands). Thedata stream 104 may be transmitted or stored for example within somestorage 105 such as memory, or alternatively directly processed in the same device. - A
synthesis processor 107 may receive thedata stream 104. Thesynthesis processor 107 can, for example, be a computer or a mobile phone (running suitable software), or alternatively a specific device utilizing, for example, FPGAs or ASICs. Based on the data stream (the transport audio signals and the metadata). The synthesis processor can be configured to produce output audio signals. For headphone listening, the output signals can bebinaural signals 109. For loudspeaker rendering, the output signals can be multi-channel signals. - The
headphones 111 or other playback apparatus may be configured to receive the output of thesynthesis processor 107 and output the audio signals in a format suitable for listening. - With respect to
Figure 2 is shown an example summary of the operations of the apparatus shown inFigure 1 . - The initial operation is the capture (or otherwise input) of the audio signals as shown in
Figure 2 bystep 201. - Having captured the audio signals they are analysed to generate the data stream as shown in
Figure 2 bystep 203. - The data stream may then be transmitted and received (or stored and retrieved) as shown in
Figure 2 bystep 205. - Having received or retrieved the data stream, the output may be synthesized based at least on the data stream as shown in
Figure 2 bystep 207. - The synthesized audio signal output signals may then be output to a suitable output such as headphones as shown in
Figure 2 bystep 209. - With respect to
Figure 3 anexample analysis processor 103, such as shown inFigure 1 , is presented. The input to theanalysis processor 103 are the microphone array signals 102. - A transport
audio signal generator 301 may be configured to receive the microphone array signals 102 and create the transport audio signals. In some embodiments the transport audio signals are selected from the microphone array signals. In some embodiments the microphone array signals may be downmixed to generate the transport audio signals. In some embodiments the transport audio signals may be obtained by processing the microphone array signals. - The transport
audio signal generator 301 may be configured to generate any suitable number of transport audio signals (or channels), for example in some embodiments the transportaudio signal generator 301 is configured to generate two transport audio signals. In some embodiments the transportaudio signal generator 301 is further configured to compress the audio signals. For example in some embodiments the audio signals may be compressed using an advanced audio coding (AAC) or enhanced voice services (EVS) compression coding. - Furthermore the
analysis processor 103 comprises aspatial analyser 303. Thespatial analyser 303 is also configured to receive the microphone array signals 103 and generatemetadata 304 based on a spatial analysis of the microphone array signals. Thespatial analyser 303 may be configured to determine any suitable spatial metadata parameter. For example spatial metadata parameters determined include (but are not limited to): Direction and direct-to-total energy ratio; Direction and diffuseness; Inter-channel level difference, inter-channel phase difference, and inter-channel coherence. In some embodiments these parameters are determined in time-frequency domain. It should be noted that also other parametrizations may be used than those presented above. In general, typically the spatial audio parametrizations describe how the sound is distributed in space, either generally (e.g., using directions) or relatively (e.g., as level differences between certain channels). In the example shown inFigure 3 themetadata 304 comprisesdirections 306 andenergy ratios 308. In some embodiments the metadata may be compressed and/or quantized. Theanalysis processor 103 may furthermore comprise a multiplexer ormux 305 which is configured to receive themetadata 304 and the transport audio signals 302 and generate a combineddata stream 104. The combination may be any suitable combination. - It should be noted that in some embodiments the input to the
analysis processor 103 can also be other types of audio signals, such as multichannel loudspeaker signals, audio objects, or Ambisonic signals. Furthermore, the exact implementation of the analysis processor may be any suitable implementation (as indicated above a computer running suitable software, a FPGAs or ASICs etc). caused to produce the transport audio signals and the spatial metadata in the time-frequency domain. - With respect to
Figure 4 is shown an example summary of the operations of the analysis processor shown inFigure 3 . - The initial operation is receiving the microphone array audio signals as shown in
Figure 4 bystep 401. - Having received the microphone audio signals they are analysed to generate the transport audio signals (for example selection, downmixing or other processing) as shown in
Figure 4 bystep 403. - Furthermore the microphone audio signals are spatially analysed to generate the metadata, for example the directions and energy ratios as shown in
Figure 4 bystep 405. - The metadata and the transport audio signals may then be combined to generate the data stream as shown in
Figure 4 bystep 407. - With respect to
Figure 5 an example synthesis processor 107 (as shown inFigure 1 ) according to some embodiments is shown. - A demultiplexer, or demux, 501 is configured to receive the
data stream 104 and caused to demultiplex the data stream into a transport audio signals 502 andmetadata 504. In some embodiments, where the transport audio signals were compressed within the analysis processor, the demultiplexer is furthermore caused to decode the audio signals. The metadata in some embodiments is with the time-frequency domain, and comprises parameters such as directions θ(k,n) 506 and direct-to-total energy ratios r(k,n) 508, where k is the frequency band index and n the temporal frame index. In the embodiments where the metadata is compressed/dequantized then the demultiplexed data is furthermore decompressed/dequantized to attempt to regenerate the originally determined parameters. - A
spatial synthesizer 503 is configured to receive the transport audio signals 502 and the metadata and caused to generate the multichannel output signals 510 such as the binaural output signals 109 shown inFigure 1 . - With respect to
Figure 6 is shown an example summary of the operations of the synthesis processor shown inFigure 5 . - The initial operation is receiving the data stream as shown in
Figure 6 bystep 601. - Having received the data stream, it is demultiplexed and optionally decoded to generate the transport audio signals and the metadata as shown in
Figure 6 bystep 603. - The multichannel (binaural or otherwise) output signals may then be synthesized from the transport audio signals and the metadata as shown in
Figure 6 bystep 605. - The multichannel (binaural or otherwise) output signals may then be output as shown in
Figure 6 bystep 607. - With respect to
Figures 7a and7b example spatial synthesizers 503 (as shown inFigure 5 ) according to some embodiments is shown. - The input to the
spatial synthesizer 503 is in some embodiments the transport audio signals 502 and furthermore the metadata 504 (which may include theenergy ratios 508 and the directions 506). - In some embodiments the transport audio signals 502 are transformed to the time-frequency domain using a suitable transformer. For example as shown in
Figure 7a and7b a short-time Fourier transformer (STFT) 701 is configured to apply a short-time Fourier transform to the transport audio signals to generate suitable time-frequency domain audio signals Si (k, n) 700. In some embodiments any suitable time-frequency transformer may be used, for example a quadrature mirror filterbank (QMF). - A
divider 705 may receive the time-frequency domain audio signals Si (k, n) 700 and theenergy ratios 508 and divide the time-frequency domain audio signals Si (k, n) 700 to ambient and direct parts using the energy ratio r(k, n) 508. - With respect to
Figure 7a a smoothingcoefficients determiner 703 may also receive the time-frequency domain audio signals Si (k, n) 700 and theenergy ratios 508 and determinesuitable smoothing coefficients 706.Figure 7b differs with respect to the example spatial synthesizer shown inFigure 7a in that the smoothing coefficients determiner inFigure 7b is caused to receive the time-frequency domain audio signals Si (k, n) 700 and thedirections 506. - The smoothing
coefficients determiner 703 may be configured to adaptively determine the smoothing coefficient(s) α(k, n). - A panning
gain determiner 715 may be configured to receive thedirections 506 and based on the output speaker/headphone configuration and the directions determine suitable panning gains 708. The amplitude panning gains may be computed using any suitable manner, for example vector base amplitude panning (VBAP) based on the received direction θ(k,n). - In some embodiments a panning gain smoother 717 is configured to receive the panning
gains 708 and the smoothingcoefficients 706 and based on these determine suitable smoothed panning gains 710. There are many ways to perform the smoothing. In some embodiments a first-order smoothing may be used. Thus for example the panning gain smoother 717 is configured to receive a current gain g(k, n), smoothing coefficients α(k, n) and also knowledge on the last smoothed gain g'(k, n - 1) and determine a smoothed gain by: - In other words the current gain is multiplied with the smoothing coefficient α and the previous smoothed gain is multiplied with (1 - α).
- In other embodiments any suitable smoothing may be applied. The smoothing 'filter' may therefore be multiple order and similarly the smoothing coefficient α(k, n) may be a vector value. The actual value(s) of α may depend on the filterbank, and typically is frequency-dependent (values may include, e.g., 0.1). In general, the larger the value is, the less smoothing is applied.
- A
decorrelator 707 is configured to receive the ambientaudio signal part 702 and process it to make it perceived as being surrounding, for example by decorrelating and spreading the ambientaudio signal part 702 across the audio scene. - A
positioner 709 is configured to receive the directaudio signal part 704 and the smoothedpanning gains 710 and position the directaudio signal part 704 using a suitable positioning, for example using the smoothed panning gains and an amplitude panning operation. - A
merger 711 or other suitable combiner is configured to receive the spread ambient signal part from thedecorrelator 707 and the positioned direct audio signals part from thepositioner 709 and combine or merge these resulting audio signals. - An inverse short-time Fourier transformer (Inverse STFT) 713 is configured to receive the combined audio signals and apply an inverse short-time Fourier transform (or other suitable frequency to time domain transform) to generate the multi-channel audio signals 510 which may be passed to a suitable output device such as the headphones or multi-channel loudspeaker setup.
- In the examples and embodiments described in detail herein, for example as described with respect to the examples shown in
Figures 7a and7b the panning gains are determined directly from the direction metadata, and the "direct sound" is also positioned with these gains after smoothing. - In some embodiments there may be implementations where the panning gains are not directly determined from the direction metadata, but instead determined indirectly. Thus the smoothing of these gains as described above may be applied to any suitably generated gains.
- Thus for example, in some embodiments the directions may be used (together with the energy ratios and transport audio signals) to determine a target energy distribution of the output multichannel signals. The target energy distribution may be compared to the energy distribution of the transport audio signals (or to the energy distribution of intermediate signals obtained from the transport audio signals by mixing). Panning gains (or any gains that position audio) may be obtained as a ratio of these values and the "Smoother" 717 may be applied to these gains.
- In summary the method of generating the panning gains may be one of many optional methods which is then smoothed according to methods as described herein.
- With respect to
Figures 8a and8b the operations of thespatial synthesizer 503 shown inFigures 7a and7b according to some embodiments are described in further detail. - The spatial synthesizer in some embodiments is configured to receive the transport audio signals as shown in
Figures 8a and8b bystep 801. - The spatial synthesizer in some embodiments is furthermore configured to receive the energy ratios as shown in
Figures 8a and8b bystep 803. - The spatial synthesizer in some embodiments is also configured to receive the directions as shown in
Figures 8a and8b bystep 805. - The received transport audio signals are in some embodiments converted into a time-frequency domain form, for example by applying a suitable time-frequency domain transform to the transport audio signals as shown in
Figures 8a and8b bystep 807. - The time-frequency domain audio signals may then in some embodiments be divided into ambient and direct parts (based on the energy ratios) as shown in
Figures 8a and8b bystep 813. - Furthermore the smoothing coefficients may be determined based on the energy ratios and the time-frequency domain audio signals as shown in
Figure 8a bystep 811. Alternatively the smoothing coefficients may be determined based on the directions and the time-frequency domain audio signals as shown inFigure 8b bystep 851. - Panning gains may be determined based on the received directions as shown in
Figures 8a and8b bystep 809. - A series of smoothed panning gains may be determined based on the determined panning gains and the smoothing coefficients as shown in
Figures 8a and8b bystep 817. - The ambient audio signal part may be decorrelated as shown in
Figures 8a and8b bystep 815. - The positional component of the audio signals may then be determined based on the smoothed panning gains and the direct audio signal part as shown in
Figures 8a and8b bystep 819. In such embodiments a positional component of the audio signals or positioned audio signal can be a number of audio signals which are combined to produce a virtual sound source positioned in a three dimensional space. - The positional component of the audio signals and the decorrelated ambient audio signal may then be combined or merged as shown in
Figures 8a and8b bystep 821. - Furthermore the combined audio signals may then be inverse time-frequency domain transformed to generate the multichannel audio signals in a suitable format to be output as shown in
Figures 8a and8b bystep 823. - With respect to
Figure 9 an example smoothing coefficients determiner 703 (such as shown inFigures 7a and7b ) according to some embodiments is shown. The smoothingcoefficients determiner 703 is configured to generate values which may be used to smooth the panning gains in order to avoid "musical noise" artefacts. - The inputs to the smoothing
coefficients determiner 703 are shown as the time-frequency domain audio signals 700 and theenergy ratios 508. -
- A
direction smoothness estimator 903 is configured to estimate a direction smoothness ξ(k, n). In some embodiments, such as shown in the examples inFigures 7a and8a , this direction smoothness may be estimated or determined from the energy ratios r(k, n) 508. For example the direction smoothness estimator may be configured to calculate the direction smoothness by the following:
In some embodiments, such as shown in the examples inFigures 7b and8b , the direction smoothness value ξ(k, n) 904 can be estimated by using or calculating the fluctuation of the direction value. In such embodiments a circular variance of the directions θ(k, n) is determined and this is used as the basis of a direction smoothness. In other embodiments any suitable analysis of the temporal fluctuation of the directions may be used to determine the direction smoothness estimate. - An average
direction smoothness estimator 905 is configured to receive theenergy 902 and direction smoothness estimates 904 and determine an average over time (and in some embodiments over frequency). The average direction smoothness estimator may therefore be configured to perform a first-order smoothing based on a current estimate ξ(k, n) a previous average value ξ'(k, n - 1) and smoothing coefficient β to generate an averaged direction smoothness estimate ξ'(k, n) 906, for example by the following: - A direction smoothness estimates to smoothing coefficients converter may receive the averaged direction smoothness estimate ξ'(k, n) 906 and generate the smoothing coefficients α(k, n). For example in some embodiments the averaged direction smoothness estimates ξ'(k, n) are converted to the actual smoothing coefficients by the following
- The values of αfast may, e.g., include 0.4, and the values of αslow may, e.g., include 0.1. These fast and slow coefficients may depend on the actual implementation and may be frequency-dependent.
- In some embodiments the smoothing coefficients may be a vector instead of a single value. This for example may occur when the smoothing is other than a first-order IIR smoothing. These embodiments may therefore may implement "fast settings" and "slow settings" which are interpolated based on the "averaged direction smoothness estimates". In such embodiments these "settings" may depend on the implementation, for example whether it is a single value or a vector of values.
- The smoothing coefficients α(k, n)706 may then be output.
- With respect to
Figure 10 an example flow diagram showing the operation of the smoothing coefficients determiner according to some embodiments is shown. - The time-frequency domain audio signals are received as shown in
Figure 10 bystep 1001. - Furthermore the energy ratios are received as shown in
Figure 10 bystep 1003. - The estimate of the energy (of the audio signals) may be determined based on the time-frequency domain audio signals as shown in
Figure 10 bystep 1005. - Furthermore the estimate of the direction smoothness based on the energy ratios (or based on any other suitable parameter such as an analysis of the directions) is shown in
Figure 10 bystep 1007. - The estimate of the average direction smoothness is then determined based on the energy estimate and the direction smoothness estimates as shown in
Figure 10 bystep 1009. - Then the average direction smoothness estimate is converted to smoothness coefficients as shown in
Figure 10 bystep 1011. - The smoothness coefficients are then output as shown in
Figure 10 bystep 1013. - In the above examples the smoothness coefficients were determined based on the use of the spatial metadata (for example the metadata generated in the analysis processor such as found within a spatial audio capture (SPAC) which generates directions and direct-to-total energy ratios). It should be noted that the above methods can be modified without inventive skill to be used with any method utilizing similar parameters. For example in context of Direct Audio Coding (DirAC), the direction smoothness can be determined as
- Some of the advantages of the proposed embodiments is that significant amount of smoothing can applied with typical sound scenes, and thus musical noise artefacts are avoided. Furthermore when the sound scene does not require so much smoothing, the amount of smoothing applied can be reduced and thus the reproduction can react faster to changes in the sound field.
- The effect of the proposed embodiments can be seen in
Figure 11 which shows three graph traces showing a reference audio signal, a reproduction using a fixed smoothing coefficient and a reproduction using an adaptive smoothing coefficient. In this example the sound scene contains two sources located in different directions in anechoic conditions. The sound was rendered to a multichannel setup. - The
reference graph trace 1101 shows the signal of one output channel of the audio signal and shows thefirst source 1103 but not the other source. - In the fixed smoothing example graph trace 1111, excessive temporal smoothing causes the sound to be reproduced partially still from the first direction (shown from 1.4 seconds to 1.8 seconds) even though the sound source is not present anymore for that direction. As a result, the reproduction is perceived to slowly react to changes in the direction.
- On the contrary, the adaptive smoothing example 1121 having analysed that the directions are stable, and there is not as much need for temporal smoothing is configured to set the smoothing to a faster mode, and the sound source is not reproduced from the wrong channel. In such a manner the reproduction is perceived to react fast to changes in the direction.
- With respect to
Figure 12 an example implementation of some further embodiments is shown. In these embodiments the implementation can be by software implemented, for example on a mobile phone (or a computer) 1200. The software running inside the mobile phone 1200 may be configured to receive an encoded bitstream (it may have been e.g., transmitted real-time or it may have been stored to the device). The bitstream can also be any other suitable bitstream. A demultiplexer 1203 (DEMUX) is configured to demultiplex the bitstream into anaudio bitstream 1204 and to aspatial metadata bitstream 1206. - An enhanced voice standard (EVS) or encoded
bitstream decoder 1205 is configured to extract thetransport audio signals 1206 from the audio bitstream (or any decoder that corresponds to the utilized codec). - A
metadata decoder 1207 is used to decompress thespatial metadata 1208, for example comprising thedirections 1210 andenergy ratios 1212. - The spatial synthesiser 1209 (similar to the spatial synthesizer in the embodiments above) is configured to receive
transport audio signals 1206 and themetadata 1208 and outputmultichannel loudspeaker signals 1211 that may be reproduced using a multichannel loudspeaker setup. In some embodiments the spatial synthesizer 1209 is configured to generate binaural audio signals that may be reproduced using headphones. - With respect to
Figure 13 a further example implementation is shown according to some further embodiments. In this example implementation amicrophone array 1301, for example part of a mobile phone, is configured to capture audio signals 1302. The captured microphonearray audio signals 1302 may be processed bysoftware 1300 running inside the mobile phone. In thesoftware 1300 may be an analysis processor configured to analyse the captured microphone array signals 1302 in a manner such as described with respect toFigures 1 and3 and is configured to generate spatial metadata 1304 (comprisingdirections 1306 and energy ratios 1308). Furthermore there may comprise asynthesis processor 1305 which is configured to receive thespatial metadata 1304 from theanalysis processor 1303 along with the captured microphone array audio signals 1302 (or alternatively a subset or a processed set of the microphone signals). Thesynthesis processor 1305 may operate in a manner similar to thesynthesis processor 1305 as described with respect toFigures 1 and5 ,7a ,7b and9 . Depending on the configuration, thesynthesis processor 1305 may be configured to output a multichannel audio signal (for example a binaural signal or a surround loudspeaker signal or Ambisonic signal). Themultichannel audio signals 1307 can therefore be listened to directly (when fed to headphones or loudspeakers, or reproduced using an Ambisonic renderer), stored (with any suitable codec) and/or transmitted to a remote device. - Although the codec use implementation is described above it is noted that some embodiments may be used with any suitable codec that utilizes smoothing and can provide information on the smoothness of the direction-related parameters.
- Similarly as depicted in the example implementation in
Figure 13 , the proposed method can also be applied in any kind of spatial audio processing which operates in time-frequency domain. - With respect to
Figure 14 an example electronic device which may be used as the analysis or synthesis processor is shown. The device may be any suitable electronics device or apparatus. For example in some embodiments thedevice 1400 is a mobile device, user equipment, tablet computer, computer, audio playback apparatus, etc. - In some embodiments the
device 1400 comprises at least one processor orcentral processing unit 1407. Theprocessor 1407 can be configured to execute various program codes such as the methods such as described herein. - In some embodiments the
device 1400 comprises amemory 1411. In some embodiments the at least oneprocessor 1407 is coupled to thememory 1411. Thememory 1411 can be any suitable storage means. In some embodiments thememory 1411 comprises a program code section for storing program codes implementable upon theprocessor 1407. Furthermore in some embodiments thememory 1411 can further comprise a stored data section for storing data, for example data that has been processed or to be processed in accordance with the embodiments as described herein. The implemented program code stored within the program code section and the data stored within the stored data section can be retrieved by theprocessor 1407 whenever needed via the memory-processor coupling. - In some embodiments the
device 1400 comprises auser interface 1405. Theuser interface 1405 can be coupled in some embodiments to theprocessor 1407. In some embodiments theprocessor 1407 can control the operation of theuser interface 1405 and receive inputs from theuser interface 1405. In some embodiments theuser interface 1405 can enable a user to input commands to thedevice 1400, for example via a keypad. In some embodiments theuser interface 1405 can enable the user to obtain information from thedevice 1400. For example theuser interface 1405 may comprise a display configured to display information from thedevice 1400 to the user. Theuser interface 1405 can in some embodiments comprise a touch screen or touch interface capable of both enabling information to be entered to thedevice 1400 and further displaying information to the user of thedevice 1400. In some embodiments theuser interface 1405 may be the user interface for communicating with the position determiner as described herein. - In some embodiments the
device 1400 comprises an input/output port 1409. The input/output port 1409 in some embodiments comprises a transceiver. The transceiver in such embodiments can be coupled to theprocessor 1407 and configured to enable a communication with other apparatus or electronic devices, for example via a wireless communications network. The transceiver or any suitable transceiver or transmitter and/or receiver means can in some embodiments be configured to communicate with other electronic devices or apparatus via a wire or wired coupling. - The transceiver can communicate with further apparatus by any suitable known communications protocol. For example in some embodiments the transceiver can use a suitable universal mobile telecommunications system (UMTS) protocol, a wireless local area network (WLAN) protocol such as for example IEEE 802.X, a suitable short-range radio frequency communication protocol such as Bluetooth, or infrared data communication pathway (IRDA).
- The transceiver input/
output port 1409 may be configured to receive the signals and in some embodiments determine the parameters as described herein by using theprocessor 1407 executing suitable code. Furthermore the device may generate a suitable downmix signal and parameter output to be transmitted to the synthesis device. - In some embodiments the
device 1400 may be employed as at least part of the synthesis device. As such the input/output port 1409 may be configured to receive the downmix signals and in some embodiments the parameters determined at the capture device or processing device as described herein, and generate a suitable audio signal format output by using theprocessor 1407 executing suitable code. The input/output port 1409 may be coupled to any suitable audio output for example to a multichannel speaker system and/or headphones or similar. - In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
- The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
- The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASIC), gate level circuits and processors based on multi-core processor architecture, as non-limiting examples.
- Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
- Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
Claims (10)
- An apparatus for spatial audio signal processing, the apparatus comprising at least one processor and at least one memory including a computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to:receive at least one audio signal;determine an energy ratio being a spatial parameter associated with the at least one audio signaldetermine a direction smoothness parameter by applying an exponent to the energy ratio;convert the direction smoothness parameter to an adaptive smoothing parameter;determine panning gains for applying to a first part of the at least one audio signal;apply the adaptive smoothing parameter to the panning gains to generate associated smoothed panning gains; andapply the smoothed panning gains to the first part of the at least one audio signal to generate a positioned audio signal.
- The apparatus as claimed in claim 1, wherein the at least one memory and the computer program code are configured to, with the at least one processor, further cause the apparatus to:apply a decorrelation to a second part of the at least one audio signal to generate an ambient audio signal; andcombine the positioned audio signal and the ambient audio signal to generate a multichannel audio signal.
- The apparatus as claimed in any of claims 1 and 2, wherein the at least one memory and the computer program code are configured to, with the at least one processor, further cause the apparatus to:estimate an energy of the at least one audio signal; andaverage the direction smoothness parameter based on the energy of the at least one audio signal, wherein the apparatus caused to convert the direction smoothness parameter to the adaptive smoothing parameter is caused to convert the averaged direction smoothness parameter to the adaptive smoothing parameter.
- The apparatus as claimed in claim 3, wherein the at least one memory and the computer program code are configured to, with the at least one processor, further cause the apparatus to:determine an averaging parameter based on the energy of the at least one audio signal; andapply the averaging parameter to the direction smoothness parameter and unity minus the averaging parameter to a previous averaged direction smoothness parameter to generate the averaged direction smoothness parameter.
- The apparatus as claimed in any of claims 1 to 4, wherein the at least one memory and the computer program code are configured to, with the at least one processor, further cause the apparatus to:receive the at least one audio signal from at least one microphone within a microphone array;determine the at least one audio signal from multichannel loudspeaker audio signals; andreceive the at least one audio signal as part of a data stream comprising the at least one audio signal and metadata comprising the spatial parameter.
- The apparatus as claimed in any of claims 1 to 5, wherein the at least one memory and the computer program code are configured to, with the at least one processor, further cause the apparatus to: analyse the at least one audio signal to determine the spatial parameter; and
receive the spatial parameter as part of a data stream comprising the at least one audio signal and metadata comprising the spatial parameter. - A method for spatial audio signal processing comprising:receiving at least one audio signal;determining an energy ratio being a spatial parameter associated with the at least one audio signaldetermining a direction smoothness parameter by applying an exponent to the energy ratio;converting the direction smoothness parameter to an adaptive smoothing parameter;determining panning gains for applying to a first part of the at least one audio signals;applying the adaptive smoothing parameter to the panning gains to generate associated smoothed panning gains; andapplying the smoothed panning gains to the first part of the at least one audio signal to generate a positioned audio signal.
- The method as claimed in Claim 7, further comprising:applying a decorrelation to a second part of the at least one audio signal to generate an ambient audio signal; andcombining the positioned audio signal and the ambient audio signal to generate a multichannel audio signal.
- The method as claimed in Claims 7 and 8, wherein converting the direction smoothness parameter to an adaptive smoothing parameter comprises:estimating an energy of the at least one audio signal;averaging the direction smoothness parameter based on the energy of the at least one audio signal, wherein converting the direction smoothness parameter to the adaptive smoothing parameter comprises converting the averaged direction smoothness parameter to the adaptive smoothing parameter.
- The method as claimed in Claim 9, wherein averaging the direction smoothness parameter based on the energy of the at least one audio signal comprises:determining an averaging parameter based on the energy of the at least one audio signal; andapplying the averaging parameter to the direction smoothness parameter and unity minus the averaging parameter to a previous averaged direction smoothness parameter to generate the averaged direction smoothness parameter.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB1803993.3A GB2571949A (en) | 2018-03-13 | 2018-03-13 | Temporal spatial audio parameter smoothing |
PCT/FI2019/050178 WO2019175472A1 (en) | 2018-03-13 | 2019-03-07 | Temporal spatial audio parameter smoothing |
Publications (3)
Publication Number | Publication Date |
---|---|
EP3766262A1 EP3766262A1 (en) | 2021-01-20 |
EP3766262A4 EP3766262A4 (en) | 2021-11-10 |
EP3766262B1 true EP3766262B1 (en) | 2022-11-23 |
Family
ID=61972940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19767481.5A Active EP3766262B1 (en) | 2018-03-13 | 2019-03-07 | Spatial audio parameter smoothing |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3766262B1 (en) |
GB (1) | GB2571949A (en) |
WO (1) | WO2019175472A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11545166B2 (en) | 2019-07-02 | 2023-01-03 | Dolby International Ab | Using metadata to aggregate signal processing operations |
GB2593419A (en) * | 2019-10-11 | 2021-09-29 | Nokia Technologies Oy | Spatial audio representation and rendering |
TW202123220A (en) * | 2019-10-30 | 2021-06-16 | 美商杜拜研究特許公司 | Multichannel audio encode and decode using directional metadata |
AU2021357364B2 (en) * | 2020-10-09 | 2024-06-27 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method, or computer program for processing an encoded audio scene using a parameter smoothing |
EP4178231A1 (en) * | 2021-11-09 | 2023-05-10 | Nokia Technologies Oy | Spatial audio reproduction by positioning at least part of a sound field |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7983922B2 (en) * | 2005-04-15 | 2011-07-19 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for generating multi-channel synthesizer control signal and apparatus and method for multi-channel synthesizing |
US8295494B2 (en) | 2007-08-13 | 2012-10-23 | Lg Electronics Inc. | Enhancing audio with remixing capability |
JP5798247B2 (en) * | 2011-07-01 | 2015-10-21 | ドルビー ラボラトリーズ ライセンシング コーポレイション | Systems and tools for improved 3D audio creation and presentation |
WO2013181272A2 (en) | 2012-05-31 | 2013-12-05 | Dts Llc | Object-based audio system using vector base amplitude panning |
DE102012017296B4 (en) * | 2012-08-31 | 2014-07-03 | Hamburg Innovation Gmbh | Generation of multichannel sound from stereo audio signals |
US10635383B2 (en) | 2013-04-04 | 2020-04-28 | Nokia Technologies Oy | Visual audio processing apparatus |
JP6187131B2 (en) | 2013-10-17 | 2017-08-30 | ヤマハ株式会社 | Sound image localization device |
EP2942981A1 (en) | 2014-05-05 | 2015-11-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | System, apparatus and method for consistent acoustic scene reproduction based on adaptive functions |
CN105336335B (en) * | 2014-07-25 | 2020-12-08 | 杜比实验室特许公司 | Audio object extraction with sub-band object probability estimation |
US10045145B2 (en) * | 2015-12-18 | 2018-08-07 | Qualcomm Incorporated | Temporal offset estimation |
WO2018213159A1 (en) * | 2017-05-15 | 2018-11-22 | Dolby Laboratories Licensing Corporation | Methods, systems and apparatus for conversion of spatial audio format(s) to speaker signals |
-
2018
- 2018-03-13 GB GB1803993.3A patent/GB2571949A/en not_active Withdrawn
-
2019
- 2019-03-07 EP EP19767481.5A patent/EP3766262B1/en active Active
- 2019-03-07 WO PCT/FI2019/050178 patent/WO2019175472A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
GB201803993D0 (en) | 2018-04-25 |
EP3766262A1 (en) | 2021-01-20 |
GB2571949A (en) | 2019-09-18 |
EP3766262A4 (en) | 2021-11-10 |
WO2019175472A1 (en) | 2019-09-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12114146B2 (en) | Determination of targeted spatial audio parameters and associated spatial audio playback | |
US11343630B2 (en) | Audio signal processing method and apparatus | |
EP3766262B1 (en) | Spatial audio parameter smoothing | |
KR101480258B1 (en) | Apparatus and method for decomposing an input signal using a pre-calculated reference curve | |
US20190394606A1 (en) | Two stage audio focus for spatial audio processing | |
US9313599B2 (en) | Apparatus and method for multi-channel signal playback | |
US20170188174A1 (en) | Audio signal processing method and device | |
US20130195276A1 (en) | Multi-Channel Audio Processing | |
US20160255452A1 (en) | Method and apparatus for compressing and decompressing sound field data of an area | |
US20230071136A1 (en) | Method and apparatus for adaptive control of decorrelation filters | |
US20220369061A1 (en) | Spatial Audio Representation and Rendering | |
US20240089692A1 (en) | Spatial Audio Representation and Rendering | |
CN112567765B (en) | Spatial audio capture, transmission and reproduction | |
US20210099795A1 (en) | Spatial Audio Capture | |
US20210319799A1 (en) | Spatial parameter signalling | |
US11956615B2 (en) | Spatial audio representation and rendering | |
US20240274137A1 (en) | Parametric spatial audio rendering | |
US20240357304A1 (en) | Sound Field Related Rendering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20201013 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20211007 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 19/008 20130101ALI20211002BHEP Ipc: H04R 1/32 20060101ALI20211002BHEP Ipc: H04R 3/12 20060101ALI20211002BHEP Ipc: H04R 5/04 20060101ALI20211002BHEP Ipc: H04S 7/00 20060101AFI20211002BHEP |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
INTG | Intention to grant announced |
Effective date: 20220707 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE PATENT HAS BEEN GRANTED |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602019022274 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 1533887 Country of ref document: AT Kind code of ref document: T Effective date: 20221215 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG9D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20221123 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 1533887 Country of ref document: AT Kind code of ref document: T Effective date: 20221123 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230323 Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230223 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230323 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20230224 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230527 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602019022274 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
26N | No opposition filed |
Effective date: 20230824 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
REG | Reference to a national code |
Ref country code: BE Ref legal event code: MM Effective date: 20230331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230307 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230331 Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230307 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230331 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20230331 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240130 Year of fee payment: 6 Ref country code: GB Payment date: 20240201 Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20221123 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240213 Year of fee payment: 6 |