US20180084364A1 - Method for Visualizing the Directional Sound Activity of a Multichannel Audio Signal - Google Patents
Method for Visualizing the Directional Sound Activity of a Multichannel Audio Signal Download PDFInfo
- Publication number
- US20180084364A1 US20180084364A1 US15/707,129 US201715707129A US2018084364A1 US 20180084364 A1 US20180084364 A1 US 20180084364A1 US 201715707129 A US201715707129 A US 201715707129A US 2018084364 A1 US2018084364 A1 US 2018084364A1
- Authority
- US
- United States
- Prior art keywords
- time
- directional
- vector
- sound
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000000694 effects Effects 0.000 title claims abstract description 106
- 230000005236 sound signal Effects 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000013598 vector Substances 0.000 claims abstract description 233
- 230000036962 time dependent Effects 0.000 claims abstract description 31
- 238000012800 visualization Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000000875 corresponding effect Effects 0.000 description 18
- 238000000605 extraction Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 6
- 241000936942 Verrucomicrobia subdivision 6 Species 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 230000002596 correlated effect Effects 0.000 description 4
- 230000000873 masking effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000035945 sensitivity Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012732 spatial analysis Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 208000032041 Hearing impaired Diseases 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009257 reactivity Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
- H04R29/008—Visual indication of individual signal levels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/02—Spatial or constructional arrangements of loudspeakers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/04—Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/008—Systems employing more than two channels, e.g. quadraphonic in which the audio signals are in digital form, i.e. employing more than two discrete digital channels
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S3/00—Systems employing more than two channels, e.g. quadraphonic
- H04S3/02—Systems employing more than two channels, e.g. quadraphonic of the matrix type, i.e. in which input signals are combined algebraically, e.g. after having been phase shifted with respect to each other
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/40—Visual indication of stereophonic sound image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/01—Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Definitions
- the invention relates to a method and apparatus for visualizing the directional sound activity of a multichannel audio signal.
- Audio is an important medium for conveying any kind of information, especially sound direction information. Indeed, the human auditory system is more effective than the visual system for surveillance tasks. Thanks to the development of multichannel audio format, spatialization has become a common feature in all domains of audio: movies, video games, virtual reality, music, etc. For instance, when playing a First Person Shooting (FPS) game using a multichannel sound system (5.1 or 7.1 surround sound), it is possible to localize enemies thanks to their sounds.
- FPS First Person Shooting
- such sounds are mixed onto multiple audio channels, wherein each channel is fed to a dedicated loudspeaker.
- Distribution of a sound to the different channels is adapted to the configuration of the dedicated playback system (positions of the loudspeakers), so as to reproduce the intended directionality of said sound.
- FIG. 1 shows an example of a five-channel loudspeaker layout recommended by the International Telecommunication Union (ITU), with a left loudspeaker L, right loudspeaker R, center loudspeaker C, surround left loudspeaker LS and surround right loudspeaker RS, arranged around a reference listening point O which is the recommended listener's position O. With this reference listening point O as a center, the relative angular distances between the central directions of the loudspeakers are indicated.
- ITU International Telecommunication Union
- a multichannel audio signal is thus encoded according to an audio file format dedicated to a prescribed spatial configuration where loudspeakers are arranged at prescribed positions to a reference listening point. Indeed, each time-dependent input audio signal of the multichannel audio signal is associated with a channel, each channel corresponding to a prescribed position of a loudspeaker.
- multichannel audio is played back over an appropriate sound system, i.e. with the required number of loudspeakers and correct angular distances between them, a normal hearing listener is able to detect the location of the sound sources that compose the multichannel audio mix.
- the sound system exhibit inappropriate features, such as too few loudspeakers, or an inaccurate angular distance thereof, the directional information of the audio content may not be delivered properly to the listener. This is especially the case when sound is played back over headphones.
- the multichannel audio signal conveys sound direction information through the respective sound levels of the channels, but such information cannot be delivered to the user. Accordingly, there is a need for conveying to the user the sound direction information encoded in the multichannel audio signal.
- Some methods have been provided for conveying directional information related to sound through the visual modality. However, these methods were often a mere juxtaposition of volume meters, each dedicated to a particular loudspeaker, and thus unable to render precisely the simultaneous predominant direction of the sounds that compose the multichannel audio mix except in the case of one unique virtual sound source whose direction coincides with a loudspeaker direction. Other methods intended to more precisely display sound locations are so complicated that they reveal themselves inadequate since sound directions cannot be readily derived by a user.
- U.S. patent application US 2009/0182564 describes a method wherein sound power level of each channel is displayed, or alternatively wherein position and power level of elementary sound components are displayed.
- U.S. Pat. No. 9,232,337 B2 describes a method for visualizing a directional sound activity of a multichannel audio signal that displays a visualization of a directional sound activity of the multichannel audio signal through a graphical representation of directional sound activity level within a sub-division of space.
- a sound activity vector is formed by associating the sound activity level corresponding to the frequency-domain signal of said channel and said sub-band to the unit vector corresponding to the spatial information associated with said channel.
- the energy vector sum representative for the perceived directional energy is directly calculated using Gerzon's energy vectors, as a mere summation of the sound activity vectors related to the channels for said frequency sub-band.
- This directional sound activity vector represents the predominant sound direction that would be perceived by a listener according to the recommended loudspeaker layout for sounds within that particular frequency sub-band.
- this method visually renders the main sound direction, it may not always achieve optimal results for a user. Indeed, this method does not exploit diffuse sounds, but focuses on identifying and displaying the main sound directions, regardless of the nature of the sound (directivity or diffuseness). As a result, when the sound is very diffuse, it may not be able to correctly extract a useful main sound direction from the noisy environment.
- the method and system according to the invention is intended to provide a simple and clear visualization of sound activity in any direction.
- this object is achieved by a method for visualizing a directional sound activity of a multichannel audio signal, wherein the multichannel audio signal comprises time-dependent input audio signals, each time-dependent input audio signal being associated with an input channel, spatial information with respect to a reference listening point being associated with each one of said channel, the method comprising:
- each input channel is associated with a sound direction defined between the reference listening point and the prescribed position of the speaker associated with said input channel, and a sound velocity vector is determined as a function of a sum of each sound direction weighted by the time-frequency representation corresponding to the input channel associated with said sound direction, said sound velocity vector being used to determine the active directional vector and the reactive directional vector;
- the invention also relates to a non-transitory tangible computer-readable medium having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the invention.
- the invention also relates to an apparatus for visualizing directional sound activity of a multichannel audio signal, comprising:
- FIG. 1 shows an example of prescribed positions of loudspeakers with respect to a reference listening point in a prescribed spatial configuration for multichannel audio system
- FIG. 2 is a diagram showing steps of the method
- FIG. 3 is a diagram showing stages of the signal processing in the method
- FIG. 4 shows schematically an example of a relationship between the active directional vector and the reactive directional vector with the locations of virtual sound sources
- FIG. 5 shows schematically an example of a virtual spatial configuration with two virtual sound sources, and the active directional vector and the reactive directional vector, and the cardioids of the two corresponding virtual microphones;
- FIG. 6 shows schematically an example of a virtual spatial configuration with three virtual sound sources and the cardioids of the three corresponding virtual microphones, as well as the active directional vector and the reactive directional vector;
- FIG. 7 illustrates a display layout according to an embodiment of the present invention.
- a directional sound activity analyzing unit which may be part of a device comprising a processor, typically a computer, further provided with means for acquiring audio signals and means for displaying a visualization of sound activity data, for example visual display unit such as a screen or a computer monitor.
- the directional sound activity analyzing unit comprises means for executing the described method, such as a processor or any computing device, and a memory for buffering signals or storing various process parameters.
- the directional sound activity analyzing unit receives an input signal constituted by a multichannel audio signal.
- This multichannel audio signal comprises K time-dependent input audio signals associated with K input audio channels, each time-dependent input audio signal being associated with an input channel.
- Each channel is associated with spatial information.
- Spatial information describes the location of the associated loudspeaker relative to the listener's location, called the reference listening point.
- spatial information can be coordinates or angles and distances used to locate a loudspeaker with respect to the reference listening point, generally a listener's recommended location.
- three values per audio channel are provided to describe this localization.
- Spatial parameters constituting said spatial information may then be represented by a K ⁇ 3 matrix.
- An input receives the multichannel audio signal comprising time-dependent input audio signals for a plurality of input channels (step S 01 ).
- Each time-dependent input audio signal is associated with an input channel.
- Each input channel corresponds to a prescribed position of an electroacoustic transducer with respect to a reference listening point in a prescribed spatial configuration. For example, in the prescribed spatial configuration shown by FIG. 1 , there are five input channels, one for each loudspeaker LS, L, C, R, RS.
- each of the prescribed positions defines a unitary vector ⁇ right arrow over (a l ) ⁇ representing the sound direction and originating from the reference listening point and pointing in the direction of each loudspeaker.
- each input channel i is associated with a sound direction ⁇ right arrow over (a l ) ⁇ defined between the reference listening point and the prescribed position of the loudspeaker associated with said input channel i.
- the location of the loudspeaker C is defined by the sound vector ⁇ right arrow over (a C ) ⁇ that originates from the reference listening point O and towards the location of the loudspeaker C on the unitary circle.
- This sound vector ⁇ right arrow over (a C ) ⁇ extends in the front of the listening point.
- the location of the loudspeaker L is defined by the sound vector ⁇ right arrow over (a L ) ⁇ that originates from the reference listening point O and towards the location of the loudspeaker L on the unitary circle.
- the directions of the sound vector ⁇ right arrow over (a C ) ⁇ and of the sound vector ⁇ right arrow over (a L ) ⁇ are at an angle of 30°.
- the directional sound activity analyzing unit receives these input audio channels, and then determines directional sound activity levels to be displayed for visualizing the directional sound activity of a multichannel audio signal.
- the directional sound activity analyzing unit is configured to perform the steps of the above-described method.
- the method is performed on an extracted part of the input signal corresponding to a temporal window. For example, a 50 ms duration analysis window can be chosen for analyzing the directional sound activity within said window.
- a frequency band analysis aims at estimating the sound activity level for a predetermined number of frequency sub-bands for each channel of the windowed multichannel audio signal.
- the received time-dependent input audio signals a i (t) may be analog, but they preferably are digital signals. There are as many input audio signals a i (t) as input channels i.
- the time-dependent input audio signals a i (t) are converted into the frequency domain by performing a time-frequency conversion (step S 02 ).
- the time-frequency conversion uses a Fourier-related transform such as a Short-time Fourier transform (STFT), which is used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.
- STFT Short-time Fourier transform
- each time-dependent input audio signal a i (t) is converted into a plurality of time-frequency representations A i (k, n) for the input channel i associated with said time-dependent input audio signal.
- Each time-frequency representation A i (k, n) corresponds to a time-frequency tile defined by a time frame and a frequency sub-band. The conversion is made on a frame-by-frame basis.
- the frame length is comprised between 5 ms and 80 ms.
- the width of the frequency sub-band is comprised between 10 Hz and 200 Hz.
- the inter-frame spacing is comprised between 1/16 th and one half of the frame length.
- the frame length may be of 1024 samples with a related frequency sub-band width (or bin width) of 46.875 Hz and an inter-frame spacing of 512 samples.
- the time-frequency tiles are the same for the different input channels i.
- the frequency sub-bands are subdivisions of the frequency band of the audio signal, which can be divided into sub-bands of equal widths or preferably into sub-bands whose widths are dependent on human hearing sensitivity to the frequencies of said sub-bands.
- time-frequency representation A i (k,n) refers to a complex number associated with the k th frequency sub-band and the n th frame of the signal of the input channel i.
- the time-frequency representations A i (k, n) and the sound directions ⁇ right arrow over (a l ) ⁇ are then used in a time-frequency processing (step S 03 ) wherein the data of a time-frequency tile are processed.
- step S 11 Spatial analysis is performed from time-frequency representations A i (k, n) and the sound directions ⁇ right arrow over (a l ) ⁇ of a time-frequency tile. For each time-frequency tile, an active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and a reactive directional vector D r (k, n) are determined (step S 31 ) from time-frequency representations A i (k, n) of different input channels for said time-frequency tile.
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) of a time-frequency tile is proportional to the active acoustical intensity vector which is representative of the sound energy flow at the reference listening point for the time frame and the frequency sub-band of said time-frequency tile. More specifically, the active directional vector ⁇ right arrow over (D a ) ⁇ (k,n) corresponds to the active acoustical intensity vector, normalized by the sum of the acoustic energies E P (k, n) and E K (k, n) at the reference listening point O, with an added minus sign in order to have it directed from the reference listening point O towards the unitary circle. It is possible to use a different normalization or to omit the minus sign, in which case the vectors would be pointing towards the reference listening point O.
- the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) is proportional to the reactive acoustical intensity vector which is representative of acoustic perturbations at the reference listening point with respect to the sound energy flow for the same time-frequency tile. More specifically, the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) corresponds to the reactive acoustical intensity vector, normalized by the sum of the acoustic energies E P (k, n) and E K (k, n) at the reference listening point O. A minus sign is also added but could be omitted. As for the active directional vector, it is possible to use a different normalization.
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) can be related to the primary directional sound field
- the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) is related to the ambient diffuse sound field.
- the directional information of the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) enables the handling of the spatial characteristics of this ambient sound field, and thus it can be used to describe not only totally diffused ambient sound fields but also partially diffused ones.
- the combination of the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) may be used to identify the locations of sound sources, as depicted with the example on FIG. 4 .
- sound distribution is represented by two virtual sound sources VS 1 and VS 2 arranged on a unitary circle centered on the reference listening point O.
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) originates from the reference listening point O and is directed along the main acoustical flow.
- the two uncorrelated sound sources VS 1 , VS 2 are of equal energy (for that time-frequency tile).
- the perceived acoustical energy flow at the reference listening point O comes from the middle of the two sound sources VS 1 , VS 2 , and therefore the active directional vector ⁇ right arrow over (D a ) ⁇ (k,n) extends between the two sound sources VS 1 , VS 2 .
- the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) is here perpendicular to the active directional vector ⁇ right arrow over (D a ) ⁇ (k,n), and the location of a sound source VS 1 , VS 2 corresponds to the sum of the active directional vector ⁇ right arrow over (D a ) ⁇ (k,n) and of the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) or of the opposite of the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n).
- the sound sources VS 1 , VS 2 are not totally uncorrelated. It has been found that whatever the exact locations of the two sound sources VS 1 , VS 2 , the reactive intensity is maximal when the source signals are totally uncorrelated. Conversely, the reactive intensity is minimal when the source signals are totally correlated. In a similar way, where the sound source signals are totally uncorrelated, the reactive intensity is maximal when the source directions are spatially negatively correlated (i.e. opposite) with respect to the reference listening point O. Conversely, the reactive intensity is minimal when the source directions are spatially correlated (i.e. in the same direction) with respect to the reference listening point O.
- each input channel i is associated with a sound direction ⁇ right arrow over (a l ) ⁇ defined between the reference listening point O and the prescribed position of the loudspeaker associated with said input channel i.
- a sound pressure value P(k, n) for a time-frequency tile defined by a sum of the time-frequency representations A i (k, n) of the different input channels of the same for said time-frequency tile is determined:
- a sound velocity vector ⁇ right arrow over (V) ⁇ (k, n) for the time-frequency tile is determined, said sound velocity vector ⁇ right arrow over (V) ⁇ (k, n) being proportional to a sum of each sound direction ⁇ right arrow over (a l ) ⁇ weighted by the time-frequency representation A i (k,n)corresponding to the input channel i associated with said sound direction ⁇ right arrow over (a l ) ⁇ :
- ⁇ right arrow over (e x ) ⁇ , ⁇ right arrow over (e y ) ⁇ and ⁇ right arrow over (e z ) ⁇ the unitary vectors of a coordinate system used as a reference frame for the virtual spatial configuration, ⁇ the density of air and c the speed of sound.
- the speed of sound in dry air at 20° C. is 343.2 meters per second, which may be approximated to 340 m.s ⁇ 1 .
- air density is approximately 1.225 kg/m 3 , which may be approximated to 1.2 kg/m 3 .
- Other values may be used.
- a complex intensity vector ⁇ right arrow over (I) ⁇ (k, n) resulting from a complex product between a conjugate of the sound pressure value P (k, n) for a time-frequency tile and the sound velocity vector ⁇ right arrow over (V) ⁇ (k, n) for said time-frequency tile is determined:
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) of said time-frequency tile More precisely, the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) is determined from the real part of the complex product ⁇ right arrow over (I) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) is determined from the imaginary part of the complex product ⁇ right arrow over (I) ⁇ (k, n).
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) may be calculated as follows:
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) are here normalized by the energies E K (k, n) and E P (k,n), but could be calculated otherwise. It shall be noted that that a minus sign is added to the expressions of the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) in order to have them directed from the reference listening point O towards the unitary circle. It would be possible to omit the minus sign, in which case the vectors would be pointing towards the reference listening point O.
- step S 12 it is possible to perform the audio source extraction (step S 12 ) for determining positions and time-frequency signal values of virtual sound sources (step S 32 ).
- the method requires determining the attributes (position and time-frequency signal values) of virtual sound sources that will be used thereafter to determine the signals of the electroacoustic transducers of the actual spatial configuration.
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n) are used to determine the positions of the virtual sound sources with respect to the reference listening point in a virtual spatial configuration (step S 32 ).
- the determined positions of the virtual sound sources, the active directional vector ⁇ right arrow over (D a ) ⁇ (k,n), the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n), the sound pressure value P(k, n) and the sound velocity vector ⁇ right arrow over (V) ⁇ (k,n) are used to determine virtual first-order directional microphone signals (step S 122 ) corresponding to the sounds that would be acquired by virtual microphones arranged at the reference listening point O and directed towards each virtual sound sources. There are as many virtual microphones as virtual sound sources.
- a virtual microphone signal is a function of the sum of the sound pressure value P(k, n), and of the scalar product between the sound velocity vector ⁇ right arrow over (V) ⁇ (k, n) and a unitary vector in the direction of a sound source, possibly weighted by the density of air P and the speed of sound c .
- a virtual cardioid microphone signal M j (k, n) associated with a virtual sound source arranged in the direction defined by ⁇ right arrow over (s j ) ⁇ (k,n) can be calculated as follows:
- M j ⁇ ( k , n ) P ⁇ ( k , n ) + ⁇ ⁇ ⁇ c ⁇ V -> ⁇ ( k , n ) ⁇ s J ⁇ ⁇ ( k , n ) 2
- a virtual microphone signal highlights the sound of the corresponding virtual sound source perceived at the reference listening point O, but also contains interferences from the other virtual sound sources.
- defining the virtual microphone signals for every virtual sound source allows identifying the virtual sound source signal of each virtual sound source.
- spatial manipulation may be performed by modifying the positions of the virtual sound sources. This approach is much safer than modifying the input channel data side defining the prescribed positions, because the original primary/ambient energy ratio is kept.
- the details of the source extraction process however change depending on the number of virtual sound sources.
- the audio source extraction process estimates the locations and frequency signal values of virtual sound sources that generate the same sound field characteristics as the sound field defined by the time-dependent input audio signals in the prescribed configuration.
- Source-related sound field models need to be defined, as the audio source extraction process may be highly different from one model to another.
- Two reliable models with the analysis based on the exploitation of both the active and reactive components of the acoustical intensity are described below: a model with two virtual sound sources and a model with three virtual sound sources.
- the “two-source” model handles the diffuseness (and thus makes use of the reactive component) as an indicator of the perceptual width of a sound source or local diffuseness. Two sound sources are sufficient to simulate a wider sound source, their spatial and signal correlation defining the perceived wideness of this composite sound source.
- the “three-source” model handles the diffuseness (and thus makes use of the reactive component) as an indicator of the ambience level within the sound scene or global diffuseness. Two uncorrelated sound sources of opposite directions are suitable to simulate this ambient component, in addition to a first virtual sound source corresponding to the primary component. It is explained below how to proceed with two virtual sound sources or three virtual sound sources.
- each virtual sound source In a spatial configuration of a unitary circle centered on the reference listening point O, the virtual sound sources are positioned on the unitary circle. A position of a virtual sound source is therefore at the intersection of the unitary circle with a directional line extending from the reference listening point.
- the position of each virtual sound source can be defined by a unitary source direction vector ⁇ right arrow over (s j ) ⁇ (k, n) originating from the reference listening point. This is shown in FIG. 5 .
- the first step of the source extraction consists in determining the positions of the two virtual sound sources (step S 121 ).
- each unitary source direction vector ⁇ right arrow over (s j ) ⁇ (k, n) is defined through the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n). More precisely, a virtual sound source is located at the intersection of
- the analyzed sound field is generated by two uncorrelated sound sources (not necessary of equal energy)
- this technique enables to retrieve the exact location of those two sound sources. If the two sound sources used to generate the sound field tend to be in-phase (respectively opposite-phase), their exact locations cannot be retrieved anymore.
- the technique over-estimates (respectively under-estimates) the spatial correlation between the two sound source directions. However, this relationship between signal correlation and spatial correlation is perceptively coherent.
- Determining the locations of the two virtual sound sources VS 1 , VS 2 is equivalent to solving a geometry problem of the intersection of a line with a circle (or a sphere for three-dimensional sound field). Solving this problem is equivalent to solving a second order equation, which solutions are
- the two virtual directional microphones may have a cardioid directivity patterns VM 1 , VM 2 in the directions of the source direction vectors ⁇ right arrow over (s 1 ) ⁇ (k, n), ⁇ right arrow over (s 2 ) ⁇ (k, n).
- the virtual microphone pick-up in these two directions may then be estimated by virtual microphone signals M 1 (k, n), M 2 (k, n) defined as follows:
- each virtual microphone signal highlights the sound signal of the corresponding virtual sound source VS 1 , VS 2 perceived at the reference listening point O, but also contains interferences from the other virtual sound source:
- a last processing step permits to extract the time-frequency signal values S 1 (k, n), S 2 (k, n) of each virtual sound source by unmixing the source signals from the virtual microphone signals (step S 123 ):
- the positions of the two virtual sound sources VS 1 , VS 2 defined by the source direction vectors ⁇ right arrow over (s 1 ) ⁇ (k, n) and ⁇ right arrow over (s 2 ) ⁇ (k, n), and their respective time-frequency signal values S 1 (k, n) and S 2 (k, n) have been determined.
- the two virtual sound sources VS 1 , VS 2 are equivalent, in the sense that they contain both primary component (through the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n)) and ambient components (through the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n)).
- An ambience extraction processing may be performed for implementing additional refinement.
- the first step of the audio source extraction consists in determining the positions of the three virtual sound sources, through unitary source direction vectors ⁇ right arrow over (s j ) ⁇ (k, n) defined by the active directional vector ⁇ right arrow over (D a ) ⁇ (k,n) and reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n).
- the virtual sound sources are positioned on the unitary circle.
- a position of a virtual sound source is therefore at the intersection of the unitary circle with a directional line extending from the reference listening point.
- each virtual sound source can be defined by a unitary source direction vector ⁇ right arrow over (s j ) ⁇ (k, n) originating from the reference listening point.
- the unitary source direction vector ⁇ right arrow over (s j ) ⁇ (k, n) is defined through the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n). This is shown in FIG. 6 .
- the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) indicates the main perceptual sound event direction
- the reactive intensity indicates the “direction of maximum perceptual diffuseness”.
- determining the positions of the virtual sound sources VS 1 , VS 2 , VS 3 is much simpler for the three-source model than for the two-source model, since their source direction vectors ⁇ right arrow over (s l ) ⁇ (k, n) are directly computed from the active directional vector ⁇ right arrow over (D a ) ⁇ (k, n) and the reactive directional vector ⁇ right arrow over (D r ) ⁇ (k, n):
- these source direction vectors localize the virtual sound sources VS 1 , VS 2 , VS 3 on the unitary circle centered on the reference listening point O.
- the three virtual directional microphones may have a cardioid directivity patterns VM 1 , VM 2 , VM 3 in the directions of the source direction vectors ⁇ right arrow over (s 1 ) ⁇ (k, n) ⁇ right arrow over (s 2 ) ⁇ (k, n), ⁇ right arrow over (s 3 ) ⁇ (k, n).
- the virtual microphone pick-ups in these three directions may then be estimated by virtual microphone signals defined as follows:
- each virtual microphone signal M 1 (k, n) , M 2 (k, n) , M 3 (k, n) highlights the sound of the corresponding virtual sound source VS 1 , VS 2 , VS 3 perceived at the reference listening point O, but also contains interferences from the other virtual sound source VS 1 , VS 2 , VS 3 . More precisely, since the second source direction vector ⁇ right arrow over (s 2 ) ⁇ (k, n) and the third source direction vector ⁇ right arrow over (s 3 ) ⁇ (k, n) are of opposite direction, interference between the second virtual sound source VS 2 and the third virtual sound source VS 3 is negligible, whereas they both interfere with the first virtual sound source VS 1 :
- a last processing step permits to extract the time-frequency signal value of each virtual sound source by unmixin the source time-frequency values:
- This directional sound activity vector represents the predominant sound direction that would be perceived by a listener according to the recommended loudspeaker layout for sounds within the particular frequency sub-band of the time-frequency tile.
- the attributes of the directional sound activity vector is calculated from the positions and time-frequency signal values of the virtual sound sources.
- the energy vectors relative to the sound sources of a time-frequency tile are:
- the first virtual sound source VS 1 is more related to the main perceptual sound event direction and the two other virtual sound sources VS 2 , VS 3 more related to the direction of the maximum perceptual diffuseness. It may be then relevant to take only the first virtual sound source VS 1 for the directional sound activity vector. Generally, a weighting of the different virtual sound sources VS 1 , VS 2 , VS 3 may be used, with a source-weighted directional sound activity vector expressed as:
- weighting factors ⁇ j weighting factors between 0 and 1.
- the sum of the weighting factors ⁇ j is 1.
- none of the weighting factors ⁇ j is 0.
- the weighting factors ⁇ 2 and ⁇ 3 are equal.
- the two weighting factors ⁇ 1 and ⁇ 2 are equal.
- An optional, however advantageous, frequency masking can adapt directional sound activity vectors according to their respective frequency sub-bands.
- the norms of the directional sound activity vectors can be weighted based on their respective frequency sub-bands. The weighted directional sound activity vector is then
- a[k] is a weight, for instance between 0 and 1, which depends on the frequency sub-band of each directional sound activity vector.
- a weighting allows enhancing particular frequency sub-bands of particular interest for the user.
- This feature can be used for discriminating sounds based on their frequencies. For instance, frequencies related to particularly interesting sounds can be enhanced in order to distinguish them from ambient noise.
- the directional sound analyzing unit can be fed with spectral sensitivity parameters which define the weight attributed to each frequency sub-band.
- FIG. 7 shows an example of such a divided space relative to a 5.1 loudspeaker layout.
- a polar representation of the listener's environment is divided into M similar sub-divisions 6 circularly disposed around the reference listening point in a central position representing the listener's location. Loudspeakers of the recommended layout of FIG. 1 are represented for comparison.
- the dominant sound direction and the sound activity level associated to said direction is now determined and described by the directional sound activity vector, preferably weighted as described above.
- the visualization of such directional information must be very intuitive so that sound direction information can be restituted to the user without interfering with other source of information.
- the beam clustering stage (S 14 ) corresponds to allocating to each of the sub-division a part of each frequency sub-band sound activity.
- each frequency sub-band sound activity to each sub-division of space are determined on the basis of directivity information.
- a directional sound activity level is determined within said sub-division of space by combining, for instance by summing, the contributions of said frequency sub-band sound activity to said sub-division of space.
- Directivity information is associated to each sub-division 6 .
- Such directivity information relates to level modulation as a function of direction in an oriented coordinate system, typically centered on a listener's position.
- This directivity information can be described by a directivity function which associates a weight to space directions in an oriented coordinate system.
- a directivity function exhibits a maximum for a direction associated with the related sub-division.
- norms of directional sound activity vectors are weighted on the basis of a directivity information associated with said sub-division 6 of space and the directions of said directional sound activity vectors. These weighted norms can thus represent the contribution of said directional sound activity vectors within said sub-divisions of space.
- a directivity function can be parameterized by a beam vector ⁇ right arrow over (v m ) ⁇ and an angular value ⁇ m corresponding to the angular width of the beam, wherein m identifies a space sub-division.
- the direction associated with a sub-division 6 can be the main direction defined by the beam vector ⁇ right arrow over (v m ) ⁇ . Accordingly, the angular distance between a beam vector ⁇ right arrow over (v m ) ⁇ and a directional sound activity vector ⁇ right arrow over (G) ⁇ [k] can define the clustering weight C m [k].
- a simple directional weighting function may be 1 if the angular distance between a beam vector ⁇ right arrow over (v m ) ⁇ and a directional sound activity vector ⁇ right arrow over (G) ⁇ [k] is less than ⁇ m /2 and 0 otherwise:
- the beam vector ⁇ right arrow over (v m ) ⁇ and the angular value ⁇ m used for define the parameters of the directivity function can constitute an example of directivity information by which contribution of each one of said directional sound activity vectors within sub-divisions of space can be estimated.
- the directional sound activity within a beam or sub-division of space can then be determined by summing said contributions, such as weighted norms in this example, of said directional sound activity vectors related to L frequency sub-bands:
- the directional sound activity for each of the M beam can be fed to a visualizing unit, typically to a screen associated with the computer which comprises or constitutes the directional sound analyzing unit.
- directional sound activity can then be displayed for visualization (step S 04 ).
- a graphical representation of directional sound activity level within said sub-division of space is displayed, as in FIG. 7 .
- sub-divisions of space are organized according to their respective location within said space, so as to reconstruct the divided space.
- FIG. 7 shows a configuration wherein the directional sound activity is restricted in two different beams, suggesting that sound sources related to different frequencies are located in the directions related to these two beams. It shall be noted that at least one beam 16 a shows a directional sound activity without having a direction that corresponds to a loudspeaker recommended orientation. As can be seen, a user can easily and accurately infer sound source directions, and thus can retrieve sound direction information originally conveyed by the multichannel audio input signal.
- graphical representation can be used, such a radar chart wherein directional sound activity levels are represented on axes starting from the center, lines or curves being drawn between the directional sound activity levels of adjacent axes.
- the lines or curves define a colored geometrical shape containing the center.
- the invention thus allows sound direction information to be delivered to the user even if said user does not possess the recommended loudspeaker layout, for example with headphones. It can also be very helpful for hearing-impaired people or for users who must identify sound directions quickly and accurately.
- the graphical representation shows several directional sound activity levels for each sub-division, these directional sound activity levels being calculated with different frequency masking parameters.
- At least two set of spectral sensitivity parameters are chosen to parameterize two frequency masking process respectively used in two directional sound activity level determination processes.
- the two set of directional sound activity vectors determined from the same input audio channels are weighted based on their respective frequency sub-bands in accordance with two different set of weighting parameters. Consequently, for each sub-division, each one of the two directional sound activity levels enhanced some particular frequencies in order to distinguish different sound types.
- the two directional sound activities can then be displayed simultaneously within the same sub-divided space, for example with a color code for distinguishing them and a superimposition, for instance based on level differences.
- the method of the present invention as described above can be realized as a program and stored into a non-transitory tangible computer-readable medium, such as CD-ROM, ROM, hard-disk, having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the invention.
- a non-transitory tangible computer-readable medium such as CD-ROM, ROM, hard-disk, having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the invention.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Algebra (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Otolaryngology (AREA)
- General Engineering & Computer Science (AREA)
- Stereophonic System (AREA)
Abstract
A method for visualizing a directional sound activity of a multichannel audio signal, wherein the multichannel audio signal comprises time-dependent input audio signals, comprising determining a directional sound activity vector from virtual sound sources determined from an active directional vector and a reactive directional vector determined from time-frequency representations of different input audio signals for each one of a plurality of time-frequency tiles; determining a contribution of each one of said directional sound activity vectors within sub-divisions of space on the basis of directivity information related to each sub-divisions of space and directional sound activity level within said sub-division of space by summing said contributions; displaying a visualization of the directional sound activity of the multichannel audio signal by a graphical representation of directional sound activity level within said sub-division of space.
Description
- This application claims priority under 35 U.S.C 119(a) to European Patent Application No. 16306190.6 filed on Sep. 19, 2016, all of which is hereby expressly incorporated by reference into the present application.
- The invention relates to a method and apparatus for visualizing the directional sound activity of a multichannel audio signal.
- Audio is an important medium for conveying any kind of information, especially sound direction information. Indeed, the human auditory system is more effective than the visual system for surveillance tasks. Thanks to the development of multichannel audio format, spatialization has become a common feature in all domains of audio: movies, video games, virtual reality, music, etc. For instance, when playing a First Person Shooting (FPS) game using a multichannel sound system (5.1 or 7.1 surround sound), it is possible to localize enemies thanks to their sounds.
- Typically, such sounds are mixed onto multiple audio channels, wherein each channel is fed to a dedicated loudspeaker. Distribution of a sound to the different channels is adapted to the configuration of the dedicated playback system (positions of the loudspeakers), so as to reproduce the intended directionality of said sound.
- Multichannel audio streams thus require to be played back over suitable loudspeaker layouts. For instance, each of the channels of a five-channel formatted audio signal is associated with its corresponding loudspeaker within a five-loudspeaker array.
FIG. 1 shows an example of a five-channel loudspeaker layout recommended by the International Telecommunication Union (ITU), with a left loudspeaker L, right loudspeaker R, center loudspeaker C, surround left loudspeaker LS and surround right loudspeaker RS, arranged around a reference listening point O which is the recommended listener's position O. With this reference listening point O as a center, the relative angular distances between the central directions of the loudspeakers are indicated. - A multichannel audio signal is thus encoded according to an audio file format dedicated to a prescribed spatial configuration where loudspeakers are arranged at prescribed positions to a reference listening point. Indeed, each time-dependent input audio signal of the multichannel audio signal is associated with a channel, each channel corresponding to a prescribed position of a loudspeaker.
- If multichannel audio is played back over an appropriate sound system, i.e. with the required number of loudspeakers and correct angular distances between them, a normal hearing listener is able to detect the location of the sound sources that compose the multichannel audio mix. However, should the sound system exhibit inappropriate features, such as too few loudspeakers, or an inaccurate angular distance thereof, the directional information of the audio content may not be delivered properly to the listener. This is especially the case when sound is played back over headphones.
- As a consequence, there is in this case a loss of information since the multichannel audio signal conveys sound direction information through the respective sound levels of the channels, but such information cannot be delivered to the user. Accordingly, there is a need for conveying to the user the sound direction information encoded in the multichannel audio signal.
- Some methods have been provided for conveying directional information related to sound through the visual modality. However, these methods were often a mere juxtaposition of volume meters, each dedicated to a particular loudspeaker, and thus unable to render precisely the simultaneous predominant direction of the sounds that compose the multichannel audio mix except in the case of one unique virtual sound source whose direction coincides with a loudspeaker direction. Other methods intended to more precisely display sound locations are so complicated that they reveal themselves inadequate since sound directions cannot be readily derived by a user.
- For example, U.S. patent application US 2009/0182564 describes a method wherein sound power level of each channel is displayed, or alternatively wherein position and power level of elementary sound components are displayed.
- U.S. Pat. No. 9,232,337 B2 describes a method for visualizing a directional sound activity of a multichannel audio signal that displays a visualization of a directional sound activity of the multichannel audio signal through a graphical representation of directional sound activity level within a sub-division of space. For a channel and for a frequency sub-band, a sound activity vector is formed by associating the sound activity level corresponding to the frequency-domain signal of said channel and said sub-band to the unit vector corresponding to the spatial information associated with said channel. In an embodiment of this patent, the energy vector sum representative for the perceived directional energy is directly calculated using Gerzon's energy vectors, as a mere summation of the sound activity vectors related to the channels for said frequency sub-band. This directional sound activity vector represents the predominant sound direction that would be perceived by a listener according to the recommended loudspeaker layout for sounds within that particular frequency sub-band.
- However, if this method visually renders the main sound direction, it may not always achieve optimal results for a user. Indeed, this method does not exploit diffuse sounds, but focuses on identifying and displaying the main sound directions, regardless of the nature of the sound (directivity or diffuseness). As a result, when the sound is very diffuse, it may not be able to correctly extract a useful main sound direction from the noisy environment.
- The method and system according to the invention is intended to provide a simple and clear visualization of sound activity in any direction.
- In accordance with a first aspect of the present invention, this object is achieved by a method for visualizing a directional sound activity of a multichannel audio signal, wherein the multichannel audio signal comprises time-dependent input audio signals, each time-dependent input audio signal being associated with an input channel, spatial information with respect to a reference listening point being associated with each one of said channel, the method comprising:
-
- receiving the time-dependent input audio signals;
- performing a time-frequency conversion of said time-dependent input audio signals for converting each one of the time-dependent input audio signals into a plurality of time-frequency representations for the input channel associated with said time-dependent input audio signal, each time-frequency representation corresponding to a time-frequency tile defined by a time frame and a frequency sub-band, the time-frequency tiles being the same for the different input channels;
- for each time-frequency tile, determining positions of at least two virtual sound sources with respect to the reference listening point and frequency signal values for each virtual sound source from an active directional vector and a reactive directional vector determined from time-frequency representations of different input audio signals for said time-frequency tile, wherein the active directional vector is determined from a real part of a complex intensity vector and the reactive directional vector is determined from an imaginary part of the complex intensity vector;
- for each time-frequency tile, determining a directional sound activity vector from the virtual sound sources,
- determining a contribution of each one of said directional sound activity vectors within sub-divisions of space on the basis of directivity information related to each sub-divisions of space;
- for each sub-division of space, determining directional sound activity level within said sub-division of space by summing said contributions within said sub-division of space;
- displaying a visualization of the directional sound activity of the multichannel audio signal by a graphical representation of directional sound activity level within said sub-division of space.
- Other preferred, although non-limitative, aspects of the pixel circuit of the Invention are as follows, isolated or in a technically feasible combination:
-
- the active directional vector of a time-frequency tile is representative of the sound energy flow at the reference listening point for the time frame and a frequency sub-band of said time-frequency tile, and wherein the reactive directional vector is representative of acoustic perturbations at the reference listening point with respect to the sound energy flow;
- each input channel is associated with a sound direction defined between the reference listening point and the prescribed position of the speaker associated with said input channel, and a sound velocity vector is determined as a function of a sum of each sound direction weighted by the time-frequency representation corresponding to the input channel associated with said sound direction, said sound velocity vector being used to determine the active directional vector and the reactive directional vector;
-
- a sound pressure value defined by a sum of the time-frequency representations of the different input channels is used to determine the active directional vector and the reactive directional vector;
- the complex intensity vector results from a complex product between a conjugate of a sound pressure value for a time-frequency tile and a sound velocity vector for said time-frequency tile;
- for determining time-frequency signal values of each one of the virtual sound sources, virtual microphone signals are determined, each virtual microphone signal being associated with a virtual sound source and corresponding to the signal that would acquire a virtual microphone arranged at the reference listening point and oriented in the direction toward the position of said virtual sound source;
- the time-frequency signal value of a virtual sound source is determined by suppressing, in the virtual microphone signal associated with said virtual sound source, the interferences from other virtual sound sources;
- the virtual sound sources are arranged on a circle centered on the reference listening point and a virtual microphone signal corresponds to the signal that would acquire a virtual cardioid microphone having an cardioid directivity pattern in the shape of a cardioid tangential to the circle centered on the reference listening point;
- there are three virtual sound sources for each time-frequency tile, each virtual sound source having a position with respect to the reference listening point, wherein:
- a position of a first virtual sound source defines with the reference listening point a direction which is collinear to the direction of the active directional vector from the reference listening point,
- a position of a second virtual sound source defines with the reference listening point a direction which is collinear to the direction of the reactive directional vector with a first orientation,
- a position of a third virtual sound source defines with the reference listening point a direction which is collinear to the direction of the reactive directional vector with a second orientation opposite to the first orientation; there are two virtual sound sources for each time-frequency tile, each virtual sound source having a position with respect to the reference listening point, and wherein:
- a position of a first virtual sound source defines with the reference listening point a direction resulting from the sum of the active directional vector and the reactive directional vector weighted by a positive factor, and
- a position of a second virtual sound source defines with the reference listening point a direction resulting from the sum of the active directional vector and the reactive directional vector weighted by a negative factor;
- information used for determining the contribution of a directional sound activity vector within a sub-division of space is an angular distance between a direction associated with said sub-division of space and the direction of said directional sound activity vector;
- the contribution of a directional sound activity vector within a sub-division of space is determined by weighting a norm of said directional sound activity vector on the basis of an angular distance between a direction associated with said sub-division of space and the direction of said directional sound activity vector;
- norms of the directional sound activity vectors are further weighted based on their respective frequency sub-bands;
- at least two set of directional sound activity vectors determined from the same input audio channels are weighted based on their respective frequency sub-bands in accordance with two different set of weighting parameters, and the two resulting directional sound activities are displayed on the graphical representation;
- the visualization of the directional sound activity of the multichannel audio signal comprises representations of said sub-division of space, each provided with a representation of the directional sound activity associated with said sub-division.
- The invention also relates to a non-transitory tangible computer-readable medium having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the invention.
- The invention also relates to an apparatus for visualizing directional sound activity of a multichannel audio signal, comprising:
-
- an input for receiving time-dependent input audio signals for a plurality of input channels,
- a processor and a memory for:
- performing a time-frequency conversion of said time-dependent input audio signals for converting each one of the time-dependent input audio signals into a plurality of time-frequency representations for the input channel associated with said time-dependent input audio signal, each time-frequency representation corresponding to a time-frequency tile defined by a time frame and a frequency sub-band, the time-frequency tiles being the same for the different input channels,
- for each time-frequency tile, determining an active directional vector and a reactive directional vector from time-frequency representations of different input channels for said time-frequency tile, wherein the active directional vector is determined from a real part of a complex intensity vector and the reactive directional vector is determined from an imaginary part of the complex intensity vector,
- for each time-frequency tile, determining positions of virtual sound sources with respect to the reference listening point in a virtual spatial configuration from the active directional vector and the reactive directional vector, and determining time-frequency signal values for each virtual sound sources,
- for each time-frequency tile, determining a directional sound activity vector from the virtual sound sources,
- determining a contribution of each one of said directional sound activity vectors within sub-divisions of space on the basis of directivity information related to each sub-divisions of space,
- for each sub-division of space, determining directional sound activity data within said sub-division of space by summing said contributions within said sub-division of space,
- a processor and a memory for:
- a visualizing unit for displaying a visualization of the directional sound activity of the multichannel audio signal.
- an input for receiving time-dependent input audio signals for a plurality of input channels,
- Other aspects, objects and advantages of the present invention will become better apparent upon reading the following detailed description of preferred embodiments thereof, given as a non-limiting example, and made with reference to the appended drawings wherein:
-
FIG. 1 , already discussed, shows an example of prescribed positions of loudspeakers with respect to a reference listening point in a prescribed spatial configuration for multichannel audio system; -
FIG. 2 is a diagram showing steps of the method; -
FIG. 3 is a diagram showing stages of the signal processing in the method; -
FIG. 4 shows schematically an example of a relationship between the active directional vector and the reactive directional vector with the locations of virtual sound sources; -
FIG. 5 shows schematically an example of a virtual spatial configuration with two virtual sound sources, and the active directional vector and the reactive directional vector, and the cardioids of the two corresponding virtual microphones; -
FIG. 6 shows schematically an example of a virtual spatial configuration with three virtual sound sources and the cardioids of the three corresponding virtual microphones, as well as the active directional vector and the reactive directional vector; -
FIG. 7 illustrates a display layout according to an embodiment of the present invention. - The operation of a directional sound activity analyzing unit, which may be part of a device comprising a processor, typically a computer, further provided with means for acquiring audio signals and means for displaying a visualization of sound activity data, for example visual display unit such as a screen or a computer monitor. The directional sound activity analyzing unit comprises means for executing the described method, such as a processor or any computing device, and a memory for buffering signals or storing various process parameters.
- The directional sound activity analyzing unit receives an input signal constituted by a multichannel audio signal. This multichannel audio signal comprises K time-dependent input audio signals associated with K input audio channels, each time-dependent input audio signal being associated with an input channel. Each channel is associated with spatial information. Spatial information describes the location of the associated loudspeaker relative to the listener's location, called the reference listening point. For example, spatial information can be coordinates or angles and distances used to locate a loudspeaker with respect to the reference listening point, generally a listener's recommended location. Typically, three values per audio channel are provided to describe this localization. Spatial parameters constituting said spatial information may then be represented by a K×3 matrix.
- An input receives the multichannel audio signal comprising time-dependent input audio signals for a plurality of input channels (step S01). Each time-dependent input audio signal is associated with an input channel. Each input channel corresponds to a prescribed position of an electroacoustic transducer with respect to a reference listening point in a prescribed spatial configuration. For example, in the prescribed spatial configuration shown by
FIG. 1 , there are five input channels, one for each loudspeaker LS, L, C, R, RS. - Under the plane-wave model assumption, the position of a sound source (e.g. the location of each loudspeaker) may be defined solely by the direction of the sound source with respect to the reference listening point. A unitary vector is then sufficient to locate a sound source. Accordingly, each of the prescribed positions defines a unitary vector {right arrow over (al)} representing the sound direction and originating from the reference listening point and pointing in the direction of each loudspeaker. As a result, each input channel i is associated with a sound direction {right arrow over (al)} defined between the reference listening point and the prescribed position of the loudspeaker associated with said input channel i. For example, in the prescribed spatial configuration shown in
FIG. 1 , the location of the loudspeaker C is defined by the sound vector {right arrow over (aC)} that originates from the reference listening point O and towards the location of the loudspeaker C on the unitary circle. This sound vector {right arrow over (aC)} extends in the front of the listening point. In a similar way, the location of the loudspeaker L is defined by the sound vector {right arrow over (aL)} that originates from the reference listening point O and towards the location of the loudspeaker L on the unitary circle. In this example, the directions of the sound vector {right arrow over (aC)} and of the sound vector {right arrow over (aL)} are at an angle of 30°. - The directional sound activity analyzing unit receives these input audio channels, and then determines directional sound activity levels to be displayed for visualizing the directional sound activity of a multichannel audio signal. The directional sound activity analyzing unit is configured to perform the steps of the above-described method. The method is performed on an extracted part of the input signal corresponding to a temporal window. For example, a 50 ms duration analysis window can be chosen for analyzing the directional sound activity within said window.
- Frequency Analysis
- First, a frequency band analysis aims at estimating the sound activity level for a predetermined number of frequency sub-bands for each channel of the windowed multichannel audio signal.
- The received time-dependent input audio signals ai(t) may be analog, but they preferably are digital signals. There are as many input audio signals ai(t) as input channels i. During the frequency analysis (step S10), the time-dependent input audio signals ai(t) are converted into the frequency domain by performing a time-frequency conversion (step S02). Typically, the time-frequency conversion uses a Fourier-related transform such as a Short-time Fourier transform (STFT), which is used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time.
- More precisely, each time-dependent input audio signal ai(t) is converted into a plurality of time-frequency representations Ai(k, n) for the input channel i associated with said time-dependent input audio signal. Each time-frequency representation Ai(k, n) corresponds to a time-frequency tile defined by a time frame and a frequency sub-band. The conversion is made on a frame-by-frame basis.
- Preferably, the frame length is comprised between 5 ms and 80 ms. Preferably, the width of the frequency sub-band is comprised between 10 Hz and 200 Hz. Preferably the inter-frame spacing is comprised between 1/16th and one half of the frame length. For instance, for a sampling rate of 48 kHz and an FFT-based STFT processing framework, the frame length may be of 1024 samples with a related frequency sub-band width (or bin width) of 46.875 Hz and an inter-frame spacing of 512 samples. The time-frequency tiles are the same for the different input channels i.
- The frequency sub-bands are subdivisions of the frequency band of the audio signal, which can be divided into sub-bands of equal widths or preferably into sub-bands whose widths are dependent on human hearing sensitivity to the frequencies of said sub-bands.
- In the following, k is used as a frequency index of a frequency sub-band and n is a frame index, so that the time-frequency representation Ai(k,n) refers to a complex number associated with the kth frequency sub-band and the nth frame of the signal of the input channel i. The time-frequency representations Ai (k, n) and the sound directions {right arrow over (al)} are then used in a time-frequency processing (step S03) wherein the data of a time-frequency tile are processed.
- Spatial Analysis
- Spatial analysis (step S11) is performed from time-frequency representations Ai(k, n) and the sound directions {right arrow over (al)} of a time-frequency tile. For each time-frequency tile, an active directional vector {right arrow over (Da)}(k, n) and a reactive directional vector Dr(k, n) are determined (step S31) from time-frequency representations Ai(k, n) of different input channels for said time-frequency tile.
- The active directional vector {right arrow over (Da)}(k, n) of a time-frequency tile is proportional to the active acoustical intensity vector which is representative of the sound energy flow at the reference listening point for the time frame and the frequency sub-band of said time-frequency tile. More specifically, the active directional vector {right arrow over (Da)}(k,n) corresponds to the active acoustical intensity vector, normalized by the sum of the acoustic energies EP(k, n) and EK(k, n) at the reference listening point O, with an added minus sign in order to have it directed from the reference listening point O towards the unitary circle. It is possible to use a different normalization or to omit the minus sign, in which case the vectors would be pointing towards the reference listening point O.
- The reactive directional vector {right arrow over (Dr)}(k, n) is proportional to the reactive acoustical intensity vector which is representative of acoustic perturbations at the reference listening point with respect to the sound energy flow for the same time-frequency tile. More specifically, the reactive directional vector {right arrow over (Dr)}(k, n) corresponds to the reactive acoustical intensity vector, normalized by the sum of the acoustic energies EP(k, n) and EK(k, n) at the reference listening point O. A minus sign is also added but could be omitted. As for the active directional vector, it is possible to use a different normalization.
- From a perceptual point of view, if the active directional vector {right arrow over (Da)}(k, n) can be related to the primary directional sound field, the reactive directional vector {right arrow over (Dr)}(k, n) is related to the ambient diffuse sound field. Moreover, the directional information of the reactive directional vector {right arrow over (Dr)}(k, n) enables the handling of the spatial characteristics of this ambient sound field, and thus it can be used to describe not only totally diffused ambient sound fields but also partially diffused ones.
- This new approach is by nature more robust as it takes benefits of the reliability of the active directional vector {right arrow over (Da)}(k,n) which is a true acoustical spatial cue (compared to the Gerzon vectors which are empiric perceptual cues), but also exploits the diffuseness of sound through the reactive directional vector {right arrow over (Dr)}(k, n).
- It has been found that the combination of the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) may be used to identify the locations of sound sources, as depicted with the example on
FIG. 4 . In thisFIG. 4 , sound distribution is represented by two virtual sound sources VS1 and VS2 arranged on a unitary circle centered on the reference listening point O. The active directional vector {right arrow over (Da)}(k, n) originates from the reference listening point O and is directed along the main acoustical flow. In this example, the two uncorrelated sound sources VS1, VS2 are of equal energy (for that time-frequency tile). As a result, the perceived acoustical energy flow at the reference listening point O comes from the middle of the two sound sources VS1, VS2, and therefore the active directional vector {right arrow over (Da)}(k,n) extends between the two sound sources VS1, VS2. The reactive directional vector {right arrow over (Dr)}(k, n) is here perpendicular to the active directional vector {right arrow over (Da)}(k,n), and the location of a sound source VS1, VS2 corresponds to the sum of the active directional vector {right arrow over (Da)}(k,n) and of the reactive directional vector {right arrow over (Dr)}(k, n) or of the opposite of the reactive directional vector {right arrow over (Dr)}(k, n). - However, most of the time, the sound sources VS1, VS2 are not totally uncorrelated. It has been found that whatever the exact locations of the two sound sources VS1, VS2, the reactive intensity is maximal when the source signals are totally uncorrelated. Conversely, the reactive intensity is minimal when the source signals are totally correlated. In a similar way, where the sound source signals are totally uncorrelated, the reactive intensity is maximal when the source directions are spatially negatively correlated (i.e. opposite) with respect to the reference listening point O. Conversely, the reactive intensity is minimal when the source directions are spatially correlated (i.e. in the same direction) with respect to the reference listening point O.
- For determining the active directional vector {right arrow over (Da)}(k,n) and the reactive directional vector {right arrow over (Dr)}(k, n), the prescribed positions of the loudspeakers with respect to the reference listening point O in a prescribed spatial configuration are used. As indicated above, each input channel i is associated with a sound direction {right arrow over (al)} defined between the reference listening point O and the prescribed position of the loudspeaker associated with said input channel i.
- A sound pressure value P(k, n) for a time-frequency tile defined by a sum of the time-frequency representations Ai(k, n) of the different input channels of the same for said time-frequency tile is determined:
-
- A sound velocity vector {right arrow over (V)}(k, n) for the time-frequency tile is determined, said sound velocity vector {right arrow over (V)}(k, n) being proportional to a sum of each sound direction {right arrow over (al)} weighted by the time-frequency representation Ai(k,n)corresponding to the input channel i associated with said sound direction {right arrow over (al)}:
-
- with {right arrow over (ex)}, {right arrow over (ey)}and {right arrow over (ez)} the unitary vectors of a coordinate system used as a reference frame for the virtual spatial configuration, ρthe density of air and c the speed of sound. For example, the speed of sound in dry air at 20° C. is 343.2 meters per second, which may be approximated to 340 m.s−1. At sea level and at 15° C., air density is approximately 1.225 kg/m3, which may be approximated to 1.2 kg/m3. Other values may be used.
- A complex intensity vector {right arrow over (I)}(k, n) resulting from a complex product between a conjugate of the sound pressure value P (k, n) for a time-frequency tile and the sound velocity vector {right arrow over (V)}(k, n) for said time-frequency tile is determined:
-
{right arrow over (I)}(k, n)=P(k, n)*{right arrow over (V)}(k, n) - and is used to determine the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) of said time-frequency tile. More precisely, the active directional vector {right arrow over (Da)}(k, n) is determined from the real part of the complex product {right arrow over (I)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) is determined from the imaginary part of the complex product {right arrow over (I)}(k, n).
- The active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) may be calculated as follows:
-
- It shall be noted that the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) are here normalized by the energies EK(k, n) and EP(k,n), but could be calculated otherwise. It shall be noted that that a minus sign is added to the expressions of the active directional vector {right arrow over (Da)}(k, n) and reactive directional vector {right arrow over (Dr)}(k, n) in order to have them directed from the reference listening point O towards the unitary circle. It would be possible to omit the minus sign, in which case the vectors would be pointing towards the reference listening point O.
- Once the active directional vector {right arrow over (Da)}(k, n), the reactive directional vector {right arrow over (Dr)}(k, n), the sound pressure value P(k, n) and the sound velocity vector {right arrow over (V)}(k, n) (or the equivalents thereof) have been determined, it is possible to perform the audio source extraction (step S12) for determining positions and time-frequency signal values of virtual sound sources (step S32).
- Audio Source Extraction
- The method requires determining the attributes (position and time-frequency signal values) of virtual sound sources that will be used thereafter to determine the signals of the electroacoustic transducers of the actual spatial configuration.
- For each time-frequency tile, the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) are used to determine the positions of the virtual sound sources with respect to the reference listening point in a virtual spatial configuration (step S32).
- The determined positions of the virtual sound sources, the active directional vector {right arrow over (Da)}(k,n), the reactive directional vector {right arrow over (Dr)}(k, n), the sound pressure value P(k, n) and the sound velocity vector {right arrow over (V)}(k,n) are used to determine virtual first-order directional microphone signals (step S122) corresponding to the sounds that would be acquired by virtual microphones arranged at the reference listening point O and directed towards each virtual sound sources. There are as many virtual microphones as virtual sound sources.
- A virtual microphone signal is a function of the sum of the sound pressure value P(k, n), and of the scalar product between the sound velocity vector {right arrow over (V)}(k, n) and a unitary vector in the direction of a sound source, possibly weighted by the density of air P and the speed of sound c . For example, a virtual cardioid microphone signal Mj(k, n) associated with a virtual sound source arranged in the direction defined by {right arrow over (sj)}(k,n) can be calculated as follows:
-
- A virtual microphone signal highlights the sound of the corresponding virtual sound source perceived at the reference listening point O, but also contains interferences from the other virtual sound sources. However, defining the virtual microphone signals for every virtual sound source allows identifying the virtual sound source signal of each virtual sound source.
- It shall be noted that spatial manipulation may be performed by modifying the positions of the virtual sound sources. This approach is much safer than modifying the input channel data side defining the prescribed positions, because the original primary/ambient energy ratio is kept.
- The details of the source extraction process however change depending on the number of virtual sound sources. The audio source extraction process estimates the locations and frequency signal values of virtual sound sources that generate the same sound field characteristics as the sound field defined by the time-dependent input audio signals in the prescribed configuration. Source-related sound field models need to be defined, as the audio source extraction process may be highly different from one model to another. Two reliable models with the analysis based on the exploitation of both the active and reactive components of the acoustical intensity are described below: a model with two virtual sound sources and a model with three virtual sound sources.
- The “two-source” model handles the diffuseness (and thus makes use of the reactive component) as an indicator of the perceptual width of a sound source or local diffuseness. Two sound sources are sufficient to simulate a wider sound source, their spatial and signal correlation defining the perceived wideness of this composite sound source. The “three-source” model handles the diffuseness (and thus makes use of the reactive component) as an indicator of the ambience level within the sound scene or global diffuseness. Two uncorrelated sound sources of opposite directions are suitable to simulate this ambient component, in addition to a first virtual sound source corresponding to the primary component. It is explained below how to proceed with two virtual sound sources or three virtual sound sources.
- Source Extraction: Two Virtual Sound Sources
- In a spatial configuration of a unitary circle centered on the reference listening point O, the virtual sound sources are positioned on the unitary circle. A position of a virtual sound source is therefore at the intersection of the unitary circle with a directional line extending from the reference listening point. The position of each virtual sound source can be defined by a unitary source direction vector {right arrow over (sj)}(k, n) originating from the reference listening point. This is shown in
FIG. 5 . - As indicated above, the first step of the source extraction consists in determining the positions of the two virtual sound sources (step S121). As shown in
FIG. 5 , each unitary source direction vector {right arrow over (sj)}(k, n) is defined through the active directional vector {right arrow over (Da)}(k, n) and reactive directional vector {right arrow over (Dr)}(k, n). More precisely, a virtual sound source is located at the intersection of -
- the unitary circle and
- a line collinear with the reactive directional vector {right arrow over (Dr)}(k, n) and passing through the tip of the active directional vector {right arrow over (Da)}(k, n) originating from the reference listening point.
- If the analyzed sound field is generated by two uncorrelated sound sources (not necessary of equal energy), this technique enables to retrieve the exact location of those two sound sources. If the two sound sources used to generate the sound field tend to be in-phase (respectively opposite-phase), their exact locations cannot be retrieved anymore. The technique over-estimates (respectively under-estimates) the spatial correlation between the two sound source directions. However, this relationship between signal correlation and spatial correlation is perceptively coherent.
- Determining the locations of the two virtual sound sources VS1, VS2 is equivalent to solving a geometry problem of the intersection of a line with a circle (or a sphere for three-dimensional sound field). Solving this problem is equivalent to solving a second order equation, which solutions are
-
- It shall be noted that there are:
-
- a position of a first virtual sound source VS1 defines, with the reference listening point O, a direction resulting from the sum of the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) weighted by a positive factor, and
- a position of a second virtual sound source VS2 defines, with the reference listening point O, a direction resulting from the sum of the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n) weighted by a negative factor.
- We thus have a source direction vector {right arrow over (s1)}(k, n) of a first virtual sound source VS1, and a source direction vector {right arrow over (s2)}(k, n) of a second virtual sound source VS2. As depicted in
FIG. 5 , these source direction vectors {right arrow over (s1)}(k, n), {right arrow over (s2)}(k, n) localize the virtual sound sources VS1, VS2 on the unitary circle centered on the reference listening point O. - As explained above, after the computation of the directions of the two virtual sound sources VS1, VS2, it is possible, by combining the sound pressure value P(k, n) and the sound velocity vector {right arrow over (V)}(k, n) to the source direction vectors {right arrow over (s1)}(k, n), {right arrow over (s2)}(k, n), to create two virtual directional microphones. As depicted in
FIG. 5 , the two virtual directional microphones may have a cardioid directivity patterns VM1, VM2 in the directions of the source direction vectors {right arrow over (s1)}(k, n), {right arrow over (s2)}(k, n). The virtual microphone pick-up in these two directions may then be estimated by virtual microphone signals M1(k, n), M2(k, n) defined as follows: -
- As explained above, each virtual microphone signal highlights the sound signal of the corresponding virtual sound source VS1, VS2 perceived at the reference listening point O, but also contains interferences from the other virtual sound source:
-
- where S1 (k, n) is the time-frequency signal value of the first virtual sound source VS1 and S2(k, n) is the time-frequency signal value of the second virtual sound source VS2. A last processing step permits to extract the time-frequency signal values S1(k, n), S2(k, n) of each virtual sound source by unmixing the source signals from the virtual microphone signals (step S123):
-
- The positions of the two virtual sound sources VS1, VS2, defined by the source direction vectors {right arrow over (s1)}(k, n) and {right arrow over (s2)}(k, n), and their respective time-frequency signal values S1(k, n) and S2(k, n) have been determined.
- It shall be noted that the two virtual sound sources VS1, VS2 are equivalent, in the sense that they contain both primary component (through the active directional vector {right arrow over (Da)}(k, n)) and ambient components (through the reactive directional vector {right arrow over (Dr)}(k, n)). An ambience extraction processing may be performed for implementing additional refinement.
- Audio Source Extraction: Three Virtual Sound Sources
- As explained before, the first step of the audio source extraction consists in determining the positions of the three virtual sound sources, through unitary source direction vectors {right arrow over (sj)}(k, n) defined by the active directional vector {right arrow over (Da)}(k,n) and reactive directional vector {right arrow over (Dr)}(k, n). In a spatial configuration of a unitary circle centered on the reference listening point O, the virtual sound sources are positioned on the unitary circle. A position of a virtual sound source is therefore at the intersection of the unitary circle with a directional line extending from the reference listening point. The position of each virtual sound source can be defined by a unitary source direction vector {right arrow over (sj)}(k, n) originating from the reference listening point. The unitary source direction vector {right arrow over (sj)}(k, n) is defined through the active directional vector {right arrow over (Da)}(k, n) and reactive directional vector {right arrow over (Dr)}(k, n). This is shown in
FIG. 6 . - As already explained, the active directional vector {right arrow over (Da)}(k, n) indicates the main perceptual sound event direction, the reactive intensity indicates the “direction of maximum perceptual diffuseness”. Using three virtual sound sources VS1, VS2, VS3 thus appears relevant to approximate the sound field properties:
-
- one virtual sound source VS1 is in the direction of the active directional vector {right arrow over (Da)}(k, n) to represent the reconstruction of the main acoustic flow, and
- two virtual sound sources VS2, VS3 negative spatially correlated, in the direction of the reactive directional vector {right arrow over (Dr)}(k, n) and its opposite direction, respectively, to represent the acoustic perturbations of the acoustic field.
- As a consequence, there are:
-
- a position of a first virtual sound source VS1 defines with the reference listening point O a direction which is collinear to the direction of the active directional vector {right arrow over (Da)}(k, n) from the reference listening point,
- a position of a second virtual sound source VS2 defines with the reference listening point O a direction which is collinear to the direction of the reactive directional vector {right arrow over (Dr)}(k, n) from the reference listening point with a first orientation,
- a position of a third virtual sound source VS3 defines with the reference listening point a direction which is collinear to the direction of the reactive directional vector {right arrow over (Dr)}(k, n) from the reference listening point O with a second orientation opposite to the first orientation.
- Indeed, determining the positions of the virtual sound sources VS1, VS2, VS3 is much simpler for the three-source model than for the two-source model, since their source direction vectors {right arrow over (sl)}(k, n) are directly computed from the active directional vector {right arrow over (Da)}(k, n) and the reactive directional vector {right arrow over (Dr)}(k, n):
-
- with a first source direction vector {right arrow over (s1)}(k, n) of a first virtual sound source VS1, a second source direction vector {right arrow over (s2)}(k, n) of a second virtual sound source VS2, and a third source direction vector {right arrow over (s3)}(k, n) of a third virtual sound source VS3. As depicted in
FIG. 5 , these source direction vectors localize the virtual sound sources VS1, VS2, VS3 on the unitary circle centered on the reference listening point O. - As explained above, after the computation of the directions of the three virtual sound sources VS1, VS2, VS3, it is possible, by combining the sound pressure value P(k, n), the sound velocity vector {right arrow over (V)}(k, n) to a source direction vector, to create three virtual directional microphones. As depicted in
FIG. 6 , the three virtual directional microphones may have a cardioid directivity patterns VM1, VM2, VM3 in the directions of the source direction vectors {right arrow over (s1)}(k, n){right arrow over (s2)}(k, n), {right arrow over (s3)}(k, n). The virtual microphone pick-ups in these three directions may then be estimated by virtual microphone signals defined as follows: -
- As explained above, each virtual microphone signal M1(k, n) , M2(k, n) , M3(k, n) highlights the sound of the corresponding virtual sound source VS1, VS2, VS3 perceived at the reference listening point O, but also contains interferences from the other virtual sound source VS1, VS2, VS3. More precisely, since the second source direction vector {right arrow over (s2)}(k, n) and the third source direction vector {right arrow over (s3)}(k, n) are of opposite direction, interference between the second virtual sound source VS2 and the third virtual sound source VS3 is negligible, whereas they both interfere with the first virtual sound source VS1:
-
- A last processing step (step S123) permits to extract the time-frequency signal value of each virtual sound source by unmixin the source time-frequency values:
-
- Contrary to the model with two virtual sound sources, the three virtual sound sources are already decomposed between primary components and ambient components:
-
- the first virtual sound source VS1 corresponds to the primary component, and
- the second virtual sound source VS2 and third virtual sound source VS3 correspond to the ambient components.
- Directional Sound Activity Vector
- Once the attributes of the virtual sound sources have been determined (positions and time-frequency signal values), it is possible to determine a directional sound activity vector related to a time-frequency tile from the virtual sound sources. This directional sound activity vector represents the predominant sound direction that would be perceived by a listener according to the recommended loudspeaker layout for sounds within the particular frequency sub-band of the time-frequency tile.
- The attributes of the directional sound activity vector is calculated from the positions and time-frequency signal values of the virtual sound sources.
- With three virtual sound sources, the energy vectors relative to the sound sources of a time-frequency tile are:
-
- The energy vector sum representative for the perceived directional energy is then:
-
- The first virtual sound source VS1 is more related to the main perceptual sound event direction and the two other virtual sound sources VS2, VS3 more related to the direction of the maximum perceptual diffuseness. It may be then relevant to take only the first virtual sound source VS1 for the directional sound activity vector. Generally, a weighting of the different virtual sound sources VS1, VS2, VS3 may be used, with a source-weighted directional sound activity vector expressed as:
-
- with ωj weighting factors between 0 and 1. Preferably, the sum of the weighting factors ωj is 1. Preferably, none of the weighting factors ωj is 0. Preferably, the weighting factors ω2 and ω3 are equal. Preferably, ω1>ω2, and ω1>ω3.
- With two virtual sound sources, it is also possible to use a weighted sum of the energy vectors relative to the two sound sources as with the three virtual sound sources. Preferably, the two weighting factors ω1 and ω2 are equal.
- Frequency Masking
- An optional, however advantageous, frequency masking (step S13) can adapt directional sound activity vectors according to their respective frequency sub-bands. In order to tune reactivity with respect to sound frequencies, the norms of the directional sound activity vectors can be weighted based on their respective frequency sub-bands. The weighted directional sound activity vector is then
-
{right arrow over (G)}[k, n]=∝[k]. {right arrow over (E)}[k,n] - where a[k] is a weight, for instance between 0 and 1, which depends on the frequency sub-band of each directional sound activity vector. Such a weighting allows enhancing particular frequency sub-bands of particular interest for the user. This feature can be used for discriminating sounds based on their frequencies. For instance, frequencies related to particularly interesting sounds can be enhanced in order to distinguish them from ambient noise. The directional sound analyzing unit can be fed with spectral sensitivity parameters which define the weight attributed to each frequency sub-band.
- In order to directionally visualize sound activity, space is divided into sub-divisions which are intended to discretely represent the acoustic environment of the listener.
FIG. 7 shows an example of such a divided space relative to a 5.1 loudspeaker layout. A polar representation of the listener's environment is divided into Msimilar sub-divisions 6 circularly disposed around the reference listening point in a central position representing the listener's location. Loudspeakers of the recommended layout ofFIG. 1 are represented for comparison. - For each frequency sub-band, the dominant sound direction and the sound activity level associated to said direction is now determined and described by the directional sound activity vector, preferably weighted as described above. The visualization of such directional information must be very intuitive so that sound direction information can be restituted to the user without interfering with other source of information.
- The beam clustering stage (S14) corresponds to allocating to each of the sub-division a part of each frequency sub-band sound activity.
- To this end the contributions of each frequency sub-band sound activity to each sub-division of space are determined on the basis of directivity information. For each sub-division of space, a directional sound activity level is determined within said sub-division of space by combining, for instance by summing, the contributions of said frequency sub-band sound activity to said sub-division of space.
- Directivity information is associated to each
sub-division 6. Such directivity information relates to level modulation as a function of direction in an oriented coordinate system, typically centered on a listener's position. This directivity information can be described by a directivity function which associates a weight to space directions in an oriented coordinate system. Typically, such a directivity function exhibits a maximum for a direction associated with the related sub-division. - For each
sub-division 6 of space, norms of directional sound activity vectors are weighted on the basis of a directivity information associated with saidsub-division 6 of space and the directions of said directional sound activity vectors. These weighted norms can thus represent the contribution of said directional sound activity vectors within said sub-divisions of space. - For instance, a directivity function can be parameterized by a beam vector {right arrow over (vm)} and an angular value θm corresponding to the angular width of the beam, wherein m identifies a space sub-division. The direction associated with a
sub-division 6 can be the main direction defined by the beam vector {right arrow over (vm)}. Accordingly, the angular distance between a beam vector {right arrow over (vm)} and a directional sound activity vector {right arrow over (G)}[k] can define the clustering weight Cm[k]. For instance, a simple directional weighting function may be 1 if the angular distance between a beam vector {right arrow over (vm)} and a directional sound activity vector {right arrow over (G)}[k] is less than θm/2 and 0 otherwise: -
- The beam vector {right arrow over (vm)} and the angular value θm used for define the parameters of the directivity function can constitute an example of directivity information by which contribution of each one of said directional sound activity vectors within sub-divisions of space can be estimated.
- The directional sound activity within a beam or sub-division of space can then be determined by summing said contributions, such as weighted norms in this example, of said directional sound activity vectors related to L frequency sub-bands:
-
- Once determined, the directional sound activity for each of the M beam can be fed to a visualizing unit, typically to a screen associated with the computer which comprises or constitutes the directional sound analyzing unit.
- For every
space sub-division 6, such as the beams illustrated inFIG. 7 , directional sound activity can then be displayed for visualization (step S04). A graphical representation of directional sound activity level within said sub-division of space is displayed, as inFIG. 7 . In the displayed graphical representation, sub-divisions of space are organized according to their respective location within said space, so as to reconstruct the divided space. -
FIG. 7 shows a configuration wherein the directional sound activity is restricted in two different beams, suggesting that sound sources related to different frequencies are located in the directions related to these two beams. It shall be noted that at least one beam 16 a shows a directional sound activity without having a direction that corresponds to a loudspeaker recommended orientation. As can be seen, a user can easily and accurately infer sound source directions, and thus can retrieve sound direction information originally conveyed by the multichannel audio input signal. - Other graphical representation can be used, such a radar chart wherein directional sound activity levels are represented on axes starting from the center, lines or curves being drawn between the directional sound activity levels of adjacent axes. Preferably, the lines or curves define a colored geometrical shape containing the center.
- The invention thus allows sound direction information to be delivered to the user even if said user does not possess the recommended loudspeaker layout, for example with headphones. It can also be very helpful for hearing-impaired people or for users who must identify sound directions quickly and accurately.
- Preferably, the graphical representation shows several directional sound activity levels for each sub-division, these directional sound activity levels being calculated with different frequency masking parameters.
- For example, at least two set of spectral sensitivity parameters are chosen to parameterize two frequency masking process respectively used in two directional sound activity level determination processes. The two set of directional sound activity vectors determined from the same input audio channels are weighted based on their respective frequency sub-bands in accordance with two different set of weighting parameters. Consequently, for each sub-division, each one of the two directional sound activity levels enhanced some particular frequencies in order to distinguish different sound types.
- The two directional sound activities can then be displayed simultaneously within the same sub-divided space, for example with a color code for distinguishing them and a superimposition, for instance based on level differences.
- The method of the present invention as described above can be realized as a program and stored into a non-transitory tangible computer-readable medium, such as CD-ROM, ROM, hard-disk, having computer executable instructions embodied thereon that, when executed by a computer, perform the method according to the invention.
- While the present invention has been described with respect to certain preferred embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined in the appended claims.
Claims (17)
1. A method for visualizing a directional sound activity of a multichannel audio signal, wherein the multichannel audio signal comprises time-dependent input audio signals, each time-dependent input audio signal being associated with an input channel, spatial information with respect to a reference listening point being associated with each one of said channel, the method comprising:
receiving the time-dependent input audio signals;
performing a time-frequency conversion of said time-dependent input audio signals for converting each one of the time-dependent input audio signals into a plurality of time-frequency representations for the input channel associated with said time-dependent input audio signal, each time-frequency representation corresponding to a time-frequency tile defined by a time frame and a frequency sub-band, the time-frequency tiles being the same for the different input channels;
for each time-frequency tile, determining positions of at least two virtual sound sources with respect to the reference listening point and frequency signal values for each virtual sound source from an active directional vector and a reactive directional vector determined from time-frequency representations of different input audio signals for said time-frequency tile, wherein the active directional vector is determined from a real part of a complex intensity vector and the reactive directional vector is determined from an imaginary part of the complex intensity vector;
for each time-frequency tile, determining a directional sound activity vector from the virtual sound sources,
determining a contribution of each one of said directional sound activity vectors within sub-divisions of space on the basis of directivity information related to each sub-divisions of space;
for each sub-division of space, determining directional sound activity level within said sub-division of space by summing said contributions within said sub-division of space;
displaying a visualization of the directional sound activity of the multichannel audio signal by a graphical representation of directional sound activity level within said sub-division of space.
2. The method of claim 1 , wherein the active directional vector of a time-frequency tile is representative of the sound energy flow at the reference listening point for the time frame and a frequency sub-band of said time-frequency tile, and wherein the reactive directional vector is representative of acoustic perturbations at the reference listening point with respect to the sound energy flow.
3. The method according to claim 1 , wherein each input channel is associated with a sound direction defined between the reference listening point and the prescribed position of the speaker associated with said input channel, and a sound velocity vector is determined as a function of a sum of each sound direction weighted by the time-frequency representation corresponding to the input channel associated with said sound direction, said sound velocity vector being used to determine the active directional vector and the reactive directional vector.
4. The method according to claim 1 , wherein a sound pressure value defined by a sum of the time-frequency representations of the different input channels is used to determine the active directional vector and the reactive directional vector.
5. The method according to claim 1 wherein the complex intensity vector results from a complex product between a conjugate of a sound pressure value for a time-frequency tile and a sound velocity vector for said time-frequency tile.
6. The method according to claim 1 , wherein for determining time-frequency signal values of each one of the virtual sound sources, virtual microphone signals are determined, each virtual microphone signal being associated with a virtual sound source and corresponding to the signal that would acquire a virtual microphone arranged at the reference listening point and oriented in the direction toward the position of said virtual sound source.
7. The method according to claim 6 , wherein the time-frequency signal value of a virtual sound source is determined by suppressing, in the virtual microphone signal associated with said virtual sound source, the interferences from other virtual sound sources.
8. The method according to claim 6 , wherein the virtual sound sources are arranged on a circle centered on the reference listening point and a virtual microphone signal corresponds to the signal that would acquire a virtual cardioid microphone having an cardioid directivity pattern in the shape of a cardioid tangential to the circle centered on the reference listening point.
9. The method according to claim 1 , wherein there are three virtual sound sources for each time-frequency tile, each virtual sound source having a position with respect to the reference listening point, wherein:
a position of a first virtual sound source defines with the reference listening point a direction which is collinear to the direction of the active directional vector from the reference listening point,
a position of a second virtual sound source defines with the reference listening point a direction which is collinear to the direction of the reactive directional vector with a first orientation,
a position of a third virtual sound source defines with the reference listening point a direction which is collinear to the direction of the reactive directional vector with a second orientation opposite to the first orientation.
10. The method according to claim 1 , wherein there are two virtual sound sources for each time-frequency tile, each virtual sound source having a position with respect to the reference listening point, and wherein:
a position of a first virtual sound source defines with the reference listening point a direction resulting from the sum of the active directional vector and the reactive directional vector weighted by a positive factor, and
a position of a second virtual sound source defines with the reference listening point a direction resulting from the sum of the active directional vector and the reactive directional vector weighted by a negative factor.
11. The method of claim 1 , wherein directional information used for determining the contribution of a directional sound activity vector within a sub-division of space is an angular distance between a direction associated with said sub-division of space and the direction of said directional sound activity vector.
12. The method according to claim 1 , wherein the contribution of a directional sound activity vector within a sub-division of space is determined by weighting a norm of said directional sound activity vector on the basis of an angular distance between a direction associated with said sub-division of space and the direction of said directional sound activity vector.
13. The method of claim 1 , wherein norms of the directional sound activity vectors are further weighted based on their respective frequency sub-bands.
14. The method of claim 13 , wherein at least two set of directional sound activity vectors determined from the same input audio channels are weighted based on their respective frequency sub-bands in accordance with two different set of weighting parameters, and the two resulting directional sound activities are displayed on the graphical representation.
15. The method of claim 1 , wherein the visualization of the directional sound activity of the multichannel audio signal comprises representations of said sub-division of space, each provided with a representation of the directional sound activity associated with said sub-division.
16. A non-transitory tangible computer-readable medium having computer executable instructions embodied thereon that, when executed by a computer, perform the method of any one of claims 1 to 15 .
17. An apparatus for visualizing directional sound activity of a multichannel audio signal, comprising:
an input for receiving time-dependent input audio signals for a plurality of input channels,
a processor and a memory for:
performing a time-frequency conversion of said time-dependent input audio signals for converting each one of the time-dependent input audio signals into a plurality of time-frequency representations for the input channel associated with said time-dependent input audio signal, each time-frequency representation corresponding to a time-frequency tile defined by a time frame and a frequency sub-band, the time-frequency tiles being the same for the different input channels,
for each time-frequency tile, determining an active directional vector and a reactive directional vector from time-frequency representations of different input channels for said time-frequency tile, wherein the active directional vector is determined from a real part of a complex intensity vector and the reactive directional vector is determined from an imaginary part of the complex intensity vector,
for each time-frequency tile, determining positions of at least two virtual sound sources with respect to the reference listening point in a virtual spatial configuration from the active directional vector and the reactive directional vector, and determining time-frequency signal values for each virtual sound sources,
for each time-frequency tile, determining a directional sound activity vector from the virtual sound sources,
determining a contribution of each one of said directional sound activity vectors within sub-divisions of space on the basis of directivity information related to each sub-divisions of space,
for each sub-division of space, determining directional sound activity data within said sub-division of space by summing said contributions within said sub-division of space,
a visualizing unit for displaying a visualization of the directional sound activity of the multichannel audio signal.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16306190.6A EP3297298B1 (en) | 2016-09-19 | 2016-09-19 | Method for reproducing spatially distributed sounds |
EP16306190.6 | 2016-09-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20180084364A1 true US20180084364A1 (en) | 2018-03-22 |
Family
ID=57130308
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/707,129 Abandoned US20180084364A1 (en) | 2016-09-19 | 2017-09-18 | Method for Visualizing the Directional Sound Activity of a Multichannel Audio Signal |
US15/708,579 Active US10085108B2 (en) | 2016-09-19 | 2017-09-19 | Method for visualizing the directional sound activity of a multichannel audio signal |
US16/334,333 Active US10536793B2 (en) | 2016-09-19 | 2017-09-19 | Method for reproducing spatially distributed sounds |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/708,579 Active US10085108B2 (en) | 2016-09-19 | 2017-09-19 | Method for visualizing the directional sound activity of a multichannel audio signal |
US16/334,333 Active US10536793B2 (en) | 2016-09-19 | 2017-09-19 | Method for reproducing spatially distributed sounds |
Country Status (5)
Country | Link |
---|---|
US (3) | US20180084364A1 (en) |
EP (1) | EP3297298B1 (en) |
CN (1) | CN110089134B (en) |
TW (1) | TWI770059B (en) |
WO (1) | WO2018050905A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036456A (en) * | 2018-09-19 | 2018-12-18 | 电子科技大学 | For stereosonic source component context components extracting method |
CN113965862A (en) * | 2020-07-20 | 2022-01-21 | 西万拓私人有限公司 | Method for operating a hearing system, hearing device |
Families Citing this family (74)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8930005B2 (en) | 2012-08-07 | 2015-01-06 | Sonos, Inc. | Acoustic signatures in a playback system |
US9965247B2 (en) | 2016-02-22 | 2018-05-08 | Sonos, Inc. | Voice controlled media playback system based on user profile |
US10509626B2 (en) | 2016-02-22 | 2019-12-17 | Sonos, Inc | Handling of loss of pairing between networked devices |
US10743101B2 (en) | 2016-02-22 | 2020-08-11 | Sonos, Inc. | Content mixing |
US9947316B2 (en) | 2016-02-22 | 2018-04-17 | Sonos, Inc. | Voice control of a media playback system |
US10264030B2 (en) | 2016-02-22 | 2019-04-16 | Sonos, Inc. | Networked microphone device control |
US10095470B2 (en) | 2016-02-22 | 2018-10-09 | Sonos, Inc. | Audio response playback |
US9978390B2 (en) | 2016-06-09 | 2018-05-22 | Sonos, Inc. | Dynamic player selection for audio signal processing |
US10152969B2 (en) | 2016-07-15 | 2018-12-11 | Sonos, Inc. | Voice detection by multiple devices |
US10134399B2 (en) | 2016-07-15 | 2018-11-20 | Sonos, Inc. | Contextualization of voice inputs |
US10115400B2 (en) | 2016-08-05 | 2018-10-30 | Sonos, Inc. | Multiple voice services |
US9942678B1 (en) | 2016-09-27 | 2018-04-10 | Sonos, Inc. | Audio playback settings for voice interaction |
US9743204B1 (en) | 2016-09-30 | 2017-08-22 | Sonos, Inc. | Multi-orientation playback device microphones |
US10181323B2 (en) | 2016-10-19 | 2019-01-15 | Sonos, Inc. | Arbitration-based voice recognition |
US11183181B2 (en) | 2017-03-27 | 2021-11-23 | Sonos, Inc. | Systems and methods of multiple voice services |
US10475449B2 (en) | 2017-08-07 | 2019-11-12 | Sonos, Inc. | Wake-word detection suppression |
US10048930B1 (en) | 2017-09-08 | 2018-08-14 | Sonos, Inc. | Dynamic computation of system response volume |
US10446165B2 (en) | 2017-09-27 | 2019-10-15 | Sonos, Inc. | Robust short-time fourier transform acoustic echo cancellation during audio playback |
US10051366B1 (en) | 2017-09-28 | 2018-08-14 | Sonos, Inc. | Three-dimensional beam forming with a microphone array |
US10621981B2 (en) | 2017-09-28 | 2020-04-14 | Sonos, Inc. | Tone interference cancellation |
US10482868B2 (en) | 2017-09-28 | 2019-11-19 | Sonos, Inc. | Multi-channel acoustic echo cancellation |
CN111052770B (en) * | 2017-09-29 | 2021-12-03 | 苹果公司 | Method and system for spatial audio down-mixing |
US10466962B2 (en) | 2017-09-29 | 2019-11-05 | Sonos, Inc. | Media playback system with voice assistance |
CN107890673A (en) * | 2017-09-30 | 2018-04-10 | 网易(杭州)网络有限公司 | Visual display method and device, storage medium, the equipment of compensating sound information |
US10880650B2 (en) | 2017-12-10 | 2020-12-29 | Sonos, Inc. | Network microphone devices with automatic do not disturb actuation capabilities |
US10818290B2 (en) | 2017-12-11 | 2020-10-27 | Sonos, Inc. | Home graph |
US11343614B2 (en) | 2018-01-31 | 2022-05-24 | Sonos, Inc. | Device designation of playback and network microphone device arrangements |
GB2572419A (en) * | 2018-03-29 | 2019-10-02 | Nokia Technologies Oy | Spatial sound rendering |
GB2572420A (en) * | 2018-03-29 | 2019-10-02 | Nokia Technologies Oy | Spatial sound rendering |
US11175880B2 (en) | 2018-05-10 | 2021-11-16 | Sonos, Inc. | Systems and methods for voice-assisted media content selection |
US10847178B2 (en) | 2018-05-18 | 2020-11-24 | Sonos, Inc. | Linear filtering for noise-suppressed speech detection |
US10959029B2 (en) | 2018-05-25 | 2021-03-23 | Sonos, Inc. | Determining and adapting to changes in microphone performance of playback devices |
CN108854069B (en) * | 2018-05-29 | 2020-02-07 | 腾讯科技(深圳)有限公司 | Sound source determination method and device, storage medium and electronic device |
US10681460B2 (en) | 2018-06-28 | 2020-06-09 | Sonos, Inc. | Systems and methods for associating playback devices with voice assistant services |
US10461710B1 (en) | 2018-08-28 | 2019-10-29 | Sonos, Inc. | Media playback system with maximum volume setting |
US11076035B2 (en) | 2018-08-28 | 2021-07-27 | Sonos, Inc. | Do not disturb feature for audio notifications |
EP3618464A1 (en) * | 2018-08-30 | 2020-03-04 | Nokia Technologies Oy | Reproduction of parametric spatial audio using a soundbar |
US10878811B2 (en) | 2018-09-14 | 2020-12-29 | Sonos, Inc. | Networked devices, systems, and methods for intelligently deactivating wake-word engines |
US10587430B1 (en) * | 2018-09-14 | 2020-03-10 | Sonos, Inc. | Networked devices, systems, and methods for associating playback devices based on sound codes |
US11024331B2 (en) | 2018-09-21 | 2021-06-01 | Sonos, Inc. | Voice detection optimization using sound metadata |
US10811015B2 (en) | 2018-09-25 | 2020-10-20 | Sonos, Inc. | Voice detection optimization based on selected voice assistant service |
US11100923B2 (en) | 2018-09-28 | 2021-08-24 | Sonos, Inc. | Systems and methods for selective wake word detection using neural network models |
US10692518B2 (en) | 2018-09-29 | 2020-06-23 | Sonos, Inc. | Linear filtering for noise-suppressed speech detection via multiple network microphone devices |
US11899519B2 (en) | 2018-10-23 | 2024-02-13 | Sonos, Inc. | Multiple stage network microphone device with reduced power consumption and processing load |
EP3654249A1 (en) | 2018-11-15 | 2020-05-20 | Snips | Dilated convolutions and gating for efficient keyword spotting |
US11183183B2 (en) | 2018-12-07 | 2021-11-23 | Sonos, Inc. | Systems and methods of operating media playback systems having multiple voice assistant services |
US11132989B2 (en) | 2018-12-13 | 2021-09-28 | Sonos, Inc. | Networked microphone devices, systems, and methods of localized arbitration |
US10602268B1 (en) | 2018-12-20 | 2020-03-24 | Sonos, Inc. | Optimization of network microphone devices using noise classification |
US11315556B2 (en) | 2019-02-08 | 2022-04-26 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing by transmitting sound data associated with a wake word to an appropriate device for identification |
US10867604B2 (en) | 2019-02-08 | 2020-12-15 | Sonos, Inc. | Devices, systems, and methods for distributed voice processing |
WO2020178256A1 (en) | 2019-03-04 | 2020-09-10 | A-Volute | Apparatus and method for audio analysis |
US11120794B2 (en) | 2019-05-03 | 2021-09-14 | Sonos, Inc. | Voice assistant persistence across multiple network microphone devices |
US11361756B2 (en) | 2019-06-12 | 2022-06-14 | Sonos, Inc. | Conditional wake word eventing based on environment |
US10586540B1 (en) | 2019-06-12 | 2020-03-10 | Sonos, Inc. | Network microphone device with command keyword conditioning |
US11200894B2 (en) | 2019-06-12 | 2021-12-14 | Sonos, Inc. | Network microphone device with command keyword eventing |
CN116978387A (en) * | 2019-07-02 | 2023-10-31 | 杜比国际公司 | Method, apparatus and system for representation, encoding and decoding of discrete directional data |
US11138969B2 (en) | 2019-07-31 | 2021-10-05 | Sonos, Inc. | Locally distributed keyword detection |
US10871943B1 (en) | 2019-07-31 | 2020-12-22 | Sonos, Inc. | Noise classification for event detection |
US11138975B2 (en) | 2019-07-31 | 2021-10-05 | Sonos, Inc. | Locally distributed keyword detection |
GB2587196A (en) * | 2019-09-13 | 2021-03-24 | Nokia Technologies Oy | Determination of spatial audio parameter encoding and associated decoding |
US11189286B2 (en) | 2019-10-22 | 2021-11-30 | Sonos, Inc. | VAS toggle based on device orientation |
US12010493B1 (en) * | 2019-11-13 | 2024-06-11 | EmbodyVR, Inc. | Visualizing spatial audio |
US11291911B2 (en) | 2019-11-15 | 2022-04-05 | Microsoft Technology Licensing, Llc | Visualization of sound data extending functionality of applications/services including gaming applications/services |
US11200900B2 (en) | 2019-12-20 | 2021-12-14 | Sonos, Inc. | Offline voice control |
US11562740B2 (en) | 2020-01-07 | 2023-01-24 | Sonos, Inc. | Voice verification for media playback |
US11556307B2 (en) | 2020-01-31 | 2023-01-17 | Sonos, Inc. | Local voice data processing |
US11308958B2 (en) | 2020-02-07 | 2022-04-19 | Sonos, Inc. | Localized wakeword verification |
CN111372167B (en) * | 2020-02-24 | 2021-10-26 | Oppo广东移动通信有限公司 | Sound effect optimization method and device, electronic equipment and storage medium |
US11308962B2 (en) | 2020-05-20 | 2022-04-19 | Sonos, Inc. | Input detection windowing |
US11482224B2 (en) | 2020-05-20 | 2022-10-25 | Sonos, Inc. | Command keywords with input detection windowing |
US11727919B2 (en) | 2020-05-20 | 2023-08-15 | Sonos, Inc. | Memory allocation for keyword spotting engines |
US11698771B2 (en) | 2020-08-25 | 2023-07-11 | Sonos, Inc. | Vocal guidance engines for playback devices |
US11984123B2 (en) | 2020-11-12 | 2024-05-14 | Sonos, Inc. | Network device interaction by range |
US11551700B2 (en) | 2021-01-25 | 2023-01-10 | Sonos, Inc. | Systems and methods for power-efficient keyword detection |
Family Cites Families (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9417185D0 (en) * | 1994-08-25 | 1994-10-12 | Adaptive Audio Ltd | Sounds recording and reproduction systems |
EP1224037B1 (en) | 1999-09-29 | 2007-10-31 | 1... Limited | Method and apparatus to direct sound using an array of output transducers |
JP2004144912A (en) * | 2002-10-23 | 2004-05-20 | Matsushita Electric Ind Co Ltd | Audio information conversion method, audio information conversion program, and audio information conversion device |
FI118247B (en) * | 2003-02-26 | 2007-08-31 | Fraunhofer Ges Forschung | Method for creating a natural or modified space impression in multi-channel listening |
US7643090B2 (en) | 2003-12-30 | 2010-01-05 | The Nielsen Company (Us), Llc. | Methods and apparatus to distinguish a signal originating from a local device from a broadcast signal |
WO2006006809A1 (en) * | 2004-07-09 | 2006-01-19 | Electronics And Telecommunications Research Institute | Method and apparatus for encoding and cecoding multi-channel audio signal using virtual source location information |
EP1761110A1 (en) | 2005-09-02 | 2007-03-07 | Ecole Polytechnique Fédérale de Lausanne | Method to generate multi-channel audio signals from stereo signals |
KR100644715B1 (en) * | 2005-12-19 | 2006-11-10 | 삼성전자주식회사 | Method and apparatus for active audio matrix decoding |
US8560303B2 (en) | 2006-02-03 | 2013-10-15 | Electronics And Telecommunications Research Institute | Apparatus and method for visualization of multichannel audio signals |
JP3949701B1 (en) * | 2006-03-27 | 2007-07-25 | 株式会社コナミデジタルエンタテインメント | Voice processing apparatus, voice processing method, and program |
US8374365B2 (en) | 2006-05-17 | 2013-02-12 | Creative Technology Ltd | Spatial audio analysis and synthesis for binaural reproduction and format conversion |
US8379868B2 (en) * | 2006-05-17 | 2013-02-19 | Creative Technology Ltd | Spatial audio coding based on universal spatial cues |
US9014377B2 (en) * | 2006-05-17 | 2015-04-21 | Creative Technology Ltd | Multichannel surround format conversion and generalized upmix |
US8290167B2 (en) * | 2007-03-21 | 2012-10-16 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Method and apparatus for conversion between multi-channel audio formats |
US8908873B2 (en) | 2007-03-21 | 2014-12-09 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Method and apparatus for conversion between multi-channel audio formats |
US8841535B2 (en) | 2008-12-30 | 2014-09-23 | Karen Collins | Method and system for visual representation of sound |
EP2285139B1 (en) | 2009-06-25 | 2018-08-08 | Harpex Ltd. | Device and method for converting spatial audio signal |
US8208002B2 (en) | 2009-08-27 | 2012-06-26 | Polycom, Inc. | Distance learning via instructor immersion into remote classroom |
US8989401B2 (en) * | 2009-11-30 | 2015-03-24 | Nokia Corporation | Audio zooming process within an audio scene |
KR101081752B1 (en) * | 2009-11-30 | 2011-11-09 | 한국과학기술연구원 | Artificial Ear and Method for Detecting the Direction of a Sound Source Using the Same |
ES2656815T3 (en) | 2010-03-29 | 2018-02-28 | Fraunhofer-Gesellschaft Zur Förderung Der Angewandten Forschung | Spatial audio processor and procedure to provide spatial parameters based on an acoustic input signal |
FR2996094B1 (en) * | 2012-09-27 | 2014-10-17 | Sonic Emotion Labs | METHOD AND SYSTEM FOR RECOVERING AN AUDIO SIGNAL |
EP2733965A1 (en) | 2012-11-15 | 2014-05-21 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for generating a plurality of parametric audio streams and apparatus and method for generating a plurality of loudspeaker signals |
US9232337B2 (en) * | 2012-12-20 | 2016-01-05 | A-Volute | Method for visualizing the directional sound activity of a multichannel audio signal |
JP2014219467A (en) * | 2013-05-02 | 2014-11-20 | ソニー株式会社 | Sound signal processing apparatus, sound signal processing method, and program |
US9852737B2 (en) | 2014-05-16 | 2017-12-26 | Qualcomm Incorporated | Coding vectors decomposed from higher-order ambisonics audio signals |
US20150332682A1 (en) | 2014-05-16 | 2015-11-19 | Qualcomm Incorporated | Spatial relation coding for higher order ambisonic coefficients |
CN105392102B (en) * | 2015-11-30 | 2017-07-25 | 武汉大学 | Three-dimensional sound signal generation method and system for aspherical loudspeaker array |
-
2016
- 2016-09-19 EP EP16306190.6A patent/EP3297298B1/en active Active
-
2017
- 2017-09-18 US US15/707,129 patent/US20180084364A1/en not_active Abandoned
- 2017-09-19 WO PCT/EP2017/073565 patent/WO2018050905A1/en active Application Filing
- 2017-09-19 US US15/708,579 patent/US10085108B2/en active Active
- 2017-09-19 US US16/334,333 patent/US10536793B2/en active Active
- 2017-09-19 CN CN201780057585.2A patent/CN110089134B/en active Active
- 2017-09-19 TW TW106132102A patent/TWI770059B/en active
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109036456A (en) * | 2018-09-19 | 2018-12-18 | 电子科技大学 | For stereosonic source component context components extracting method |
CN113965862A (en) * | 2020-07-20 | 2022-01-21 | 西万拓私人有限公司 | Method for operating a hearing system, hearing device |
Also Published As
Publication number | Publication date |
---|---|
CN110089134A (en) | 2019-08-02 |
CN110089134B (en) | 2021-06-22 |
TWI770059B (en) | 2022-07-11 |
US10085108B2 (en) | 2018-09-25 |
TW201820898A (en) | 2018-06-01 |
US20190208349A1 (en) | 2019-07-04 |
US10536793B2 (en) | 2020-01-14 |
US20180084367A1 (en) | 2018-03-22 |
WO2018050905A1 (en) | 2018-03-22 |
EP3297298B1 (en) | 2020-05-06 |
EP3297298A1 (en) | 2018-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10085108B2 (en) | Method for visualizing the directional sound activity of a multichannel audio signal | |
US10645518B2 (en) | Distributed audio capture and mixing | |
EP3320692B1 (en) | Spatial audio processing apparatus | |
CN106416304B (en) | For the spatial impression of the enhancing of home audio | |
US9232337B2 (en) | Method for visualizing the directional sound activity of a multichannel audio signal | |
US9451379B2 (en) | Sound field analysis system | |
US9918174B2 (en) | Wireless exchange of data between devices in live events | |
US8180062B2 (en) | Spatial sound zooming | |
JP2023078432A (en) | Method and apparatus for decoding ambisonics audio soundfield representation for audio playback using 2d setups | |
US7386133B2 (en) | System for determining the position of a sound source | |
US20220141612A1 (en) | Spatial Audio Processing | |
CN102907120A (en) | System and method for sound processing | |
CN109314832A (en) | Acoustic signal processing method and equipment | |
Romblom et al. | Perceptual thresholds for non-ideal diffuse field reverberation | |
US10869151B2 (en) | Speaker system, audio signal rendering apparatus, and program | |
US10854210B2 (en) | Device and method for capturing and processing a three-dimensional acoustic field | |
Guthrie | Stage acoustics for musicians: A multidimensional approach using 3D ambisonic technology | |
US20190349704A1 (en) | Determining sound locations in multi-channel audio | |
JP6161962B2 (en) | Audio signal reproduction apparatus and method | |
EP4226651B1 (en) | A method of outputting sound and a loudspeaker | |
Tom | Automatic mixing systems for multitrack spatialization based on unmasking properties and directivity patterns | |
CN118741407A (en) | Recording and rendering method and device for outdoor environment field space audio |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- INCOMPLETE APPLICATION (PRE-EXAMINATION) |