WO2015086377A1 - Extraction of reverberant sound using microphone arrays - Google Patents
Extraction of reverberant sound using microphone arrays Download PDFInfo
- Publication number
- WO2015086377A1 WO2015086377A1 PCT/EP2014/076252 EP2014076252W WO2015086377A1 WO 2015086377 A1 WO2015086377 A1 WO 2015086377A1 EP 2014076252 W EP2014076252 W EP 2014076252W WO 2015086377 A1 WO2015086377 A1 WO 2015086377A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- microphone
- diffuse sound
- sound
- filter
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/027—Spatial or constructional arrangements of microphones, e.g. in dummy heads
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02082—Noise filtering the noise being echo, reverberation of the speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10592—Audio or video recording specifically adapted for recording or reproducing multichannel signals
- G11B2020/10601—Audio or video recording specifically adapted for recording or reproducing multichannel signals surround sound signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- the following invention is in the field of acoustic analysis, spatial sound recording, microphone array signal processing, and spatial filtering.
- Some embodiments of the present invention relate to a method that can be used to determine the filter coefficients of a diffuse sound filter, i.e., a filter for extracting diffuse sound (reverberant sound) from the recordings with a microphone array.
- Some embodiments relate to a corresponding computer program.
- Some embodiments relate to an apparatus that can be used to determine the filter coefficients of a diffuse sound filter.
- Sound acquisition with microphone arrays in reverberant environments typically aims at capturing the direct sound of the sound sources while attenuating noise and reverberation. For many applications it would be beneficial if we were able to extract also the reverberant sound while suppressing the direct sound and noise. For instance in spatial sound reproduction [Pulkki2007,Thiergart2013, Kowalczyk2013], the reverberation present at the recording side needs to be reproduced at the reproduction side to recreate the desired spatial impression. Moreover, given an estimate of the reverberant sound, we can compute parameters such as the signal-to-reverberation ratio or reverberant sound power, which represent crucial information for various other applications.
- the diffuse sound filter has a directional response that is highly omnidirectional, with the exception of directions of arrival of direct sound components. A highly omnidirectional directional response is desired since the diffuse sound arrives from all directions at the microphone array.
- a method comprises defining a linear constraint for filter coefficients of a diffuse sound filter.
- the linear constraint is based on a spatial coherence between a first diffuse sound portion in a first microphone signal and a second diffuse sound portion in a second microphone signal
- the first microphone signal is captured by a first microphone and the second microphone signal is captured by a second microphone spaced apart from the first microphone in a known manner.
- the method also comprises calculating at least one of a direction of arrival of at least one direct sound, signal statistics over the first and second microphone signals, and noise statistics over the first and second microphone signals.
- the method further comprises determining the filter coefficients of the diffuse sound filter by solving an optimization problem concerning at least one of the direction of arrival of the at least one direct sound, the signal statistics, and the noise statistics while considering the linear constraint for the filter coefficients.
- Embodiments provide a computer program for implementing the above-described method when being executed on a computer or signal processor is provided.
- the apparatus further comprises a filter coefficients calculator configured to determine the filter coefficients of the diffuse sound filter by solving an optimization problem concerning at least one of the direction of arrival of the at least one direct sound, the signal statistics, and the noise statistics while considering the linear constraint for the filter coefficients.
- Embodiments are based on the insight that a diffuse sound filter may be determined while taking into account at least one linear constraint that is related to the diffuse sound portions of the microphone signals.
- Fig. 1 shows a schematic block diagram of an approach for extracting diffuse sound with a single- channel filter
- Fig. 2 shows a schematic block diagram of an approach for extracting diffuse sound with a multichannel filter
- Fig. 3 shows a schematic block diagram of the proposed invention according to a first example for implementation
- Fig. 4 shows a schematic block diagram of the proposed invention according to a second example for implementation
- Fig. 5 shows a schematic block diagram of the proposed invention according to a third example for implementation
- Fig. 6 shows an example of a pick-up pattern resulting from the approach for extracting diffuse sound with a filter according to Fig. 2;
- Fig. 7 shows an example of a pick-up pattern resulting from the approach for extracting diffuse sound with a filter according to Fig. 4.
- Fig. 8 schematically il lustrates a microphone array and different sound signals that are acquired by the microphone array.
- Direct sound sounds that arrive from mainly a specific prominent direction at the microphones.
- the direct sound can represent for instance the sound travelling directly from the sound source to the microphone or a distinct room reflection.
- Direct sounds can be for instance plane waves or spherical waves with a specific direction of arrival. When the direction of arrival of a direct sound is known, one can compute the relative transfer function of the direct sound between the microphones given that the microphone geometry is known.
- Diffuse sound sound that arrives at the microphones from all directions.
- the diffuse sound can represent for instance the later reverberation present in a room.
- no prominent direction-of- arrival can be associated with a diffuse sound (isotropic sound field), i .e., the sound is arriving with equal mean power from all directions.
- the relative transfer functions of the diffuse sound between the microphones must be assumed random and unobservable.
- the mean relative transfer functions of the diffuse sound between the microphones are usually known for specific microphone setups and diffuse field assumptions or can be measured.
- M denotes the number of microphones used.
- M denotes the number of microphones used.
- the microphones capture L plane waves (referred to as direct sound) propagating in a diffuse field.
- the DOA of the /-th plane wave is represented by the unit-norm vector n t (k, n) .
- the signal of the m-th (omnidirectional) microphone can be written as
- X,(k,n) is the sound pressure of the /-th plane wave
- X d (k, n, d m ) is the diffuse sound
- X n (k,n,d m ) is a stationary noise (e.g., self-noise or background noise)
- dCS is a vector describing the microphone position (of the m-th microphone) in a given coordinate system.
- the aim of this invention is to estimate ⁇ ⁇ (k,n,d m ) at position d m .
- An estimate of the diffuse sound is found by multiplying one of the microphone signals, for example the microphone signal of the first microphone , (k, ri) , with a filter H(k, ri) , e.g.,
- the filter H(k,ri) is a Wiener filter, which is given by
- ⁇ ⁇ is the power of the diffuse sound and ⁇ is the power of the plane waves and the stationary noise.
- the square-root Wiener filter i.e.. the square-root of H
- H the square-root Wiener filter
- SDR signal-to-diffuse ratio
- H(k,n) can be found by estimating the so- called diffuseness, as described in [Pulkki2007,Thiergart20l 3].
- Estimating the SDR or diffuseness typically requires more than one microphone. Nevertheless, the diffuse sound is finally obtained by filtering a single microphone signal.
- An example system for extracting the diffuse sound with a single-channel filter is illustrated in Fig. 1.
- the SDR (or alternatively the diffuseness) is estimated from the multiple microphone signals.
- the filter 7 (k, n) is computed from this information.
- the filter H(k, n) is multiplied with a single microphone signal to obtain the diffuse sound estimate.
- Multi-channel filters consider M > 1 microphones. Such filters have been used for instance in [Thiergart2013b, owalczyk2013]. For the following derivations, let us represent the M microphone signals by a vector n) - [ , ⁇ k, n), X 2 (k, n), X M (k, n) .
- the diffuse sound at the m-th microphone is estimated via a linear combination of the microphone signals, i.e.,
- the straight-forward way to find an appropriate filter is to compute the weights w OT such that the L plane waves are suppressed while the stationary noise X n (k, , d m ) , which is contained in the microphone signals, is minimized.
- a is the so-called propagation vector. Its elements are the relative transfer function of the /-th plane wave from the m-th microphone to the other microphones.
- & is a column vector with length M (remember: only the diffuse sound at the m-th microphone is estimated by the wv-weighted linear combination of the M microphone signals; the diffuse sound at the other microphones is substantially redundant, as these signals are related via relative transfer functions from the m-th microphone to the other microphones and could be calculated in this manner, if needed).
- the DOA n 0 to which the vector a 0 corresponds, is found by choosing the direction which has the largest angular distance to all DOAs n, (k, n) of the plane waves. For instance if a single plane wave is arriving from 0 degree, then ⁇ ) would correspond to 1 80 degree.
- the DOA n 0 does not guarantee that we obtain a diffuse sound estimate with as little noise as possible.
- the resulting pick-up pattern is not very optimal for capturing diffuse sound, since it becomes highly directive at higher frequencies. This is a drawback when aiming at capturing diffuse sound from all directions.
- FIG. 6 An example of a resulting pick-up pattern is depicted in Fig. 6.
- two direct sounds arrive from an azimuth direction 51 0 and 97°.
- the figure shows the resulting pick-up pattern at a frequency of 2.8kHz when using an uniform linear array with 16 microphones with 5cm microphone spacing.
- the pick-up pattern possess nulls exactly for 51° and 97° and a high gain for 180°, which corresponds to the direction n 0 .
- the pick-up pattern has multiple other spatial nulls or low gains for almost all other directions. This pick-up pattern is not suitable for capturing diffuse sound that arrives from all directions.
- the direct sound constraints a are directly related to the DOAs of the direct sounds.
- FIG. 7 A desired pick-up pattern, which cannot be achieved with the spatial filter in this subsection, is depicted in Fig. 7.
- This pick-up pattern has two spatial nulls for the DOAs of the direct sounds, but otherwise it is almost omnidirectional.
- This pick-up pattern is achieved by using the proposed filter which is described below in connection with Fig. 7.
- the filter weights can be computed. Applying these weights to the microphone signals yields the desired estimate of the diffuse sound. It is clear from this description that the obtained filter does only depend on the direct sound (i.e., on the DOAs and the corresponding relative transfer functions of the plane waves between the microphones, respectively), but not on the diffuse sound. This means that the filter does not consider potentially available information on the diffuse sound, even though it is used to estimate the diffuse sound.
- the proposed spatial filter is characterized by a directivity pattern, which tends to an omnidirectional pattern, except for the directions-of-arrival (DOAs) of the direct sound for which it exhibits spatial nulls. This represents a highly desired property for capturing diffuse sound from all directions with low distortion.
- DOEs directions-of-arrival
- the diffuse sound pressure at the m-th microphone is estimated by performing a linear combination of the microphone signals, i.e.,
- X d (k, n, d m ) ⁇ H m (k, n) ⁇ k, n) .
- the weight-vector w m which is proposed in the following, minimizes a specific cost function and is linearly constrained similarly to the multi-channel filters described above.
- the proposed novel constraint is not a function of the DOAs of the plane waves or the corresponding relative transfer functions of the plane waves between the microphones, respectively.
- the proposed novel constraint depends on statistical information on the diffuse sound, i.e., the proposed novel constraint depends on the relative transfer functions of the diffuse sound between the microphones.
- the proposed novel constraint is a function of the coherence or correlation of the diffuse sound between the microphones. This coherence corresponds to the mean relative transfer function of the diffuse sound between the microphones.
- the proposed spatial filter is obtained by minimizing a specific cost function while satisfying a distortionless constraint for the diffuse sound. This constraint corresponds to the relative transfer function of the diffuse sound between the microphones.
- the filter is computed as
- J is the cost function to be minimized by the filter.
- the cost function can be for instance the stationary noise power at the filter output, the interfering energy at the filter output, or the quadratic error of the estimated diffuse sound. Examples for J will be provided in the embodiments.
- the m'-tin element B is the relative transfer function of the diffuse sound between microphone m. and rri. This relative transfer function is given by
- the relative transfer functions in b are typically cannot be estimated in practice since it is basically random, i.e., we have a different realization of the transfer function for each k and «.
- B m , m is computed as the mean relative transfer function between microphone m and rri, i.e.,
- This mean relative transfer function y m , m corresponds to the so-called spatial coherence of the diffuse sound between microphone m and rri, which is defined as where ( ⁇ ) denotes complex conjugate.
- This spatial coherence describes the correlation of the diffuse sound between microphone m and m' in the frequency domain. This coherence depends on the specific diffuse sound field. The coherence can be measured in advance for a given room. Alternatively, the coherence is known from theory for specific diffuse sound fields [Elko2001].
- the diffuse sound constraint b,de is conceptually very different from the direct sound constraints a , and a 0 . Therefore, the novel filter proposed in this section is conceptually very different compared to the multi-channel filters described above.
- FIG. 3 A block scheme of the proposed invention is depicted in Fig. 3.
- the M microphone signals are transformed into the time-frequency domain (or another suitable domain for the signal processing) using a filterbank (FB) (1 01 ).
- FB filterbank
- the diffuse sound constraint vector is either estimated from the signal, or it corresponds for instance to the theoretical spatial coherence for a specific assumed diffuse field as mentioned before.
- specific statistics e.g., noise statistics
- This information which is usually represented as a PSD matrix ⁇ 3?(k, n) , is used to generate the cost function J which has to be minimized by the filter.
- the filter-weights that minimize the cost function subject to the diffuse sound constraint are computed in block ( 1 03).
- a spatial filter that minimizes the entire output of the filter subject to the diffuse sound constraint.
- the diffuse sound constraint ensures that the diffuse sound is preserved by the spatial filter while the remaining signal parts (undesired stationary noise and plane waves) are minimized.
- the filter weights ⁇ v m are computed as
- ⁇ is the PSD matrix of the microphone signals, which can be computed as
- ⁇ P x (k, n) E ⁇ x(k, n) x H ( ⁇ ) ⁇ , where x(k, n) is the vector containing the microphone signals.
- the expectation is approximated for instance by a temporal averaging.
- the spatial coherence may be either estimated from the microphone signals (during periods where only the diffuse sound is present) using or given as a priori information assuming a specific diffuse sound field. In the latter case, we use for instance the spatial coherence for a spherically isotropic diffuse sound field, i.e.,
- sine function might be replaced by other functions depending on the assumed sound field.
- coherence functions that are known a priori. Examples can be found in [Elko2001 ] ,
- FIG. 3 A block scheme of this embodiment is shown in Fig. 3. After transforming the microphone signals with a filterbank ( 1 01 ), we compute the signal PSD matrix ⁇ in the signal statistics estimation block
- the filter computed in this embodiment has the following advantages compared to other spatial filter (e.g., the filters described in the background art):
- the filter provides an optimal trade-off between attenuation of the L plane waves and the stationary noise.
- This embodiment represents a combination of the novel approach and the state-of-the-art approach of multi-channel filters described above in connection with Fig. 2. Jn this embodiment, we define a l inearly constrained spatial filter that minimizes the stationary noise at t e filter output subject to the diffuse constraint and additional directional constraints.
- the filter minimizes only the stationary noise at the output.
- the undesired plane waves are suppressed with the second linear constraints (as explained above for the multi-channel filters, Fig.2).
- These additional constrains compared to the output power minimizing filter according to Fig. 3 ensure an even stronger suppression of the interfering plane waves.
- the resulting filter still preserves the diffuse sound due to the first linear constraint.
- the vectors a depend on the DOAs of the L plane waves and can be computed as well-known from literature [VanTrees2002].
- the elements of b su describe the correlation or coherence of the diffuse sound between the microphones.
- the elements of b m are computed as explained in connection with Fig. 3.
- ⁇ ⁇ is the PSD matrix of the stationary noise. This PSD matrix can be estimated e.g. during speech pauses. If the stationary noise in the different microphones is mutually independent, we can simply replace ⁇ ⁇ by the identity matrix of size by M.
- FIG. 4 A block scheme of this embodiment is shown in Fig. 4. After transforming the microphone signals with a filterbank (101 ), we compute the PSD matrix ⁇ ⁇ of the stationary noise in the noise statistics estimation block (104). Moreover, we compute the linear diffuse sound constraint b m in block (102) either from the signal or using a priori information assuming a specific diffuse sound field.
- the linear diffuse sound constraint b m in block (102) either from the signal or using a priori information assuming a specific diffuse sound field.
- Fig. 7 An exam le of a resulting pick-up pattern for this filter is depicted in Fig. 7.
- two direct sounds arrive from an azimuth direction 51 ° and 97°.
- the figure shows the resulting pick-up pattern at a frequency of 2.8kHz when using an uniform linear array with 16 microphones with 5cm microphone spacing.
- the pick-up pattern possesses nulls exactly for 51 ° and 97°. Otherwise it is almost omnidirectional.
- This is a major advantage over the state-of-the-art spatial filter shown in Fig. 2 which yields the spatial filter in Fig. 6.
- the filter computed in this embodiment has the following advantages compared to other spatial filter (e.g., the fi lters described in the background art):
- the spatial filters shown in Figs. 3 and 4 in general provide a good performance in practice. However, they also suffer from specific drawbacks. For instance the filter of Fig. 3 typically does not suppress the direct sound completely. The remaining power of the direct sound can lead to undesired effects during spatial sound reproduction. In contrast, the spatial filter of Fig. 4 provides a comparatively poor robustness against the stationary noise at low frequencies.
- PMWF parametric multi-channel Wiener filter
- MMSE minimum mean square error
- Fig. 4 the spatial filter in Fig. 4 is used to estimate specific required quantities.
- a e [0,l ] is a user-defined control parameter.
- a e [0,l ] is a user-defined control parameter.
- a scaling between the two spatial filters we have a scaling between the two spatial filters. A smaller yields a better noise and interference suppression, while a higher a yields a diffuse sound estimate with less distortion.
- the power of the diffuse sound ⁇ ⁇ can be computed with the spatial filter proposed in Fig. 4 which provides a very good suppression of the plane waves.
- w denote the spatial filter in Fig. 4 for estimating the diffuse sound at the first microphone.
- the diffuse sound power at the first microphone can for instance be obtained as described in [Thiergart2013b], i.e., ⁇ (k K)
- T d is the M x ' M spatial coherence matrix for the diffuse sound.
- the (m, ') -th element of F (l is the spatial coherence v m , between microphone m and >ri . This spatial coherence ⁇ ⁇ m was already defined above.
- a block scheme of this embodiment is shown in Fig. 5.
- the linear diffuse sound constraint b administrat in block (102) either from the signal or using a priori information assuming a specific diffuse sound field.
- block ( 1 05) we estimate the DOAs of the L plane waves. From this information, we compute the direct sound constraints a . in block (106). These constraints are used in (107) together with ⁇ ⁇ to compute the weights , .
- the power of the diffuse sound ⁇ ⁇ is computed in ( 108) from w, and ⁇ ⁇ .
- the final weights of the spatial filter w m can then be computed in (103) using ⁇ ⁇ , ⁇ , , and hauch. .
- the parameter one can scale the spatial filter between the MMSE filter and PMWF. Multiplying the weights w m with the microphone signals yields the desired estimate of the diffuse sound.
- Fig. 8 schematically illustrates a microphone array comprising M microphones.
- the microphone array is exposed to a sound field comprising direct sound portions and diffuse sound portions.
- the propagation direction of each plane wave (at the location of the microphone array) is indicated in Fig. 8 by the vectors n t to n ,
- the direct sound portions are typically a function of the location d,,,.
- the diffuse sound X d (A;, n, d, mod) can model for instance the reverberation in a room.
- the diffuse sound is assumed to be generated by an infinite sum of plane waves with random phases, random magnitude, and random DOAs. This means, the diffuse sound is generated by an infinite number of sources randomly distributed around the sound scene. These sound sources model the infinite number of room reflections which generate the late reverberation.
- Relative transfer functions B 1 >m , B 2 m , B m - .m , B M m for the diffuse sound between the other microphones to the m-th microphone are schematically illustrated in Fig. 8.
- the relative transfer function B m m from the m-th microphone to itself (not depicted in Fig. 8) is typically equal to 1.
- aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, where a block or device corresponds to a method step or a feature of a method step. Analogously, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus. Depend ing on certain implementation requirements, embodiments of the invention can be implemented in hardware or in software.
- the implementation can be performed using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
- a digital storage medium for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored thereon, which cooperate (or are capable of cooperating) with a programmable computer system such that the respective method is performed.
- Some embodiments according to the invention comprise a non-transitory data carrier having electronically readable control signals, which are capable of cooperating with a programmable computer system, such that one of the methods described herein is per-formed.
- embodiments of the present invention can be implemented as a computer pro-gram product with a program code, the program code being operative for performing one of the methods when the computer program product runs on a computer.
- the program code may for example be stored on a machine readable carrier.
- inventions comprise the computer program for performing one of the methods described herein, stored on a machine readable carrier.
- an embodiment of the inventive method is, therefore, a computer program having a program code for performing one of the methods described herein, when the computer program runs on a computer.
- a further embodiment of the inventive methods is, therefore, a data carrier (or a digital storage medium, or a computer-readable medium) comprising, recorded thereon, the computer program for performing one of the methods described herein.
- a further embodiment of the inventive method is, therefore, a data stream or a sequence of signals representing the computer program for performing one of the methods de-scribed herein.
- the data stream or the sequence of signals may for example be configured to be transferred via a data communication connection, for example via the Internet.
- a further embodiment comprises a processing means, for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods de-scribed herein.
- a processing means for example a computer, or a programmable logic device, configured to or adapted to perform one of the methods de-scribed herein.
- a further embodiment comprises a computer having installed thereon the computer pro- gram for performing one of the methods described herein.
- a programmable logic device for example a field programmable gate array
- a field programmable gate array may cooperate with a microprocessor in order to perform one of the methods described herein.
- the methods are preferably performed by any hardware apparatus.
- Pulkki2007 V . Pulkki, "Spatial sound reproduction with directional audio coding," .
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Otolaryngology (AREA)
- General Health & Medical Sciences (AREA)
- Circuit For Audible Band Transducer (AREA)
- Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
Priority Applications (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP14805624.5A EP3080806B1 (en) | 2013-12-11 | 2014-12-02 | Extraction of reverberant sound using microphone arrays |
| CN201480066907.6A CN105981404B (zh) | 2013-12-11 | 2014-12-02 | 使用麦克风阵列的混响声的提取 |
| BR112016013366-8A BR112016013366B1 (pt) | 2013-12-11 | 2014-12-02 | Extração de som reverberante utilizando redes de microfones |
| JP2016534922A JP6389259B2 (ja) | 2013-12-11 | 2014-12-02 | マイクロホンアレイを使用した残響音の抽出 |
| RU2016127191A RU2640742C1 (ru) | 2013-12-11 | 2014-12-02 | Извлечение реверберирующего звука с использованием микрофонных массивов |
| US15/178,530 US9984702B2 (en) | 2013-12-11 | 2016-06-09 | Extraction of reverberant sound using microphone arrays |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP13196672.3 | 2013-12-11 | ||
| EP13196672 | 2013-12-11 | ||
| EP14156014.4 | 2014-02-20 | ||
| EP14156014.4A EP2884491A1 (en) | 2013-12-11 | 2014-02-20 | Extraction of reverberant sound using microphone arrays |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/178,530 Continuation US9984702B2 (en) | 2013-12-11 | 2016-06-09 | Extraction of reverberant sound using microphone arrays |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2015086377A1 true WO2015086377A1 (en) | 2015-06-18 |
Family
ID=50230835
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2014/076252 Ceased WO2015086377A1 (en) | 2013-12-11 | 2014-12-02 | Extraction of reverberant sound using microphone arrays |
Country Status (7)
| Country | Link |
|---|---|
| US (1) | US9984702B2 (enExample) |
| EP (2) | EP2884491A1 (enExample) |
| JP (1) | JP6389259B2 (enExample) |
| CN (1) | CN105981404B (enExample) |
| BR (1) | BR112016013366B1 (enExample) |
| RU (1) | RU2640742C1 (enExample) |
| WO (1) | WO2015086377A1 (enExample) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105869651A (zh) * | 2016-03-23 | 2016-08-17 | 北京大学深圳研究生院 | 基于噪声混合相干性的双通道波束形成语音增强方法 |
| JPWO2015129760A1 (ja) * | 2014-02-28 | 2017-03-30 | 日本電信電話株式会社 | 信号処理装置、方法及びプログラム |
| US10524072B2 (en) | 2016-03-15 | 2019-12-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method or computer program for generating a sound field description |
| CN111149155A (zh) * | 2017-07-14 | 2020-05-12 | 弗劳恩霍夫应用研究促进协会 | 使用多点声场描述生成经增强的声场描述或经修改的声场描述的概念 |
| US10923132B2 (en) | 2016-02-19 | 2021-02-16 | Dolby Laboratories Licensing Corporation | Diffusivity based sound processing method and apparatus |
| CN113257270A (zh) * | 2021-05-10 | 2021-08-13 | 中国科学技术大学 | 一种基于参考麦克风优化的多通道语音增强方法 |
Families Citing this family (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2521649B (en) * | 2013-12-27 | 2018-12-12 | Nokia Technologies Oy | Method, apparatus, computer program code and storage medium for processing audio signals |
| WO2018053050A1 (en) * | 2016-09-13 | 2018-03-22 | VisiSonics Corporation | Audio signal processor and generator |
| US10187740B2 (en) * | 2016-09-23 | 2019-01-22 | Apple Inc. | Producing headphone driver signals in a digital audio signal processing binaural rendering environment |
| US9813833B1 (en) * | 2016-10-14 | 2017-11-07 | Nokia Technologies Oy | Method and apparatus for output signal equalization between microphones |
| US11528556B2 (en) | 2016-10-14 | 2022-12-13 | Nokia Technologies Oy | Method and apparatus for output signal equalization between microphones |
| US10056091B2 (en) * | 2017-01-06 | 2018-08-21 | Bose Corporation | Microphone array beamforming |
| US10219098B2 (en) * | 2017-03-03 | 2019-02-26 | GM Global Technology Operations LLC | Location estimation of active speaker |
| CN106960672B (zh) * | 2017-03-30 | 2020-08-21 | 国家计算机网络与信息安全管理中心 | 一种立体声音频的带宽扩展方法与装置 |
| GB2562518A (en) | 2017-05-18 | 2018-11-21 | Nokia Technologies Oy | Spatial audio processing |
| EP3573058B1 (en) * | 2018-05-23 | 2021-02-24 | Harman Becker Automotive Systems GmbH | Dry sound and ambient sound separation |
| JP7173355B2 (ja) * | 2019-08-08 | 2022-11-16 | 日本電信電話株式会社 | Psd最適化装置、psd最適化方法、プログラム |
| JP7173356B2 (ja) * | 2019-08-08 | 2022-11-16 | 日本電信電話株式会社 | Psd最適化装置、psd最適化方法、プログラム |
| WO2021252912A1 (en) | 2020-06-11 | 2021-12-16 | Dolby Laboratories Licensing Corporation | Separation of panned sources from generalized stereo backgrounds using minimal training |
| CN113963712B (zh) * | 2020-07-21 | 2025-07-25 | 华为技术有限公司 | 滤除回声的方法、电子设备和计算机可读存储介质 |
| CN112017684B (zh) * | 2020-08-27 | 2022-06-24 | 北京计算机技术及应用研究所 | 一种基于麦克风阵列的密闭空间混响消除方法 |
| CN115862665B (zh) * | 2023-02-27 | 2023-06-16 | 广州市迪声音响有限公司 | 一种回声混响效果参数的可视化曲线界面系统 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080069366A1 (en) * | 2006-09-20 | 2008-03-20 | Gilbert Arthur Joseph Soulodre | Method and apparatus for extracting and changing the reveberant content of an input signal |
| US20090252355A1 (en) * | 2008-04-07 | 2009-10-08 | Sony Computer Entertainment Inc. | Targeted sound detection and generation for audio headset |
| US20090299742A1 (en) * | 2008-05-29 | 2009-12-03 | Qualcomm Incorporated | Systems, methods, apparatus, and computer program products for spectral contrast enhancement |
| US20100169103A1 (en) * | 2007-03-21 | 2010-07-01 | Ville Pulkki | Method and apparatus for enhancement of audio reconstruction |
| US20120128160A1 (en) * | 2010-10-25 | 2012-05-24 | Qualcomm Incorporated | Three-dimensional sound capturing and reproducing with multi-microphones |
Family Cites Families (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100493172B1 (ko) * | 2003-03-06 | 2005-06-02 | 삼성전자주식회사 | 마이크로폰 어레이 구조, 이를 이용한 일정한 지향성을갖는 빔 형성방법 및 장치와 음원방향 추정방법 및 장치 |
| GB0321722D0 (en) * | 2003-09-16 | 2003-10-15 | Mitel Networks Corp | A method for optimal microphone array design under uniform acoustic coupling constraints |
| GB0405455D0 (en) * | 2004-03-11 | 2004-04-21 | Mitel Networks Corp | High precision beamsteerer based on fixed beamforming approach beampatterns |
| JP4177413B2 (ja) * | 2004-07-20 | 2008-11-05 | パイオニア株式会社 | 音響再生装置および音響再生システム |
| RU2343562C1 (ru) * | 2007-04-23 | 2009-01-10 | Федеральное государственное образовательное учреждение высшего профессионального образования "Санкт-Петербургский государственный университет кино и телевидения" "СПбГУКиТ" | Способ и электронное устройство оптимизации времени реверберации при передаче звуковых сигналов |
| WO2011104146A1 (en) * | 2010-02-24 | 2011-09-01 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus for generating an enhanced downmix signal, method for generating an enhanced downmix signal and computer program |
| ES2656815T3 (es) * | 2010-03-29 | 2018-02-28 | Fraunhofer-Gesellschaft Zur Förderung Der Angewandten Forschung | Procesador de audio espacial y procedimiento para proporcionar parámetros espaciales en base a una señal de entrada acústica |
| WO2011129725A1 (en) * | 2010-04-12 | 2011-10-20 | Telefonaktiebolaget L M Ericsson (Publ) | Method and arrangement for noise cancellation in a speech encoder |
| US9100734B2 (en) * | 2010-10-22 | 2015-08-04 | Qualcomm Incorporated | Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation |
| EP2444967A1 (en) * | 2010-10-25 | 2012-04-25 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | Echo suppression comprising modeling of late reverberation components |
| EP2708043B1 (en) * | 2011-05-11 | 2020-06-03 | Sennheiser Electronic GmbH & Co. KG | Method for efficient sound field control of a compact loudspeaker array |
| JP5817366B2 (ja) * | 2011-09-12 | 2015-11-18 | 沖電気工業株式会社 | 音声信号処理装置、方法及びプログラム |
| JP5897343B2 (ja) | 2012-02-17 | 2016-03-30 | 株式会社日立製作所 | 残響除去パラメータ推定装置及び方法、残響・エコー除去パラメータ推定装置、残響除去装置、残響・エコー除去装置、並びに、残響除去装置オンライン会議システム |
| JP5738218B2 (ja) * | 2012-02-28 | 2015-06-17 | 日本電信電話株式会社 | 音響信号強調装置、遠近判定装置、それらの方法、及びプログラム |
| US9495591B2 (en) * | 2012-04-13 | 2016-11-15 | Qualcomm Incorporated | Object recognition using multi-modal matching scheme |
-
2014
- 2014-02-20 EP EP14156014.4A patent/EP2884491A1/en not_active Withdrawn
- 2014-12-02 WO PCT/EP2014/076252 patent/WO2015086377A1/en not_active Ceased
- 2014-12-02 EP EP14805624.5A patent/EP3080806B1/en active Active
- 2014-12-02 JP JP2016534922A patent/JP6389259B2/ja active Active
- 2014-12-02 RU RU2016127191A patent/RU2640742C1/ru active
- 2014-12-02 BR BR112016013366-8A patent/BR112016013366B1/pt active IP Right Grant
- 2014-12-02 CN CN201480066907.6A patent/CN105981404B/zh active Active
-
2016
- 2016-06-09 US US15/178,530 patent/US9984702B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080069366A1 (en) * | 2006-09-20 | 2008-03-20 | Gilbert Arthur Joseph Soulodre | Method and apparatus for extracting and changing the reveberant content of an input signal |
| US20100169103A1 (en) * | 2007-03-21 | 2010-07-01 | Ville Pulkki | Method and apparatus for enhancement of audio reconstruction |
| US20090252355A1 (en) * | 2008-04-07 | 2009-10-08 | Sony Computer Entertainment Inc. | Targeted sound detection and generation for audio headset |
| US20090299742A1 (en) * | 2008-05-29 | 2009-12-03 | Qualcomm Incorporated | Systems, methods, apparatus, and computer program products for spectral contrast enhancement |
| US20120128160A1 (en) * | 2010-10-25 | 2012-05-24 | Qualcomm Incorporated | Three-dimensional sound capturing and reproducing with multi-microphones |
Non-Patent Citations (8)
| Title |
|---|
| G. W. ELKO: "Microphone Arrays: Signal Processing Techniques and Applications", 2001, SPRINGER, article "Spatial coherence functions for differential micro- phones in isotropic noise fields", pages: 61 - 85 |
| H. L. VAN TREES: "Detection, Estimation, and Modulation Theory: Part IV: Array Processing", vol. 1, April 2002, JOHN WILEY & SONS |
| K. KOWALCZYK; O. THIERGART; A. CRACIUN; E. A. P. HABETS: "Sound acquisition in noisy and reverberant environments using virtual", APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2013 IEEE WORKSHOP, October 2013 (2013-10-01) |
| KOWALCZYK KONRAD ET AL: "Sound acquisition in noisy and reverberant environments using virtual microphones", 2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, IEEE, 20 October 2013 (2013-10-20), pages 1 - 4, XP032540793, ISSN: 1931-1168, [retrieved on 20140102], DOI: 10.1109/WASPAA.2013.6701869 * |
| O. THIERGART; E. A. P. HABETS: "An informed LCMV filter based on multiple instantaneous direction-of-arrival estimates", ACOUSTICS SPEECH AND SIGNAL PROCESSING (ICA S ), 2013 IEEE INTERNATIONAL CONFERENCE, 2013, pages 659 - 663, XP032508886, DOI: doi:10.1109/ICASSP.2013.6637730 |
| O. THIERGART; G. D. GALDO; E. A. P. HABETS: "On the spatial coherence in mixed sound fields and its application to signal-to-diffuse ratio estimation", THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, vol. 132, no. 4, 2012, pages 2337 - 2346, XP012163324, DOI: doi:10.1121/1.4750493 |
| O. THIERGART; G. DEL GALDO; M. TASESKA; E. HABETS: "Geometry-based spatial sound acquisition using distributed microphone arrays", AUDIO, SPEECH, AND LANGUAGE PROCESSING, IEEE TRANSACTIONS, vol. 21, no. 12, December 2013 (2013-12-01), pages 2583 - 2594, XP011531023, DOI: doi:10.1109/TASL.2013.2280210 |
| V. PULKKI: "Spatial sound reproduction with directional audio coding", J. AUDIO ENG. SOC, vol. 55, no. 6, June 2007 (2007-06-01), pages 503 - 516 |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2015129760A1 (ja) * | 2014-02-28 | 2017-03-30 | 日本電信電話株式会社 | 信号処理装置、方法及びプログラム |
| US10923132B2 (en) | 2016-02-19 | 2021-02-16 | Dolby Laboratories Licensing Corporation | Diffusivity based sound processing method and apparatus |
| US10524072B2 (en) | 2016-03-15 | 2019-12-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method or computer program for generating a sound field description |
| US10694306B2 (en) | 2016-03-15 | 2020-06-23 | Fraunhofer-Gesellschaft Zur Förderung Der Angenwandten Forschung E.V. | Apparatus, method or computer program for generating a sound field description |
| US11272305B2 (en) | 2016-03-15 | 2022-03-08 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e. V. | Apparatus, method or computer program for generating a sound field description |
| CN105869651A (zh) * | 2016-03-23 | 2016-08-17 | 北京大学深圳研究生院 | 基于噪声混合相干性的双通道波束形成语音增强方法 |
| CN105869651B (zh) * | 2016-03-23 | 2019-05-31 | 北京大学深圳研究生院 | 基于噪声混合相干性的双通道波束形成语音增强方法 |
| CN111149155A (zh) * | 2017-07-14 | 2020-05-12 | 弗劳恩霍夫应用研究促进协会 | 使用多点声场描述生成经增强的声场描述或经修改的声场描述的概念 |
| CN111149155B (zh) * | 2017-07-14 | 2023-10-10 | 弗劳恩霍夫应用研究促进协会 | 使用多点声场描述生成经增强的声场描述的装置及方法 |
| CN113257270A (zh) * | 2021-05-10 | 2021-08-13 | 中国科学技术大学 | 一种基于参考麦克风优化的多通道语音增强方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105981404A (zh) | 2016-09-28 |
| US9984702B2 (en) | 2018-05-29 |
| JP6389259B2 (ja) | 2018-09-12 |
| EP3080806A1 (en) | 2016-10-19 |
| EP2884491A1 (en) | 2015-06-17 |
| EP3080806B1 (en) | 2021-07-28 |
| JP2017503388A (ja) | 2017-01-26 |
| BR112016013366A2 (pt) | 2017-08-08 |
| RU2640742C1 (ru) | 2018-01-11 |
| US20160293179A1 (en) | 2016-10-06 |
| CN105981404B (zh) | 2019-06-04 |
| BR112016013366B1 (pt) | 2021-12-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9984702B2 (en) | Extraction of reverberant sound using microphone arrays | |
| CA2857611C (en) | Apparatus and method for microphone positioning based on a spatial power density | |
| EP2647222B1 (en) | Sound acquisition via the extraction of geometrical information from direction of arrival estimates | |
| EP2647221B1 (en) | Apparatus and method for spatially selective sound acquisition by acoustic triangulation | |
| BR112015014380B1 (pt) | Filtro e método para filtragem espacial informada utilizando múltiplas estimativas da direção de chegada instantânea | |
| CN103181190A (zh) | 用于远场多源追踪和分离的系统、方法、设备和计算机可读媒体 | |
| HK1190490B (en) | Sound acquisition via the extraction of geometrical information from direction of arrival estimates | |
| HK1202746B (en) | Apparatus and method for microphone positioning based on a spatial power density | |
| HK1190260B (en) | Apparatus and method for spatially selective sound acquisition by acoustic triangulation | |
| HK1190260A (en) | Apparatus and method for spatially selective sound acquisition by acoustic triangulation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14805624 Country of ref document: EP Kind code of ref document: A1 |
|
| DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
| REEP | Request for entry into the european phase |
Ref document number: 2014805624 Country of ref document: EP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2014805624 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2016534922 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112016013366 Country of ref document: BR |
|
| ENP | Entry into the national phase |
Ref document number: 2016127191 Country of ref document: RU Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 112016013366 Country of ref document: BR Kind code of ref document: A2 Effective date: 20160609 |