CN104142492A - SRP-PHAT multi-source spatial positioning method - Google Patents

SRP-PHAT multi-source spatial positioning method Download PDF

Info

Publication number
CN104142492A
CN104142492A CN201410366922.4A CN201410366922A CN104142492A CN 104142492 A CN104142492 A CN 104142492A CN 201410366922 A CN201410366922 A CN 201410366922A CN 104142492 A CN104142492 A CN 104142492A
Authority
CN
China
Prior art keywords
omega
tau
source
microphone
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410366922.4A
Other languages
Chinese (zh)
Other versions
CN104142492B (en
Inventor
孙明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN201410366922.4A priority Critical patent/CN104142492B/en
Publication of CN104142492A publication Critical patent/CN104142492A/en
Application granted granted Critical
Publication of CN104142492B publication Critical patent/CN104142492B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders

Abstract

The invention provides a SRP-PHAT multi-source spatial positioning method. The method comprises the steps that the number and spatial positions of all microphones in a uniform circular microphone array are assumed to be unchanged in the data obtaining process at first, the isotropous microphones are evenly distributed on a circumference which has the radius r and is located on an x-y plane, the direction of arrival of a plane wave s is expressed by polar coordinates, the original point of the coordinate system is located on a circle center position of the circular array, multiple sound source signals are divided into non-overlapped time frequency point sets, each time frequency window contains only one movable source signal, and weak W orthogonal separation conditions are met; a Hamming window is selected, a controllable response power function is calculated and a target function is obtained through a SRP-PHAT algorithm, wave beams are controlled to carry out scanning in all the possible receiving directions, and the wave beams output the direction value with the maximum power to obtain the direction of a sound source, so that the DOA estimation of the multiple sound sources has the better separating performance in the strong noise and moderate reverberation acoustic environment, the real peak value is obviously outstanding, and high positioning precision is achieved.

Description

A kind of SRP-PHAT multi-source space-location method
Technical field
The present invention relates to a kind of space-location method, specifically, relate to a kind of SRP-PHAT multi-source space-location method, be applied in the systems such as video conference, voice enhancing, osophone, hands-free phone and intelligent robot.
Background technology
Auditory localization technology is with a wide range of applications in the systems such as video conference, voice enhancing, osophone, hands-free phone and intelligent robot, has received in recent years increasing concern.
Controlled responding power (SRP-PHAT:Steered Response Power-Phase Transform) the auditory localization algorithm of phase tranformation weighting has at present become main flow algorithm, this algorithm combines the advantage of steerable beam formation and GCC-PHAT, has stronger robustness under Low SNR.For simple sund source, be positioned with good performance, but maximum shortcoming is that operand is large, huge operand has limited the application in real-time system.
Many researchers are attempting reducing the calculated amount of the controlled responding power search procedure of its core.As secondary accelerates SRP-PHAT auditory localization algorithm, by vertically arranged array, the search of two-dimensional space is converted into the search of the one-dimensional space, adopts Level Search strategy, by thick, to smart, the one-dimensional space is searched for.And for example improved associating SRP-PHAT voice location algorithm utilizes orthogonal straight lines microphone array that two-dimensional search space is reduced to dimension space one to one, then in the one-dimensional space, carries out respectively hierarchical search strategy, finds SRP maximal value to determine sound source position.
In practice, usually need to estimate the position of multi-acoustical.The separated orthogonality hypothesis of the existing W-based on the sparse property of voice signal does not meet many sound sources, cause the method spatial resolution low, easily be subject to the impact of reverberation, particularly under reverberation and noise circumstance, cannot differentiate two nearer signal sources of leaning in direction.Therefore, many auditory localizations problem has very important theory significance and practical value.
Summary of the invention
The present invention has overcome shortcoming of the prior art, and a kind of SRP-PHAT multi-source space-location method is provided, and can under reverberation and noise circumstance, differentiate a plurality of nearer signal sources of leaning in direction, good positioning effect.
In order to solve the problems of the technologies described above, the present invention is achieved by the following technical solutions:
A SRP-PHAT multi-source space-location method, is characterized in that, comprises the following steps:
1) computer memory coordinate under assumed condition, first number and the locus of supposing whole microphones of Homogeneous Circular microphone array in data acquisition process are constant, sound source and microphone distance meet the requirement of sound-field model, the physical property of each microphone is identical, isotropic microphone is evenly distributed on the circumference that is positioned at x-y plane that a radius is r, adopt polar coordinates to represent the arrival direction of plane wave s, the initial point of coordinate system is positioned on the home position of circular array, the pitching angle theta ∈ [0 of signal, pi/2], and position angle φ ∈ [0,2 π];
2) many sound-source signals are divided into the time frequency point sets of non-overlapping copies, make only to comprise a movable source signal in each time frequency window, meet the separated orthogonality condition of weak W; And choose Hamming window, work as WDO mmeet the separated quadrature of W-at=1 o'clock;
3) by SRP-PHAT algorithm, calculate the controlled responding power function of the right phase tranformation of all microphones and obtain an objective function, the control wave beam of Beam-former scans at all possible receive direction, and the direction value of wave beam output power maximum obtains the direction of sound source.
Further, described step 2) comprising:
First introduce two important performance criterias: (1) is sheltered and to what extent retained interested sound source; (2) shelter and to what extent suppressed interference sound source;
Consideration is divided into many sound-source signals the time frequency point sets of non-overlapping copies, only comprises a movable source signal in each time frequency window, and approximate satisfied
S j ( t , ω ) S k ( t , ω ) ≈ 0 , ∀ t , ω
Definition time-frequency masking code is
By estimating the time-frequency masking in corresponding each source, can from mixing source, obtain certain source j thus
S j ( t , ω ) = M j ( t , ω ) X ( t , ω ) , ∀ t , ω
M wherein jfor the indicator function of source j support, S j(t, ω), X (t, ω) is respectively s j, the time-frequency representation of x (t),
For given time-frequency mask M, the signal ratio PSRM that definition retains:
PSR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2
PSRM is the shared number percent of source Sj energy that appraisal retains after use is sheltered;
Definition simultaneously
z j ( t ) = Σ k = 1 j ≠ k N s k ( t )
Z wherein j(t) be at source S jlower active sum of interference;
After definition application time-frequency masking M, signal-to-noise ratio is:
SIR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | M ( t , ω ) Z j ( t , ω ) | | 2
SIR wherein mthe main signal-to-noise ratio of estimating after application time-frequency masking M separation signal;
Pass through PSR mand SIR mcan estimate approximate W-separated orthogonality WDO m:
WDO M = | | M ( t , ω ) S j ( t , ω ) | | 2 - | | M ( t , ω ) Y j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2
Because voice signal has sparse time-frequency representation, the power of its time-frequency representation accounts for the exhausted vast scale of general power, and the product amplitude of its time-frequency representation is conventionally always little, therefore meets the separated orthogonality condition of weak W; Especially, work as WDO mmeet the separated quadrature of W-at=1 o'clock.
Further, described step 3) for the SRP-PHAT algorithm of dual microphone;
For only having two microphones, microphone m iwith microphone m jarray, from the signal of position angle and the angle of pitch, arriving two microphone time delays is Δ τ ij(θ, φ), TDOA can estimate by broad sense simple crosscorrelation (GCC), be expressed as:
Δ τ ij ( θ , φ ) = arg max τ P ( r ) = arg max τ R s i , s j ( Δτ ij ( θ , φ ) )
Wherein P (r) is three-dimensional space vectors r spatial likelihood function, can obtain by calculating all possible θ and φ broad sense cross correlation function Rs is j(Δ τ i, j(θ, φ)) in frequency domain, can be expressed as:
R s i s j ( Δ τ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ ij ( θ , φ ) ) dω
ψ wherein ij(ω) be weighting function, S i(ω) S* j(ω) be cross-spectral density function;
Phase tranformation (PHAT) method is exactly a kind of typical transform method,
Definition phase weighting function is:
Ψ ij ( ω ) = 1 | S i ( ω ) S j * ( ω ) |
By selecting suitable weighting function, make the controlled responding power of delay accumulation meet optimization signal-to-noise ratio (SNR) Criterion, broad sense simple crosscorrelation Rs is j(Δ τ i, j(θ, φ)) in limited scope τ, show as a peak value, correspondence propagates into microphone m iwith microphone m jdelay TDOA.
Further, described step 3) for the SRP-PHAT algorithm of circular array microphone sound source:
The broad sense simple crosscorrelation right to all microphones summation:
P ( Δ τ 1 , Δτ 2 , . . . Δτ N ) = Σ i = 1 N Σ j = 1 N R s i s j ( Δτ ij ( θ , φ ) )
= Σ i = 1 N Σ j = 1 N ∫ - π π Ψ i , j ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ i - Δτ j ) dω
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
Further, described step 3) for many sound sources circular array microphone SRP-PHAT algorithm:
When there is two and above sound source, when there is more than two sound source, the SRP-PHAT peak value of a sound source has been sneaked into the SRP-PHAT peak value of another sound source, on some points, can produce false peak value, is difficult to find local maximal peak simultaneously simultaneously;
Utilize voice signal approximate W-separated orthogonality, at time-frequency domain, estimate that each sound-source signal arrives the relative time delay of microphone, array, utilize Short Time Fourier Transform as approximate W-separated orthogonal transformation,
The frequency domain representation of supposing the signal model of i microphone is:
X i [ ω , τ ] = S n ( ω , τ ) e - jωΔ τ n , i + N i [ ω , τ ]
If given window function W, the Short Time Fourier Transform of sj is Sj, has
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ
By selecting appropriate window function and size, at signal, be under approximate W-separated orthogonality hypothesis, only have a sound source at any time-Frequency point is effective, its cross-spectrum is:
E [ X i [ ω , τ ] X j * [ ω , τ ] ] = | S n ( ω , τ ) | 2 e - jω ( Δτ i - Δτ j )
The time delay Δ τ between microphone i and microphone j n, i-Δ τ n, jcan obtain by cross-power spectrum.
Compared with prior art, the invention has the beneficial effects as follows:
A kind of SRP-PHAT multi-source space-location method of the present invention shows by theoretical analysis and emulation experiment, associating approximate W based on circular array-separated quadrature SRP-PHAT algorithm makes the DOA of many sound sources estimate to have good separating property under the acoustic enviroment of very noisy and appropriate reverberation, obviously give prominence to true peaks, there is higher positioning precision.
1. for uniform circular array row, can see the research to simple sund source location, and relatively less for the multi-source Position Research of circular array.There is more high spatial resolution
2. on the basis of approximate W-separated orthogonality hypothesis, SRP-PHAT algorithm makes the DOA of many sound sources estimate under the acoustic enviroment of very noisy and appropriate reverberation, to have good separating property, has obviously given prominence to true peaks, has higher positioning precision.
3. can effectively solve the problem at false spectrum peak, 3 signal sources can be differentiated and opened,
4. this method is applicable to the location under medium reverberation.
Accompanying drawing explanation
Accompanying drawing is used to provide a further understanding of the present invention, for explaining the present invention, is not construed as limiting the invention together with embodiments of the present invention, in the accompanying drawings:
Fig. 1 is uniform circular array row geometric graphs;
Fig. 2 is that uniform circular array train wave bundle forms principle;
Fig. 2: WDO ratio (80%) in 3 sound source situations;
Fig. 4: WDO ratio (90%) in 3 sound source situations;
Fig. 5 sound source s 1(t) time frequency analysis | S 1w (t, ω) |;
Fig. 6 sound source s 2(t) time frequency analysis | S 2w (t, ω) |;
Fig. 7 time frequency analysis | S 1w (t, ω) S 2w (t, ω) |;
Fig. 8 method realizes block diagram;
Fig. 9 uniform circular array row;
Figure 10 is two auditory localization two-dimensional imaging figure, and signal to noise ratio (S/N ratio) is 20dB;
Figure 11 is two auditory localization two-dimensional imaging figure, and signal to noise ratio (S/N ratio) is 30dB;
Figure 12 is the position angle that circular array is surveyed two sound sources, and signal to noise ratio (S/N ratio) is 20dB;
Figure 13 is the position angle that circular array is surveyed two sound sources, and signal to noise ratio (S/N ratio) is 30dB;
Figure 14 is two angle, the sound bearing three-dimensional plot of surveying, and signal to noise ratio (S/N ratio) is 30dB;
Figure 15 is three auditory localization two-dimensional imaging figure, and signal to noise ratio (S/N ratio) is 30dB;
Figure 16 improves one's methods for circular array, to survey the position angle of three sound sources, and signal to noise ratio (S/N ratio) is 30dB;
Figure 17 is that classic method is surveyed the position angle of three sound sources for circular array, and signal to noise ratio (S/N ratio) is 30dB;
Figure 18 is the signal waveform that 8 yuan of microphones receive;
Signal to noise ratio (S/N ratio) and angular error curve when Figure 19 is different T60
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein, only for description and interpretation the present invention, is not intended to limit the present invention.
The first step, location model and uniform circular array train wave bundle form.
A Homogeneous Circular array can be determined space coordinates, as shown in Figure 1, is that isotropic microphone is evenly distributed on the circumference that is positioned at x-y plane that a radius is R.Adopt polar coordinates to represent the arrival direction of plane wave s, the initial point of coordinate system is positioned on the home position of circular array, and true origin is the pitching angle theta ∈ [0, pi/2] of system reference point signal, and position angle φ ∈ [0,2 π].Wherein r is the distance that sound source arrives the circular array center of circle, r ifor sound source is to microphone m idistance.
Suppose that acoustic signals is:
s ( r , t ) = e j ω 0 t - - - ( 1 )
Wherein: ω 0for the angular frequency of sound-source signal, and
C is velocity of wave, C=384m/s.
F is the frequency (Hz) of sound source.
The signal of i microphone reception is
f i(r,t)=s(t-Δτ i)
(2)
As shown in Figure 1
Wherein: r ibe that i microphone is to the distance in source
R is the distance that the round microphone array center of circle is arrived in source
R circular array radius
θ is the angle of pitch of sound source,
position angle for sound source.
i=0,1,2 ... N-1 is the position angle of i microphone.
So the time delay of each microphone before stack is
Wherein: C is velocity of wave, C=384m/s.
As shown in Figure 2, by the time delay Beam-former that superposes, the shifted signal of all microphones capture is sued for peace.The contribution stack of each sound source far zone field point just can, in the hope of the far-field pattern function of this ring array, be had
y ( t ) = 1 N Σ i = 1 N s ( r i , t - Δ τ i ) = 1 N Σ n = 1 N e j ω 0 ( t - Δ τ i ) = e j ω 0 t 1 N Σ n = 1 N e - j ω 0 Δ τ i - - - ( 5 )
(4) are brought in (5) and obtained
Wherein: for sound source unit's wave-number vector.
T is vector transposition.
Δ τ ii0τ 0 estimates with reference to time delay, and getting minimum in all microphone time delays is reference.
Second step, approximate W-separated orthogonality hypothesis
Conventionally the masking effect of people's ear is divided into frequency masking and temporal masking characteristic, based on time-frequency masking method hypothesis sound-source signal, is sparse in separable, meets the separated orthogonality of W-.
Suppose that signal x (t) is comprised of N sound-source signal, can be expressed as
x ( t ) = Σ j = 1 N s j ( t ) - - - ( 7 )
If there is certain linear transformation T, be called s jto S jmapping, be designated as and there is following properties:
(1) conversion T has reversibility, i.e. T -1(Ts)=T (T -1s)=s
(2) during j ≠ k, Λ wherein jfor S jsupport, Λ j=supp S j:={ λ: S j(λ) ≠ 0}, table
Show collection Λ jwith Λ kfriendship non-zero.
If meet above-mentioned (1), the condition of (2), the mixed signal in collection S all can be effectively separated.
If a given window function, if meet
S j ( t , ω ) S k ( t , ω ) = 0 ∀ t , ω - - - ( 8 )
Claim two sound source S jand S kmeet the separated orthogonality of W-.
But the separated orthogonality hypothesis of W-does not meet the signal that will study herein, the result of expression formula (7) is seldom zero.
For this reason, introduce two important performance criterias: (1) is sheltered and to what extent retained interested sound source; (2) shelter and to what extent suppressed interference sound source.
Consideration is divided into many sound-source signals the time frequency point sets of non-overlapping copies, only comprises a movable source signal in each time frequency window, and approximate satisfied
S j ( t , ω ) S k ( t , ω ) ≈ 0 , ∀ t , ω - - - ( 9 )
Definition time-frequency masking code is
By estimating the time-frequency masking in corresponding each source, can from mixing source, obtain certain source j thus
S j ( t , ω ) = M j ( t , ω ) X ( t , ω ) , ∀ t , ω - - - ( 11 )
M wherein jfor the indicator function of source j support, S j(t, ω), X (t, ω) is respectively s j, the time-frequency representation of x (t),
For given time-frequency mask M, the signal ratio PSR that definition retains m
PSR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2 - - - ( 12 )
PSR mfor estimate the source S retaining after use is sheltered jthe number percent that energy is shared.
Definition simultaneously
z j ( t ) = Σ k = 1 j ≠ k N s k ( t ) - - - ( 13 )
Z wherein j(t) be at source S jlower active sum of interference.
After definition application time-frequency masking M, signal-to-noise ratio is
SIR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | M ( t , ω ) Z j ( t , ω ) | | 2 - - - ( 14 )
SIR wherein mthe main signal-to-noise ratio of estimating after application time-frequency masking M separation signal.
Pass through PSR mand SIR mcan estimate approximate W-separated orthogonality WDO m.
WDO M = | | M ( t , ω ) S j ( t , ω ) | | 2 - | | M ( t , ω ) Y j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2 - - - ( 15 )
Because voice signal has sparse time-frequency representation, the power of its time-frequency representation accounts for the exhausted vast scale of general power, and the product amplitude of its time-frequency representation is conventionally always little.Therefore meet the separated orthogonality condition of weak W.Approximate W-separated intercept is higher, has better separating effect.Want to obtain good time-frequency masking effect, window function type and choosing of size are played vital effect to its performance.Especially, work as WDO mmeet the separated quadrature of W-at=1 o'clock.
According to the experiment of Scott Rickard (Scott Rickard, Radu Balan and Justinian Rosca.Real-time time-frequency based blind source separation.Proceedings ICA2001, pp.651-656, December2001.), under 0dB, the WDO ratio of different number sound sources is as follows
N 2 3 4 5 6 7 8 9 10
WDO 93.6 88.0 83.4 79.2 75.6 72.3 69.3 66.6 64
As shown in Figure 3, Figure 4, by the situation of 3 sound sources is carried out to simplation verification, horizontal ordinate is WDO value, and ordinate is voice signal number of samples, can see in 3 sound source situations, and signal more than 80% is being quadrature.
As Fig. 5, Fig. 6 and Fig. 7, in addition 2 sound sources are carried out to nearly orthogonal condition Verification, respectively to signal s 1(t), s 2(t) carry out time frequency analysis, respectively with analyze simultaneously horizontal ordinate is the time, and ordinate is frequency.Window function W (t) chooses Hamming window, length of window 64ms, and by Fig. 5, Fig. 6, Fig. 7 can find out, in comprise seldom with composition, can prove that sound-source signal meets approximate W-separated quadrature thus.
The 3rd step, the SRP-PHAT localization method of associating approximate W-separated many sound sources of quadrature circular array
SRP-PHAT algorithm is by calculating the controlled responding power function of the right phase tranformation of all microphones and obtaining an objective function, the Beam-former of devise optimum is also controlled wave beam and is scanned at all possible receive direction, and the direction value of wave beam output power maximum obtains the direction of sound source.
The SRP-PHAT algorithm of 1 dual microphone
For only there being two microphone m iand m jarray, from the signal of position angle and pitching, arriving two microphone time delays is Δ τ ij(θ, φ), TDOA can estimate by broad sense simple crosscorrelation (GCC), be expressed as:
Δτ ij ( θ , φ ) = arg max τ P ( r ) = arg max τ R s i , s j ( Δτ ij ( θ , φ ) ) - - - ( 16 )
Wherein P (r) is three-dimensional space vectors r spatial likelihood function, can obtain by calculating all possible θ and φ.Broad sense cross correlation function Rs is j(Δ τ i, j(θ, φ)) in frequency domain, can be expressed as:
R s i s j ( Δ τ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ ij ( θ , φ ) ) dω - - - ( 17 )
ψ wherein ij(ω) be weighting function, S i(ω) S* j(ω) be cross-spectral density function.
Phase tranformation (PHAT) method is exactly a kind of typical transform method.
Definition phase weighting function is:
Ψ ij = ( ω ) = 1 | S i ( ω ) S j * ( ω ) | - - - ( 18 )
By selecting suitable weighting function, make the controlled responding power of delay accumulation meet optimization signal-to-noise ratio (SNR) Criterion, broad sense simple crosscorrelation Rs is j(Δ τ i, j(θ, φ)) in limited scope τ, show as a peak value, correspondence propagates into microphone m iand m jdelay TDOA.This algorithm has certain noise immunity, anti-reverberation and robustness in auditory localization.
2 circular array SRP-PHAT algorithms
The broad sense simple crosscorrelation right to all microphones summation
P ( Δτ 1 , Δτ 2 , . . . Δτ N ) = Σ i = 1 N Σ j = 1 N R s i s j ( Δτ ij ( θ , φ ) ) = Σ i = 1 N Σ j = 1 N ∫ - π π Ψ i , j ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ i - Δτ j ) dω - - - ( 19 )
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
Along with the increase of microphone number, dual microphone SRP-PHAT method expands to round microphone SRP-PHAT method naturally.
The circular array of sound source more than 3 SRP-PHAT algorithm
When there is two and above sound source, when there is more than two sound source, the SRP-PHAT peak value of a sound source has been sneaked into the SRP-PHAT peak value of another sound source, on some points, can produce false peak value, is difficult to find local maximal peak simultaneously simultaneously.
Utilize foregoing voice signal approximate W-separated orthogonality, at time-frequency domain, estimate that each sound-source signal arrives the relative time delay of microphone array.
Utilize Short Time Fourier Transform as approximate W-separated orthogonal transformation.
The frequency domain representation of supposing the signal model of i microphone is:
X i [ ω , τ ] = S n ( ω , τ ) e - jω Δτ n , i + N i [ ω , τ ] - - - ( 20 )
If given window function W, the Short Time Fourier Transform of sj is Sj, has
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ - - - ( 21 )
By selecting appropriate window function and size, at signal, be under approximate W-separated orthogonality hypothesis, only have a sound source at any time-Frequency point is effective.Its cross-spectrum is:
E [ X i [ ω , τ ] X j * [ ω , τ ] ] = | S n ( ω , τ ) | 2 e - jω ( Δτ i - Δτ j ) - - - ( 22 )
The time delay Δ τ n between microphone i and j, i-Δ τ n, j can obtain by cross-power spectrum.
1 two auditory localizations of embodiment
1. uniform circular array row location model is selected
Emulation experiment is simulated under different signal to noise ratio (S/N ratio)s and reverberation environment, and Homogeneous Circular array is placed in the room of 7m * 8m * 3.5m, and its 8 yuan of microphone locus are respectively [3.25 ,-1.6,1.5], [3.25,1.1,1.5], [1.87,3.75,1.5], [1.0,3.75,1.5], [3.25,1.8,1.5], [3.25,-1.0,1.5], [2.2 ,-3.75,1.5], [0.6 ,-3.75,1.5].
2. the selection of sound source
Sound source is the random voice signal producing, and signal to noise ratio (S/N ratio) is 0-30dB.Random interfering signal is gaussian signal, is used for simulating air condition electric fan and from noise outside window, noise power can reach 10dB the most by force, and the corresponding reverberation time is determined by the reflection coefficient of room wall, floor and ceiling.
3. pair array reception signal carries out Short Time Fourier Transform (STFT)
If given window function W, s jshort Time Fourier Transform be S j, have
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ - - - ( 22 )
Want to obtain good time-frequency masking effect, window function type and choosing of size are played vital effect to its performance.Wherein window function is chosen Hamming window, and window size is 1024 points.
4. carry out the broad sense simple crosscorrelation of phase tranformation
By choosing suitable window function, desirable good separating effect, meets approximate W-separated quadrature.On this basis, can carry out broad sense computing cross-correlation.
Broad sense cross correlation function Rs is j(Δ τ i, j(θ, φ)) in frequency domain, can be expressed as:
R s i s j ( Δτ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ ij ( θ , φ ) ) dω - - - ( 22 )
ψ wherein ij(ω) be weighting function, for:
Ψ ij ( ω ) = 1 | S i ( ω ) S j * ( ω ) | - - - ( 23 )
The broad sense simple crosscorrelation right to all microphones summation
P ( Δτ 1 , Δτ 2 , . . . Δτ N ) = Σ i = 1 N Σ j = 1 N R s i s j ( Δτ ij ( θ , φ ) ) = Σ i = 1 N Σ j = 1 N ∫ - π π Ψ i , j ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δτ i - Δτ j ) dω - - - ( 24 )
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
Obtain P (Δ τ 1, Δ τ 2... Δ τ n) maximal value after, can determine pitching angle theta and the position angle φ of sound source.
5. the result after above step
Shown in Figure 10, Figure 11, be respectively circular array at 20dB, sound source wave field image under 30dB signal to noise ratio (S/N ratio).In figure, is microphone position, and zero represents the sound source of estimating, * for disturbing residing position.
The locus that Figure 10 shows that two sound sources is respectively [0.59,2.08,1.5] and [0.29 ,-1.37,1.5], and signal to noise ratio (S/N ratio) is 20dB.Random interfering signal is gaussian signal, is used for simulating air condition electric fan and from noise outside window, locus is respectively [2 ,-4,1.5], [3.5 ,-3.2,1.5], noise power can reach 10dB the most by force, and the corresponding reverberation time is determined by the reflection coefficient of room wall, floor and ceiling.
The locus that Figure 11 shows that two sound sources is respectively [1.5,2.1,1.5] and [2.1,0.8,1.5], and signal to noise ratio (S/N ratio) is 30dB.Be used for simulating air condition electric fan and from noise outside window away from two sound sources.
Adopt the SRP-PHAT algorithm of associating approximate W-separated quadrature to carry out orientation estimation, choose Hamming window, window size is 1024 points.Shown in Figure 10, Figure 11, be respectively circular array at 20dB, sound source wave field image under 30dB signal to noise ratio (S/N ratio).Is microphone position, and zero represents the sound source of estimating, * for disturbing residing position.Visible under identical background noise environment, the signal to noise ratio (S/N ratio) of signal more high position precision is also higher.
Shown in Figure 12, Figure 13, be respectively the angle, sound bearing recording.Fig. 5 position angle is respectively φ 1=74 ° and φ 2=-78 °, although the azran of two signals is near and Signal-to-Noise is low, 2 sound sources can be differentiated out substantially, in true bearing, all there is spectrum peak, do not have false spectrum peak to occur, and target azimuth correctly still can draw estimated result, 2 sound sources can be differentiated out substantially.Figure 13 is measured position angle φ 1=17 °, φ 2=52 °.Although the azran of two signals is nearer, because signal to noise ratio (S/N ratio) is high and two angles differ larger, 2 sound sources are differentiated completely.Along with the increase of signal to noise ratio (S/N ratio), evaluated error can be more and more less, and estimated accuracy can be more and more higher.The larger estimation of differential seat angle between two signals is more accurate, when the difference of angle greatly to a certain extent after estimated accuracy tend towards stability.
Position angle shown in Figure 14 and the angle of pitch are (φ 1=74 °, θ 1=46 °) and (φ 2=-78 °, θ 2=0 °).
2 three auditory localizations of embodiment
When sound source increases to 3, in the situation that signal to noise ratio (S/N ratio) is low, can not solve well the problem at false spectrum peak.Under high s/n ratio condition, substantially can solve the problem at false spectrum peak, many sound sources are had to good resolution characteristic.
Specific implementation step, with example 1, is omited herein.
Figure 15 shows that three auditory localization two-dimensional imaging figure, signal to noise ratio (S/N ratio) 30dB.
Shown in Figure 16, Figure 17, be respectively the angle, sound bearing that the method that proposes herein and traditional SRP-PHAT method record under the higher condition of signal to noise ratio (S/N ratio).SRP-PHAT method based on approximate W-separated quadrature can solve the problem at false spectrum peak effectively, 3 signal sources can be differentiated and opened, and traditional SRP-PHAT method there will be false spectrum peak, 3 useful signals of indistinguishable.
Figure 18 shows that the sound-source signal that 8 yuan of microphone array received arrive, can find out that interference source is on No. 7 microphones impacts close to are larger from it, Figure 19 shows that 60 times signal to noise ratio (S/N ratio)s of different reverberation time T and orientation angle error relationship curve, RT60 chooses respectively 300ms, 450ms and 600ms.Along with the increase of T60, evaluated error is increasing, and estimated accuracy can be more and more lower.Visible in the situation that reverberation is large, be difficult to resolution target orientation, this method is applicable to the location under medium reverberation.
From simulation result, can find out, adopt the SRP-PHAT algorithm of even ring array to there is good positioning performance.Particularly, when SNR is higher, when reverberation is moderate, locating effect is better
The separated orthogonality hypothesis of W-the present invention is directed to based on the sparse property of voice signal does not meet many sound sources, two key properties of signal to noise ratio (S/N ratio) after signal retention rate and time-frequency masking after introducing voice signal time-frequency masking, derived approximate W-separated orthogonality hypothesis condition, many sound-source signals are divided into the time frequency point sets of non-overlapping copies, each set only comprises the time frequency component of single source signal, at time-frequency domain, estimates that each sound-source signal arrives the relative time delay of microphone array.Estimate that source signal arrives the relative time delay of microphone array.Special employing has the more circular array of high spatial resolution, realized and the high-resolution of the position angle of many sound-source signals, the angle of pitch having been estimated simultaneously, realize the space orientation of sound-source signal, overcome the three-dimensional fix problem that existing sound localization method cannot effectively be realized a plurality of aliasing sound sources.
Finally it should be noted that: these are only the preferred embodiments of the present invention; be not limited to the present invention; although the present invention is had been described in detail with reference to embodiment; for a person skilled in the art; its technical scheme that still can record aforementioned each embodiment is modified; or part technical characterictic is wherein equal to replacement; but within the spirit and principles in the present invention all; any modification of doing, be equal to replacement, improvement etc., within protection scope of the present invention all should be included in.

Claims (5)

1. a SRP-PHAT multi-source space-location method, is characterized in that, comprises the following steps:
1) computer memory coordinate under assumed condition, first number and the locus of supposing whole microphones of Homogeneous Circular microphone array in data acquisition process are constant, sound source and microphone distance meet the requirement of sound-field model, the physical property of each microphone is identical, isotropic microphone is evenly distributed on the circumference that is positioned at x-y plane that a radius is r, adopt polar coordinates to represent the arrival direction of plane wave s, the initial point of coordinate system is positioned on the home position of circular array, the pitching angle theta ∈ [0 of signal, pi/2], and position angle φ ∈ [0,2 π];
2) many sound-source signals are divided into the time frequency point sets of non-overlapping copies, make only to comprise a movable source signal in each time frequency window, meet the separated orthogonality condition of weak W; And choose Hamming window, work as WDO mmeet the separated quadrature of W-at=1 o'clock;
3) by SRP-PHAT algorithm, calculate the controlled responding power function of the right phase tranformation of all microphones and obtain an objective function, the control wave beam of Beam-former scans at all possible receive direction, and the direction value of wave beam output power maximum obtains the direction of sound source.
2. a kind of SRP-PHAT multi-source space-location method according to claim 1, is characterized in that described step 2) comprising:
First introduce two important performance criterias: (1) is sheltered and to what extent retained interested sound source; (2) shelter and to what extent suppressed interference sound source;
Consideration is divided into many sound-source signals the time frequency point sets of non-overlapping copies, only comprises a movable source signal in each time frequency window, and approximate satisfied
S j ( t , ω ) S k ( t , ω ) ≈ 0 , ∀ t , ω
Definition time-frequency masking code is
By estimating the time-frequency masking in corresponding each source, can from mixing source, obtain certain source j thus
S j ( t , ω ) = M j ( t , ω ) X ( t , ω ) , ∀ t , ω
Wherein Mj is the indicator function of source j support, Sj (t, ω), and X (t, ω) is respectively sj, the time-frequency representation of x (t),
For given time-frequency mask M, the signal ratio PSRM that definition retains:
PSR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2
PSRM is the shared number percent of source Sj energy that appraisal retains after use is sheltered;
Definition simultaneously
z j ( t ) = Σ k = 1 j ≠ k N s k ( t )
Wherein zj (t) is active sum under the interference at source Sj;
After definition application time-frequency masking M, signal-to-noise ratio is:
SIR M = | | M ( t , ω ) S j ( t , ω ) | | 2 | | M ( t , ω ) Z j ( t , ω ) | | 2
Wherein SIRM mainly estimates the signal-to-noise ratio after application time-frequency masking M separation signal;
By PSRM and SIRM, can estimate approximate W-separated orthogonality WDOM:
WDO M = | | M ( t , ω ) S j ( t , ω ) | | 2 - | | M ( t , ω ) Y j ( t , ω ) | | 2 | | S j ( t , ω ) | | 2
Because voice signal has sparse time-frequency representation, the power of its time-frequency representation accounts for the exhausted vast scale of general power, and the product amplitude of its time-frequency representation is conventionally always little, therefore meets the separated orthogonality condition of weak W; Especially, when WDOM=1, meet the separated quadrature of W-.
3. a kind of SRP-PHAT multi-source space-location method according to claim 1, is characterized in that described step 3) for the SRP-PHAT algorithm of dual microphone,
For only there being two microphones, microphone mi and microphone mj array, from the signal of position angle and the angle of pitch, arriving two microphone time delays is Δ τ ij (θ, φ), TDOA can estimate by broad sense simple crosscorrelation (GCC), be expressed as:
Δ τ ij ( θ , φ ) = arg max τ P ( r ) = arg max τ R s i , s j ( Δ τ ij ( θ , φ ) )
Wherein P (r) is three-dimensional space vectors r spatial likelihood function, can obtain by calculating all possible θ and φ, and broad sense cross correlation function Rsisj (Δ τ i, j (θ, φ)) can be expressed as in frequency domain:
R s i , s j ( Δ τ ij ( θ , φ ) ) = ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δ τ ij ( θ , φ ) ) dω
Wherein ψ ij (ω) is weighting function, and Si (ω) S*j (ω) is cross-spectral density function;
Phase tranformation (PHAT) method is exactly a kind of typical transform method,
Definition phase weighting function is:
Ψ ij ( ω ) = 1 | S i ( ω ) S j * ( ω ) |
By selecting suitable weighting function, make the controlled responding power of delay accumulation meet optimization signal-to-noise ratio (SNR) Criterion, broad sense simple crosscorrelation Rsisj (Δ τ i, j (θ, φ)) in limited scope τ, show as a peak value, correspondence propagates into the delay TDOA of microphone mi and microphone mj.
4. a kind of SRP-PHAT multi-source space-location method according to claim 1, is characterized in that described step 3) for the SRP-PHAT algorithm of circular array microphone sound source:
The broad sense simple crosscorrelation right to all microphones summation:
P ( Δ τ 1 , Δ τ 2 , · · · Δ τ N ) = Σ i = 1 N Σ j = 1 N R s i , s j ( Δ τ ij ( θ , φ ) ) = Σ i = 1 N Σ j = 1 N ∫ - π π Ψ ij ( ω ) S i ( ω ) S j * ( ω ) e jω ( Δ τ i - Δ τ j ) dω
Δ τ wherein 1, Δ τ 2Δ τ nfor the controllable time delay of N microphone, Δ τ wherein ii0i=1 ... N, τ 0for estimating with reference to time delay, getting minimum in all microphone time delays is reference.
5. a kind of SRP-PHAT multi-source space-location method according to claim 1, is characterized in that described step 3) for many sound sources circular array microphone SRP-PHAT algorithm:
When there is two and above sound source, when there is more than two sound source, the SRP-PHAT peak value of a sound source has been sneaked into the SRP-PHAT peak value of another sound source, on some points, can produce false peak value, is difficult to find local maximal peak simultaneously simultaneously;
Utilize voice signal approximate W-separated orthogonality, at time-frequency domain, estimate that each sound-source signal arrives the relative time delay of microphone, array, utilize Short Time Fourier Transform as approximate W-separated orthogonal transformation,
The frequency domain representation of supposing the signal model of i microphone is:
X i [ ω , τ ] = S n ( ω , τ ) e - jωΔ τ n , i + N i [ ω , τ ]
If given window function W, the Short Time Fourier Transform of sj is Sj, has
S j ( t , ω ) = F W ( s j ( · ) ) ( t , ω ) = 1 2 π ∫ - ∞ ∞ W ( τ - t ) s j ( τ ) e - iωτ dτ
By selecting appropriate window function and size, at signal, be under approximate W-separated orthogonality hypothesis, only have a sound source at any time-Frequency point is effective, its cross-spectrum is:
E [ X i [ ω , τ ] X j * [ ω , τ ] ] = | S n ( ω , τ ) | 2 e - jω ( Δτ i - Δτ j )
The time delay Δ τ n between microphone i and microphone j, i-Δ τ n, j can obtain by cross-power spectrum.
CN201410366922.4A 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods Expired - Fee Related CN104142492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410366922.4A CN104142492B (en) 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410366922.4A CN104142492B (en) 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods

Publications (2)

Publication Number Publication Date
CN104142492A true CN104142492A (en) 2014-11-12
CN104142492B CN104142492B (en) 2017-04-05

Family

ID=51851720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410366922.4A Expired - Fee Related CN104142492B (en) 2014-07-29 2014-07-29 A kind of SRP PHAT multi-source space-location methods

Country Status (1)

Country Link
CN (1) CN104142492B (en)

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104898086A (en) * 2015-05-19 2015-09-09 南京航空航天大学 Sound intensity estimation sound source orientation method applicable for minitype microphone array
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN105044675A (en) * 2015-07-16 2015-11-11 南京航空航天大学 Fast SRP sound source positioning method
CN105467364A (en) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for localizing target sound source
CN105489219A (en) * 2016-01-06 2016-04-13 广州零号软件科技有限公司 Indoor space service robot distributed speech recognition system and product
CN106093864A (en) * 2016-06-03 2016-11-09 清华大学 A kind of microphone array sound source space real-time location method
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106950542A (en) * 2016-01-06 2017-07-14 中兴通讯股份有限公司 The localization method of sound source, apparatus and system
CN107063437A (en) * 2017-04-12 2017-08-18 中广核研究院有限公司北京分公司 Nuclear power station noise-measuring system based on microphone array
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
CN107271963A (en) * 2017-06-22 2017-10-20 广东美的制冷设备有限公司 The method and apparatus and air conditioner of auditory localization
CN107290711A (en) * 2016-03-30 2017-10-24 芋头科技(杭州)有限公司 A kind of voice is sought to system and method
CN107918108A (en) * 2017-11-14 2018-04-17 重庆邮电大学 A kind of uniform circular array 2-d direction finding method for quick estimating
CN108089153A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of sound localization method, apparatus and system
CN108089152A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of apparatus control method, apparatus and system
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
CN108510987A (en) * 2018-03-26 2018-09-07 北京小米移动软件有限公司 Method of speech processing and device
CN108549052A (en) * 2018-03-20 2018-09-18 南京航空航天大学 A kind of humorous domain puppet sound intensity sound localization method of circle of time-frequency-spatial domain joint weighting
CN108872939A (en) * 2018-04-29 2018-11-23 桂林电子科技大学 Interior space geometric profile reconstructing method based on acoustics mirror image model
CN109254266A (en) * 2018-11-07 2019-01-22 苏州科达科技股份有限公司 Sound localization method, device and storage medium based on microphone array
CN109633551A (en) * 2019-01-08 2019-04-16 中国电子科技集团公司第三研究所 A kind of acoustic array of detectable a variety of acoustic targets
CN109997375A (en) * 2016-11-09 2019-07-09 西北工业大学 Concentric circles difference microphone array and associated beam are formed
CN110376551A (en) * 2019-07-04 2019-10-25 浙江大学 A kind of TDOA localization method based on the distribution of acoustical signal time-frequency combination
CN110544490A (en) * 2019-07-30 2019-12-06 南京林业大学 sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics
CN110703199A (en) * 2019-10-22 2020-01-17 哈尔滨工程大学 Quaternary cross array high-precision azimuth estimation method based on compass compensation
CN110726972A (en) * 2019-10-21 2020-01-24 南京南大电子智慧型服务机器人研究院有限公司 Voice sound source positioning method using microphone array under interference and high reverberation environment
CN111060872A (en) * 2020-03-17 2020-04-24 深圳市友杰智新科技有限公司 Sound source positioning method and device based on microphone array and computer equipment
CN111798869A (en) * 2020-09-10 2020-10-20 成都启英泰伦科技有限公司 Sound source positioning method based on double microphone arrays
CN111833901A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio processing method, audio processing apparatus, audio processing system, and medium
CN111880148A (en) * 2020-08-07 2020-11-03 北京字节跳动网络技术有限公司 Sound source positioning method, device, equipment and storage medium
CN111929645A (en) * 2020-09-23 2020-11-13 深圳市友杰智新科技有限公司 Method and device for positioning sound source of specific human voice and computer equipment
CN112379330A (en) * 2020-11-27 2021-02-19 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112684412A (en) * 2021-01-12 2021-04-20 中北大学 Sound source positioning method and system based on pattern clustering
CN113470682A (en) * 2021-06-16 2021-10-01 中科上声(苏州)电子有限公司 Method, device and storage medium for estimating speaker orientation by microphone array
CN113655440A (en) * 2021-08-09 2021-11-16 西南科技大学 Self-adaptive compromising pre-whitening sound source positioning method
CN113936687A (en) * 2021-12-17 2022-01-14 北京睿科伦智能科技有限公司 Method for real-time voice separation voice transcription
CN115150712A (en) * 2022-06-07 2022-10-04 中国第一汽车股份有限公司 Vehicle-mounted microphone system and automobile
CN115295000A (en) * 2022-10-08 2022-11-04 深圳通联金融网络科技服务有限公司 Method, device and equipment for improving speech recognition accuracy under multi-object speaking scene

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090279714A1 (en) * 2008-05-06 2009-11-12 Samsung Electronics Co., Ltd. Apparatus and method for localizing sound source in robot
CN101762806A (en) * 2010-01-27 2010-06-30 华为终端有限公司 Sound source locating method and apparatus thereof
KR20140015893A (en) * 2012-07-26 2014-02-07 삼성테크윈 주식회사 Apparatus and method for estimating location of sound source

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090279714A1 (en) * 2008-05-06 2009-11-12 Samsung Electronics Co., Ltd. Apparatus and method for localizing sound source in robot
CN101762806A (en) * 2010-01-27 2010-06-30 华为终端有限公司 Sound source locating method and apparatus thereof
KR20140015893A (en) * 2012-07-26 2014-02-07 삼성테크윈 주식회사 Apparatus and method for estimating location of sound source

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAVID AYLLO´N ET AL.: "Real-time phase-isolation algorithm for speech separation", 《19TH EUROPEAN SIGNAL PROCESSING CONFERENCE》 *
M. SWARTLING ET AL.: "Source Localization for Multiple Speech Sources Using Low Complexity Non-Parametric Source Separation and Clustering", 《SIGNAL PROCESSING》 *

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104936091A (en) * 2015-05-14 2015-09-23 科大讯飞股份有限公司 Intelligent interaction method and system based on circle microphone array
CN104936091B (en) * 2015-05-14 2018-06-15 讯飞智元信息科技有限公司 Intelligent interactive method and system based on circular microphone array
CN104898086A (en) * 2015-05-19 2015-09-09 南京航空航天大学 Sound intensity estimation sound source orientation method applicable for minitype microphone array
CN105044675A (en) * 2015-07-16 2015-11-11 南京航空航天大学 Fast SRP sound source positioning method
CN105467364A (en) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 Method and apparatus for localizing target sound source
CN105467364B (en) * 2015-11-20 2019-03-29 百度在线网络技术(北京)有限公司 A kind of method and apparatus positioning target sound source
CN106950542A (en) * 2016-01-06 2017-07-14 中兴通讯股份有限公司 The localization method of sound source, apparatus and system
CN105489219A (en) * 2016-01-06 2016-04-13 广州零号软件科技有限公司 Indoor space service robot distributed speech recognition system and product
CN107290711A (en) * 2016-03-30 2017-10-24 芋头科技(杭州)有限公司 A kind of voice is sought to system and method
CN106093864A (en) * 2016-06-03 2016-11-09 清华大学 A kind of microphone array sound source space real-time location method
CN106448722A (en) * 2016-09-14 2017-02-22 科大讯飞股份有限公司 Sound recording method, device and system
CN106448722B (en) * 2016-09-14 2019-01-18 讯飞智元信息科技有限公司 The way of recording, device and system
CN109997375A (en) * 2016-11-09 2019-07-09 西北工业大学 Concentric circles difference microphone array and associated beam are formed
WO2018095166A1 (en) * 2016-11-23 2018-05-31 杭州海康威视数字技术股份有限公司 Device control method, apparatus and system
CN108089152A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of apparatus control method, apparatus and system
US10816633B2 (en) 2016-11-23 2020-10-27 Hangzhou Hikvision Digital Technology Co., Ltd. Device control method, apparatus and system
CN108089153A (en) * 2016-11-23 2018-05-29 杭州海康威视数字技术股份有限公司 A kind of sound localization method, apparatus and system
CN108089152B (en) * 2016-11-23 2020-07-03 杭州海康威视数字技术股份有限公司 Equipment control method, device and system
CN107063437A (en) * 2017-04-12 2017-08-18 中广核研究院有限公司北京分公司 Nuclear power station noise-measuring system based on microphone array
CN107102296B (en) * 2017-04-27 2020-04-14 大连理工大学 Sound source positioning system based on distributed microphone array
CN107102296A (en) * 2017-04-27 2017-08-29 大连理工大学 A kind of sonic location system based on distributed microphone array
CN107271963A (en) * 2017-06-22 2017-10-20 广东美的制冷设备有限公司 The method and apparatus and air conditioner of auditory localization
CN107918108A (en) * 2017-11-14 2018-04-17 重庆邮电大学 A kind of uniform circular array 2-d direction finding method for quick estimating
CN108198568B (en) * 2017-12-26 2020-10-16 太原理工大学 Method and system for positioning multiple sound sources
CN108198568A (en) * 2017-12-26 2018-06-22 太原理工大学 A kind of method and system of more auditory localizations
CN108549052A (en) * 2018-03-20 2018-09-18 南京航空航天大学 A kind of humorous domain puppet sound intensity sound localization method of circle of time-frequency-spatial domain joint weighting
CN108549052B (en) * 2018-03-20 2021-04-13 南京航空航天大学 Time-frequency-space domain combined weighted circular harmonic domain pseudo-sound strong sound source positioning method
US10930304B2 (en) 2018-03-26 2021-02-23 Beijing Xiaomi Mobile Software Co., Ltd. Processing voice
CN108510987A (en) * 2018-03-26 2018-09-07 北京小米移动软件有限公司 Method of speech processing and device
CN108510987B (en) * 2018-03-26 2020-10-23 北京小米移动软件有限公司 Voice processing method and device
CN108872939A (en) * 2018-04-29 2018-11-23 桂林电子科技大学 Interior space geometric profile reconstructing method based on acoustics mirror image model
CN108872939B (en) * 2018-04-29 2020-09-29 桂林电子科技大学 Indoor space geometric outline reconstruction method based on acoustic mirror image model
CN109254266A (en) * 2018-11-07 2019-01-22 苏州科达科技股份有限公司 Sound localization method, device and storage medium based on microphone array
CN109633551A (en) * 2019-01-08 2019-04-16 中国电子科技集团公司第三研究所 A kind of acoustic array of detectable a variety of acoustic targets
CN111833901A (en) * 2019-04-23 2020-10-27 北京京东尚科信息技术有限公司 Audio processing method, audio processing apparatus, audio processing system, and medium
CN111833901B (en) * 2019-04-23 2024-04-05 北京京东尚科信息技术有限公司 Audio processing method, audio processing device, system and medium
CN110376551B (en) * 2019-07-04 2021-05-04 浙江大学 TDOA (time difference of arrival) positioning method based on acoustic signal time-frequency joint distribution
CN110376551A (en) * 2019-07-04 2019-10-25 浙江大学 A kind of TDOA localization method based on the distribution of acoustical signal time-frequency combination
CN110544490B (en) * 2019-07-30 2022-04-05 南京工程学院 Sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics
CN110544490A (en) * 2019-07-30 2019-12-06 南京林业大学 sound source positioning method based on Gaussian mixture model and spatial power spectrum characteristics
CN110726972A (en) * 2019-10-21 2020-01-24 南京南大电子智慧型服务机器人研究院有限公司 Voice sound source positioning method using microphone array under interference and high reverberation environment
CN110703199A (en) * 2019-10-22 2020-01-17 哈尔滨工程大学 Quaternary cross array high-precision azimuth estimation method based on compass compensation
CN111060872A (en) * 2020-03-17 2020-04-24 深圳市友杰智新科技有限公司 Sound source positioning method and device based on microphone array and computer equipment
CN111060872B (en) * 2020-03-17 2020-06-23 深圳市友杰智新科技有限公司 Sound source positioning method and device based on microphone array and computer equipment
CN111880148A (en) * 2020-08-07 2020-11-03 北京字节跳动网络技术有限公司 Sound source positioning method, device, equipment and storage medium
CN111798869B (en) * 2020-09-10 2020-11-17 成都启英泰伦科技有限公司 Sound source positioning method based on double microphone arrays
CN111798869A (en) * 2020-09-10 2020-10-20 成都启英泰伦科技有限公司 Sound source positioning method based on double microphone arrays
CN111929645A (en) * 2020-09-23 2020-11-13 深圳市友杰智新科技有限公司 Method and device for positioning sound source of specific human voice and computer equipment
CN112379330A (en) * 2020-11-27 2021-02-19 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112379330B (en) * 2020-11-27 2023-03-10 浙江同善人工智能技术有限公司 Multi-robot cooperative 3D sound source identification and positioning method
CN112684412A (en) * 2021-01-12 2021-04-20 中北大学 Sound source positioning method and system based on pattern clustering
CN112684412B (en) * 2021-01-12 2022-09-13 中北大学 Sound source positioning method and system based on pattern clustering
CN113470682A (en) * 2021-06-16 2021-10-01 中科上声(苏州)电子有限公司 Method, device and storage medium for estimating speaker orientation by microphone array
CN113470682B (en) * 2021-06-16 2023-11-24 中科上声(苏州)电子有限公司 Method, device and storage medium for estimating speaker azimuth by microphone array
CN113655440A (en) * 2021-08-09 2021-11-16 西南科技大学 Self-adaptive compromising pre-whitening sound source positioning method
CN113936687B (en) * 2021-12-17 2022-03-15 北京睿科伦智能科技有限公司 Method for real-time voice separation voice transcription
CN113936687A (en) * 2021-12-17 2022-01-14 北京睿科伦智能科技有限公司 Method for real-time voice separation voice transcription
CN115150712A (en) * 2022-06-07 2022-10-04 中国第一汽车股份有限公司 Vehicle-mounted microphone system and automobile
CN115295000A (en) * 2022-10-08 2022-11-04 深圳通联金融网络科技服务有限公司 Method, device and equipment for improving speech recognition accuracy under multi-object speaking scene
CN115295000B (en) * 2022-10-08 2023-01-03 深圳通联金融网络科技服务有限公司 Method, device and equipment for improving speech recognition accuracy under multi-object speaking scene

Also Published As

Publication number Publication date
CN104142492B (en) 2017-04-05

Similar Documents

Publication Publication Date Title
CN104142492A (en) SRP-PHAT multi-source spatial positioning method
Chen et al. Acoustic source localization and beamforming: theory and practice
Chen et al. Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field
CN111123192B (en) Two-dimensional DOA positioning method based on circular array and virtual extension
CN111474521B (en) Sound source positioning method based on microphone array in multipath environment
Zou et al. Multisource DOA estimation based on time-frequency sparsity and joint inter-sensor data ratio with single acoustic vector sensor
Long et al. Acoustic source localization based on geometric projection in reverberant and noisy environments
Huleihel et al. Spherical array processing for acoustic analysis using room impulse responses and time-domain smoothing
Padois et al. Acoustic source localization using a polyhedral microphone array and an improved generalized cross-correlation technique
Schwartz et al. Multi-speaker DOA estimation in reverberation conditions using expectation-maximization
He et al. Closed-form DOA estimation using first-order differential microphone arrays via joint temporal-spectral-spatial processing
CN109696657A (en) A kind of coherent sound sources localization method based on vector hydrophone
Dang et al. A feature-based data association method for multiple acoustic source localization in a distributed microphone array
KR20090128221A (en) Method for sound source localization and system thereof
Wang et al. Robust direct position determination methods in the presence of array model errors
Liu et al. Research on acoustic source localization using time difference of arrival measurements
Dang et al. Multiple sound source localization based on a multi-dimensional assignment model
Çöteli et al. Multiple sound source localization with rigid spherical microphone arrays via residual energy test
Liu et al. A multiple sources localization method based on TDOA without association ambiguity for near and far mixed field sources
Pertilä Acoustic source localization in a room environment and at moderate distances
Wang et al. 3-D sound source localization with a ternary microphone array based on TDOA-ILD algorithm
Zavala et al. Generalized inverse beamforming investigation and hybrid estimation
Zhang et al. Three‐Dimension Localization of Wideband Sources Using Sensor Network
Sun et al. Indoor sound source localization and number estimation using infinite Gaussian mixture models
Pasha et al. Forming ad-hoc microphone arrays through clustering of acoustic room impulse responses

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170405

Termination date: 20200729