CN103995252B - A kind of sound source localization method of three-dimensional space - Google Patents

A kind of sound source localization method of three-dimensional space Download PDF

Info

Publication number
CN103995252B
CN103995252B CN201410202062.0A CN201410202062A CN103995252B CN 103995252 B CN103995252 B CN 103995252B CN 201410202062 A CN201410202062 A CN 201410202062A CN 103995252 B CN103995252 B CN 103995252B
Authority
CN
China
Prior art keywords
mike
sound
formula
sound source
overbar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410202062.0A
Other languages
Chinese (zh)
Other versions
CN103995252A (en
Inventor
郭业才
朱赛男
张宁
黄友锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN201410202062.0A priority Critical patent/CN103995252B/en
Publication of CN103995252A publication Critical patent/CN103995252A/en
Application granted granted Critical
Publication of CN103995252B publication Critical patent/CN103995252B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements

Abstract

The present invention provides the sound source localization method of three-dimensional space of a kind of improvement, belongs to sound localization technical field.The present invention includes: set up double L-shaped microphone array;On the basis of normalization frequency domain least-square methods, by introducing penalty, spectrum energy is modified, thus estimates the impulse response of different array element adaptively and calculation delay is poor;Utilize the relation of delay inequality and the double L-shaped microphone position obtained, determine the position coordinates by sound source.For more traditional localization method, array structure of the present invention is simple, amount of calculation is few, it is effectively improved noise robustness, anti-reverberation ability and positioning precision, it is more suitable for, for indoor three dimensional sound source location, can be widely applied to the every field such as vehicle-carried hands-free telephone, video conferencing system, speech recognition system and intelligent robot.

Description

A kind of sound source localization method of three-dimensional space
Technical field
The invention belongs to sound localization technical field, especially relate to one and utilize microphone array to carry out sound source three-dimensional calmly The method of position.
Background technology
At present, sound localization based on microphone array is a major issue in acoustics signal processing field, compares In traditional Array Signal Processing, the voice signal that array microphone processes does not has carrier wave, it is possible to the range of signal of process is wide, suitable Should be able to power strong, have extensively in fields such as vehicle-carried hands-free telephone, video conferencing system, speech recognition system and intelligent robots Application.Sound localization based on microphone array mainly has localization method based on steerable beam formation, based on arriving delay inequality Localization method and based on high-resolution localization method.Wherein, localization method based on steerable beam formation is to microphone array The voice signal that row receive is filtered, weighted sum, the most directly controls mike sensing and makes wave beam have maximum work output The direction of rate, because its computation complexity height can not be used for real time processing system;Utilize based on high-resolution localization method Solve the correlation matrix between Mike's signal to make deflection, thus make sound source position further, although be applied successfully to The process of some array signals, but locating effect is the best in actual applications, is not only limited by array structure but also at signal Time unstable, amount of calculation can be multiplied;And wherein, be by estimating that sound source arrives based on the sound localization method arriving delay inequality Delay inequality between mike also estimates sound source position according to the position of mike, do not limited by array structure, amount of calculation little.
Two parts are mainly divided to complete based on the sound localization method arriving delay inequality: time delay is estimated and location.Time delay is estimated Method mainly has broad sense cross-correlation method (Generalized Cross Correlation, GCC), minimum mean square error method (Least Mean Square, LMS) and self-adaptive features value decomposition method (Adaptive Engenvalue Decomposition, AED).Wherein, GCC method performance under reverberant ambiance can decline a lot;LMS method performance basic and it Quite;AED method is by estimating that double-channel impulse response suppresses reverberation, but it requires that double-channel is relatively prime, does not contains public Zero point.But, in actual indoor environment, impulse response length is the longest, and the relatively prime probability of double-channel is the least, AED side Method is the most applicable.In order to improve relatively prime probability, AED dual pathways method of estimation is generalized to certainly by Y (Arden) Huang et al. Adapt to multichannel method of estimation (Adaptive Multichannel, AMC), propose to utilize normalization multichannel frequency domain minimum all Fang Fangfa (Normalized Multichannel Frequency-domain Least Mean Square, NMCFLMS) comes Estimate the impulse response of each array element.The basic thought of NMCFLMS method is, observation signal is divided into continuous print block signal, adopts Channel estimation in frequency domain is carried out, though it is low to have multichannel frequency domain least-square methods complexity by frequency domain normalization minimum mean-square method With the feature of Newton method fast convergence rate, but may dissipate in the presence of channel additive noise, it is impossible to effectively carry out channel Estimate.
Localization method mainly has method of least square and geometry location method, and the former calculates complexity, and sensitive to initial value;The latter's profit Determine that bi-curved functional relationship, multiple bi-curved intersection points are and determine sound source by microphone position parameter and time delay estimated value Position, calculates simple, but not exclusive closed solution easily occurs, it is difficult to effectively position.
Summary of the invention
May dissipate in the presence of channel additive noise for sound localization method in prior art, it is impossible to effectively enter Row channel is estimated and is difficult to the defect of effective localization of sound source, and the present invention provides the three dimensions sound localization side of a kind of improvement Method, utilizes the method revised normalization multichannel frequency domain least-square methods (MNMCFLMS) and limit channel impulse response length Carry out channel estimation, be effectively increased the precision of sound localization.
In order to achieve the above object, the present invention provides following technical scheme:
A kind of sound source localization method of three-dimensional space, comprises the steps:
Step A, in three-dimensional cartesian coordinate system, sets up two at grade and the L-type microphone array that is oppositely arranged Row;
Step B, uses the normalization multichannel frequency domain least-square methods revised to estimate that sound-source signal arrives each mike Delay inequality, comprise the steps:
Step B-1, sound-source signal s (n) is through the channel impulse response h of i-th mikei(n) afterwards with channel additive noise vi N () merges, obtain the reception signal x of i-th mikei(n): xi(n)=sT(n)*hi(n)+vi(n);
Wherein, T representing matrix transposition operation;N is time series, is integer;
When disregarding channel additive noise, the reception signal x of i-th mikeiThe reception letter of (n) and jth mike Number xjN the relation between () is:
x i T ( n ) h j ( n ) = x j T ( n ) h i ( n ) ;
Step B-2, uses a length of 2LhRectangular window function w (n) the reception signal x to i-th mikeiN () adds Window processes, and the n-th frame reception signal obtaining i-th mike is:
x i ( n ) 2 L h × 1 = [ x i ( nL h - L h ) , x i ( nL h - L h + 1 ) , ... , x i ( nL h + L h - 1 ) ] T ;
In formula, LhRepresent channel impulse response hiN the length of (), for integer;N=1 ..., int (N/Lh-1);N is Mike Wind receives the sequence total length of signal, for integer;int(N/Lh-1) it is to N/Lh-1 round downwards after the integer that obtains;
Step B-3, utilizes Fourier transformation that the time-domain signal obtained in step B-2 is transformed to frequency-region signal:
X i ( n ) 2 L h × 1 = F 2 L h × 2 L h x i ( n ) 2 L h × 1 ;
In formula,For 2Lh×2LhDimension Fourier transform matrix;
Step B-4, introducing penalty, to spectrum energy correction, estimates channel frequency domain response adaptively
As channel additive noise viN, in the presence of (), error of frequency domain function is defined as
e ‾ i j ( n ) = W L h × 2 L h 01 D i ( n ) W 2 L h × L h 10 h ‾ ^ j ( n ) - W L h × 2 L h 01 D j ( n ) W 2 L h × L h 10 h ‾ ^ i ( n )
In formula
D i = d i a g ( F 2 L h × 2 L h ( x i ( n ) 2 L h × 1 ) ) ;
W L h × 2 L h 01 = F L h × L h 0 L h × L h I L h × L h F 2 L h × 2 L h - 1 ;
W 2 L h × L h 10 = F 2 L h × 2 L h I L h × L h 0 L h × L h T F L h × L h - 1 ;
In formula, i ≠ j;I, j=1,2 ..., M,It is respectively Lh×LhDimension Fourier transform matrix and Fourier become Change inverse matrix;Diag represents diagonal matrix;RepresentDimension unit matrix;Represent Lh×LhDimension null matrix;
Utilize penalty JpN () passes through Lagrange multiplier β (n) the cost function J to NMCFLMSfN () is modified, The cost function obtaining revising is
Jmod(n)=Jf(n)+β(n){-Jp(n)}
In formula
β ( n ) = | ▿ J p H ( n ) ▿ J f ( n ) | || ▿ J p ( n ) || 2
In formula, the cost function gradient revised when the value of Lagrange multiplier β (n) is by stable state is equal to zero, i.e. Jf(n)= β(n)▽JpN obtaining time (), H represents conjugate transpose;
By the cost function J revisedmodN (), obtains channel frequency domain responseMore new formula be
h ^ ‾ i 10 ( n + 1 ) = h ^ ‾ i 10 ( n ) - ▿ J f 01 ( n ) + β ^ ( n ) ▿ J p 10 ( n )
Wherein,
▿ J p 01 ( n ) = F 2 L h × 2 L h 0 L h × L h I L h × L h T K ( n ) F 2 L h × 2 L h 0 L h × L h I L h × L h T h ^ ‾ ( n )
β ^ ( n ) = | ▿ J p 10 ( n ) H ▿ J f 01 ( n ) | || ▿ J p 10 ( n ) || 2
▿ J f 01 ( n ) = μ ( P i ( n ) + δI 2 L h × 2 L h ) - 1 × Σ k = 1 M D k * ( n ) e ‾ k i 01 ( n )
In above formula,
P i ( n ) = λP i ( n - 1 ) + ( 1 - λ ) × Σ k = 1 , k ≠ i M D k * ( n ) D k ( n )
e ‾ k i 01 ( n ) = F 2 L h × 2 L h 0 L h × L h I L h × L h T F L h × L h - 1 e ‾ k i ( n )
In formula, K (n) is that diagonal entry isDiagonal matrix;PiN () is the frequency spectrum of multi-channel output signal Energy, can obtain more stable spectrum energy by forgetting factor λ,Parameter δ is normal number, its energy The noise scale-up problem that effectively solution spectrum energy causes time less;
Step B-5, to channel frequency domain responseCarrying out Fourier inversion, obtaining channel impulse response estimation is
h ^ i ( n ) = I L h × L h 0 L h × L h F 2 L h × 2 L h - 1 h ^ ‾ i 10 ( n )
In formula,For 2Lh×2LhDimension Fourier transformation inverse matrix;
Step B-6, in adaptive process, the channel impulse response of estimationThe peak value correspondence time delay of middle appearance is exactly The reception direct sound wave signal time delay of i-th mike, then sound-source signal arrives between i-th mike and jth mike Delay inequality is
τ ^ i j = ( m a x l = 1 L h | h ^ i l ( n ) | - m a x l = 1 L h | h ^ j l ( n ) | ) / f s
In formula, fsFor the sample frequency of signal, max represents and takes maximum;
Step C, the delay inequality described in step B being multiplied with the velocity of sound obtains sound-source signal and arrives the range difference of each mike, And according to the position relationship of each mike, determine the position of described sound source.
Further, described JpN () is the penalty under the conditions of constraint, constraints is
max J P ( n ) = Σ i = 1 M Σ j = 0 L - 1 ln ( | h ‾ ^ i j ( n ) | 2 ) s . t . Σ i = 1 M Σ j = 0 L n - 1 | h ‾ ^ i j ( n ) | 2 = 1 ML h
In formula, in penalty JpWhen () maximizes n, constraintsSet up;S.t. represent about Bundle condition.
Further, length L of channel impulse response in described step B-2hRestrictive condition is:
L h ≤ 2 f s m a x i , j = 1 M { τ i j , m a x }
In formula, τij,max=dij/ c is that between i-th and jth mike, maximum delay is poor, dijFor i-th and jth Distance between mike, c is the spread speed of sound source.
Further, in described step A two L-type arrays to lay rule as follows: first mike mic1 is placed in X-axis On, its coordinate is (d, 0,0);Second mike mic2 is overlapping with zero, and its coordinate is (0,0,0);3rd Mike Wind mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is the spacing of two neighboring microphones, for real number, first, second and third wheat Gram wind constitutes first L-type microphone array;Meanwhile, with x=D/2 as axis of symmetry, set up and first L-type microphone array Second relative L-type microphone array, wherein, the coordinate of the 4th mike mic4 is (D, 0,0), the 5th mike The coordinate of mic5 is (D-d, 0,0), and the coordinate of the 6th mike mic6 is (D, d, 0);D is two relative L-type mikes Distance between array.
Further, in described step C, determine that the step of sound source position is as follows:
Step C-1, position is that (x, y, sound-source signal z) arrives i-th mike and the delay inequality of jth mike With i-th mike and jth microphone space from dijRelation beRelation in detail has following two groups:
First group:With
Second group:With
In formula,For the delay inequality between sound-source signal to second mike and first mike;Believe for sound source Number to second mike and the delay inequality of the 3rd mike,For sound-source signal to the 4th mike and the 5th Mike The delay inequality of wind,For sound-source signal to the 4th mike and the delay inequality of the 6th mike;
Step C-2, according to first group described in step C-1 and second group of relation, obtains sound source projection in xoy plane and sits It is designated as
x ^ = b 2 - b 1 a 1 - a 2
y ^ = a 1 b 2 - a 2 b 1 a 1 - a 2
z ^ = 0
In formula,
Step C-3, substitutes into described in step C-1 respectively by the projection coordinate in xoy plane of the sound source described in step C-2 First group, with four relational expressions of second group, obtains four z-axis coordinate estimated values of sound sourceTake this four z The meansigma methods of coordinate is estimated as the z coordinate of sound sourceFor:
z ^ = z ^ 1 + z ^ 2 + z ^ 3 + z ^ 4 4 .
Compared with prior art, the invention have the advantages that and beneficial effect:
Spectrum energy, on the basis of normalization frequency domain least-square methods, is repaiied by the present invention by introducing penalty Just, the impulse response the calculation delay that thus estimate different array element adaptively are poor, it is to avoid channel estimates occur deteriorating;Utilize In plane, two crossing straight lines uniquely determine sound source projected position planar, it is to avoid going out of not exclusive closed solution problem Existing;Additionally define channel impulse response length, enhance the relatively prime property of double-channel, improve anti-reverberation ability.More traditional For localization method, array structure of the present invention is simple, amount of calculation is few, be effectively improved noise robustness, anti-reverberation ability and Positioning precision, is more suitable for for indoor three dimensional sound source location, can be widely applied to vehicle-carried hands-free telephone, video conferencing system, The every field such as speech recognition system and intelligent robot.
Accompanying drawing explanation
The flow chart of steps of the sound source localization method of three-dimensional space that Fig. 1 provides for the present invention;
Fig. 2 is double L-shaped microphone array setting principle figure;
Fig. 3 is the normalization multichannel frequency domain least-square methods schematic diagram revised;
Fig. 4 be in embodiment microphone array be listed in room put schematic diagram;
Fig. 5 is normalized projection error convergence curve figure.
Detailed description of the invention
The technical scheme provided the present invention below with reference to specific embodiment is described in detail, it should be understood that following specifically Embodiment is merely to illustrate the present invention rather than limits the scope of the present invention.
First normalization multichannel frequency domain least-square methods NMCFLMS is improved by the present invention, increases its performance;With Time limit channel impulse response length, strengthen the relatively prime property of double-channel;And utilize the NMCFLMS of correction to carry out channel estimation.Specifically Ground is said, flow chart is as it is shown in figure 1, comprise the steps:
Step A, sets up three-dimensional coordinate system, at grade, 6 mikes is put into two relative L-type arrays, Their spacing is D, as shown in Figure 2: first mike mic1 is placed in X-axis, and its coordinate is (d, 0,0);Second wheat Gram wind mic2 is overlapping with zero, and its coordinate is (0,0,0);3rd mike mic3 is placed in Y-axis, its coordinate be (0, d, 0), d is the spacing of two neighboring microphones, and for real number, first, second and third mike constitutes first L-type microphone array Row.Meanwhile, with x=D/2 as axis of symmetry, set up second the L-type microphone array relative with first L-type microphone array, Wherein, the coordinate of the 4th mike mic4 is (D, 0,0), and the coordinate of the 5th mike mic5 is (D-d, 0,0), the 6th The coordinate of mike mic6 is (D, d, 0);D is the distance between two relative L-type microphone arrays.The coordinate of sound source (x, y, Z) projection coordinate in xoy plane is (x, y, 0), and the distance between projection coordinate's point and initial point is γ, and both lines and x Axle forward angle is θ.
Step B, uses normalization multichannel frequency domain least-square methods MNMCFLMS revised, and estimates that sound-source signal arrives The delay inequality of each mike, its schematic diagram as shown in Figure 3:
Step B-1, sound-source signal s (n), the channel impulse response h of i-th mikei(n), channel additive noise vi(n) And the reception signal x of i-th mikeiN the relation between () is
xi(n)=sT(n)*hi(n)+vi(n) (1)
In formula, T representing matrix transposition operates;N is time series, is integer.
When disregarding channel additive noise, the reception signal x of i-th mikeiThe reception letter of (n) and jth mike Number xjN the relation between () is
x i T ( n ) h j ( n ) = x j T ( n ) h i ( n ) - - - ( 2 )
In formula, i, j=1,2 ..., M, M are the number of mike, for positive integer;
Step B-2, carries out frame and moves as L the reception signal of i-th mikehWindowing process: use a length of 2LhRectangle Window function w (n) is multiplied by the reception signal of i-th mike, and the n-th frame reception signal obtaining i-th mike is:
x i ( n ) 2 L h × 1 = [ x i ( nL h - L h ) , x i ( nL h - L h + 1 ) , ... , x i ( nL h + L h - 1 ) ] T - - - ( 3 )
In formula, LhRepresent channel impulse response hiLength, for integer;N=1 ..., int (N/Lh-1);N is that mike connects Receive the sequence total length of signal, for integer;int(N/Lh-1) it is to N/Lh-1 round downwards after the integer that obtains;
In actual indoor environment, in order to overcome, channel impulse response length is the longest causes the relatively prime probability of double-channel the least Deficiency, guarantee double-channel relatively prime stronger time, can be further to channel impulse response length LhLimit, restrictive condition For:
L h ≤ 2 f s m a x i , j = 1 M { τ ^ i j , m a x } - - - ( 4 )
In formula,Poor for maximum delay between i-th and jth mike.Ring by limiting channel impulse Answer length, effectively strengthen the relatively prime property of double-channel.
Step B-3, obtains frequency-region signal to formula (3) as Fourier transformation
X i ( n ) 2 L h × 1 = F 2 L h × 2 L h x i ( n ) 2 L h × 1 - - - ( 5 )
In formula,For 2Lh×2LhDimension Fourier transform matrix.
Step B-4, introducing penalty, to spectrum energy correction, estimates adaptively and estimates channel frequency adaptively Domain response
As channel additive noise viN, in the presence of (), error of frequency domain function is defined as
e ‾ i j ( n ) = W L h × 2 L h 01 D i ( n ) W 2 L h × L h 10 h ‾ ^ j ( n ) - W L h × 2 L h 01 D j ( n ) W 2 L h × L h 10 h ‾ ^ i ( n ) - - - ( 6 )
In formula
D i = d i a g ( F 2 L h × 2 L h ( x i ( n ) 2 L h × 1 ) ) ;
W L h × 2 L h 01 = F L h × L h 0 L h × L h I L h × L h F 2 L h × 2 L h - 1 ;
W 2 L h × L h 10 = F 2 L h × 2 L h I L h × L h 0 L h × L h T F L h × L h - 1 ;
In formula, i ≠ j;I, j=1,2 ..., M,It is respectively Lh×LhDimension Fourier transform matrix and Fourier become Change inverse matrix;Diag represents diagonal matrix;Represent Lh×LhDimension unit matrix;Represent Lh×LhDimension null matrix.
Utilize normalization multichannel frequency domain least-square methods (Normalized Multichannel Frequency- Domain Least Mean Square, NMCFLMS), its cost function is
J f ( n ) = Σ i = 1 M - 1 Σ j = i + 1 M e ‾ i j H ( n ) e ‾ i j ( n ) - - - ( 7 )
In the presence of channel additive noise, utilize formula (7) to carry out channel frequency domain response and estimate, may dissipate, make Obtain channel to estimate to lose efficacy.For making NMCFLMS method in the presence of channel additive noise, there is good channel estimating performance, permissible Acoustical signal spectrum energy is utilized to have equally distributed characteristic.In order to make sound spectrum be uniformly distributed, mean value theorem pair is utilized to rush with channel The frequency band energy swashing response corresponding retrains, and constraints is
max J P ( n ) = Σ i = 1 M Σ j = 0 L - 1 ln ( | h ‾ ^ i j ( n ) | 2 ) s . t . Σ i = 1 M Σ j = 0 L n - 1 | h ‾ ^ i j ( n ) | 2 = 1 ML h - - - ( 8 )
In formula, JpN () is the penalty under the conditions of constraint, in penalty JpWhen () maximizes n, constraintsSet up;S.t. constraints is represented.
In order to make full use of the advantage of NMCFLMS and make spectrum energy have equally distributed characteristic, penalty Jp(n) By Lagrange multiplier β (n) the cost function J to NMCFLMSfN () is modified, obtain revising NMCFLMS (MNMCFLMS) cost function is
Jmod(n)=Jf(n)+β(n){-Jp(n)} (9)
In formula
β ( n ) = | ▿ J p H ( n ) ▿ J f ( n ) | || ▿ J p ( n ) || 2 - - - ( 10 )
In formula, the cost function gradient revised when the value of Lagrange multiplier β (n) is by stable state is equal to zero, i.e. Jf(n)= β(n)▽JpN obtaining time (), H represents conjugate transpose;
By the cost function J revisedmodN (), obtains channel frequency domain responseMore new formula be
h ^ ‾ i 10 ( n + 1 ) = h ^ ‾ i 10 ( n ) - ▿ J f 01 ( n ) + β ^ ( n ) ▿ J p 10 ( n ) - - - ( 11 )
Wherein,
▿ J p 01 ( n ) = F 2 L h × 2 L h 0 L h × L h I L h × L h T K ( n ) F 2 L h × 2 L h 0 L h × L h I L h × L h T h ‾ ^ ( n )
β ^ ( n ) = | ▿ J p 10 ( n ) H ▿ J f 01 ( n ) | || ▿ J p 10 ( n ) || 2
▿ J f 01 ( n ) = μ ( P i ( n ) + δI 2 L h × 2 L h ) - 1 × Σ k = 1 M D k * ( n ) e ‾ k i 01 ( n )
In above formula,
P i ( n ) = λP i ( n - 1 ) + ( 1 - λ ) × Σ k = 1 , k ≠ i M D k * ( n ) D k ( n )
e ‾ k i 01 ( n ) = F 2 L h × 2 L h 0 L h × L h I L h × L h T F L h × L h - 1 e ‾ k i ( n )
In formula, K (n) is that diagonal entry isDiagonal matrix;PiN () is the frequency spectrum of multi-channel output signal Energy, can obtain more stable spectrum energy by forgetting factor λ,Parameter δ is normal number, its energy The noise scale-up problem that effectively solution spectrum energy causes time less;
Revise normalization multichannel frequency domain least-square methods, it is possible to estimate the impulse response of different array element adaptively And calculation delay is poor, it is to avoid channel estimates occur deteriorating;
Step B-5, to channel frequency domain responseCarrying out Fourier inversion, obtaining channel impulse response estimation is
h ^ i ( n ) = I L h × L h 0 L h × L h F 2 L h × 2 L h - 1 h ^ ‾ i 10 ( n ) - - - ( 12 )
In formula,For 2Lh×2LhDimension Fourier transformation inverse matrix;
Step B-6, according to the channel impulse response estimatedCarry out time delay estimation.In adaptive process,One peak value of middle appearance, peak value correspondence time delay is exactly the reception direct sound wave signal time delay of i-th mike, then sound source letter Number delay inequality arrived between i-th mike and jth mike is
τ ^ i j = ( m a x l = 1 L h | h ^ i l ( n ) | - m a x l = 1 L h | h ^ j l ( n ) | ) / f s - - - ( 13 )
In formula, fsFor the sample frequency of signal, max represents and takes maximum;
Step C, the delay inequality described in step B being multiplied with the velocity of sound obtains sound-source signal and arrives the range difference of each mike, And according to the position relationship of each mike, determine the position of described sound source.
Use dijRepresenting the distance between i-th and jth mike, c is the spread speed of sound source.When the position of sound source is sat It is designated as that (x, y, time z), sound-source signal is to delay inequality τ between i-th mike and jth mikeijIt is a constant, at this moment
d i j = c . τ ^ i j - - - ( 14 )
Formula (14) includes following concrete equation:
First group:
x 2 + y 2 + z 2 - x 2 + ( y - d ) 2 + z 2 = c . τ ^ 23 - - - ( 16 )
Second group:
( x - D ) 2 + y 2 + z 2 - x 2 + ( y - d ) 2 + z 2 = c . τ ^ 46 - - - ( 18 )
In formula,For sound-source signal to second mike and the delay inequality of first mike,Arrive for sound-source signal Second mike and the delay inequality of the 3rd mike,For sound-source signal to the 4th mike and the 5th mike Delay inequality,For sound-source signal to the 4th mike and the delay inequality of the 6th mike.
By formula (15) and formula (16),
Y=a1x+b1 (19)
In formula
a 1 = τ ^ 23 / τ ^ 21 - - - ( 20 )
b 1 = ( c 2 τ ^ 21 τ ^ 23 + d 2 ) ( τ ^ 21 - τ ^ 23 ) / ( 2 d τ ^ 21 ) - - - ( 21 )
By formula (17) and (18),
Y=a2x+b2 (22)
In formula
a 2 = - τ ^ 46 / τ ^ 45 - - - ( 23 )
b 2 = ( ( c 2 τ ^ 45 τ ^ 46 + d 2 ) ( τ ^ 45 - τ ^ 46 ) + 2 d D τ ^ 45 ) / ( 2 d τ ^ 45 ) - - - ( 24 )
By formula (19) and (22), obtaining sound source in projection coordinate's estimated value of xoy plane is
x ^ = b 2 - b 1 a 1 - a 2 - - - ( 25 )
y ^ = a 1 b 2 - a 2 b 1 a 1 - a 2 - - - ( 26 )
z ^ = 0 - - - ( 27 )
Formula (25)-formula (26) is substituted into formula (15)-formula (18) respectively, obtains four estimated values of sound source Z coordinate, remember respectively ForEstimate, i.e. as the accurate of sound source Z coordinate by the meansigma methods of these four estimated values
z ^ = z ^ 1 + z ^ 2 + z ^ 3 + z ^ 4 4 - - - ( 28 )
The value obtained with (28) by formula (25), (26), it is simply that the coordinate figure of sound localization.
Embodiment one:
In order to verify the performance of the inventive method, we carry out performance detection for the present invention.In the detection, Fig. 4 is room Simulation drawing put by interior mike.The acoustic enviroment of common conference room, room-sized is 6m × 5m × 3.5m.Microphone array frame Being located on the position of the most about 0.5m, 6 mikes are arranged in double L-shaped microphone array as shown in Figure 4, mike from The distance on wall and ground is respectively 0.5m, 0.5m, and the distance between adjacent two mikes is 1m, L-type microphone array it Between distance be 5m.In view of mike from wall apart from little, reverberation is big, actual application value is little, therefore, the moon in Fig. 4 Shadow part is not considered, and sound source is positioned over any position within two L-type mikes.Sound-source signal is one section of prior recording Male voice read aloud, its sample frequency is 25kHz.The position laid according to mike, length L of channel impulse responseh=370, Parameter δ=0.1556 × 10 of MNMCFLMS-5, μ=0.5.
Fig. 5 is SNR=20dB, mixing time delay RT60During=100ms, normalized projection error (Normalized Projection Misalignment, NPM) convergence curve, projection error NPM is
N P M ( n ) = 20 log 10 ( || h ( n ) - h ( n ) T h ^ ( n ) h ^ ( n ) T h ^ ( n ) h ^ ( n ) || / || h ( n ) || ) - - - ( 29 )
In formula, h (n) is the true impulse response of channel,The impulse response obtained is estimated for channel self-adapting.
Fig. 5 shows, under having channel additive noise environment, NMCFLMS method just dissipates when iteration about 450 times.And The inventive method MNMCFLMS with regard to stable convergence, is effectively prevented from the channel that channel additive noise causes when iteration about 450 times Estimate to deteriorate.
Under different reverberant ambiance when SNR=20dB, use the inventive method MNMCFLMS and classical phse conversion Weighting broad sense cross-correlation function method (Generalized Cross Correlation-phase Transform, GCC-PHAT) Carrying out time delay and estimate performance comparison, result is as shown in table 1 below:
Under the low reverberant ambiance of table 1, the inventive method contrasts with the performance indications of GCC-PHAT method
Table 1 uses non-abnormity point percentage ratio (Percentage of Non-abnormal Point, PNP) and root-mean-square Error (Root Mean Square Error, RMSE) is as the index of measure algorithm performance.Wherein, PNP and RMSE is fixed respectively Justice is
P N P = ( 1 - 1 N Σ i = 1 N T P ( τ i - τ 0 ) ) × 100 % - - - ( 30 )
R M S E = 1 N Σ i = 1 N ( τ i - τ 0 ) 2 - - - ( 31 )
In formula
T P ( x ) = { 0 , | x | ≤ 2 / f s 1 , | x | > 2 / f s - - - ( 32 )
In formula, N represents time delay appreciable amt, τ0The actual value that time delay is estimated.Here, more than 2 will be differed with actual value The time delay estimated value of sample point is as an abnormity point, and non-abnormal percentage ratio is the highest, and method performance is the best.Table 1 shows, through too much Secondary experiment, under low reverberant ambiance, the inventive method is suitable with GCC-PHAT method performance;When reverberation is more than 250ms, GCC- The non-dissimilarity percentage ratio of PHAT method significantly declines, root-mean-square error is much larger than sampling precision, it is impossible to effectively carries out time delay and estimates Meter;But, the performance of the inventive method is uninfluenced, in the environment of reverberation is less than 700ms, remains to estimation time delay effectively Value.
Under room model, real sound source is tested, detect the performance of sound localization method of the present invention.Indoor Ambient parameter is as follows: SNR=20dB, RT60=150ms.The result of sound localization, uses absolute errorMake For the judgment criteria of positioning performance, and For the coordinate estimated value of sound source position, right The present invention carries out many experiments checking, and result is as shown in table 2 below:
The test result of table 2 the inventive method
Knowable to upper table 2, the absolute error Stable distritation of this method orientation distance γ within 10cm, horizontal angle θ exhausted To error control within 1 °, highly absolute error Stable distritation is within 10cm.
Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned embodiment, also includes The technical scheme being made up of above technical characteristic combination in any.It should be pointed out that, for those skilled in the art For, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (5)

1. a sound source localization method of three-dimensional space, it is characterised in that comprise the steps:
Step A, in three-dimensional cartesian coordinate system, sets up two at grade and the L-type microphone array that is oppositely arranged;
Step B, uses normalization multichannel frequency domain least-square methods NMCFLMS revised to estimate that sound-source signal arrives each Mike The delay inequality of wind, comprises the steps:
Step B-1, sound-source signal s (n) is through the channel impulse response h of i-th mikei(n) afterwards with channel additive noise vi(n) Merge, obtain the reception signal x of i-th mikei(n): xi(n)=sT(n)*hi(n)+vi(n);
Wherein, T representing matrix transposition operation;N is time series, is integer;
When disregarding channel additive noise, the reception signal x of i-th mikeiThe reception signal x of (n) and jth mikej N the relation between () is:
x i T ( n ) h j ( n ) = x j T ( n ) h i ( n ) ;
Step B-2, uses a length of 2LhRectangular window function w (n) the reception signal x to i-th mikeiN () is carried out at windowing Reason, the n-th frame reception signal obtaining i-th mike is:
x i ( n ) 2 L h × 1 = [ x i ( nL h - L h ) , x i ( nL h - L h + 1 ) , ... , x i ( nL h + L h - 1 ) ] T ;
In formula, LhRepresent channel impulse response hiN the length of (), for integer;N=1 ..., int (N/Lh-1);N is that mike connects Receive the sequence total length of signal, for integer;int(N/Lh-1) it is to N/Lh-1 round downwards after the integer that obtains;
Step B-3, utilizes Fourier transformation that the time-domain signal obtained in step B-2 is transformed to frequency-region signal:
X i ( n ) 2 L h × 1 = F 2 L h × 2 L h x i ( n ) 2 L h × 1 ;
In formula,For 2Lh×2LhDimension Fourier transform matrix;
Step B-4, introducing penalty, to spectrum energy correction, estimates channel frequency domain response adaptively
As channel additive noise viN, in the presence of (), error of frequency domain function is defined as
e ‾ i j ( n ) = W L h × 2 L h 01 D i ( n ) W 2 L h × L h 10 h ^ ‾ j ( n ) - W L h × 2 L h 01 D j ( n ) W 2 L h × L h 10 h ^ ‾ i ( n )
In formula
D i = d i a g ( F 2 L h × 2 L h ( x i ( n ) 2 L h × 1 ) ) ;
W L h × 2 L h 01 = F L h × L h 0 L h × L h I L h × L h F 2 L h × 2 L h - 1 ;
W 2 L h × L h 10 = F 2 L h × 2 L h I L h × L h 0 L h × L h T F L h × L h - 1 ;
In formula, i ≠ j;I, j=1,2 ..., M,It is respectively Lh×LhDimension Fourier transform matrix and Fourier transformation are inverse Matrix;Diag represents diagonal matrix;Represent Lh×LhDimension unit matrix;Represent Lh×LhDimension null matrix;
Utilize penalty JpN () passes through Lagrange multiplier β (n) the cost function J to NMCFLMSfN () is modified, obtain The cost function revised is
Jmod(n)=Jf(n)+β(n){-Jp(n)}
In formula
β ( n ) = | ▿ J p H ( n ) ▿ J f ( n ) | | | ▿ J p ( n ) | | 2
In formula, the cost function gradient revised when the value of Lagrange multiplier β (n) is by stable state is equal to zero, i.e. Jf(n)=β (n) ▽JpN obtaining time (), H represents conjugate transpose;
By the cost function J revisedmodN (), obtains channel frequency domain responseMore new formula be
h ^ ‾ i 10 ( n + 1 ) = h ^ ‾ i 10 ( n ) - ▿ J f 01 ( n ) + β ^ ( n ) ▿ J p 10 ( n )
Wherein,
▿ J p 01 ( n ) = F 2 L h × 2 L h 0 L h × L h I L h × L h T K ( n ) F 2 L h × 2 L h 0 L h × L h I L h × L h T h ^ ‾ ( n )
β ^ ( n ) = | ▿ J p 10 ( n ) H ▿ J f 01 ( n ) | | | ▿ J p 10 ( n ) | | 2
▿ J f 01 ( n ) = μ ( P i ( n ) + δI 2 L h × 2 L h ) - 1 × Σ k = 1 M D k * ( n ) e ‾ k i 01 ( n )
In above formula,
P i ( n ) = λP i ( n - 1 ) + ( 1 - λ ) × Σ k = 1 , k ≠ i M D k * ( n ) D k ( n )
e ‾ k i 01 ( n ) = F 2 L h × 2 L h 0 L h × L h I L h × L h T F L h × L h - 1 e ‾ k i ( n )
In formula, K (n) is that diagonal entry isDiagonal matrix;PiN () is the spectrum energy of multi-channel output signal, Forgetting factorParameter δ is normal number;
Step B-5, to channel frequency domain responseCarrying out Fourier inversion, obtaining channel impulse response estimation is
h ^ i ( n ) = I L h × L h 0 L h × L h F 2 L h × 2 L h - 1 h ^ ‾ i 10 ( n )
In formula,For 2Lh×2LhDimension Fourier transformation inverse matrix;
Step B-6, in adaptive process, the channel impulse response of estimationThe peak value correspondence time delay of middle appearance is i-th The reception direct sound wave signal time delay of mike, then sound-source signal arrives the delay inequality between i-th mike and jth mike For
τ ^ i j = ( m a x l = 1 L h | h ^ i l ( n ) | - m a x l = 1 L h | h ^ j l ( n ) | ) / f s
In formula, fsFor the sample frequency of signal, max represents and takes maximum;
Step C, the delay inequality described in step B being multiplied with the velocity of sound obtains sound-source signal and arrives the range difference of each mike, and root According to the position relationship of each mike, determine the position of described sound source.
Sound source localization method of three-dimensional space the most according to claim 1, it is characterised in that described JpN () is for constraint under the conditions of Penalty, constraints is:
max J P ( n ) = Σ i = 1 M Σ j = 0 L - 1 ln ( | h ‾ ^ i j ( n ) | 2 ) s . t . Σ i = 1 M Σ j = 0 L h - 1 | h ‾ ^ i j ( n ) | 2 = 1 ML h
In formula, in penalty JpWhen () maximizes n, constraintsSet up;S.t. constraint bar is represented Part.
Sound source localization method of three-dimensional space the most according to claim 1 and 2, it is characterised in that channel in described step B-2 Length L of impulse responsehRestrictive condition is:
L h ≤ 2 f s m a x i , j = 1 M { τ i j , m a x }
In formula, τij,max=dij/ c is that between i-th and jth mike, maximum delay is poor, dijFor i-th and jth mike Between distance, c is the spread speed of sound source.
Sound source localization method of three-dimensional space the most according to claim 1, it is characterised in that two L-type battle arrays in described step A It is as follows that row lay rule: first mike mic1 is placed in X-axis, and its coordinate is (d, 0,0);Second mike mic2 and seat Mark initial point is overlapping, and its coordinate is (0,0,0);3rd mike mic3 is placed in Y-axis, and its coordinate is (0, d, 0), and d is two phases The spacing of adjacent mike, for real number, first, second and third mike constitutes first L-type microphone array;Meanwhile, with x =D/2 is axis of symmetry, sets up second the L-type microphone array relative with first L-type microphone array, wherein, the 4th The coordinate of mike mic4 is (D, 0,0), and the coordinate of the 5th mike mic5 is (D-d, 0,0), the 6th mike mic6 Coordinate be (D, d, 0);D is the distance between two relative L-type microphone arrays.
5. according to the sound source localization method of three-dimensional space described in claim 1 or 4, it is characterised in that in described step C, determine The step of sound source position is as follows:
Step C-1, position is that (x, y, sound-source signal z) arrives i-th mike and the delay inequality of jth mikeWith I mike and jth microphone space are from dijRelation beRelation in detail has following two groups:
First group:With
x 2 + y 2 + z 2 = 1 2 c τ ^ 23 ( c 2 τ ^ 23 2 + 2 d y - d 2 ) ;
Second group:With
( x - D ) 2 + y 2 + z 2 = - 1 2 c τ ^ 46 ( c 2 τ ^ 46 2 + 2 D x + D 2 - 2 d y - d 2 ) ;
In formula,For the delay inequality between sound-source signal to second mike and first mike;Arrive for sound-source signal Second mike and the delay inequality of the 3rd mike,For sound-source signal to the 4th mike and the 5th mike Delay inequality,For sound-source signal to the 4th mike and the delay inequality of the 6th mike;
Step C-2, according to first group described in step C-1 and second group of relation, obtaining sound source projection coordinate in xoy plane is
x ^ = b 2 - b 1 a 1 - a 2
y ^ = a 1 b 2 - a 2 b 1 a 1 - a 2
z ^ = 0
In formula, b 2 = ( ( b 2 τ ^ 45 τ ^ 46 + d 2 ) ( τ ^ 45 - τ ^ 46 ) + 2 dD τ ^ 45 ) / ( 2 d τ ^ 45 ) ;
Step C-3, substitutes into first described in step C-1 respectively by the projection coordinate in xoy plane of the sound source described in step C-2 Group, with four relational expressions of second group, obtains four z-axis coordinate estimated values of sound sourceTake these four z coordinate Meansigma methods obtains the z coordinate of sound source and estimatesFor:
z ^ = z ^ 1 + z ^ 2 + z ^ 3 + z ^ 4 4 .
CN201410202062.0A 2014-05-13 2014-05-13 A kind of sound source localization method of three-dimensional space Expired - Fee Related CN103995252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410202062.0A CN103995252B (en) 2014-05-13 2014-05-13 A kind of sound source localization method of three-dimensional space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410202062.0A CN103995252B (en) 2014-05-13 2014-05-13 A kind of sound source localization method of three-dimensional space

Publications (2)

Publication Number Publication Date
CN103995252A CN103995252A (en) 2014-08-20
CN103995252B true CN103995252B (en) 2016-08-24

Family

ID=51309468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410202062.0A Expired - Fee Related CN103995252B (en) 2014-05-13 2014-05-13 A kind of sound source localization method of three-dimensional space

Country Status (1)

Country Link
CN (1) CN103995252B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104076331B (en) * 2014-06-18 2016-04-13 南京信息工程大学 A kind of sound localization method of seven yuan of microphone arrays
CN104535965A (en) * 2014-12-29 2015-04-22 江苏科技大学 Parallelized sound source positioning system based on embedded GPU system and method
CN104977564B (en) * 2015-07-09 2018-05-25 百度在线网络技术(北京)有限公司 The microphone array of home-use intelligent robot based on artificial intelligence
CN106950542A (en) * 2016-01-06 2017-07-14 中兴通讯股份有限公司 The localization method of sound source, apparatus and system
CN105681972B (en) * 2016-01-14 2018-05-01 南京信息工程大学 The constant Beamforming Method of sane frequency that linear constraint minimal variance diagonally loads
CN107340498A (en) * 2016-05-03 2017-11-10 深圳光启合众科技有限公司 The determination method and apparatus of robot and sound source position
CN106842111B (en) * 2016-12-28 2019-03-29 西北工业大学 Indoor sound localization method based on microphone mirror image
CN106886010B (en) * 2017-01-17 2019-07-30 南京航空航天大学 A kind of sound bearing recognition methods based on mini microphone array
CN107144820A (en) * 2017-06-21 2017-09-08 歌尔股份有限公司 Sound localization method and device
US10491995B1 (en) 2018-10-11 2019-11-26 Cisco Technology, Inc. Directional audio pickup in collaboration endpoints
CN110068797B (en) * 2019-04-23 2021-02-02 浙江大华技术股份有限公司 Method for calibrating microphone array, sound source positioning method and related equipment
CN110161454B (en) * 2019-06-14 2020-11-13 哈尔滨工业大学 Signal frequency and two-dimensional DOA joint estimation method based on double L-shaped arrays
US11076251B2 (en) 2019-11-01 2021-07-27 Cisco Technology, Inc. Audio signal processing based on microphone arrangement
CN111157951B (en) * 2020-01-13 2022-02-25 东北大学秦皇岛分校 Three-dimensional sound source positioning method based on differential microphone array
CN111856400B (en) * 2020-07-29 2021-04-09 中北大学 Underwater target sound source positioning method and system
CN116299147B (en) * 2023-03-13 2023-11-28 中国科学院声学研究所 One-dimensional structure internal sound source positioning method based on acoustic coherence technology

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866385A (en) * 2012-09-10 2013-01-09 上海大学 Multi-sound-source locating method based on spherical microphone array

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002131072A (en) * 2000-10-27 2002-05-09 Yamaha Motor Co Ltd Position guide system, position guide simulation system, navigation system and position guide method
JP5435716B2 (en) * 2009-09-14 2014-03-05 国立大学法人 東京大学 Sound source direction detecting device and sound source direction detecting method
CN102033223B (en) * 2010-12-29 2012-10-03 北京信息科技大学 Method for positioning sound source by using microphone array
CN102707262A (en) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 Sound localization system based on microphone array
CN103076593B (en) * 2012-12-28 2014-09-10 中国科学院声学研究所 Sound source localization method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102866385A (en) * 2012-09-10 2013-01-09 上海大学 Multi-sound-source locating method based on spherical microphone array

Also Published As

Publication number Publication date
CN103995252A (en) 2014-08-20

Similar Documents

Publication Publication Date Title
CN103995252B (en) A kind of sound source localization method of three-dimensional space
CN104142492B (en) A kind of SRP PHAT multi-source space-location methods
CN103439688B (en) Sound source positioning system and method used for distributed microphone arrays
CN109490822B (en) Voice DOA estimation method based on ResNet
CN105388459B (en) The robust sound source space-location method of distributed microphone array network
CN107167770B (en) A kind of microphone array sound source locating device under the conditions of reverberation
CN105204001A (en) Sound source positioning method and system
CN105068048A (en) Distributed microphone array sound source positioning method based on space sparsity
CN109669160B (en) Method for detecting underwater transient acoustic signal
Niwa et al. Post-filter design for speech enhancement in various noisy environments
CN103792513B (en) A kind of thunder navigation system and method
CN103760520B (en) A kind of single language person sound source DOA method of estimation based on AVS and rarefaction representation
CN109541548A (en) A kind of air sonar localization method based on Matched Field
CN106093920B (en) It is a kind of based on the adaptive beam-forming algorithm diagonally loaded
Huang et al. Time delay estimation and source localization
Niwa et al. PSD estimation in beamspace using property of M-matrix
CN109254265A (en) A kind of whistle vehicle positioning method based on microphone array
Svaizer et al. Environment aware estimation of the orientation of acoustic sources using a line array
Zohourian et al. Multi-channel speaker localization and separation using a model-based GSC and an inertial measurement unit
CN101645701B (en) Time delay estimation method based on filter bank and system thereof
Li et al. Robust acoustic source localization with TDOA based RANSAC algorithm
Sugiyama et al. Simultaneous calibration of positions, orientations, and time offsets, among multiple microphone arrays
Hu et al. Acoustic SLAM With Moving Sound Event Based on Auxiliary Microphone Arrays
CN114355292B (en) Wireless earphone and microphone positioning method thereof
CN111157949A (en) Voice recognition and sound source positioning method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160824

Termination date: 20190513