CN105609113A - Bispectrum weighted spatial correlation matrix-based speech sound source localization method - Google Patents
Bispectrum weighted spatial correlation matrix-based speech sound source localization method Download PDFInfo
- Publication number
- CN105609113A CN105609113A CN201510937548.3A CN201510937548A CN105609113A CN 105609113 A CN105609113 A CN 105609113A CN 201510937548 A CN201510937548 A CN 201510937548A CN 105609113 A CN105609113 A CN 105609113A
- Authority
- CN
- China
- Prior art keywords
- spectrum
- bispectrum
- signal
- microphone
- road
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000004807 localization Effects 0.000 title claims abstract description 27
- 239000011159 matrix material Substances 0.000 title claims abstract description 14
- 238000001228 spectrum Methods 0.000 claims description 123
- 238000009499 grossing Methods 0.000 claims description 17
- 230000003595 spectral effect Effects 0.000 claims description 12
- 238000012805 post-processing Methods 0.000 claims description 7
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 claims description 6
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 2
- 238000009432 framing Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 22
- 238000004364 calculation method Methods 0.000 description 10
- 238000013461 design Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011109 contamination Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012421 spiking Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S3/00—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
- G01S3/80—Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
- G01S3/802—Systems for determining direction or deviation from predetermined direction
- G01S3/808—Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
- G01S3/8083—Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems determining direction of source
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
- G01S5/20—Position of source determined by a plurality of spaced direction-finders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention relates to a bispectrum weighted spatial correlation matrix-based speech sound source localization method. The objective of the invention is to solve problems in microphone array-based robust sound source localization in an actual complex noise environment. According to the method, the special mathematical properties of speech signals and noise signals received by a microphone array on a bispectrum domain are utilized. The method includes the following steps that: framing and bispectrum estimation are performed on signals acquired by the microphone array; on the bispectrum domain, the bispectrum phase difference of each microphone and a reference microphone is calculated; the signals of the reference microphone are adopted to estimate bispectrum unit weight; a bispectrum weighted spatial correlation matrix corresponding to a candidate direction is calculated according to the bispectrum phase difference and the bispectrum unit weight; a sound source direction cost function for a current candidate direction is calculated based on the eigenvalue of the bispectrum weighted spatial correlation matrix; and the direction of a speech sound source is estimated according to a direction corresponding to the maximum value of the sound source direction cost function.
Description
Technical field
The present invention relates to the noise robustness voice sound localization method design based on microphone array, moreParticularly, relate to the voice sound localization method based on two spectrum weighted space correlation matrixes.
Background technology
Obtain in recent years research widely based on Microphone Array Speech sound localization method. Pass throughThe microphone institute collected sound signal of different spatial, according to the time difference information of voice signalWith microphone array geometry, can finally determine sound bearing. The estimation of time difference informationThe performance of auditory localization algorithm is had to conclusive effect. Noise is that voice sound localization method is realBy the main restricting factor of changing. Noise under actual environment comprises direction-free shot noise, withAnd oriented interference source noise.
All frameworks based on same of the existing auditory localization algorithm based on microphone array: first pre-If sound bearing candidate collection, then calculates corresponding to " cost function " in each candidate orientation and obtainsPoint, the orientation that cost function score is the highest is the most at last estimated as final sound bearing. When undirectedWhen shot noise is stronger, the time difference information between each microphone signal is subject to undirected noise and floodsNot yet, cause the dimensional orientation discrimination of sound bearing cost function to decline; When in environment, have toWhen interference noise, sound bearing cost function tends to peak-peak directional signal energy maximumDirection, and can not effectively distinguish voice and interfering noise.
Tradition sound localization method be mainly divided into the method estimated based on high resolution space spectrum, based onThe method of controlled power response, the method for estimating based on time delay.
The method of estimating based on high resolution space spectrum mainly comes from eighties of last century at military, communication neckNarrow band signal incidence angle estimation problem in the application such as sonar, radar in territory. This class methods pair arraySpatial correlation matrix carries out Subspace Decomposition, utilizes the orthogonality of signal subspace and noise subspaceMatter, constructs specific sound bearing cost function, and it is had in sound bearing in theory for justInfinite sharp-pointed peak value. Because these class methods are the simple popularizations from narrow band signal, and But most of algorithmsAnd not exclusively for the orientation problem of speech source, therefore not by the self-characteristic of voice (as harmonic waveCharacteristic, non-stationary property etc.) be fused in the design of algorithm. Due to voice spectrum distribution character andNarrow band signal, broadband stationary signal are all not identical, and therefore many algorithms can not ideally be applicable to languageSound auditory localization.
First method based on controlled power response design Beam-former, to strengthen specific directionEcho signal suppresses the non-echo signal of other directions, then to all candidates sound bearing simultaneouslyCarry out above-mentioned beam scanning, the signal energy after this candidate orientation is strengthened, as corresponding to this timeSelect the cost function score of sound bearing. Finally, by the direction corresponding to spiking output energyEstimate as current sound bearing. An important hypothesis of the method based on controlled power response beVoice sound bearing, the output maximum of Beam-former. But, in actual environment, particularly haveUnder the condition existing to interference source noise, this hypothesis can not effectively be set up. How to improve at noisePerformance under environment is the subject matter that such algorithm faces.
The method of estimating based on time delay is divided into two large steps. First, need to be according to observed manyRoad signal, estimates the time delay between each road, then, estimates according to time delay, and microphoneThe geometry of array, further calculates sound bearing. With estimate based on high resolution space spectrumMethod, method based on controlled responding power are compared, and this class methods computation complexity is lower, realizeSimply, therefore paid attention to widely. In the time that the sample rate of signal is lower, owing to can only estimating letterThe time delay of integer sampled point between number, the method for therefore estimating based on time delay can not reach higherAngular resolution. Noise is still one of main challenge that these class methods face. Especially make an uproar orientedWhen sound exists, the Delay between echo signal is very easily subject to oriented noise time delay and disturbs.
Summary of the invention
In order to solve the problem of prior art, the object of the invention is to improve undirected and have simultaneouslyTo the voice auditory localization performance under noise circumstance. For reaching affiliated object, the invention provides oneBased on the voice sound localization method of two spectrum weighted space correlation matrixes. The concrete steps of the method asUnder:
Step a: the multi-path voice signal by microphone array collection with noise, the Ba Ge road band language of making an uproarTone signal is divided respectively frame, on each frame, calculate each road time-domain signal and first via microphone signal itBetween mutually two spectrum values;
Step b: on two spectral domains, calculate the two spectrum phases between each road microphone and first via microphonePotential difference;
Step c: on two spectral domains, utilize first via microphone signal and the mutually two spectrum values of self to estimateTwo spectrums unit weight;
Steps d: definition candidate orientation set;
Step e: poor and two spectrums unit weight according to bispectrum phase, structure is for current candidate orientationTwo spectrum weighted space correlation matrixes;
Step f: based on two spectrum weighted space correlation matrix characteristic values, calculate for current candidate orientationSound bearing cost function;
Step g: repeating step e~step f, until traversal candidate orientation set. According to sound bearingThe corresponding orientation of cost function maximum, obtains the estimation of voice sound bearing.
Step a comprises especially and to use directly two spectrum estimations technique or indirectly two spectrum estimations technique, passes throughThe digital observation signal that each road microphone is received, calculates each road time-domain signal and first via MikeOriginal mutual pair of spectrum value between wind signal.
Step a comprises that employing time smoothing strategy is to claim 2 Zhong Ge road time-domain signal and the first viaOriginal mutual pair of spectrum value between microphone signal carried out post processing:
Wherein,(Ω1,Ω2) be t frame m road microphone signal and first via microphone signal itBetween mutually two spectrum values,(Ω1,Ω2) be t frame m road microphone signal and first via microphoneOriginal mutual pair of spectrum value between signal, 0≤α < 1 is smoothing factor.
Step b comprises that the following equation of use calculates between each road microphone and first via microphone signalBispectrum phase poor:
Wherein, Im1(Ω1,Ω2) be the two spectrum phases between m road microphone and first via microphone signalPotential difference, []*For adjoint operator, || be delivery operator.
Described in step c, utilize first via microphone signal and the mutually two spectrum values between self to estimate two spectrumsUnit weight comprises:
Set t frame first via microphone signal and the mutually two spectrum values between self are(Ω1,Ω2),So, estimate that two spectrums unit weight comprises the steps:
Step c1: according to(Ω1,Ω2) and(Ω1,Ω2) estimation t frame first via microphone signalTwo spectrum priori signal to noise ratios;
Step c2: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios and(Ω1,Ω2) letterNumber.
Described in step e, structure, for two spectrum weighted space correlation matrixes in current candidate orientation, comprisesFollowing step:
Step e1: poor according to the bispectrum phase between each road microphone and first via microphone, calculateThe bispectrum phase difference vector of complex field;
Step e2: according to candidate orientation, calculate the poor compensation vector of bispectrum phase;
Step e3: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate after compensationBispectrum phase difference vector;
Step e4: according to bispectrum phase difference vector after compensation and two spectrums unit weight, structure is for working asOriginal two spectrum weighted space correlation matrixes in front candidate orientation;
Step e5: adopt time smoothing strategy to carry out rear place to original two spectrum weighted space correlation matrixesReason.
Described in above-mentioned steps e5, adopt time smoothing strategy to original two spectrum weighted space correlation matrixesCarry out post processing, comprising:
Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,ForOriginal two spectrum weighted space correlation matrix, 0≤β < 1 is smoothing factor.
Described in step f, calculate the sound bearing cost function for current candidate orientation, comprising:
Setting for two spectrum weighted space correlation matrixes of current candidate orientation θ is R (θ), and MikeWind array comprises M microphone altogether, and M the Complex eigenvalues value of R (θ) according to mould value from big to smallBe arranged as | λ1(θ)|≥|λ2(θ)|...≥|λM(θ) |, so, for the sound bearing of current candidate orientation θCost function is defined as the function of M Complex eigenvalues value.
Described in step g, according to the corresponding orientation of cost function maximum, sound bearing, obtain voiceThe estimation of sound bearing, comprising:
Set all candidate orientation composition candidate orientation set Θ, for current candidate orientation θ'sSound bearing cost function is J (θ), so voice sound bearingBe estimated as:
Beneficial effect of the present invention: first, traditional sound localization method cannot be eliminated in theoryThe impact of undirected Gaussian noise. Two spectrums are one of signal " high-order statistic ". High-order statisticThe better character having is that the high-order statistic of Gaussian noise is 0. Therefore, resonableIn opinion, the present invention has robustness to Gaussian noise signal. Secondly, two spectrum of the present inventionPhase difference comprises voice sound bearing clue, and can obtain theoretically this voice sound bearingClue has redundancy on two spectral domains, and (theory analysis of this part can be with reference to WeiXue, ShanLiang,WenjuLiu,“WeightedSpatialBispectrumCorrelationMatrixforDOAEstimationinthePresenceofInterferences,”InterSpeech2014,September14-18, Singapore, pp.2228-2232,2014). Even if it is two that this makes at someIn spectrum unit, sound bearing clue is polluted by noise, but seriously not dirty by noise at otherIn the two spectrums unit dying, still may pick up same azimuth information. This feature contributes to carryPerformance under the oriented noise conditions of high non-Gauss. Finally, the present invention adopts the weighting of two spectrum, extracts languageTwo sound bearing clues of composing unit that sound is leading, suppress the negative effect of noise, further improveFor the robustness of noise.
Brief description of the drawings
The further characteristic of the present invention and advantage are described below with reference to illustrative accompanying drawing.
Fig. 1 has schematically shown the Microphone Array Speech sound based on two spectrum weighted space correlation matrixesSource localization method flow chart;
Fig. 2 has schematically shown the flow chart of mutually two spectral estimation units;
Fig. 3 has schematically shown the flow chart of two spectrums unit weight calculation unit;
Fig. 4 has schematically shown the flow chart of two spectrum weighted space correlation matrix structures;
Fig. 5 has schematically shown the flow chart of candidate orientation cost function calculation.
Detailed description of the invention
The following detailed description that should be appreciated that different examples and accompanying drawing is not to be intended to the present invention to limitBe formed on special illustrative embodiment; The illustrative embodiment being described is only that illustration is of the present inventionEach step, its scope is defined by the claim of adding.
The present invention utilizes two spectrums for Gaussian noise robustness in theory, and sound on two spectral domainThe redundancy of source side bit line rope, adopts the weighting of two spectrum to choose the main two spectrums unit of voice, and raising is made an uproarThe performance of voice auditory localization under acoustic environment.
In Fig. 1, provide the Microphone Array Speech sound based on two spectrum weighted space correlation matrixesSource localization method flow chart.
System comprises the microphone array of at least two microphones 101. The microphone of microphone arrayCan have different arrangements, especially, microphone 101 is placed in a row, wherein each microphoneThere is preset distance with adjoining microphone. For example, the distance between two microphones can be aboutIt is 5 centimetres. For different applied environments and technical requirement, microphone array can be installed inSuitable position.
Core calculations part of the present invention comprises five elementary cells: mutually two spectral estimation units 102,The poor computing unit 103 of bispectrum phase, two spectrums unit weight calculation unit 104, two spectrum weighted spaceCorrelation matrix construction unit 106, cost function calculation unit, candidate orientation 107. The method is passed throughAll candidates orientation in traversal candidate orientation set, calculates corresponding candidate orientation cost function,Finally judge cost function maximum value position 108, determine voice sound bearing.
One, mutually two spectral estimation units
The time-domain signal of setting m road is xm(k), can be decomposed into speech components, oriented interferenceNoise component(s) vmAnd undirected noise component(s) n (k)m(k) add up:
xm(k)=γms1(k-τm1)+vm(k)+nm(k)(1)
Wherein, γmThe amplitude fading factor of sound source to m microphone, τm1It is microphone 1To the time difference of microphone m, this time difference and sound bearing are directly related.
Fig. 2 has shown the flow chart of mutually two spectral estimation units, corresponding to step 102 in Fig. 1.
In step 102, first, the data signal that step 201 observes multichannel is carried out respectivelyDivide frame. Next,, on each frame, step 202 is calculated each road signal and first via Mike windOriginal mutual pair of spectrum value between number. Original mutual pair of spectrum value can be counted by " direct method " or " indirect method "Calculate, calculation process can bibliography ChrysostomosLNikiasandMysoreRRaghuveer,“Bispectrumestimation:Adigitalsignalprocessingframework,”ProceedingsOftheIEEE, 75 (7): 869-891,1987. Because voice have non-stationaryly, and adoptFrame length is limited, in order to reduce the estimate variance of mutually two spectrum values, need to carry out time smoothing post processing.Set(Ω1,Ω2) be between t frame m road microphone signal and first via microphone signalOriginal mutual pair of spectrum value, the time smoothing method of step 203 is:
Wherein,(Ω1,Ω2) be t frame m road microphone signal and first via microphone signal itBetween mutually two spectrum values, 0≤α1< 1 is smoothing factor.
Two spectrums are three rank statistics of signal, are the simplest high-order statistics. In theory, high-orderAn excellent specific property of statistic is that the high-order statistic of Gaussian noise is 0. Under actual environmentUndirected noise can be modeled as the uncorrelated white Gaussian noise in space conventionally, and thus, the present invention canTo eliminate in theory the impact of white Gaussian noise. Suppose voice signal, undirected noise signal andIndependence between oriented interference source signal, by formula (1),(Ω1,Ω2) can further be shownReach for:
Wherein,(Ω1,Ω2) and(Ω1,Ω2) mutually voice and the oriented interference in two spectrums of representative respectivelySource component.
Two, the poor computing unit of bispectrum phase
In Fig. 1 step 103, adopt following equation to calculate each road microphone and first via microphoneBispectrum phase between signal is poor:
Wherein, Im1(Ω1,Ω2) be the two spectrum phases between m road microphone and first via microphone signalPotential difference, []*For adjoint operator, || be delivery operator.
According in formula (2)(Ω1,Ω2) and(Ω1,Ω2) expression formula, Im1(Ω1,Ω2) can be entered oneStep is expressed as:
Wherein,
This shows, due to τm1Relevant with sound bearing, thereforeBe called as sound source sideBit line rope. km(Ω1,Ω2) relevant with two spectrums of non-Gauss's interference source signal. Obviously, at pure languageThe two spectrum of sound unit, because interference noise does not exist, km(Ω1,Ω2)=1. Like this, bispectrum phase is poorEquate with voice sound bearing clue.
A critical nature that can draw is, even the poor difference (Ω that is defined within of bispectrum phase1,Ω2) getOn two spectrums unit of value, but actual voice sound bearing clue is not Ω2Function. Change speechIt, as long as two different two spectrums unit have same Ω1, no matter Ω2Value how, in two spectrumsIn phase difference, they all comprise same voice sound bearing clue. Therefore voice sound bearing lineRope is redundancy on two spectral domains. Because having different two spectrums, voice and interference noise distribute,In some two spectrums unit, voice sound bearing clue is by noise severe contamination, nonetheless, sameClue likely there is same Ω1Value, in two spectrums unit of noise pollution, do not weighedNewly find. This character will be conducive to improve the performance of this method under non-Gaussian noise environment.
Three, two spectrums unit weight calculation unit
Fig. 3 has shown the flow chart of two spectrums unit weight calculation unit, corresponding to step in Fig. 1104。
In step 104, set t frame first via microphone signal and the two spectrums mutually between selfValue is(Ω1,Ω2), so, estimate that two spectrums unit weight comprises following sub-step:
Step 401: according to(Ω1,Ω2) and(Ω1,Ω2) estimation t frame first via Mike windNumber two spectrum priori signal to noise ratios;
Step 402: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios and(Ω1,Ω2) letterNumber.
Step 401 adopts the two spectrum a priori SNR estimation methods based on decision-directed. Two spectrum prioriSignal to noise ratio is defined as the ratio of the two spectral components of voice and the two spectrum energies of noise in two spectrums unit, definesFor:
Wherein,Represent the estimated value of the two spectrum energies of interference noise. ByInThe unknown, need estimate two spectrum priori signal to noise ratios. For this reason, need define simultaneouslyTwo spectrum posteriori SNRs are the ratio of two spectrum unit gross energies and the two spectrum energies of noise:
Be analogous to the computational methods of time-frequency domain priori signal to noise ratio, ξ(t)(Ω1,Ω2) be estimated as:
Wherein, P[] represent halfwave rectifier operation, to ensure ξ(t)(Ω1,Ω2) nonnegativity of estimated value.0.92≤α2≤ 0.98 is a smoothing factor.
In step 402, two spectrums unit weight is calculated asWithLetterNumber. First, according toWithCalculate two spectrum energy of present frame clean speechAmount
Next, according toChoose the leading two spectrums unit of voice:
Wherein,It is threshold parameter. Finally, two spectrums unit weight w (Ω1,Ω2) be calculated as:
Four, two spectrum weighted space correlation matrix structures
Fig. 4 has shown the flow chart of two spectrum weighted space correlation matrix structures, corresponding to walking in Fig. 1Rapid 106.
In step 106, structure is for two spectrum weighted space correlation matrix bags in current candidate orientationDraw together following sub-step:
Step 601: poor according to the bispectrum phase between each road microphone and first via microphone, meterCalculate the bispectrum phase difference vector of complex field;
Step 602: according to candidate orientation, calculate the poor compensation vector of bispectrum phase;
Step 603: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate compensationRear bispectrum phase difference vector;
Step 604: according to bispectrum phase difference vector after compensation and two spectrums unit weights, structure forOriginal two spectrum weighted space correlation matrixes in current candidate orientation;
Step 605: after adopting time smoothing strategy to carry out original two spectrum weighted space correlation matrixesProcess.
In step 601, suppose that microphone array comprises M microphone altogether, two spectrums of complex fieldPhase difference vector is calculated as:
I(Ω1,Ω2)=[I11(Ω1,Ω2),I21(Ω1,Ω2),...,IM1(Ω1,Ω2)]T(13)
In step 602, the expression formula of reference voice sound bearing clue, the poor compensation of bispectrum phaseVector is defined as the function of candidate orientation θ:
Wherein,(for m=2 ..., M) candidate orientation θ is converted into a m microphone to theTime difference between a microphone.
Further, in step 603, two spectrum phases of calculating according to step 601 and step 602The poor compensation vector of potential difference vector sum bispectrum phase, calculate the rear bispectrum phase difference vector of compensation:
Ic(θ,Ω1,Ω2)=I(Ω1,Ω2)⊙C(θ,Ω1)(15)
Wherein, the element of symbol " ⊙ " representation vector multiplies each other.
In step 604, two spectrum unit weights and the benefits calculated according to step 104 and step 603Repay rear bispectrum phase difference vector, structure is relevant for original two spectrum weighted spaces in current candidate orientationMatrix
In order further to overcome the impact of noise, step 605 adopts time smoothing strategy pairEnterRow post processing:
Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,ForOriginal two spectrum weighted space correlation matrix, 0≤β < 1 is smoothing factor.
Five, candidate orientation cost function calculation
Fig. 5 has shown the flow chart of candidate orientation cost function calculation, corresponding to step in Fig. 1107。
In step 701, R (θ) is carried out to Eigenvalues Decomposition, and its M Complex eigenvalues value pressedBe arranged as from big to small according to mould value | λ1(θ)|≥|λ2(θ)≥...≥|λM(θ)|。
In step 702, for sound bearing cost function J (θ) definition of current candidate orientation θFunction for M Complex eigenvalues value:
By WeiXue, ShanLiang, WenjuLiu, " WeightedSpatialBispectrumCorrelationMatrixforDOAEstimationinthePresenceofInterferences,”InterSpeech2014,September14-18,Singapore,pp.2228-2232,2014 is known, in the time that candidate sound bearing is equal with true sound bearing, R's (θ)Order is 1, like this | and λ2(θ)|=...=|λM(θ) |=0, therefore, J (θ) is positive infinity in theory.
Six, cost function maximum value position judgement
In Fig. 1 step 108, voice sound bearing is estimated as and makes J (θ) have peakedCandidate orientation:
Wherein, Θ is the candidate orientation set that all candidate orientation form.
According to this description, the further modifications and variations of the present invention are for the technology people in described fieldMember is apparent. Therefore, this explanation will be regarded as illustrative and its objective is to affiliatedThose skilled in the art lecture and are used for carrying out conventional method of the present invention. Should be appreciated that this description showsThe form of the present invention going out and describe is just counted as current preferred embodiment.
Claims (9)
1. the voice sound localization method based on two spectrum weighted space correlation matrixes, comprises the following steps:
Step a: the multi-path voice signal by microphone array collection with noise, Ba Ge road Noisy Speech Signal divides respectively frame, calculates the mutually two spectrum values between each road time-domain signal and first via microphone signal on each frame;
Step b: on two spectral domains, calculate bispectrum phase between each road microphone and first via microphone poor;
Step c: on two spectral domains, utilize first via microphone signal and the mutually two spectrum values of self to estimate two spectrums unit weight;
Steps d: definition candidate orientation set;
Step e: poor and two spectrums unit weight according to bispectrum phase, structure is for two spectrum weighted space correlation matrixes in current candidate orientation;
Step f: based on two spectrum weighted space correlation matrix characteristic values, calculate the sound bearing cost function for current candidate orientation;
Step g: repeating step e~step f, until all candidates orientation in traversal candidate orientation set. According to the corresponding orientation of cost function maximum, sound bearing, obtain the estimation of voice sound bearing.
2. sound localization method as claimed in claim 1, wherein, step a comprises and uses directly two spectrum estimations technique or indirectly two spectrum estimations technique, by the received digital observation signal of each road microphone, calculates original mutual pair of spectrum value between each road time-domain signal and first via microphone signal.
3. sound localization method as claimed in claim 2, wherein, step a comprises that employing time smoothing strategy carries out post processing to original mutual pair of spectrum value between claim 2 Zhong Ge road time-domain signal and first via microphone signal:
Wherein,Be the mutually two spectrum values between t frame m road microphone signal and first via microphone signal,Be original mutual pair of spectrum value between t frame m road microphone signal and first via microphone signal, 0≤α < 1 is smoothing factor.
4. sound localization method as claimed in claim 1, wherein, step b comprises that to use following two equatioies to calculate bispectrum phase between each road microphone and first via microphone signal poor:
Wherein, Im1(Ω1,Ω2) be that bispectrum phase between m road microphone and first via microphone signal is poor, []*For adjoint operator, || be delivery operator.
5. sound localization method as claimed in claim 1, wherein, utilizes described in step c first via microphone signal and the mutually two spectrum values between self to estimate that two spectrums unit weights comprise:
Set t frame first via microphone signal and the mutually two spectrum values between self areSo, estimate that two spectrums unit weight comprises the steps:
Step c1: according toWithEstimate two spectrum priori signal to noise ratios of t frame first via microphone signal;
Step c2: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios andFunction.
6. sound localization method as claimed in claim 1, wherein, structure, for two spectrum weighted space correlation matrixes in current candidate orientation, comprises the steps: described in step e
Step e1: poor according to the bispectrum phase between each road microphone and first via microphone, the bispectrum phase difference vector in calculated complex territory;
Step e2: according to current candidate orientation, calculate the poor compensation vector of bispectrum phase;
Step e3: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate the rear bispectrum phase difference vector of compensation;
Step e4: according to bispectrum phase difference vector after compensation and two spectrums unit weight, structure is for original two spectrum weighted space correlation matrixes in current candidate orientation;
Step e5: adopt time smoothing strategy to carry out post processing to original two spectrum weighted space correlation matrixes.
7. as the sound localization method of claim 1 or 6, wherein, adopt time smoothing strategy to carry out post processing to original two spectrum weighted space correlation matrixes described in step e5, comprising:
Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,For original two spectrum weighted space correlation matrixes, 0≤β < 1 is smoothing factor.
8. sound localization method as claimed in claim 1, wherein, calculates the sound bearing cost function for current candidate orientation described in step f, comprising:
Setting for two spectrum weighted space correlation matrixes of current candidate orientation θ is R (θ), and microphone array comprises M microphone altogether, and M the Complex eigenvalues value of R (θ) is arranged as from big to small according to mould value | λ1(θ)|≥|λ2(θ)|≥...≥|λM(θ) |, so, be defined as the function of M Complex eigenvalues value for the sound bearing cost function of current candidate orientation θ.
9. sound localization method as claimed in claim 1, wherein, according to the corresponding orientation of cost function maximum, sound bearing, obtains the estimation of voice sound bearing described in step g, comprising:
Setting all candidate orientation composition candidate orientation set Θ, is J (θ), so voice sound bearing for the sound bearing cost function of current candidate orientation θBe estimated as:
。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510937548.3A CN105609113A (en) | 2015-12-15 | 2015-12-15 | Bispectrum weighted spatial correlation matrix-based speech sound source localization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510937548.3A CN105609113A (en) | 2015-12-15 | 2015-12-15 | Bispectrum weighted spatial correlation matrix-based speech sound source localization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105609113A true CN105609113A (en) | 2016-05-25 |
Family
ID=55988996
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510937548.3A Pending CN105609113A (en) | 2015-12-15 | 2015-12-15 | Bispectrum weighted spatial correlation matrix-based speech sound source localization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105609113A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106289506A (en) * | 2016-09-06 | 2017-01-04 | 大连理工大学 | A kind of method using POD decomposition method to eliminate flow field wall microphone array noise signal |
CN106526541A (en) * | 2016-10-13 | 2017-03-22 | 杭州电子科技大学 | Sound positioning method based on distribution matrix decision |
CN107219512A (en) * | 2017-03-29 | 2017-09-29 | 北京大学 | A kind of sound localization method based on acoustic transfer function |
CN108198562A (en) * | 2018-02-05 | 2018-06-22 | 中国农业大学 | A kind of method and system for abnormal sound in real-time positioning identification animal house |
CN108398664A (en) * | 2017-02-07 | 2018-08-14 | 中国科学院声学研究所 | A kind of analytic expression space for microphone array solves aliasing method |
CN108540898A (en) * | 2017-03-03 | 2018-09-14 | 松下电器(美国)知识产权公司 | Sound source detection device and method, the recording medium for recording sound source locator |
CN109831709A (en) * | 2019-02-15 | 2019-05-31 | 杭州嘉楠耘智信息科技有限公司 | Sound source orientation method and device and computer readable storage medium |
CN110082724A (en) * | 2019-05-31 | 2019-08-02 | 浙江大华技术股份有限公司 | A kind of sound localization method, device and storage medium |
CN110133594A (en) * | 2018-02-09 | 2019-08-16 | 北京搜狗科技发展有限公司 | A kind of sound localization method, device and the device for auditory localization |
CN110580911A (en) * | 2019-09-02 | 2019-12-17 | 青岛科技大学 | beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences |
CN110728988A (en) * | 2019-10-23 | 2020-01-24 | 浪潮金融信息技术有限公司 | Implementation method of voice noise reduction camera for self-service terminal equipment |
CN111352075A (en) * | 2018-12-20 | 2020-06-30 | 中国科学院声学研究所 | Underwater multi-sound-source positioning method and system based on deep learning |
-
2015
- 2015-12-15 CN CN201510937548.3A patent/CN105609113A/en active Pending
Non-Patent Citations (2)
Title |
---|
WEI XUE等: ""Noise Robust Direction of Arrival Estimation for Speech Source With Weighted Bispectrum Spatial Correlation Matrix"", 《IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING》 * |
WEI XUE等: ""Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferences"", 《INTERSPEECH 2014》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106289506B (en) * | 2016-09-06 | 2019-03-05 | 大连理工大学 | A method of flow field wall surface microphone array noise signal is eliminated using POD decomposition method |
CN106289506A (en) * | 2016-09-06 | 2017-01-04 | 大连理工大学 | A kind of method using POD decomposition method to eliminate flow field wall microphone array noise signal |
CN106526541A (en) * | 2016-10-13 | 2017-03-22 | 杭州电子科技大学 | Sound positioning method based on distribution matrix decision |
CN106526541B (en) * | 2016-10-13 | 2019-01-18 | 杭州电子科技大学 | Sound localization method based on distribution matrix decision |
CN108398664A (en) * | 2017-02-07 | 2018-08-14 | 中国科学院声学研究所 | A kind of analytic expression space for microphone array solves aliasing method |
CN108398664B (en) * | 2017-02-07 | 2020-09-08 | 中国科学院声学研究所 | Analytic spatial de-aliasing method for microphone array |
CN108540898B (en) * | 2017-03-03 | 2020-11-24 | 松下电器(美国)知识产权公司 | Sound source detection device and method, and recording medium having sound source detection program recorded thereon |
CN108540898A (en) * | 2017-03-03 | 2018-09-14 | 松下电器(美国)知识产权公司 | Sound source detection device and method, the recording medium for recording sound source locator |
CN107219512A (en) * | 2017-03-29 | 2017-09-29 | 北京大学 | A kind of sound localization method based on acoustic transfer function |
CN107219512B (en) * | 2017-03-29 | 2020-05-22 | 北京大学 | Sound source positioning method based on sound transfer function |
CN108198562A (en) * | 2018-02-05 | 2018-06-22 | 中国农业大学 | A kind of method and system for abnormal sound in real-time positioning identification animal house |
CN110133594A (en) * | 2018-02-09 | 2019-08-16 | 北京搜狗科技发展有限公司 | A kind of sound localization method, device and the device for auditory localization |
CN111352075B (en) * | 2018-12-20 | 2022-01-25 | 中国科学院声学研究所 | Underwater multi-sound-source positioning method and system based on deep learning |
CN111352075A (en) * | 2018-12-20 | 2020-06-30 | 中国科学院声学研究所 | Underwater multi-sound-source positioning method and system based on deep learning |
CN109831709A (en) * | 2019-02-15 | 2019-05-31 | 杭州嘉楠耘智信息科技有限公司 | Sound source orientation method and device and computer readable storage medium |
CN110082724A (en) * | 2019-05-31 | 2019-08-02 | 浙江大华技术股份有限公司 | A kind of sound localization method, device and storage medium |
CN110580911B (en) * | 2019-09-02 | 2020-04-21 | 青岛科技大学 | Beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences |
CN110580911A (en) * | 2019-09-02 | 2019-12-17 | 青岛科技大学 | beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences |
CN110728988A (en) * | 2019-10-23 | 2020-01-24 | 浪潮金融信息技术有限公司 | Implementation method of voice noise reduction camera for self-service terminal equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105609113A (en) | Bispectrum weighted spatial correlation matrix-based speech sound source localization method | |
US11308974B2 (en) | Target voice detection method and apparatus | |
CN112526451B (en) | Compressed beam forming and system based on microphone array imaging | |
CN108731886B (en) | A kind of more leakage point acoustic fix ranging methods of water supply line based on iteration recursion | |
CN107817465A (en) | The DOA estimation method based on mesh free compressed sensing under super-Gaussian noise background | |
CN109782231B (en) | End-to-end sound source positioning method and system based on multi-task learning | |
CN105068048A (en) | Distributed microphone array sound source positioning method based on space sparsity | |
CN110146846B (en) | Sound source position estimation method, readable storage medium and computer equipment | |
CN107102296A (en) | A kind of sonic location system based on distributed microphone array | |
CN104898086B (en) | Estimate sound source direction method suitable for the sound intensity of mini microphone array | |
CN103854660B (en) | A kind of four Mike's sound enhancement methods based on independent component analysis | |
CN101893698B (en) | Noise source test and analysis method and device | |
CN103559888A (en) | Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle | |
CN106468770A (en) | Closely optimum radar target detection method under K Distribution Clutter plus noise | |
CN106066468A (en) | A kind of based on acoustic pressure, the vector array port/starboard discrimination method of vibration velocity Mutual spectrum | |
CN108957403B (en) | Gaussian fitting envelope time delay estimation method and system based on generalized cross correlation | |
CN111948598B (en) | Method and device for detecting space domain interference signal | |
CN109407046A (en) | A kind of nested array direction of arrival angle estimation method based on variational Bayesian | |
CN104360305A (en) | Radiation source direction finding positioning method of uniting compressed sensing and signal cycle stationary characteristics | |
CN102915735B (en) | Noise-containing speech signal reconstruction method and noise-containing speech signal device based on compressed sensing | |
Wang | Multi-band multi-centroid clustering based permutation alignment for frequency-domain blind speech separation | |
CN106226729A (en) | Relatively prime array direction of arrival angular estimation method based on fourth-order cumulant | |
CN104665875A (en) | Ultrasonic Doppler envelope and heart rate detection method | |
Lemos et al. | Using matrix norms to estimate the direction of arrival of planar waves on an ULA | |
Li et al. | Noise reduction of ship-radiated noise based on noise-assisted bivariate empirical mode decomposition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160525 |
|
RJ01 | Rejection of invention patent application after publication |