CN105609113A - Bispectrum weighted spatial correlation matrix-based speech sound source localization method - Google Patents

Bispectrum weighted spatial correlation matrix-based speech sound source localization method Download PDF

Info

Publication number
CN105609113A
CN105609113A CN201510937548.3A CN201510937548A CN105609113A CN 105609113 A CN105609113 A CN 105609113A CN 201510937548 A CN201510937548 A CN 201510937548A CN 105609113 A CN105609113 A CN 105609113A
Authority
CN
China
Prior art keywords
spectrum
bispectrum
signal
microphone
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510937548.3A
Other languages
Chinese (zh)
Inventor
刘文举
雪巍
梁山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201510937548.3A priority Critical patent/CN105609113A/en
Publication of CN105609113A publication Critical patent/CN105609113A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/808Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
    • G01S3/8083Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems determining direction of source
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention relates to a bispectrum weighted spatial correlation matrix-based speech sound source localization method. The objective of the invention is to solve problems in microphone array-based robust sound source localization in an actual complex noise environment. According to the method, the special mathematical properties of speech signals and noise signals received by a microphone array on a bispectrum domain are utilized. The method includes the following steps that: framing and bispectrum estimation are performed on signals acquired by the microphone array; on the bispectrum domain, the bispectrum phase difference of each microphone and a reference microphone is calculated; the signals of the reference microphone are adopted to estimate bispectrum unit weight; a bispectrum weighted spatial correlation matrix corresponding to a candidate direction is calculated according to the bispectrum phase difference and the bispectrum unit weight; a sound source direction cost function for a current candidate direction is calculated based on the eigenvalue of the bispectrum weighted spatial correlation matrix; and the direction of a speech sound source is estimated according to a direction corresponding to the maximum value of the sound source direction cost function.

Description

Based on the voice sound localization method of two spectrum weighted space correlation matrixes
Technical field
The present invention relates to the noise robustness voice sound localization method design based on microphone array, moreParticularly, relate to the voice sound localization method based on two spectrum weighted space correlation matrixes.
Background technology
Obtain in recent years research widely based on Microphone Array Speech sound localization method. Pass throughThe microphone institute collected sound signal of different spatial, according to the time difference information of voice signalWith microphone array geometry, can finally determine sound bearing. The estimation of time difference informationThe performance of auditory localization algorithm is had to conclusive effect. Noise is that voice sound localization method is realBy the main restricting factor of changing. Noise under actual environment comprises direction-free shot noise, withAnd oriented interference source noise.
All frameworks based on same of the existing auditory localization algorithm based on microphone array: first pre-If sound bearing candidate collection, then calculates corresponding to " cost function " in each candidate orientation and obtainsPoint, the orientation that cost function score is the highest is the most at last estimated as final sound bearing. When undirectedWhen shot noise is stronger, the time difference information between each microphone signal is subject to undirected noise and floodsNot yet, cause the dimensional orientation discrimination of sound bearing cost function to decline; When in environment, have toWhen interference noise, sound bearing cost function tends to peak-peak directional signal energy maximumDirection, and can not effectively distinguish voice and interfering noise.
Tradition sound localization method be mainly divided into the method estimated based on high resolution space spectrum, based onThe method of controlled power response, the method for estimating based on time delay.
The method of estimating based on high resolution space spectrum mainly comes from eighties of last century at military, communication neckNarrow band signal incidence angle estimation problem in the application such as sonar, radar in territory. This class methods pair arraySpatial correlation matrix carries out Subspace Decomposition, utilizes the orthogonality of signal subspace and noise subspaceMatter, constructs specific sound bearing cost function, and it is had in sound bearing in theory for justInfinite sharp-pointed peak value. Because these class methods are the simple popularizations from narrow band signal, and But most of algorithmsAnd not exclusively for the orientation problem of speech source, therefore not by the self-characteristic of voice (as harmonic waveCharacteristic, non-stationary property etc.) be fused in the design of algorithm. Due to voice spectrum distribution character andNarrow band signal, broadband stationary signal are all not identical, and therefore many algorithms can not ideally be applicable to languageSound auditory localization.
First method based on controlled power response design Beam-former, to strengthen specific directionEcho signal suppresses the non-echo signal of other directions, then to all candidates sound bearing simultaneouslyCarry out above-mentioned beam scanning, the signal energy after this candidate orientation is strengthened, as corresponding to this timeSelect the cost function score of sound bearing. Finally, by the direction corresponding to spiking output energyEstimate as current sound bearing. An important hypothesis of the method based on controlled power response beVoice sound bearing, the output maximum of Beam-former. But, in actual environment, particularly haveUnder the condition existing to interference source noise, this hypothesis can not effectively be set up. How to improve at noisePerformance under environment is the subject matter that such algorithm faces.
The method of estimating based on time delay is divided into two large steps. First, need to be according to observed manyRoad signal, estimates the time delay between each road, then, estimates according to time delay, and microphoneThe geometry of array, further calculates sound bearing. With estimate based on high resolution space spectrumMethod, method based on controlled responding power are compared, and this class methods computation complexity is lower, realizeSimply, therefore paid attention to widely. In the time that the sample rate of signal is lower, owing to can only estimating letterThe time delay of integer sampled point between number, the method for therefore estimating based on time delay can not reach higherAngular resolution. Noise is still one of main challenge that these class methods face. Especially make an uproar orientedWhen sound exists, the Delay between echo signal is very easily subject to oriented noise time delay and disturbs.
Summary of the invention
In order to solve the problem of prior art, the object of the invention is to improve undirected and have simultaneouslyTo the voice auditory localization performance under noise circumstance. For reaching affiliated object, the invention provides oneBased on the voice sound localization method of two spectrum weighted space correlation matrixes. The concrete steps of the method asUnder:
Step a: the multi-path voice signal by microphone array collection with noise, the Ba Ge road band language of making an uproarTone signal is divided respectively frame, on each frame, calculate each road time-domain signal and first via microphone signal itBetween mutually two spectrum values;
Step b: on two spectral domains, calculate the two spectrum phases between each road microphone and first via microphonePotential difference;
Step c: on two spectral domains, utilize first via microphone signal and the mutually two spectrum values of self to estimateTwo spectrums unit weight;
Steps d: definition candidate orientation set;
Step e: poor and two spectrums unit weight according to bispectrum phase, structure is for current candidate orientationTwo spectrum weighted space correlation matrixes;
Step f: based on two spectrum weighted space correlation matrix characteristic values, calculate for current candidate orientationSound bearing cost function;
Step g: repeating step e~step f, until traversal candidate orientation set. According to sound bearingThe corresponding orientation of cost function maximum, obtains the estimation of voice sound bearing.
Step a comprises especially and to use directly two spectrum estimations technique or indirectly two spectrum estimations technique, passes throughThe digital observation signal that each road microphone is received, calculates each road time-domain signal and first via MikeOriginal mutual pair of spectrum value between wind signal.
Step a comprises that employing time smoothing strategy is to claim 2 Zhong Ge road time-domain signal and the first viaOriginal mutual pair of spectrum value between microphone signal carried out post processing:
B x , m ( t ) ( Ω 1 , Ω 2 ) = αB x , m ( t ) ( Ω 1 , Ω 2 ) + ( 1 - α ) B ^ x , m ( t ) ( Ω 1 , Ω 2 )
Wherein,1,Ω2) be t frame m road microphone signal and first via microphone signal itBetween mutually two spectrum values,1,Ω2) be t frame m road microphone signal and first via microphoneOriginal mutual pair of spectrum value between signal, 0≤α < 1 is smoothing factor.
Step b comprises that the following equation of use calculates between each road microphone and first via microphone signalBispectrum phase poor:
I m 1 ( &Omega; 1 , &Omega; 2 ) = B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) &lsqb; B x , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) &rsqb; * | B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) | | B x , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) |
Wherein, Im11,Ω2) be the two spectrum phases between m road microphone and first via microphone signalPotential difference, []*For adjoint operator, || be delivery operator.
Described in step c, utilize first via microphone signal and the mutually two spectrum values between self to estimate two spectrumsUnit weight comprises:
Set t frame first via microphone signal and the mutually two spectrum values between self are1,Ω2),So, estimate that two spectrums unit weight comprises the steps:
Step c1: according to1,Ω2) and1,Ω2) estimation t frame first via microphone signalTwo spectrum priori signal to noise ratios;
Step c2: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios and1,Ω2) letterNumber.
Described in step e, structure, for two spectrum weighted space correlation matrixes in current candidate orientation, comprisesFollowing step:
Step e1: poor according to the bispectrum phase between each road microphone and first via microphone, calculateThe bispectrum phase difference vector of complex field;
Step e2: according to candidate orientation, calculate the poor compensation vector of bispectrum phase;
Step e3: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate after compensationBispectrum phase difference vector;
Step e4: according to bispectrum phase difference vector after compensation and two spectrums unit weight, structure is for working asOriginal two spectrum weighted space correlation matrixes in front candidate orientation;
Step e5: adopt time smoothing strategy to carry out rear place to original two spectrum weighted space correlation matrixesReason.
Described in above-mentioned steps e5, adopt time smoothing strategy to original two spectrum weighted space correlation matrixesCarry out post processing, comprising:
R ( &theta; ) = &beta; R ( &theta; ) + ( 1 - &beta; ) R ^ ( &theta; )
Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,ForOriginal two spectrum weighted space correlation matrix, 0≤β < 1 is smoothing factor.
Described in step f, calculate the sound bearing cost function for current candidate orientation, comprising:
Setting for two spectrum weighted space correlation matrixes of current candidate orientation θ is R (θ), and MikeWind array comprises M microphone altogether, and M the Complex eigenvalues value of R (θ) according to mould value from big to smallBe arranged as | λ1(θ)|≥|λ2(θ)|...≥|λM(θ) |, so, for the sound bearing of current candidate orientation θCost function is defined as the function of M Complex eigenvalues value.
Described in step g, according to the corresponding orientation of cost function maximum, sound bearing, obtain voiceThe estimation of sound bearing, comprising:
Set all candidate orientation composition candidate orientation set Θ, for current candidate orientation θ'sSound bearing cost function is J (θ), so voice sound bearingBe estimated as:
&theta; ~ = arg max &theta; &Element; &Theta; J ( &theta; )
Beneficial effect of the present invention: first, traditional sound localization method cannot be eliminated in theoryThe impact of undirected Gaussian noise. Two spectrums are one of signal " high-order statistic ". High-order statisticThe better character having is that the high-order statistic of Gaussian noise is 0. Therefore, resonableIn opinion, the present invention has robustness to Gaussian noise signal. Secondly, two spectrum of the present inventionPhase difference comprises voice sound bearing clue, and can obtain theoretically this voice sound bearingClue has redundancy on two spectral domains, and (theory analysis of this part can be with reference to WeiXue, ShanLiang,WenjuLiu,“WeightedSpatialBispectrumCorrelationMatrixforDOAEstimationinthePresenceofInterferences,”InterSpeech2014,September14-18, Singapore, pp.2228-2232,2014). Even if it is two that this makes at someIn spectrum unit, sound bearing clue is polluted by noise, but seriously not dirty by noise at otherIn the two spectrums unit dying, still may pick up same azimuth information. This feature contributes to carryPerformance under the oriented noise conditions of high non-Gauss. Finally, the present invention adopts the weighting of two spectrum, extracts languageTwo sound bearing clues of composing unit that sound is leading, suppress the negative effect of noise, further improveFor the robustness of noise.
Brief description of the drawings
The further characteristic of the present invention and advantage are described below with reference to illustrative accompanying drawing.
Fig. 1 has schematically shown the Microphone Array Speech sound based on two spectrum weighted space correlation matrixesSource localization method flow chart;
Fig. 2 has schematically shown the flow chart of mutually two spectral estimation units;
Fig. 3 has schematically shown the flow chart of two spectrums unit weight calculation unit;
Fig. 4 has schematically shown the flow chart of two spectrum weighted space correlation matrix structures;
Fig. 5 has schematically shown the flow chart of candidate orientation cost function calculation.
Detailed description of the invention
The following detailed description that should be appreciated that different examples and accompanying drawing is not to be intended to the present invention to limitBe formed on special illustrative embodiment; The illustrative embodiment being described is only that illustration is of the present inventionEach step, its scope is defined by the claim of adding.
The present invention utilizes two spectrums for Gaussian noise robustness in theory, and sound on two spectral domainThe redundancy of source side bit line rope, adopts the weighting of two spectrum to choose the main two spectrums unit of voice, and raising is made an uproarThe performance of voice auditory localization under acoustic environment.
In Fig. 1, provide the Microphone Array Speech sound based on two spectrum weighted space correlation matrixesSource localization method flow chart.
System comprises the microphone array of at least two microphones 101. The microphone of microphone arrayCan have different arrangements, especially, microphone 101 is placed in a row, wherein each microphoneThere is preset distance with adjoining microphone. For example, the distance between two microphones can be aboutIt is 5 centimetres. For different applied environments and technical requirement, microphone array can be installed inSuitable position.
Core calculations part of the present invention comprises five elementary cells: mutually two spectral estimation units 102,The poor computing unit 103 of bispectrum phase, two spectrums unit weight calculation unit 104, two spectrum weighted spaceCorrelation matrix construction unit 106, cost function calculation unit, candidate orientation 107. The method is passed throughAll candidates orientation in traversal candidate orientation set, calculates corresponding candidate orientation cost function,Finally judge cost function maximum value position 108, determine voice sound bearing.
One, mutually two spectral estimation units
The time-domain signal of setting m road is xm(k), can be decomposed into speech components, oriented interferenceNoise component(s) vmAnd undirected noise component(s) n (k)m(k) add up:
xm(k)=γms1(k-τm1)+vm(k)+nm(k)(1)
Wherein, γmThe amplitude fading factor of sound source to m microphone, τm1It is microphone 1To the time difference of microphone m, this time difference and sound bearing are directly related.
Fig. 2 has shown the flow chart of mutually two spectral estimation units, corresponding to step 102 in Fig. 1.
In step 102, first, the data signal that step 201 observes multichannel is carried out respectivelyDivide frame. Next,, on each frame, step 202 is calculated each road signal and first via Mike windOriginal mutual pair of spectrum value between number. Original mutual pair of spectrum value can be counted by " direct method " or " indirect method "Calculate, calculation process can bibliography ChrysostomosLNikiasandMysoreRRaghuveer,“Bispectrumestimation:Adigitalsignalprocessingframework,”ProceedingsOftheIEEE, 75 (7): 869-891,1987. Because voice have non-stationaryly, and adoptFrame length is limited, in order to reduce the estimate variance of mutually two spectrum values, need to carry out time smoothing post processing.Set1,Ω2) be between t frame m road microphone signal and first via microphone signalOriginal mutual pair of spectrum value, the time smoothing method of step 203 is:
B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) = &alpha; 1 B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) + ( 1 - &alpha; 1 ) B ^ x , m ( t ) ( &Omega; 1 , &Omega; 2 ) - - - ( 2 )
Wherein,1,Ω2) be t frame m road microphone signal and first via microphone signal itBetween mutually two spectrum values, 0≤α1< 1 is smoothing factor.
Two spectrums are three rank statistics of signal, are the simplest high-order statistics. In theory, high-orderAn excellent specific property of statistic is that the high-order statistic of Gaussian noise is 0. Under actual environmentUndirected noise can be modeled as the uncorrelated white Gaussian noise in space conventionally, and thus, the present invention canTo eliminate in theory the impact of white Gaussian noise. Suppose voice signal, undirected noise signal andIndependence between oriented interference source signal, by formula (1),1,Ω2) can further be shownReach for:
B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) = &gamma; 1 2 &gamma; m B s , m ( t ) ( &Omega; 1 , &Omega; 2 ) + B v , m ( t ) ( &Omega; 1 , &Omega; 2 ) - - - ( 3 )
Wherein,1,Ω2) and1,Ω2) mutually voice and the oriented interference in two spectrums of representative respectivelySource component.
Two, the poor computing unit of bispectrum phase
In Fig. 1 step 103, adopt following equation to calculate each road microphone and first via microphoneBispectrum phase between signal is poor:
I m 1 ( &Omega; 1 , &Omega; 2 ) = B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) &lsqb; B x , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) &rsqb; * | B x , m ( t ) ( &Omega; 1 , &Omega; 2 ) | | B x , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) | - - - ( 4 )
Wherein, Im11,Ω2) be the two spectrum phases between m road microphone and first via microphone signalPotential difference, []*For adjoint operator, || be delivery operator.
According in formula (2)1,Ω2) and1,Ω2) expression formula, Im11,Ω2) can be entered oneStep is expressed as:
I m 1 ( &Omega; 1 , &Omega; 2 ) = e j&Omega; 1 &tau; m 1 &kappa; m ( &Omega; 1 , &Omega; 2 ) - - - ( 5 )
Wherein,
This shows, due to τm1Relevant with sound bearing, thereforeBe called as sound source sideBit line rope. km1,Ω2) relevant with two spectrums of non-Gauss's interference source signal. Obviously, at pure languageThe two spectrum of sound unit, because interference noise does not exist, km1,Ω2)=1. Like this, bispectrum phase is poorEquate with voice sound bearing clue.
A critical nature that can draw is, even the poor difference (Ω that is defined within of bispectrum phase1,Ω2) getOn two spectrums unit of value, but actual voice sound bearing clue is not Ω2Function. Change speechIt, as long as two different two spectrums unit have same Ω1, no matter Ω2Value how, in two spectrumsIn phase difference, they all comprise same voice sound bearing clue. Therefore voice sound bearing lineRope is redundancy on two spectral domains. Because having different two spectrums, voice and interference noise distribute,In some two spectrums unit, voice sound bearing clue is by noise severe contamination, nonetheless, sameClue likely there is same Ω1Value, in two spectrums unit of noise pollution, do not weighedNewly find. This character will be conducive to improve the performance of this method under non-Gaussian noise environment.
Three, two spectrums unit weight calculation unit
Fig. 3 has shown the flow chart of two spectrums unit weight calculation unit, corresponding to step in Fig. 1104。
In step 104, set t frame first via microphone signal and the two spectrums mutually between selfValue is1,Ω2), so, estimate that two spectrums unit weight comprises following sub-step:
Step 401: according to1,Ω2) and1,Ω2) estimation t frame first via Mike windNumber two spectrum priori signal to noise ratios;
Step 402: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios and1,Ω2) letterNumber.
Step 401 adopts the two spectrum a priori SNR estimation methods based on decision-directed. Two spectrum prioriSignal to noise ratio is defined as the ratio of the two spectral components of voice and the two spectrum energies of noise in two spectrums unit, definesFor:
&xi; ( t ) ( &Omega; 1 , &Omega; 2 ) = | B s , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) | 2 &lambda; v ( &Omega; 1 , &Omega; 2 ) - - - ( 7 )
Wherein,Represent the estimated value of the two spectrum energies of interference noise. ByInThe unknown, need estimate two spectrum priori signal to noise ratios. For this reason, need define simultaneouslyTwo spectrum posteriori SNRs are the ratio of two spectrum unit gross energies and the two spectrum energies of noise:
Be analogous to the computational methods of time-frequency domain priori signal to noise ratio, ξ(t)1,Ω2) be estimated as:
Wherein, P[] represent halfwave rectifier operation, to ensure ξ(t)1,Ω2) nonnegativity of estimated value.0.92≤α2≤ 0.98 is a smoothing factor.
In step 402, two spectrums unit weight is calculated asWithLetterNumber. First, according toWithCalculate two spectrum energy of present frame clean speechAmount
| B s , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) | 2 = &xi; ^ ( t ) ( &Omega; 1 , &Omega; 2 ) &xi; ^ ( t ) ( &Omega; 1 , &Omega; 2 ) + 1 * | B x , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) | 2 - - - ( 10 )
Next, according toChoose the leading two spectrums unit of voice:
Wherein,It is threshold parameter. Finally, two spectrums unit weight w (Ω1,Ω2) be calculated as:
w ( &Omega; 1 , &Omega; 2 ) = | B s , 1 ( t ) ( &Omega; 1 , &Omega; 2 ) | 2 G ( &Omega; 1 , &Omega; 2 ) - - - ( 12 )
Four, two spectrum weighted space correlation matrix structures
Fig. 4 has shown the flow chart of two spectrum weighted space correlation matrix structures, corresponding to walking in Fig. 1Rapid 106.
In step 106, structure is for two spectrum weighted space correlation matrix bags in current candidate orientationDraw together following sub-step:
Step 601: poor according to the bispectrum phase between each road microphone and first via microphone, meterCalculate the bispectrum phase difference vector of complex field;
Step 602: according to candidate orientation, calculate the poor compensation vector of bispectrum phase;
Step 603: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate compensationRear bispectrum phase difference vector;
Step 604: according to bispectrum phase difference vector after compensation and two spectrums unit weights, structure forOriginal two spectrum weighted space correlation matrixes in current candidate orientation;
Step 605: after adopting time smoothing strategy to carry out original two spectrum weighted space correlation matrixesProcess.
In step 601, suppose that microphone array comprises M microphone altogether, two spectrums of complex fieldPhase difference vector is calculated as:
I(Ω1,Ω2)=[I111,Ω2),I211,Ω2),...,IM11,Ω2)]T(13)
In step 602, the expression formula of reference voice sound bearing clue, the poor compensation of bispectrum phaseVector is defined as the function of candidate orientation θ:
C ( &theta; , &Omega; 1 ) = &lsqb; 1 , e - j&Omega; 1 &tau; ^ 21 ( &theta; ) , ... , e - j&Omega; 1 &tau; ^ M 1 ( &theta; ) &rsqb; T - - - ( 14 )
Wherein,(for m=2 ..., M) candidate orientation θ is converted into a m microphone to theTime difference between a microphone.
Further, in step 603, two spectrum phases of calculating according to step 601 and step 602The poor compensation vector of potential difference vector sum bispectrum phase, calculate the rear bispectrum phase difference vector of compensation:
Ic(θ,Ω1,Ω2)=I(Ω1,Ω2)⊙C(θ,Ω1)(15)
Wherein, the element of symbol " ⊙ " representation vector multiplies each other.
In step 604, two spectrum unit weights and the benefits calculated according to step 104 and step 603Repay rear bispectrum phase difference vector, structure is relevant for original two spectrum weighted spaces in current candidate orientationMatrix
R ^ ( &theta; ) = &Sigma; ( &Omega; 1 , &Omega; 2 ) w ( &Omega; 1 , &Omega; 2 ) &lsqb; I c ( &theta; , &Omega; 1 , &Omega; 2 ) &rsqb; &lsqb; I c ( &theta; , &Omega; 1 , &Omega; 2 ) &rsqb; H - - - ( 16 )
In order further to overcome the impact of noise, step 605 adopts time smoothing strategy pairEnterRow post processing:
R ( &theta; ) = &beta; R ( &theta; ) + ( 1 - &beta; ) R ^ ( &theta; ) - - - ( 17 )
Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,ForOriginal two spectrum weighted space correlation matrix, 0≤β < 1 is smoothing factor.
Five, candidate orientation cost function calculation
Fig. 5 has shown the flow chart of candidate orientation cost function calculation, corresponding to step in Fig. 1107。
In step 701, R (θ) is carried out to Eigenvalues Decomposition, and its M Complex eigenvalues value pressedBe arranged as from big to small according to mould value | λ1(θ)|≥|λ2(θ)≥...≥|λM(θ)|。
In step 702, for sound bearing cost function J (θ) definition of current candidate orientation θFunction for M Complex eigenvalues value:
J ( &theta; ) = 1 &Sigma; i = 2 M | &lambda; i ( &theta; ) | - - - ( 18 )
By WeiXue, ShanLiang, WenjuLiu, " WeightedSpatialBispectrumCorrelationMatrixforDOAEstimationinthePresenceofInterferences,”InterSpeech2014,September14-18,Singapore,pp.2228-2232,2014 is known, in the time that candidate sound bearing is equal with true sound bearing, R's (θ)Order is 1, like this | and λ2(θ)|=...=|λM(θ) |=0, therefore, J (θ) is positive infinity in theory.
Six, cost function maximum value position judgement
In Fig. 1 step 108, voice sound bearing is estimated as and makes J (θ) have peakedCandidate orientation:
&theta; ~ = arg max &theta; &Element; &Theta; J ( &theta; ) - - - ( 19 )
Wherein, Θ is the candidate orientation set that all candidate orientation form.
According to this description, the further modifications and variations of the present invention are for the technology people in described fieldMember is apparent. Therefore, this explanation will be regarded as illustrative and its objective is to affiliatedThose skilled in the art lecture and are used for carrying out conventional method of the present invention. Should be appreciated that this description showsThe form of the present invention going out and describe is just counted as current preferred embodiment.

Claims (9)

1. the voice sound localization method based on two spectrum weighted space correlation matrixes, comprises the following steps:
Step a: the multi-path voice signal by microphone array collection with noise, Ba Ge road Noisy Speech Signal divides respectively frame, calculates the mutually two spectrum values between each road time-domain signal and first via microphone signal on each frame;
Step b: on two spectral domains, calculate bispectrum phase between each road microphone and first via microphone poor;
Step c: on two spectral domains, utilize first via microphone signal and the mutually two spectrum values of self to estimate two spectrums unit weight;
Steps d: definition candidate orientation set;
Step e: poor and two spectrums unit weight according to bispectrum phase, structure is for two spectrum weighted space correlation matrixes in current candidate orientation;
Step f: based on two spectrum weighted space correlation matrix characteristic values, calculate the sound bearing cost function for current candidate orientation;
Step g: repeating step e~step f, until all candidates orientation in traversal candidate orientation set. According to the corresponding orientation of cost function maximum, sound bearing, obtain the estimation of voice sound bearing.
2. sound localization method as claimed in claim 1, wherein, step a comprises and uses directly two spectrum estimations technique or indirectly two spectrum estimations technique, by the received digital observation signal of each road microphone, calculates original mutual pair of spectrum value between each road time-domain signal and first via microphone signal.
3. sound localization method as claimed in claim 2, wherein, step a comprises that employing time smoothing strategy carries out post processing to original mutual pair of spectrum value between claim 2 Zhong Ge road time-domain signal and first via microphone signal:
Wherein,Be the mutually two spectrum values between t frame m road microphone signal and first via microphone signal,Be original mutual pair of spectrum value between t frame m road microphone signal and first via microphone signal, 0≤α < 1 is smoothing factor.
4. sound localization method as claimed in claim 1, wherein, step b comprises that to use following two equatioies to calculate bispectrum phase between each road microphone and first via microphone signal poor:
Wherein, Im11,Ω2) be that bispectrum phase between m road microphone and first via microphone signal is poor, []*For adjoint operator, || be delivery operator.
5. sound localization method as claimed in claim 1, wherein, utilizes described in step c first via microphone signal and the mutually two spectrum values between self to estimate that two spectrums unit weights comprise:
Set t frame first via microphone signal and the mutually two spectrum values between self areSo, estimate that two spectrums unit weight comprises the steps:
Step c1: according toWithEstimate two spectrum priori signal to noise ratios of t frame first via microphone signal;
Step c2: two spectrums unit weight be calculated as two spectrum priori signal to noise ratios andFunction.
6. sound localization method as claimed in claim 1, wherein, structure, for two spectrum weighted space correlation matrixes in current candidate orientation, comprises the steps: described in step e
Step e1: poor according to the bispectrum phase between each road microphone and first via microphone, the bispectrum phase difference vector in calculated complex territory;
Step e2: according to current candidate orientation, calculate the poor compensation vector of bispectrum phase;
Step e3: according to bispectrum phase difference vector and the poor compensation vector of bispectrum phase, calculate the rear bispectrum phase difference vector of compensation;
Step e4: according to bispectrum phase difference vector after compensation and two spectrums unit weight, structure is for original two spectrum weighted space correlation matrixes in current candidate orientation;
Step e5: adopt time smoothing strategy to carry out post processing to original two spectrum weighted space correlation matrixes.
7. as the sound localization method of claim 1 or 6, wherein, adopt time smoothing strategy to carry out post processing to original two spectrum weighted space correlation matrixes described in step e5, comprising:
Wherein, R (θ) is the two spectrum weighted space correlation matrixes for current candidate orientation θ,For original two spectrum weighted space correlation matrixes, 0≤β < 1 is smoothing factor.
8. sound localization method as claimed in claim 1, wherein, calculates the sound bearing cost function for current candidate orientation described in step f, comprising:
Setting for two spectrum weighted space correlation matrixes of current candidate orientation θ is R (θ), and microphone array comprises M microphone altogether, and M the Complex eigenvalues value of R (θ) is arranged as from big to small according to mould value | λ1(θ)|≥|λ2(θ)|≥...≥|λM(θ) |, so, be defined as the function of M Complex eigenvalues value for the sound bearing cost function of current candidate orientation θ.
9. sound localization method as claimed in claim 1, wherein, according to the corresponding orientation of cost function maximum, sound bearing, obtains the estimation of voice sound bearing described in step g, comprising:
Setting all candidate orientation composition candidate orientation set Θ, is J (θ), so voice sound bearing for the sound bearing cost function of current candidate orientation θBe estimated as:
CN201510937548.3A 2015-12-15 2015-12-15 Bispectrum weighted spatial correlation matrix-based speech sound source localization method Pending CN105609113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510937548.3A CN105609113A (en) 2015-12-15 2015-12-15 Bispectrum weighted spatial correlation matrix-based speech sound source localization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510937548.3A CN105609113A (en) 2015-12-15 2015-12-15 Bispectrum weighted spatial correlation matrix-based speech sound source localization method

Publications (1)

Publication Number Publication Date
CN105609113A true CN105609113A (en) 2016-05-25

Family

ID=55988996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510937548.3A Pending CN105609113A (en) 2015-12-15 2015-12-15 Bispectrum weighted spatial correlation matrix-based speech sound source localization method

Country Status (1)

Country Link
CN (1) CN105609113A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106289506A (en) * 2016-09-06 2017-01-04 大连理工大学 A kind of method using POD decomposition method to eliminate flow field wall microphone array noise signal
CN106526541A (en) * 2016-10-13 2017-03-22 杭州电子科技大学 Sound positioning method based on distribution matrix decision
CN107219512A (en) * 2017-03-29 2017-09-29 北京大学 A kind of sound localization method based on acoustic transfer function
CN108198562A (en) * 2018-02-05 2018-06-22 中国农业大学 A kind of method and system for abnormal sound in real-time positioning identification animal house
CN108398664A (en) * 2017-02-07 2018-08-14 中国科学院声学研究所 A kind of analytic expression space for microphone array solves aliasing method
CN108540898A (en) * 2017-03-03 2018-09-14 松下电器(美国)知识产权公司 Sound source detection device and method, the recording medium for recording sound source locator
CN109831709A (en) * 2019-02-15 2019-05-31 杭州嘉楠耘智信息科技有限公司 Sound source orientation method and device and computer readable storage medium
CN110082724A (en) * 2019-05-31 2019-08-02 浙江大华技术股份有限公司 A kind of sound localization method, device and storage medium
CN110133594A (en) * 2018-02-09 2019-08-16 北京搜狗科技发展有限公司 A kind of sound localization method, device and the device for auditory localization
CN110580911A (en) * 2019-09-02 2019-12-17 青岛科技大学 beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences
CN110728988A (en) * 2019-10-23 2020-01-24 浪潮金融信息技术有限公司 Implementation method of voice noise reduction camera for self-service terminal equipment
CN111352075A (en) * 2018-12-20 2020-06-30 中国科学院声学研究所 Underwater multi-sound-source positioning method and system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEI XUE等: ""Noise Robust Direction of Arrival Estimation for Speech Source With Weighted Bispectrum Spatial Correlation Matrix"", 《IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING》 *
WEI XUE等: ""Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferences"", 《INTERSPEECH 2014》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106289506B (en) * 2016-09-06 2019-03-05 大连理工大学 A method of flow field wall surface microphone array noise signal is eliminated using POD decomposition method
CN106289506A (en) * 2016-09-06 2017-01-04 大连理工大学 A kind of method using POD decomposition method to eliminate flow field wall microphone array noise signal
CN106526541A (en) * 2016-10-13 2017-03-22 杭州电子科技大学 Sound positioning method based on distribution matrix decision
CN106526541B (en) * 2016-10-13 2019-01-18 杭州电子科技大学 Sound localization method based on distribution matrix decision
CN108398664A (en) * 2017-02-07 2018-08-14 中国科学院声学研究所 A kind of analytic expression space for microphone array solves aliasing method
CN108398664B (en) * 2017-02-07 2020-09-08 中国科学院声学研究所 Analytic spatial de-aliasing method for microphone array
CN108540898B (en) * 2017-03-03 2020-11-24 松下电器(美国)知识产权公司 Sound source detection device and method, and recording medium having sound source detection program recorded thereon
CN108540898A (en) * 2017-03-03 2018-09-14 松下电器(美国)知识产权公司 Sound source detection device and method, the recording medium for recording sound source locator
CN107219512A (en) * 2017-03-29 2017-09-29 北京大学 A kind of sound localization method based on acoustic transfer function
CN107219512B (en) * 2017-03-29 2020-05-22 北京大学 Sound source positioning method based on sound transfer function
CN108198562A (en) * 2018-02-05 2018-06-22 中国农业大学 A kind of method and system for abnormal sound in real-time positioning identification animal house
CN110133594A (en) * 2018-02-09 2019-08-16 北京搜狗科技发展有限公司 A kind of sound localization method, device and the device for auditory localization
CN111352075B (en) * 2018-12-20 2022-01-25 中国科学院声学研究所 Underwater multi-sound-source positioning method and system based on deep learning
CN111352075A (en) * 2018-12-20 2020-06-30 中国科学院声学研究所 Underwater multi-sound-source positioning method and system based on deep learning
CN109831709A (en) * 2019-02-15 2019-05-31 杭州嘉楠耘智信息科技有限公司 Sound source orientation method and device and computer readable storage medium
CN110082724A (en) * 2019-05-31 2019-08-02 浙江大华技术股份有限公司 A kind of sound localization method, device and storage medium
CN110580911B (en) * 2019-09-02 2020-04-21 青岛科技大学 Beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences
CN110580911A (en) * 2019-09-02 2019-12-17 青岛科技大学 beam forming method capable of inhibiting multiple unstable sub-Gaussian interferences
CN110728988A (en) * 2019-10-23 2020-01-24 浪潮金融信息技术有限公司 Implementation method of voice noise reduction camera for self-service terminal equipment

Similar Documents

Publication Publication Date Title
CN105609113A (en) Bispectrum weighted spatial correlation matrix-based speech sound source localization method
US11308974B2 (en) Target voice detection method and apparatus
CN112526451B (en) Compressed beam forming and system based on microphone array imaging
CN108731886B (en) A kind of more leakage point acoustic fix ranging methods of water supply line based on iteration recursion
CN107817465A (en) The DOA estimation method based on mesh free compressed sensing under super-Gaussian noise background
CN109782231B (en) End-to-end sound source positioning method and system based on multi-task learning
CN105068048A (en) Distributed microphone array sound source positioning method based on space sparsity
CN110146846B (en) Sound source position estimation method, readable storage medium and computer equipment
CN107102296A (en) A kind of sonic location system based on distributed microphone array
CN104898086B (en) Estimate sound source direction method suitable for the sound intensity of mini microphone array
CN103854660B (en) A kind of four Mike&#39;s sound enhancement methods based on independent component analysis
CN101893698B (en) Noise source test and analysis method and device
CN103559888A (en) Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle
CN106468770A (en) Closely optimum radar target detection method under K Distribution Clutter plus noise
CN106066468A (en) A kind of based on acoustic pressure, the vector array port/starboard discrimination method of vibration velocity Mutual spectrum
CN108957403B (en) Gaussian fitting envelope time delay estimation method and system based on generalized cross correlation
CN111948598B (en) Method and device for detecting space domain interference signal
CN109407046A (en) A kind of nested array direction of arrival angle estimation method based on variational Bayesian
CN104360305A (en) Radiation source direction finding positioning method of uniting compressed sensing and signal cycle stationary characteristics
CN102915735B (en) Noise-containing speech signal reconstruction method and noise-containing speech signal device based on compressed sensing
Wang Multi-band multi-centroid clustering based permutation alignment for frequency-domain blind speech separation
CN106226729A (en) Relatively prime array direction of arrival angular estimation method based on fourth-order cumulant
CN104665875A (en) Ultrasonic Doppler envelope and heart rate detection method
Lemos et al. Using matrix norms to estimate the direction of arrival of planar waves on an ULA
Li et al. Noise reduction of ship-radiated noise based on noise-assisted bivariate empirical mode decomposition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160525

RJ01 Rejection of invention patent application after publication