CN102854494A

CN102854494A - Sound source locating method and device

Info

Publication number: CN102854494A
Application number: CN2012102810199A
Authority: CN
Inventors: 彭迎标; 邵诗强
Original assignee: TCL Corp
Current assignee: TCL Corp
Priority date: 2012-08-08
Filing date: 2012-08-08
Publication date: 2013-01-02
Anticipated expiration: 2032-08-08
Also published as: CN102854494B

Abstract

The invention is suitable for the technical field of sound processing and provides a sound source locating method and device. The method comprises the steps of: collecting sound source signals by utilizing a microphone array and preprocessing the sound source signals collected by any two microphones; confirming a cross-power spectral density function of the two sound source signals; confirming a weighting function adjusted along with the variation of the present signal to noise ratio; confirming a sequence of values of the cross-correlation function of the two sound source signals according to the cross-power spectral density function and the weighting function; confirming the time delay of the sound source singles to two microphones according to the maximum value of the cross-correlation function; and locating the sound source positions according to the permutation distribution of the microphone array and the time delay of the sound source signals to the any two microphones. According to the method and the device, the adopted weighting function can be correspondingly adjusted along with the variation of the present signal to noise ratio to ensure that under the environment that the signal to noise ratio of a sound source is changed, the time delay of the sound source can be accurately obtained through correspondingly adjusting the weighting function, and therefore, the sound source locating accuracy is improved.

Description

A kind of sound localization method and device

Technical field

The invention belongs to the acoustic processing technical field, relate in particular to a kind of sound localization method and device.

Background technology

In video conference, security protection or in some industrial application, usually need sound source is positioned, but in some scenarios, because the uncertainty of outside sound source environment, voice signal is subject to outside noise and disturbs, so that signal to noise ratio (S/N ratio) changes, in the existing auditory localization technology, obtain one group of voice data by microphone array, estimate through carrying out time delay with phase tranformation broad sense intercorrelation method (PHAT-GCC) after the pre-service again, according to the arranged distribution of microphone in time delay result and the microphone array, can determine the position of sound source by geometric model.Because in the existing PHAT-GCC method, because the signal to noise ratio (S/N ratio) of sound-source signal may change with environment, when signal energy is less, the denominator that carries out the weighting function of frequency domain weighting can go to zero, so that the value of weighting function becomes very large, the time delay resultant error of obtaining like this is also larger, and also can there be very large error in the sound source position of orienting at last.

Summary of the invention

In view of the above problems, the object of the present invention is to provide a kind of sound localization method, be intended to solve in the existing auditory localization technology because the signal to noise ratio (S/N ratio) of sound-source signal when changing, it is very large that the value of weighting function may become, so that the very large technical matters of auditory localization resultant error.

The present invention is achieved in that a kind of sound localization method, comprises the steps:

Microphone array gathers sound-source signal, and the sound-source signal of wherein any two microphone collections is carried out pre-service;

Determine the cross-spectral density function through described pretreated two-way sound-source signal;

Determine to change the weighting function of adjusting with current signal to noise ratio (S/N ratio);

Determine the value sequence of the cross correlation function of described two-way sound-source signal according to described cross-spectral density function and weighting function, and determine that according to the maximal value of described cross correlation function sound-source signal arrives the time delay of described two microphones;

According to the time delay that arranged distribution and the described sound-source signal of microphone array arrives wherein said two microphones, localization of sound source position.

A further object of the present invention is to provide a kind of sound source locating device, comprising:

Microphone array gathers pretreatment unit, is used for microphone array and gathers sound-source signal, and the sound-source signal of wherein any two microphone collections is carried out pre-service;

The cross-spectral density determining unit is used for definite cross-spectral density function through described pretreated two-way sound-source signal;

The weighting function determining unit is used for determining to change the weighting function of adjusting with current signal to noise ratio (S/N ratio);

The time delay determining unit is used for determining according to described cross-spectral density function and weighting function the value sequence of the cross correlation function of described two-way sound-source signal, and determines that according to the maximal value of described cross correlation function sound-source signal arrives the time delay of described two microphones;

The auditory localization unit is for the time delay that arranged distribution and described sound-source signal according to microphone array arrive wherein said two microphones, localization of sound source position.

The invention has the beneficial effects as follows: because sound localization method provided by the invention and the device the weighting function that adopts can make corresponding adjustment with the variation of current signal to noise ratio (S/N ratio), so that because the impact of the factors such as ground unrest, reverberation, under the environment that the sound source signal to noise ratio (S/N ratio) changes, by corresponding adjustment weighting function, also but the time delay of Obtaining Accurate voice signal has improved the auditory localization precision.

Description of drawings

Fig. 1 is the process flow diagram of the sound localization method that provides of first embodiment of the invention;

Fig. 2 is the process flow diagram of the sound localization method that provides of second embodiment of the invention;

Fig. 3 is the block diagram of the sound source locating device that provides of third embodiment of the invention;

Fig. 4 is the block diagram of the sound source locating device that provides of fourth embodiment of the invention.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.

For technical solutions according to the invention are described, describe below by specific embodiment.

Embodiment one:

Fig. 1 shows the flow process of the sound localization method that first embodiment of the invention provides, and only shows for convenience of explanation the part relevant with the embodiment of the invention.

The sound localization method that the embodiment of the invention provides comprises:

Step S101, microphone array gather sound-source signal, and the sound-source signal of wherein any two microphone collections is carried out pre-service.

Microphone array is the microphone set that a plurality of microphones are arranged according to certain way, in the auditory localization technology, be usually used in the sound-source signal collection, can obtain one group of sound-source signal, in this step, appoint and to get the sound-source signal that two microphones wherein collect and carry out pre-service, comprise filtering and minute frame etc.

Step S102, definite cross-spectral density function through described pretreated two-way sound-source signal;

Step S103, definite weighting function of adjusting that changes with current signal to noise ratio (S/N ratio);

Step S104, determine the value sequence of the cross correlation function of described two-way sound-source signal according to described cross-spectral density function and weighting function, and determine that according to the maximal value of described cross correlation function sound-source signal arrives the time delay of described two microphones (mistiming).

Step S102-S104 provides definite sound-source signal to arrive the process of the time delay of two microphones, the degree of accuracy of determining time delay has determined the degree of accuracy of auditory localization, general time delay determines that method is: cross-spectral density function and the weighting function of at first determining the two-way sound-source signal, product according to described cross-spectral density function and weighting function carries out the value sequence that inverse fourier transform obtains the cross correlation function of two paths of signals again, determines described time delay according to the maximal value of described cross correlation function.But existing weighting function can't be followed the variation of current signal to noise ratio (S/N ratio) and be changed, this weighting function can't be resisted larger ground unrest and reverberation, and when the voice signal ability hour, the value of described weighting function is very large, follow-up postponing a meeting or conference when definite produces very large error.And in embodiments of the present invention, the determined weighting function of step S103 changes with current signal to noise ratio (S/N ratio) makes corresponding adjustment, so that the functional value of weighting function can hour not become very large because of the speech signal energy, and then guarantee the really degree of accuracy of fixed response time.

Step S105, the time delay that arrives wherein said two microphones according to arranged distribution and the described sound-source signal of microphone array, the localization of sound source position.

The principle of auditory localization technology is by determining that sound-source signal arrives the time delay of two microphones, and according to the particular location of described microphone, determine the sound source particular location by geometric model, present embodiment determined higher accuracy the time delay, can pass through the accurate localization of sound source of geometric analysis method position, concrete localization method is identical with existing auditory localization technology, repeats no more herein.

The key distinction of the embodiment of the invention and existing auditory localization technology is, the weighting function that present embodiment provides changes and corresponding adjustment with current signal to noise ratio (S/N ratio), so that violent the change can not occur because current signal to noise ratio (S/N ratio) changes in the functional value of weighting function, the degree of accuracy of last so definite time delay value is guaranteed, and then has improved the auditory localization degree of accuracy.

Embodiment two:

Fig. 2 shows the flow process of the sound localization method that second embodiment of the invention provides, and only shows for convenience of explanation the part relevant with the embodiment of the invention.

Step S201, microphone array gather sound-source signal;

Step S202, the sound-source signal of any two microphone collections in the described microphone array is carried out bandpass filtering, obtain the sound-source signal behind the two-way bandpass filtering;

Step S203, the sound-source signal behind the described two-way bandpass filtering is carried out windowing divide frame to process, obtain in short-term stationary signal of two-way.

Above-mentioned steps S201-S03 as step S101 among the embodiment one a kind of specifically preferred embodiment.

In step S201, suppose that the sound-source signal that described two microphones collect is respectively:

x ₁(t)＝a ₁s ₁(t)+n ₁(t) （1）

x ₂(t)＝a ₂s ₁(t+D)+n ₂(t) （2）

Wherein, a ₁, a ₂Be the sound attenuating factor, owing to be that sound source is near-field signals, can think a ₁, a ₂Be that 1, D is the time delay that sound-source signal arrives described two microphones, n ₁(t), n ₂(t) be described two noise signals that microphone receives.

In step S202, the sound-source signal that microphone is collected carries out bandpass filtering, with the noise filtering of low-frequency range and high band, for subsequent treatment provides sound-source signal behind the two-way bandpass filtering.

In step S203, as a kind of implementation, the sound-source signal after using Hamming window function to described two-way bandpass filtering divides frame, obtains in short-term stationary signal of two-way, and windowing divides frame generally to adopt the overlapping method of frame and frame.Two-way in short-term stationary signal is:

s ₁(λ，n)＝x ₁(n+d(λ-1)N)w(n) （3）

s ₂(λ，n)＝x ₂(n+d(λ-1)N)w(n) （4）

Wherein w (n) is Hamming window function, and N is the length of window function w (n), and d is the shift parameters between the consecutive frame, and λ is frame number.

Step S204, by end-point detection judge described two-way in short-term stationary signal whether be voice signal; It is execution in step 205; No execution in step 207.

Step S205, determine current signal to noise ratio (S/N ratio), current signal to noise ratio (S/N ratio) is: SNR (λ)=aSNR (λ-1)+(1-a) SNR_0, wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio), the priori signal to noise ratio (S/N ratio) that SNR_0 tries to achieve for the energy ratio of using current speech signal frame and last non-speech audio frame, a is smoothing factor;

Step S206, to described two-way in short-term stationary signal carry out Fast Fourier Transform (FFT), determine again the in short-term cross-spectral density function of stationary signal of described two-way;

Step S207, give up in short-term stationary signal of described two-way, upgrade signal to noise ratio snr (λ)=SNR (λ-1), wherein SNR (λ-1) be previous frame sound-source signal signal to noise ratio (S/N ratio), and this flow process finishes, and enters the next frame processing;

Above-mentioned steps S204-S207 is that one kind of step S102 is specifically preferred embodiment among the embodiment one.Among the step S204 by end-point detection judge two-way in short-term stationary signal whether be voice signal, in the present embodiment, the sound-source signal that microphone collects comprises voice signal and the ambient noise signal of sound source, if described sound source is not during sounding, the sound-source signal that described microphone collects only is ambient noise signal, concrete, when detecting the in short-term short-time energy of stationary signal of described two-way (energy of a short time period of sound signal) and short-time zero-crossing rate (signal waveform is passed the number of times of transverse axis (zero level) in the unit interval) all greater than corresponding threshold value, can judge that current sound-source signal is voice signal.

When the voice signal λ frame after determining minute frame is non-speech audio, current signal to noise ratio (S/N ratio) then

SNR(λ)＝SNR(λ-1) （8）

When the voice signal λ frame after determining minute frame is voice signal, current signal to noise ratio (S/N ratio) then

SNR(λ)＝aSNR(λ-1)+(1-a)SNR_0 （9）

Wherein, SNR (λ-1) is the signal to noise ratio (S/N ratio) of previous frame, and SNR_0 is the energy ratio of current speech signal frame and last non-speech audio frame, and a is smoothing factor.

When definite current sound-source signal is voice signal, to described two-way in short-term stationary signal carry out Fast Fourier Transform (FFT), determine again the in short-term cross-spectral density function of stationary signal of described two-way.Concrete, the two-way voice signal in formula (3) and the formula (4) is carried out Fast Fourier Transform (FFT), have

S_{1} (λ, k) = Σ_{n = 0}^{N - 1} s_{1} (λ, n) \exp (- j \frac{2 π}{N} nk) - - - (5)

S_{2} (λ, k) = Σ_{n = 0}^{N - 1} s_{2} (λ, n) \exp (- j \frac{2 π}{N} nk) - - - (6)

Therefore, can be in the hope of the cross-spectral density function of described two-way voice signal:

R_{12} (λ, k) = S_{1} (λ, k) S_{2}^{*} (λ, k) - - - (7)

Wherein, s ₁(λ, n) and s ₂(λ, n) is the finite length sequence of N for length, obtains S through after the Fourier transform ₁(λ, k) and S ₂(λ, k), Be S ₂The conjugate function of (λ, k).

When definite current two-way when stationary signal is non-speech audio in short-term, give up in short-term stationary signal of described two-way.When detecting described two-way in short-term steadily for non-speech audio, there is no need to carry out follow-up computing this moment again, so it is steady in short-term to give up described two-way among the step S207, has just reduced so to a certain extent calculated amount.

Step S208, determine weighting function according to described current signal to noise ratio (S/N ratio)

Perhaps

φ wherein ₁₂(w) be the cross-spectral density function of sound-source signal, ρ is the regulatory factor proportional with current signal to noise ratio snr (λ), Be coherence function, wherein φ ₁(w) and φ ₂(w) be the autocorrelation function of described two-way voice signal.

Above-mentioned steps S208 is that one kind of step S103 specifically preferred embodiment at first needs to determine signal to noise ratio (S/N ratio) among the embodiment one, determines weighting function according to described signal to noise ratio (S/N ratio) again.

After determining current signal to noise ratio (S/N ratio), determine again corresponding with it weighting function.In step S208, if do not consider additive noise in the actual environment, weighting function described in the present embodiment is:

If the consideration additive noise, weighting function described in the present embodiment is:

Wherein, φ ₁₂(w) be the cross-spectral density function of voice signal, ρ is the regulatory factor proportional with current signal to noise ratio snr (k),

Be coherence function, wherein φ ₁(w) and φ ₂(w) be the autocorrelation function of described two-way voice signal.

In the prior art, traditional weighting function of frequency domain is This weighting function can't be resisted larger noise and reverberation impact in the practical application, and when speech signal energy hour, the weighting function denominator approaches zero, thereby produces larger error.And in embodiments of the present invention, will be suc as formula the weighting function shown in (10) or the formula (11), be associated with current signal to noise ratio (S/N ratio), wherein ρ is the regulatory factor proportional with current signal to noise ratio snr (λ), the value of ρ is to draw by the many experiments test at the sound source environment, this value relies on current signal to noise ratio snr (λ), different SNR (λ), ρ gets different values, SNR (λ) is higher, and the value of ρ is just larger, as a kind of concrete value mode, when SNR (λ)≤10dB, the span of ρ is 0.3≤ρ≤0.55; When 10dB＜SNR (λ)≤25dB, the span of ρ is 0.55＜ρ≤0.75; When 25dB＜SNR (λ), the span of ρ is 0.75＜ρ≤0.85.

For formula (10), if current signal to noise ratio (S/N ratio) is smaller, namely the energy comparison of voice signal is little, at this moment φ ₁₂(w) smaller, if ρ gets 0.5, so weighting function

Functional value compare with existing weighting function, weighted value is much smaller, can reduce to a certain extent error; For formula (11), further contemplate additive noise, also comprise coherence function in the denominator term of weighting function

Shown in the signal value size of the size of related function and voice signal irrelevant, the functional value that has further guaranteed weighting function can big ups and downs, have reduced error.

Step S209, the product of described cross-spectral density function and weighting function obtained the value sequence of the cross correlation function of described two-way sound-source signal through inverse Fourier transform;

Step S210, the value sequence of described cross correlation function is carried out peak value detect, obtain sample point corresponding to maximum of points, and determine that described sound-source signal arrives the time delay of described two microphones interval time according to sample point.

Above-mentioned steps S209-S210 is that one kind of step S104 is specifically preferred embodiment among the embodiment one.

In step S209, to the cross-spectral density function R in the formula (7) ₁₂Weighting function in (λ, k) and formula (10) or the formula (11)

Product carry out inverse Fourier transform, obtain the cross correlation function of described two-way voice signal:

In step S210, to described cross correlation function r ₁₂(λ, n) carries out peak value and detects, and gets the wherein corresponding sample point of maximum discrete value, and the described sample point that obtains and sample point are multiplied each other interval time, can obtain the time delay of described two-way sound-source signal.

Step S211, the time delay that arrives wherein said two microphones according to arranged distribution and the described sound-source signal of microphone array, the localization of sound source position, this flow process finishes, and enters next frame and processes.

After obtaining time delay value, can determine the sound source particular location according to the aggregation model of microphone position in the microphone array again.

One kind of step S105 specifically preferred embodiment among the step S211 embodiment one.

The embodiment of the invention has been listed concrete preferred implementation step to step wherein on the basis of embodiment one, can realize that sound source accurately locates.

Embodiment three:

Fig. 3 shows the structure of the sound source locating device that third embodiment of the invention provides, and only shows for convenience of explanation the part relevant with the embodiment of the invention.

The sound source locating device that the embodiment of the invention provides comprises:

Microphone array gathers pretreatment unit 301, is used for microphone array and gathers sound-source signal, and the sound-source signal of wherein any two microphone collections is carried out pre-service;

Cross-spectral density determining unit 302 is used for definite cross-spectral density function through described pretreated two-way sound-source signal;

Weighting function determining unit 303 is used for determining to change the weighting function of adjusting with current signal to noise ratio (S/N ratio);

Time delay determining unit 304, be used for determining according to described cross-spectral density function and weighting function the value sequence of the cross correlation function of described two-way sound-source signal, and determine that according to the maximal value of described cross correlation function sound-source signal arrives the time delay of described two microphones;

Auditory localization unit 305 is for the time delay that arranged distribution and described sound-source signal according to microphone array arrive wherein said two microphones, localization of sound source position.

The functional unit 301-305 that present embodiment provides respectively correspondence has realized step S101-S105 among the embodiment one, wherein, microphone array gathers that pretreatment unit 301 gathers sound-source signals and to after the two-way sound-source signal pre-service wherein, cross-spectral density determining unit 302 and weighting function determining unit 303 are determined respectively cross-spectral density function and weighting function, described weighting function can change with current signal to noise ratio (S/N ratio) makes corresponding adjustment, so that the value of weighting function can acute variation, time delay determining unit 304 according to described cross-spectral density function and weighting function determine sound-source signal arrive described two microphones the time delay, auditory localization unit 305 again can the localization of sound source position according to arranged distribution and the described time delay of microphone array.Weighting function determining unit 303 determined weighting functions are followed the variation of current signal to noise ratio (S/N ratio) and are changed in the sound source locating device that example of the present invention provides, and this is so that the time delay result's who obtains degree of accuracy is higher, thereby can improve the auditory localization degree of accuracy.

Embodiment four:

Fig. 4 shows the structure of the sound source locating device that fourth embodiment of the invention provides, and only shows for convenience of explanation the part relevant with the embodiment of the invention.

Microphone array gathers pretreatment unit 401, is used for microphone array and gathers sound-source signal, and the sound-source signal of wherein any two microphone collections is carried out pre-service;

Cross-spectral density determining unit 402 is used for definite cross-spectral density function through described pretreated two-way sound-source signal;

Weighting function determining unit 403 is used for determining to change the weighting function of adjusting with current signal to noise ratio (S/N ratio);

Time delay determining unit 404, be used for determining according to described cross-spectral density function and weighting function the value sequence of the cross correlation function of described two-way sound-source signal, and determine that according to the maximal value of described cross correlation function source sound arrives the time delay of described two microphones;

Auditory localization unit 405 is for the time delay that arranged distribution and described sound-source signal according to microphone array arrive wherein said two microphones, localization of sound source position.

Wherein, described microphone array collection pretreatment unit 401 comprises:

Microphone array acquisition module 4011 is used for microphone array and gathers sound-source signal;

Bandpass filtering modules block 4012 is used for the sound-source signal of any two the microphone collections of described microphone array is carried out bandpass filtering, the sound-source signal behind the two-way bandpass filtering;

Divide frame processing module 4013, carry out windowing for the sound-source signal to described two-way process bandpass filtering and divide frame to process, obtain in short-term stationary signal of two-way.

Wherein, described cross-spectral density determining unit 402 comprises:

Phonetic decision module 4021 is used for judging by end-point detection whether described pretreated present frame sound-source signal is voice signal;

Current signal to noise ratio (S/N ratio) determination module 4022, be used for when judgement is, determine current signal to noise ratio (S/N ratio), described current signal to noise ratio (S/N ratio) is: SNR (λ)=aSNR (λ-1)+(1-a) SNR_0, wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio), the priori signal to noise ratio (S/N ratio) that SNR_0 tries to achieve for the energy ratio of using current speech signal frame and last non-speech audio frame, a is smoothing factor;

Cross-spectral density determination module 4023, to described two-way in short-term stationary signal carry out Fast Fourier Transform (FFT), determine again the in short-term cross-spectral density function of stationary signal of described two-way;

Signal is given up module 4024, is used for giving up in short-term stationary signal of described two-way, and upgrading signal to noise ratio snr (λ)=SNR (λ-1) when judgement is no, and wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio).

Wherein, weighting function determining unit 403 comprises:

Weighting function determination module 4031 is used for determining that according to described current signal to noise ratio (S/N ratio) weighting function is

Perhaps

φ wherein ₁₂(w) be the cross-spectral density function of sound-source signal, ρ is the regulatory factor proportional with current signal to noise ratio snr (λ-1), Be coherence function, wherein φ ₁(w) and φ ₂(w) be the autocorrelation function of described two-way sound-source signal.

Wherein, described time delay determining unit 404 comprises:

Cross correlation function acquisition module 4041 is for the value sequence that the product of described cross-spectral density function and weighting function is obtained the cross correlation function of described two-way sound-source signal through inverse Fourier transform;

Time delay determination module 4042 is used for the value sequence of described cross correlation function is carried out the peak value detection, obtains sample point corresponding to maximum of points, and determines that described sound-source signal arrives the time delay of described two microphones interval time according to sample point.

The embodiment of the invention is on the basis of example three, provided the wherein concrete preferred structure of functional unit, corresponding each step that realizes among the embodiment two, concrete, after microphone array acquisition module 4011 collects sound-source signal, again by 4013 pairs of bandpass filtering modules block 4012 and minute frame processing modules wherein arbitrarily the two-way sound-source signal carry out pre-service, when phonetic decision module 4021 detects current sound-source signal and is voice signal, current signal to noise ratio (S/N ratio) determination module 4022 is determined current signal to noise ratio (S/N ratio), and by 4023 pairs of described two-way of cross-spectral density determination module in short-term stationary signal carry out Fourier transform, determine again the in short-term cross-spectral density function of stationary signal of described two-way, otherwise give up module 4024 by signal and give up in short-term stationary signal of described two-way, and the renewal signal to noise ratio (S/N ratio), can save unnecessary calculation procedure like this.Current signal to noise ratio (S/N ratio) is determined weighting function by weighting function determination module 4031 according to described current signal to noise ratio (S/N ratio) after determining again, as a kind of implementation, and described weighting function

Perhaps φ wherein ₁₂(w) be the cross-spectral density function of sound-source signal, ρ is the regulatory factor proportional with current signal to noise ratio snr (λ-1), Be coherence function, wherein φ ₁(w) and φ ₂(w) be the autocorrelation function of described two-way sound-source signal, wherein the value of ρ is to draw by the many experiments test at the sound source environment, value can be with reference to embodiment two, and this value relies on current signal to noise ratio snr (λ), different SNR (λ), ρ gets different values, SNR (λ) is higher, and the value of ρ is just larger, as SNR (λ) when diminishing, the ρ value is followed and is diminished, so the functional value of the weighting function in the embodiment of the invention acute variation can not occur.Cross correlation function acquisition module 4041 obtains the product of described cross-spectral density function and weighting function the cross correlation function of described two-way sound-source signal through inverse Fourier transform, the value sequence of 4042 pairs of described cross correlation functions of time delay determination module carries out peak value and detects, obtain sample point corresponding to maximum of points, and described sample point be multiply by the time in sampling interval, can obtain the time delay that described sound-source signal arrives described two microphones, after time delay was determined, auditory localization unit 405 can accurately navigate to the position of sound source accordingly with the arranged distribution of microphone array.

The present embodiment correspondence has realized each step among the embodiment two, provides concrete cross-spectral density function and weighting function to determine mode, can realize that sound source accurately locates.

To sum up, the sound localization method that the embodiment of the invention provides and device are compared with existing auditory localization technology, can improve the auditory localization precision.

One of ordinary skill in the art will appreciate that, realize that all or part of step in above-described embodiment method is to come the relevant hardware of instruction to finish by program, described program can be in being stored in a computer read/write memory medium, described storage medium is such as ROM/RAM, disk, CD etc.

The above only is preferred embodiment of the present invention, not in order to limiting the present invention, all any modifications of doing within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims

1. a sound localization method is characterized in that, described method comprises:

2. method as claimed in claim 1 is characterized in that, described microphone array gathers sound-source signal, and the sound-source signal of wherein any two microphone collections is carried out pre-treatment step, specifically comprises:

Microphone array gathers sound-source signal;

Sound-source signal to any two microphone collections in the described microphone array carries out bandpass filtering, obtains the sound-source signal behind the two-way bandpass filtering;

Described two-way is carried out windowing through the sound-source signal of bandpass filtering divide frame to process, obtain in short-term stationary signal of two-way.

3. method as claimed in claim 2 is characterized in that the cross-spectral density function step of the described pretreated two-way sound-source signal of described definite process specifically comprises:

By end-point detection judge described two-way in short-term stationary signal whether be voice signal;

When judgement is, determine current signal to noise ratio (S/N ratio), described current signal to noise ratio (S/N ratio) is: SNR (λ)=aSNR (λ-1)+(1-a) SNR_0, wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio), the priori signal to noise ratio (S/N ratio) that SNR_0 tries to achieve for the energy ratio of using current speech signal frame and last non-speech audio frame, a is smoothing factor;

To described two-way in short-term stationary signal carry out Fast Fourier Transform (FFT), determine again the in short-term cross-spectral density function of stationary signal of described two-way;

When judgement is no, give up in short-term stationary signal of described two-way, and upgrade signal to noise ratio snr (λ)=SNR (λ-1), wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio).

4. method as claimed in claim 3 is characterized in that, when the short-time energy of described sound-source signal and short-time zero-crossing rate during all greater than corresponding threshold value, can judge that current sound-source signal is voice signal.

5. method as claimed in claim 4 is characterized in that, describedly determines to change the weighting function step of adjusting with current signal to noise ratio (S/N ratio), specifically comprises:

Determine weighting function according to described current signal to noise ratio (S/N ratio)

Perhaps

φ wherein ₁₂(w) be the cross-spectral density function of sound-source signal, ρ is the regulatory factor proportional with current signal to noise ratio snr (λ),

Be coherence function, wherein φ ₁(w) and φ ₂(w) be the autocorrelation function of described two-way sound-source signal.

6. method as claimed in claim 5, it is characterized in that, the described value sequence of determining the cross correlation function of described two-way sound-source signal according to described cross-spectral density function and weighting function, and determine that according to the maximal value of described cross correlation function sound-source signal arrives the time delay step of described two microphones, specifically comprises:

The product of described cross-spectral density function and weighting function is obtained the value sequence of the cross correlation function of described two-way sound-source signal through inverse Fourier transform;

Value sequence to described cross correlation function carries out the peak value detection, obtains sample point corresponding to maximum of points, and determines that described sound-source signal arrives the time delay of described two microphones interval time according to sample point.

7. a sound source locating device is characterized in that, described device comprises:

8. install as claimed in claim 7, it is characterized in that, described microphone array gathers pretreatment unit and comprises:

The microphone array acquisition module is used for microphone array and gathers sound-source signal;

Bandpass filtering modules block is used for the sound-source signal of any two the microphone collections of described microphone array is carried out bandpass filtering, the sound-source signal behind the two-way bandpass filtering;

Divide the frame processing module, carry out windowing for the sound-source signal to described two-way process bandpass filtering and divide frame to process, obtain in short-term stationary signal of two-way.

9. install as claimed in claim 8, it is characterized in that described cross-spectral density determining unit comprises:

The phonetic decision module is used for judging by end-point detection whether described pretreated present frame sound-source signal is voice signal;

Current signal to noise ratio (S/N ratio) determination module, be used for when judgement is, determine current signal to noise ratio (S/N ratio), described current signal to noise ratio (S/N ratio) is: SNR (λ)=aSNR (λ-1)+(1-a) SNR_0, wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio), the priori signal to noise ratio (S/N ratio) that SNR_0 tries to achieve for the energy ratio of using current speech signal frame and last non-speech audio frame, a is smoothing factor;

The cross-spectral density determination module, to described two-way in short-term stationary signal carry out Fast Fourier Transform (FFT), determine again the in short-term cross-spectral density function of stationary signal of described two-way;

Signal is given up module, is used for giving up in short-term stationary signal of described two-way, and upgrading signal to noise ratio snr (λ)=SNR (λ-1) when judgement is no, and wherein SNR (λ-1) is previous frame sound-source signal signal to noise ratio (S/N ratio).

10. install as claimed in claim 9, it is characterized in that described weighting function determining unit comprises:

The weighting function determination module is used for determining that according to described current signal to noise ratio (S/N ratio) weighting function is

Perhaps

φ wherein ₁₂(w) be the cross-spectral density function of sound-source signal, ρ is the regulatory factor proportional with current signal to noise ratio snr (λ-1),

11. install as claimed in claim 10, it is characterized in that described time delay determining unit comprises:

The cross correlation function acquisition module is for the value sequence that the product of described cross-spectral density function and weighting function is obtained the cross correlation function of described two-way sound-source signal through inverse Fourier transform;

The time delay determination module is used for that described cross correlation function is carried out peak value and detects, and obtains sample point corresponding to maximum of points, and determines that described sound-source signal arrives the time delay of described two microphones interval time according to sample point.