CN109901114B

CN109901114B - Time delay estimation method suitable for sound source positioning

Info

Publication number: CN109901114B
Application number: CN201910242080.4A
Authority: CN
Inventors: 张承云; 梁龙腾
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2020-10-27
Anticipated expiration: 2039-03-28
Also published as: CN109901114A

Abstract

The invention discloses a time delay estimation method suitable for sound source positioning, which comprises the steps of carrying out signal processing on voice signals obtained by two microphones to obtain a complex cepstrum of a minimum phase component; calculating a signal minimum phase component frequency spectrum and an all-pass component frequency spectrum according to the complex cepstrum of the minimum phase component; calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function; and solving the cross-correlation function of the cross-power spectrum by using an inverse fast Fourier transform method, and calculating to obtain the delay time according to the cross-correlation function. The time delay estimation method provided by the invention can effectively reduce the influence caused by noise and reverberation in the environment of reverberation and noise, thereby improving the adaptability to the noise and the accuracy of time delay estimation.

Description

Time delay estimation method suitable for sound source positioning

Technical Field

The invention relates to the technical field of sound source positioning, in particular to a time delay estimation method suitable for sound source positioning.

Background

In recent years, sound source positioning technology based on microphone arrays is widely applied to various scenes, and time delay and position information determined by the sound source positioning technology provides important information for various voice algorithms such as beam forming, voice enhancement, voice recognition, blind signal separation and the like. The sound source positioning technology based on the time delay estimation has the advantages of low operation cost, less required microphones and the like, and is widely applied to various real-time processing environments. The positioning method of the sound source positioning technology is divided into two steps, wherein the first step is to estimate the time difference between sound waves propagating from a sound source to two microphones, and the second step is to estimate the position of the sound source according to the time difference, wherein the accuracy of the estimation of the time difference of the first step determines the accuracy of the sound source positioning of the second step.

In the prior art, a cross-power spectrum-based delay estimation method is widely researched due to the advantages of low operation cost, high positioning accuracy and capability of estimation under low reverberation (0ms-300ms), but the estimation performance of the method is reduced under the environment with high reverberation, so that the accuracy of delay estimation is reduced. In view of the above, some researchers have proposed a time delay estimation method based on cepstrum pre-filtering, but this method can reduce the influence of reverberation on time delay estimation well, but is susceptible to noise.

Disclosure of Invention

The invention provides a time delay estimation method suitable for sound source positioning, which aims to solve the technical problem that the influence caused by noise and reverberation is difficult to reduce in the prior art.

In order to solve the above technical problem, an embodiment of the present invention provides a time delay estimation method suitable for sound source localization, including:

performing signal processing on voice signals obtained through the two microphones to obtain a complex cepstrum of a minimum phase component;

calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component;

calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function;

and solving the cross-correlation function of the cross-power spectrum by using an inverse fast Fourier transform method, and calculating to obtain the delay time according to the cross-correlation function.

As a preferred scheme, the calculating an improved all-pass component spectrum by using a modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, and calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function specifically includes:

multiplying the frequency spectrum of the all-pass component signal by the modulus of the frequency spectrum of the minimum phase component to obtain the improved all-pass component frequency spectrum;

and calculating the improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating the cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function.

As a preferred scheme, the signal processing is performed on the speech signals obtained by the two microphones to obtain the complex cepstrum of the minimum phase component, specifically:

respectively obtaining a first channel voice signal and a second channel voice signal through two microphones;

performing signal processing on the first channel voice signal and the second channel voice signal to obtain a first channel complex cepstrum and a second channel complex cepstrum;

and homomorphic filtering processing is carried out on the first channel complex cepstrum and the second channel complex cepstrum to obtain a complex cepstrum of the first channel minimum phase component and a complex cepstrum of the second channel minimum phase component.

As a preferred scheme, the signal processing the first channel voice signal and the second channel voice signal specifically includes:

let the first channel speech signal be x₁(t), the second channel speech signal is x₂(t)；

For the filtered and frame-divided signal x₁(t) and x₂(t) detecting the voice end point, and selecting the voice frame of the same frame to obtain the corresponding y₁(t) and y₂(t)；

Are respectively paired with y₁(t) and y₂(t) performing discrete Fourier transform to obtain corresponding Y₁(omega) and Y₂(ω)；

According to Y₁(omega) and Y₂(omega) obtaining said first channel complex cepstrum k₁(n) and the second channel complex cepstrum k₂(n); wherein k is₁(n)＝IFFT(ln(|Y₁(ω)|))，k₂(n)＝IFFT(ln(|Y₂(ω) |)), IFFT is inverse fast fourier transform;

for the first channel complex cepstrum k₁(n) and the second channel complex cepstrum k₂(n) homomorphic filtering processing is carried out to respectively obtain the complex cepstrum k of the minimum phase component of the first channel_1min(n) and a complex cepstrum k of the second channel minimum phase component_2min(n)；

Wherein k is_1min(n)＝u*k₁(n)，k_2min(n)＝u*k₂(n)，

N is the number of points of the fourier transform.

As a preferred scheme, the calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component specifically includes:

a complex cepstrum k from the first channel minimum phase component_1min(n) and a complex cepstrum k of the second channel minimum phase component_2min(n) calculating the frequency spectrum Y of the minimum phase component of the first channel_1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel_2min(ω), wherein,

FFT is fast Fourier transform;

according to the frequency spectrum Y of the minimum phase component of the first channel_1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel_2min(omega) calculating to obtain the frequency spectrum Y of the first channel all-pass component signal_1all(ω) and the frequency spectrum Y of the second channel all-pass component signal_2all(ω) wherein Y_1all(ω)＝Y₁(ω)/Y_1min(ω)，Y_2all(ω)＝Y₂(ω)/Y_2min(ω)。

frequency spectrum Y of minimum phase component passing through the first channel_1min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the first channel_1min(ω) |, spectrum Y of the minimum phase component through the second channel_2min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the second channel_2min(ω)|；

The modulus Y of the frequency spectrum of the minimum phase component of the first channel_1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal_1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal_1-nall(ω); wherein, Y_1-nall(ω)＝Y_1all(ω)*|Y_1min(ω)|；

The module value Y of the frequency spectrum of the minimum phase component of the second channel_2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal_2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal_2-nall(ω); wherein, Y_2-nall(ω)＝Y_2all(ω)*|Y_2min(ω)|；

Improving the frequency spectrum Y of an all-pass component signal using the first channel_1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal_2-nall(ω) calculating the improved phase weighting function

Wherein

α＝0.75；

Modifying the first channel to the frequency spectrum Y of the all-pass component signal_1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal_2-nall(ω) multiplying said modified phase weighting function

Calculating the cross-power spectrum G₁₂(ω); wherein the content of the first and second substances,

as a preferred scheme, the cross-correlation function of the cross-power spectrum is solved by an inverse fast fourier transform method, and the delay time is calculated according to the cross-correlation function, specifically:

solving the cross-power spectrum G by an inverse fast Fourier transform method₁₂Cross correlation function R of (omega)₁₂And obtaining the delay time tau after sampling_max(ii) a Wherein R is₁₂(τ)＝IFFT(G₁₂(ω))，τ_max＝argmax_τR₁₂(τ), IFFT is inverse fast fourier transform;

according to the delay time tau after sampling_maxSolving for the delay time delay before sampling₁₂Wherein, delay₁₂＝τ_max*f_s，f_sIs the sampling frequency.

Compared with the prior art, the embodiment of the invention has the beneficial effects that the embodiment of the invention provides a time delay estimation method suitable for sound source positioning, which comprises the following steps: performing signal processing on voice signals obtained through the two microphones to obtain a complex cepstrum of a minimum phase component; calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component; calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function; and solving the cross-correlation function of the cross-power spectrum by using an inverse fast Fourier transform method, and calculating to obtain the delay time according to the cross-correlation function. On the basis of obtaining the all-pass component signal, an improved all-pass component frequency spectrum is obtained by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, an improved phase weighting function is obtained by calculation according to the improved all-pass component frequency spectrum, and a cross-power spectrum is calculated by combining the improved all-pass component frequency spectrum and the improved phase weighting function, so that the effect of the frequency spectrum amplitude of the all-pass component signal on the phase can be effectively avoided, meanwhile, the effect of the signal can be increased, the estimation performance of the time delay estimation method under the noise is effectively increased, and the adaptability to the noise and the accuracy of the time delay estimation are further improved. The cross-correlation function of the cross-power spectrum is solved through an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the time delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environments, the adaptability of the time delay estimation method to the noise is improved, the reverberation resistance of all-pass component signals is kept, the detection of a time delay peak value is more accurate, and the performance of time delay estimation is improved.

Drawings

Fig. 1 is a schematic flowchart of a time delay estimation method suitable for sound source localization according to an embodiment of the present invention;

fig. 2 is a flowchart of a delay estimation method suitable for sound source localization according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a preferred embodiment of the present invention provides a time delay estimation method suitable for sound source localization, including:

s1, performing signal processing on the voice signals obtained by the two microphones to obtain a complex cepstrum of the minimum phase component;

in this embodiment, in order to realize sound source localization, it is necessary to receive voice signals by two microphones respectively to estimate a time difference between sound waves propagating from a sound source to the two microphones, and then estimate a sound source position according to the time difference.

S2, calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component;

s3, calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function;

in the present embodiment, the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal are calculated using a Cepstral warping (CEP) technique to obtain the complex cepstrum of the minimum phase component, by calculating an improved all-pass component frequency spectrum using the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal on the basis of the obtained all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function, so as to ensure that the all-pass component signal can effectively avoid the influence caused by reverberation on the phase and simultaneously can increase the frequency spectrum amplitude of the signal, therefore, the estimation performance of the time delay estimation method under noise is effectively improved, and the adaptability to the noise and the accuracy of time delay estimation are further improved.

And S4, solving the cross-correlation function of the cross-power spectrum by an inverse fast Fourier transform method, and calculating the delay time according to the cross-correlation function.

In the embodiment of the invention, the cross-correlation function of the cross-power spectrum is solved by an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the time delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environment, improve the adaptability of the time delay estimation method to the noise, and simultaneously reserve the anti-reverberation capability of an all-pass component signal, thereby enabling the detection of a time delay peak value to be more accurate, and further improving the performance of time delay estimation.

In this embodiment of the present invention, preferably, the calculating an improved all-pass component spectrum by using a modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, and calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function specifically includes:

the improved phase weighting function is obtained through calculation according to the improved all-pass component frequency spectrum, the cross-power spectrum is obtained through calculation by combining the improved all-pass component frequency spectrum and the improved phase weighting function, so that the all-pass component signal is multiplied by the amplitude of the minimum phase component signal, the estimation performance of the delay estimation algorithm under noise is improved, and further, the delay estimation can be carried out by using the improved phase weighting method through combining a Generalized Cross Correlation (GCC) delay estimation technology.

In the embodiment of the present invention, the signal processing is performed on the speech signals obtained by the two microphones to obtain the complex cepstrum of the minimum phase component, and specifically, the signal processing is performed by:

In this embodiment, to realize sound source localization, it is necessary to obtain a first channel speech signal and a second channel speech signal through two microphones respectively, to estimate a time difference between sound waves propagating from a sound source to the two microphones, and then estimate a sound source position according to the time difference.

In this embodiment of the present invention, the performing signal processing on the first channel voice signal and the second channel voice signal specifically includes:

Wherein k is_1min(n)＝u*k₁(n)，k_2min(n)＝u*k₂(n)，

N is the number of points of the fourier transform.

In this embodiment of the present invention, the calculating a frequency spectrum of a minimum phase component of a signal and a frequency spectrum of an all-pass component signal according to the complex cepstrum of the minimum phase component specifically includes:

FFT is fast Fourier transform;

In this embodiment of the present invention, the calculating, by using the modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, to obtain an improved all-pass component spectrum, and according to the improved all-pass component spectrum, an improved phase weighting function is obtained, and the calculating, by combining the improved all-pass component spectrum and the improved phase weighting function, a cross-power spectrum is specifically:

Wherein

α＝0.75；

in the present embodiment, the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal are calculated using a Cepstral warping (CEP) technique to obtain the complex cepstrum of the minimum phase component, obtaining an improved all-pass component spectrum by calculating a modulus of a spectrum of the minimum phase component and a spectrum of the all-pass component signal based on obtaining the all-pass component signal, and calculating an improved phase weighting function based on the improved all-pass component spectrum, calculating a cross-power spectrum using the improved all-pass component spectrum multiplied by the improved phase weighting function, so as to ensure that the all-pass component signal can effectively avoid the influence caused by reverberation on the phase and simultaneously can increase the frequency spectrum amplitude of the signal, therefore, the estimation performance of the time delay estimation method under noise is effectively improved, and the adaptability to the noise and the accuracy of time delay estimation are further improved.

In this embodiment of the present invention, the cross-correlation function of the cross-power spectrum is solved by an inverse fast fourier transform method, and the delay time is calculated according to the cross-correlation function, specifically:

according to the delay time tau after sampling_maxSolving for the delay time delay before sampling₁₂Wherein, delay₁₂＝τ_max*f_sWherein f is_sIs the sampling frequency.

Referring to fig. 2, a possible specific embodiment of a method for estimating a time delay for sound source localization according to the present invention includes the following steps:

1. using two microphones, including microphone mir₁And microphone mir₂Respectively connecting voice signalsReceiving, the received voice signals are a first channel voice signal and a second channel voice signal, and are respectively marked as x₁(t) and x₂(t)；

2. For the first channel voice signal and the second channel voice signal x of two channels₁(t) and second channel speech signal x₂(t) filtering and framing;

3. for the filtered and frame-divided signal x₁(t) and x₂(t) respectively carrying out voice endpoint detection, and selecting the voice frame of the same frame to obtain the corresponding y₁(t) and y₂(t)；

4. Are respectively paired with y₁(t) and y₂(t) performing a discrete Fourier transform to obtain Y₁(omega) and Y₂(ω)；

5. Separately determine Y₁(omega) and Y₂(ω) corresponding first channel complex cepstrum k₁(n) and second channel complex cepstrum k₂(n), the calculation process is as follows:

k₁(n)＝IFFT(ln(|Y₁(ω)|))，k₂(n)＝IFFT(ln(|Y₂(ω) |)), wherein IFFT is the inverse fast fourier transform;

6. respectively to the first channel complex cepstrum k₁(n) and second channel complex cepstrum k₂(n) homomorphic filtering processing is carried out to obtain the complex cepstrum k of the corresponding first channel minimum phase component_1min(n) and complex cepstrum k of the second channel minimum phase component_2min(n), the calculation process is as follows:

k_1min(n)＝u*k₁(n),k_2min(n)＝u*k₂(n),

n is the number of points of Fourier transform;

7. the frequency spectrum of the minimum phase component of the two channels is respectively calculated: frequency spectrum Y of minimum phase component of first channel_1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel_2min(ω), the calculation process is as follows:

wherein the FFT is a fast Fourier transform;

8. the frequency spectrum of the signal of the two-channel all-pass component is respectively obtained: frequency spectrum Y of first channel all-pass component signal_1all(ω) and the frequency spectrum Y of the second channel all-pass component signal_2all(ω), the calculation process is as follows:

Y_1all(ω)＝Y₁(ω)/Y_1min(ω),Y_2all(ω)＝Y₂(ω)/Y_2min(ω)；

9. the modulus of the spectrum of the minimum phase component is taken to be multiplied by the spectrum of the signal of the all-pass component: using a modulus | Y of a frequency spectrum of a minimum phase component of the first channel_1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal_1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal_1-nall(ω) using a modulus | Y of a frequency spectrum of a minimum phase component of said second channel_2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal_2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal_2-nall(ω), the calculation process is as follows:

Y_1-nall(ω)＝Y_1all(ω)*|Y_1min(ω)|,Y_2-nall(ω)＝Y_2all(ω)*|Y_2min(ω)|；

10. improving the frequency spectrum Y of an all-pass component signal using a first channel_1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal_2-nallImproved phase weighting function by (omega) calculation

The calculation process is as follows:

11. modifying the first channel to the frequency spectrum Y of the all-pass component signal_1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal_2-nall(ω) multiplied by said improvementPhase weighting function

Calculating the cross-power spectrum G₁₂(ω), the calculation process is as follows:

wherein "'" represents conjugation;

12. solving the cross-power spectrum G by an inverse fast Fourier transform method₁₂Cross correlation function R of (omega)₁₂Determining the position of the peak value to obtain the delay time tau after sampling_maxThe calculation process is as follows:

R₁₂(τ)＝IFFT(G₁₂(ω))，τ_max＝argmax_τR₁₂(τ), IFFT is inverse fast fourier transform;

13. by delay time tau after sampling_maxDelay time delay before sampling is obtained by solving₁₂The calculation process is as follows:

delay₁₂＝τ_max*f_swherein f is_sIs the sampling frequency;

compared with the prior art, the time delay estimation method suitable for sound source positioning provided by the embodiment of the invention has the following beneficial effects:

(1) on the basis of obtaining an all-pass component signal, an improved all-pass component frequency spectrum is obtained by utilizing a modulus value of a frequency spectrum of a minimum phase component and a frequency spectrum of the all-pass component signal through calculation, an improved phase weighting function is obtained through calculation according to the improved all-pass component frequency spectrum, and a cross-power spectrum is calculated by combining the improved all-pass component frequency spectrum and the improved phase weighting function, so that the effect of a signal frequency spectrum amplitude can be increased while the influence caused by reverberation on the phase of the all-pass component signal can be effectively avoided, the estimation performance of the time delay estimation method under noise is effectively increased, and the adaptability to the noise and the accuracy of time delay estimation are further improved.

(2) The cross-correlation function of the cross-power spectrum is solved through an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environments, the adaptability of the delay estimation method to the noise is improved, the reverberation resistance of all-pass component signals is kept, the detection of the delay peak value is more accurate, and the performance of delay estimation is improved.

(3) Compared with the traditional CEP cepstrum time delay estimation and cross-power spectrum time delay estimation, the invention ensures that the influence caused by reverberation can be effectively avoided on the phase of the all-pass component signal by adding the module value of the minimum phase component signal to the all-pass component signal, and simultaneously increases the function of the signal spectrum amplitude, so that the time delay estimation method can still effectively estimate under the condition of low signal-to-noise ratio by combining an improved GCC weighting method, and the estimation accuracy under the conditions of low signal-to-noise ratio and reverberation is improved.

(4) Because the module value of the frequency spectrum of the all-pass component signal obtained by the existing CEP time delay estimation technology is equivalent to being whitened, only traditional phase weighting can be combined, and the invention can effectively combine various improved algorithms such as GCC-phase and the like, so that the algorithm has better adaptability to noise, and meanwhile, the reverberation resistance of the all-pass component signal is kept.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A time delay estimation method suitable for sound source positioning is characterized by comprising the following steps:

solving a cross-correlation function of the cross-power spectrum by a fast Fourier inverse transformation method, and calculating to obtain delay time according to the cross-correlation function;

the processing of the speech signals obtained by the two microphones to obtain the complex cepstrum of the minimum phase component specifically includes:

obtaining first channel voice signals x through two microphones respectively₁(t) and second channel speech signal x₂(t)；

For the first channel voice signal x₁(t) and the second channel speech signal x₂(t) signal processing is carried out to obtain a first channel complex cepstrum k₁(n) and second channel complex cepstrum k₂(n)；

For the first channel complex cepstrum k₁(n) and the second channel complex cepstrum k₂(n) homomorphic filtering to obtain the complex cepstrum k of the minimum phase component of the first channel_1min(n) and complex cepstrum k of the second channel minimum phase component_2min(n)；

The calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component specifically includes:

a complex cepstrum k from the first channel minimum phase component_1min(n) and a complex cepstrum k of the second channel minimum phase component_2min(n) calculating the frequency spectrum Y of the minimum phase component of the first channel_1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel_2min(ω)；

According to the frequency spectrum Y of the minimum phase component of the first channel_1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel_2min(omega) calculating to obtain the frequency spectrum Y of the first channel all-pass component signal_1all(ω) and the frequency spectrum Y of the second channel all-pass component signal_2all(ω)；

The calculating an improved all-pass component frequency spectrum by using the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, specifically:

The modulus Y of the frequency spectrum of the minimum phase component of the first channel_1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal_1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal_1-nall(ω); wherein, Y₁-_nall(ω)＝Y_1all(ω)*|Y_1min(ω)|；

The module value Y of the frequency spectrum of the minimum phase component of the second channel_2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal_2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal_2-nall(ω); wherein, Y₂-_nall(ω)＝Y_2all(ω)*|Y_2min(ω)|；

Wherein

2. The method for estimating time delay suitable for sound source localization according to claim 1, wherein the signal processing is performed on the first channel speech signal and the second channel speech signal, specifically:

Wherein k is_1min(n)＝u*k₁(n)，k_2min(n)＝u*k₂(n)，

N is the number of points of the fourier transform.

3. The method for estimating time delay suitable for sound source localization according to claim 2, wherein the calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component specifically comprises:

a complex cepstrum k from the first channel minimum phase component_1min(n) and a complex cepstrum k of the second channel minimum phase component_2min(n) calculating to obtain the firstFrequency spectrum Y of minimum phase component of channel_1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel_2min(ω), wherein,

FFT is fast Fourier transform;

4. The method for time delay estimation suitable for sound source localization according to claim 3, wherein the combining the modified all-pass component spectrum and the modified phase weighting function to calculate a cross-power spectrum comprises:

Calculating cross power spectrum

Wherein the content of the first and second substances,

5. the method for time delay estimation suitable for sound source localization according to claim 4, wherein the cross-correlation function of the cross-power spectrum is solved by an inverse fast fourier transform method, and the delay time is calculated according to the cross-correlation function, specifically:

solving the cross-power spectrum G by an inverse fast Fourier transform method₁₂Cross correlation function R of (omega)₁₂And obtaining the delay time tau after sampling_max(ii) a Wherein R is₁₂(τ)＝IFFT(G₁₂(ω))，τ_max＝arg max_τR₁₂(τ), IFFT is inverse fast fourier transform;