CN109901114B - Time delay estimation method suitable for sound source positioning - Google Patents

Time delay estimation method suitable for sound source positioning Download PDF

Info

Publication number
CN109901114B
CN109901114B CN201910242080.4A CN201910242080A CN109901114B CN 109901114 B CN109901114 B CN 109901114B CN 201910242080 A CN201910242080 A CN 201910242080A CN 109901114 B CN109901114 B CN 109901114B
Authority
CN
China
Prior art keywords
channel
frequency spectrum
minimum phase
signal
phase component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910242080.4A
Other languages
Chinese (zh)
Other versions
CN109901114A (en
Inventor
张承云
梁龙腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN201910242080.4A priority Critical patent/CN109901114B/en
Publication of CN109901114A publication Critical patent/CN109901114A/en
Application granted granted Critical
Publication of CN109901114B publication Critical patent/CN109901114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a time delay estimation method suitable for sound source positioning, which comprises the steps of carrying out signal processing on voice signals obtained by two microphones to obtain a complex cepstrum of a minimum phase component; calculating a signal minimum phase component frequency spectrum and an all-pass component frequency spectrum according to the complex cepstrum of the minimum phase component; calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function; and solving the cross-correlation function of the cross-power spectrum by using an inverse fast Fourier transform method, and calculating to obtain the delay time according to the cross-correlation function. The time delay estimation method provided by the invention can effectively reduce the influence caused by noise and reverberation in the environment of reverberation and noise, thereby improving the adaptability to the noise and the accuracy of time delay estimation.

Description

Time delay estimation method suitable for sound source positioning
Technical Field
The invention relates to the technical field of sound source positioning, in particular to a time delay estimation method suitable for sound source positioning.
Background
In recent years, sound source positioning technology based on microphone arrays is widely applied to various scenes, and time delay and position information determined by the sound source positioning technology provides important information for various voice algorithms such as beam forming, voice enhancement, voice recognition, blind signal separation and the like. The sound source positioning technology based on the time delay estimation has the advantages of low operation cost, less required microphones and the like, and is widely applied to various real-time processing environments. The positioning method of the sound source positioning technology is divided into two steps, wherein the first step is to estimate the time difference between sound waves propagating from a sound source to two microphones, and the second step is to estimate the position of the sound source according to the time difference, wherein the accuracy of the estimation of the time difference of the first step determines the accuracy of the sound source positioning of the second step.
In the prior art, a cross-power spectrum-based delay estimation method is widely researched due to the advantages of low operation cost, high positioning accuracy and capability of estimation under low reverberation (0ms-300ms), but the estimation performance of the method is reduced under the environment with high reverberation, so that the accuracy of delay estimation is reduced. In view of the above, some researchers have proposed a time delay estimation method based on cepstrum pre-filtering, but this method can reduce the influence of reverberation on time delay estimation well, but is susceptible to noise.
Disclosure of Invention
The invention provides a time delay estimation method suitable for sound source positioning, which aims to solve the technical problem that the influence caused by noise and reverberation is difficult to reduce in the prior art.
In order to solve the above technical problem, an embodiment of the present invention provides a time delay estimation method suitable for sound source localization, including:
performing signal processing on voice signals obtained through the two microphones to obtain a complex cepstrum of a minimum phase component;
calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component;
calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function;
and solving the cross-correlation function of the cross-power spectrum by using an inverse fast Fourier transform method, and calculating to obtain the delay time according to the cross-correlation function.
As a preferred scheme, the calculating an improved all-pass component spectrum by using a modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, and calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function specifically includes:
multiplying the frequency spectrum of the all-pass component signal by the modulus of the frequency spectrum of the minimum phase component to obtain the improved all-pass component frequency spectrum;
and calculating the improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating the cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function.
As a preferred scheme, the signal processing is performed on the speech signals obtained by the two microphones to obtain the complex cepstrum of the minimum phase component, specifically:
respectively obtaining a first channel voice signal and a second channel voice signal through two microphones;
performing signal processing on the first channel voice signal and the second channel voice signal to obtain a first channel complex cepstrum and a second channel complex cepstrum;
and homomorphic filtering processing is carried out on the first channel complex cepstrum and the second channel complex cepstrum to obtain a complex cepstrum of the first channel minimum phase component and a complex cepstrum of the second channel minimum phase component.
As a preferred scheme, the signal processing the first channel voice signal and the second channel voice signal specifically includes:
let the first channel speech signal be x1(t), the second channel speech signal is x2(t);
For the filtered and frame-divided signal x1(t) and x2(t) detecting the voice end point, and selecting the voice frame of the same frame to obtain the corresponding y1(t) and y2(t);
Are respectively paired with y1(t) and y2(t) performing discrete Fourier transform to obtain corresponding Y1(omega) and Y2(ω);
According to Y1(omega) and Y2(omega) obtaining said first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n); wherein k is1(n)=IFFT(ln(|Y1(ω)|)),k2(n)=IFFT(ln(|Y2(ω) |)), IFFT is inverse fast fourier transform;
for the first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n) homomorphic filtering processing is carried out to respectively obtain the complex cepstrum k of the minimum phase component of the first channel1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n);
Wherein k is1min(n)=u*k1(n),k2min(n)=u*k2(n),
Figure BDA0002009607750000031
N is the number of points of the fourier transform.
As a preferred scheme, the calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component specifically includes:
a complex cepstrum k from the first channel minimum phase component1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n) calculating the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(ω), wherein,
Figure BDA0002009607750000032
FFT is fast Fourier transform;
according to the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(omega) calculating to obtain the frequency spectrum Y of the first channel all-pass component signal1all(ω) and the frequency spectrum Y of the second channel all-pass component signal2all(ω) wherein Y1all(ω)=Y1(ω)/Y1min(ω),Y2all(ω)=Y2(ω)/Y2min(ω)。
As a preferred scheme, the calculating an improved all-pass component spectrum by using a modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, and calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function specifically includes:
frequency spectrum Y of minimum phase component passing through the first channel1min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the first channel1min(ω) |, spectrum Y of the minimum phase component through the second channel2min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the second channel2min(ω)|;
The modulus Y of the frequency spectrum of the minimum phase component of the first channel1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal1-nall(ω); wherein, Y1-nall(ω)=Y1all(ω)*|Y1min(ω)|;
The module value Y of the frequency spectrum of the minimum phase component of the second channel2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal2-nall(ω); wherein, Y2-nall(ω)=Y2all(ω)*|Y2min(ω)|;
Improving the frequency spectrum Y of an all-pass component signal using the first channel1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) calculating the improved phase weighting function
Figure BDA0002009607750000041
Wherein
Figure BDA0002009607750000042
α=0.75;
Modifying the first channel to the frequency spectrum Y of the all-pass component signal1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) multiplying said modified phase weighting function
Figure BDA0002009607750000043
Calculating the cross-power spectrum G12(ω); wherein the content of the first and second substances,
Figure BDA0002009607750000044
as a preferred scheme, the cross-correlation function of the cross-power spectrum is solved by an inverse fast fourier transform method, and the delay time is calculated according to the cross-correlation function, specifically:
solving the cross-power spectrum G by an inverse fast Fourier transform method12Cross correlation function R of (omega)12And obtaining the delay time tau after samplingmax(ii) a Wherein R is12(τ)=IFFT(G12(ω)),τmax=argmaxτR12(τ), IFFT is inverse fast fourier transform;
according to the delay time tau after samplingmaxSolving for the delay time delay before sampling12Wherein, delay12=τmax*fs,fsIs the sampling frequency.
Compared with the prior art, the embodiment of the invention has the beneficial effects that the embodiment of the invention provides a time delay estimation method suitable for sound source positioning, which comprises the following steps: performing signal processing on voice signals obtained through the two microphones to obtain a complex cepstrum of a minimum phase component; calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component; calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function; and solving the cross-correlation function of the cross-power spectrum by using an inverse fast Fourier transform method, and calculating to obtain the delay time according to the cross-correlation function. On the basis of obtaining the all-pass component signal, an improved all-pass component frequency spectrum is obtained by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, an improved phase weighting function is obtained by calculation according to the improved all-pass component frequency spectrum, and a cross-power spectrum is calculated by combining the improved all-pass component frequency spectrum and the improved phase weighting function, so that the effect of the frequency spectrum amplitude of the all-pass component signal on the phase can be effectively avoided, meanwhile, the effect of the signal can be increased, the estimation performance of the time delay estimation method under the noise is effectively increased, and the adaptability to the noise and the accuracy of the time delay estimation are further improved. The cross-correlation function of the cross-power spectrum is solved through an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the time delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environments, the adaptability of the time delay estimation method to the noise is improved, the reverberation resistance of all-pass component signals is kept, the detection of a time delay peak value is more accurate, and the performance of time delay estimation is improved.
Drawings
Fig. 1 is a schematic flowchart of a time delay estimation method suitable for sound source localization according to an embodiment of the present invention;
fig. 2 is a flowchart of a delay estimation method suitable for sound source localization according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a preferred embodiment of the present invention provides a time delay estimation method suitable for sound source localization, including:
s1, performing signal processing on the voice signals obtained by the two microphones to obtain a complex cepstrum of the minimum phase component;
in this embodiment, in order to realize sound source localization, it is necessary to receive voice signals by two microphones respectively to estimate a time difference between sound waves propagating from a sound source to the two microphones, and then estimate a sound source position according to the time difference.
S2, calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component;
s3, calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function;
in the present embodiment, the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal are calculated using a Cepstral warping (CEP) technique to obtain the complex cepstrum of the minimum phase component, by calculating an improved all-pass component frequency spectrum using the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal on the basis of the obtained all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function, so as to ensure that the all-pass component signal can effectively avoid the influence caused by reverberation on the phase and simultaneously can increase the frequency spectrum amplitude of the signal, therefore, the estimation performance of the time delay estimation method under noise is effectively improved, and the adaptability to the noise and the accuracy of time delay estimation are further improved.
And S4, solving the cross-correlation function of the cross-power spectrum by an inverse fast Fourier transform method, and calculating the delay time according to the cross-correlation function.
In the embodiment of the invention, the cross-correlation function of the cross-power spectrum is solved by an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the time delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environment, improve the adaptability of the time delay estimation method to the noise, and simultaneously reserve the anti-reverberation capability of an all-pass component signal, thereby enabling the detection of a time delay peak value to be more accurate, and further improving the performance of time delay estimation.
In this embodiment of the present invention, preferably, the calculating an improved all-pass component spectrum by using a modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component spectrum, and calculating a cross-power spectrum by combining the improved all-pass component spectrum and the improved phase weighting function specifically includes:
multiplying the frequency spectrum of the all-pass component signal by the modulus of the frequency spectrum of the minimum phase component to obtain the improved all-pass component frequency spectrum;
the improved phase weighting function is obtained through calculation according to the improved all-pass component frequency spectrum, the cross-power spectrum is obtained through calculation by combining the improved all-pass component frequency spectrum and the improved phase weighting function, so that the all-pass component signal is multiplied by the amplitude of the minimum phase component signal, the estimation performance of the delay estimation algorithm under noise is improved, and further, the delay estimation can be carried out by using the improved phase weighting method through combining a Generalized Cross Correlation (GCC) delay estimation technology.
In the embodiment of the present invention, the signal processing is performed on the speech signals obtained by the two microphones to obtain the complex cepstrum of the minimum phase component, and specifically, the signal processing is performed by:
respectively obtaining a first channel voice signal and a second channel voice signal through two microphones;
performing signal processing on the first channel voice signal and the second channel voice signal to obtain a first channel complex cepstrum and a second channel complex cepstrum;
and homomorphic filtering processing is carried out on the first channel complex cepstrum and the second channel complex cepstrum to obtain a complex cepstrum of the first channel minimum phase component and a complex cepstrum of the second channel minimum phase component.
In this embodiment, to realize sound source localization, it is necessary to obtain a first channel speech signal and a second channel speech signal through two microphones respectively, to estimate a time difference between sound waves propagating from a sound source to the two microphones, and then estimate a sound source position according to the time difference.
In this embodiment of the present invention, the performing signal processing on the first channel voice signal and the second channel voice signal specifically includes:
let the first channel speech signal be x1(t), the second channel speech signal is x2(t);
For the filtered and frame-divided signal x1(t) and x2(t) detecting the voice end point, and selecting the voice frame of the same frame to obtain the corresponding y1(t) and y2(t);
Are respectively paired with y1(t) and y2(t) performing discrete Fourier transform to obtain corresponding Y1(omega) and Y2(ω);
According to Y1(omega) and Y2(omega) obtaining said first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n); wherein k is1(n)=IFFT(ln(|Y1(ω)|)),k2(n)=IFFT(ln(|Y2(ω) |)), IFFT is inverse fast fourier transform;
for the first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n) homomorphic filtering processing is carried out to respectively obtain the complex cepstrum k of the minimum phase component of the first channel1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n);
Wherein k is1min(n)=u*k1(n),k2min(n)=u*k2(n),
Figure BDA0002009607750000081
N is the number of points of the fourier transform.
In this embodiment of the present invention, the calculating a frequency spectrum of a minimum phase component of a signal and a frequency spectrum of an all-pass component signal according to the complex cepstrum of the minimum phase component specifically includes:
a complex cepstrum k from the first channel minimum phase component1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n) calculating the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(ω), wherein,
Figure BDA0002009607750000093
FFT is fast Fourier transform;
according to the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(omega) calculating to obtain the frequency spectrum Y of the first channel all-pass component signal1all(ω) and the frequency spectrum Y of the second channel all-pass component signal2all(ω) wherein Y1all(ω)=Y1(ω)/Y1min(ω),Y2all(ω)=Y2(ω)/Y2min(ω)。
In this embodiment of the present invention, the calculating, by using the modulus of the spectrum of the minimum phase component and the spectrum of the all-pass component signal, to obtain an improved all-pass component spectrum, and according to the improved all-pass component spectrum, an improved phase weighting function is obtained, and the calculating, by combining the improved all-pass component spectrum and the improved phase weighting function, a cross-power spectrum is specifically:
frequency spectrum Y of minimum phase component passing through the first channel1min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the first channel1min(ω) |, spectrum Y of the minimum phase component through the second channel2min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the second channel2min(ω)|;
The modulus Y of the frequency spectrum of the minimum phase component of the first channel1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal1-nall(ω); wherein, Y1-nall(ω)=Y1all(ω)*|Y1min(ω)|;
The module value Y of the frequency spectrum of the minimum phase component of the second channel2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal2-nall(ω); wherein, Y2-nall(ω)=Y2all(ω)*|Y2min(ω)|;
Improving the frequency spectrum Y of an all-pass component signal using the first channel1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) calculating the improved phase weighting function
Figure BDA0002009607750000091
Wherein
Figure BDA0002009607750000092
α=0.75;
Modifying the first channel to the frequency spectrum Y of the all-pass component signal1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) multiplying said modified phase weighting function
Figure BDA0002009607750000101
Calculating the cross-power spectrum G12(ω); wherein the content of the first and second substances,
Figure BDA0002009607750000102
in the present embodiment, the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal are calculated using a Cepstral warping (CEP) technique to obtain the complex cepstrum of the minimum phase component, obtaining an improved all-pass component spectrum by calculating a modulus of a spectrum of the minimum phase component and a spectrum of the all-pass component signal based on obtaining the all-pass component signal, and calculating an improved phase weighting function based on the improved all-pass component spectrum, calculating a cross-power spectrum using the improved all-pass component spectrum multiplied by the improved phase weighting function, so as to ensure that the all-pass component signal can effectively avoid the influence caused by reverberation on the phase and simultaneously can increase the frequency spectrum amplitude of the signal, therefore, the estimation performance of the time delay estimation method under noise is effectively improved, and the adaptability to the noise and the accuracy of time delay estimation are further improved.
In this embodiment of the present invention, the cross-correlation function of the cross-power spectrum is solved by an inverse fast fourier transform method, and the delay time is calculated according to the cross-correlation function, specifically:
solving the cross-power spectrum G by an inverse fast Fourier transform method12Cross correlation function R of (omega)12And obtaining the delay time tau after samplingmax(ii) a Wherein R is12(τ)=IFFT(G12(ω)),τmax=argmaxτR12(τ), IFFT is inverse fast fourier transform;
according to the delay time tau after samplingmaxSolving for the delay time delay before sampling12Wherein, delay12=τmax*fsWherein f issIs the sampling frequency.
In the embodiment of the invention, the cross-correlation function of the cross-power spectrum is solved by an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the time delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environment, improve the adaptability of the time delay estimation method to the noise, and simultaneously reserve the anti-reverberation capability of an all-pass component signal, thereby enabling the detection of a time delay peak value to be more accurate, and further improving the performance of time delay estimation.
Referring to fig. 2, a possible specific embodiment of a method for estimating a time delay for sound source localization according to the present invention includes the following steps:
1. using two microphones, including microphone mir1And microphone mir2Respectively connecting voice signalsReceiving, the received voice signals are a first channel voice signal and a second channel voice signal, and are respectively marked as x1(t) and x2(t);
2. For the first channel voice signal and the second channel voice signal x of two channels1(t) and second channel speech signal x2(t) filtering and framing;
3. for the filtered and frame-divided signal x1(t) and x2(t) respectively carrying out voice endpoint detection, and selecting the voice frame of the same frame to obtain the corresponding y1(t) and y2(t);
4. Are respectively paired with y1(t) and y2(t) performing a discrete Fourier transform to obtain Y1(omega) and Y2(ω);
5. Separately determine Y1(omega) and Y2(ω) corresponding first channel complex cepstrum k1(n) and second channel complex cepstrum k2(n), the calculation process is as follows:
k1(n)=IFFT(ln(|Y1(ω)|)),k2(n)=IFFT(ln(|Y2(ω) |)), wherein IFFT is the inverse fast fourier transform;
6. respectively to the first channel complex cepstrum k1(n) and second channel complex cepstrum k2(n) homomorphic filtering processing is carried out to obtain the complex cepstrum k of the corresponding first channel minimum phase component1min(n) and complex cepstrum k of the second channel minimum phase component2min(n), the calculation process is as follows:
k1min(n)=u*k1(n),k2min(n)=u*k2(n),
Figure BDA0002009607750000111
n is the number of points of Fourier transform;
7. the frequency spectrum of the minimum phase component of the two channels is respectively calculated: frequency spectrum Y of minimum phase component of first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(ω), the calculation process is as follows:
Figure BDA0002009607750000112
wherein the FFT is a fast Fourier transform;
8. the frequency spectrum of the signal of the two-channel all-pass component is respectively obtained: frequency spectrum Y of first channel all-pass component signal1all(ω) and the frequency spectrum Y of the second channel all-pass component signal2all(ω), the calculation process is as follows:
Y1all(ω)=Y1(ω)/Y1min(ω),Y2all(ω)=Y2(ω)/Y2min(ω);
9. the modulus of the spectrum of the minimum phase component is taken to be multiplied by the spectrum of the signal of the all-pass component: using a modulus | Y of a frequency spectrum of a minimum phase component of the first channel1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal1-nall(ω) using a modulus | Y of a frequency spectrum of a minimum phase component of said second channel2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal2-nall(ω), the calculation process is as follows:
Y1-nall(ω)=Y1all(ω)*|Y1min(ω)|,Y2-nall(ω)=Y2all(ω)*|Y2min(ω)|;
10. improving the frequency spectrum Y of an all-pass component signal using a first channel1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nallImproved phase weighting function by (omega) calculation
Figure BDA0002009607750000121
The calculation process is as follows:
Figure BDA0002009607750000122
11. modifying the first channel to the frequency spectrum Y of the all-pass component signal1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) multiplied by said improvementPhase weighting function
Figure BDA0002009607750000123
Calculating the cross-power spectrum G12(ω), the calculation process is as follows:
Figure BDA0002009607750000124
wherein "'" represents conjugation;
12. solving the cross-power spectrum G by an inverse fast Fourier transform method12Cross correlation function R of (omega)12Determining the position of the peak value to obtain the delay time tau after samplingmaxThe calculation process is as follows:
R12(τ)=IFFT(G12(ω)),τmax=argmaxτR12(τ), IFFT is inverse fast fourier transform;
13. by delay time tau after samplingmaxDelay time delay before sampling is obtained by solving12The calculation process is as follows:
delay12=τmax*fswherein f issIs the sampling frequency;
compared with the prior art, the time delay estimation method suitable for sound source positioning provided by the embodiment of the invention has the following beneficial effects:
(1) on the basis of obtaining an all-pass component signal, an improved all-pass component frequency spectrum is obtained by utilizing a modulus value of a frequency spectrum of a minimum phase component and a frequency spectrum of the all-pass component signal through calculation, an improved phase weighting function is obtained through calculation according to the improved all-pass component frequency spectrum, and a cross-power spectrum is calculated by combining the improved all-pass component frequency spectrum and the improved phase weighting function, so that the effect of a signal frequency spectrum amplitude can be increased while the influence caused by reverberation on the phase of the all-pass component signal can be effectively avoided, the estimation performance of the time delay estimation method under noise is effectively increased, and the adaptability to the noise and the accuracy of time delay estimation are further improved.
(2) The cross-correlation function of the cross-power spectrum is solved through an inverse fast Fourier transform method, and the delay time is calculated according to the cross-correlation function, so that the delay estimation method can effectively reduce the influence caused by noise and reverberation in the reverberation and noise environments, the adaptability of the delay estimation method to the noise is improved, the reverberation resistance of all-pass component signals is kept, the detection of the delay peak value is more accurate, and the performance of delay estimation is improved.
(3) Compared with the traditional CEP cepstrum time delay estimation and cross-power spectrum time delay estimation, the invention ensures that the influence caused by reverberation can be effectively avoided on the phase of the all-pass component signal by adding the module value of the minimum phase component signal to the all-pass component signal, and simultaneously increases the function of the signal spectrum amplitude, so that the time delay estimation method can still effectively estimate under the condition of low signal-to-noise ratio by combining an improved GCC weighting method, and the estimation accuracy under the conditions of low signal-to-noise ratio and reverberation is improved.
(4) Because the module value of the frequency spectrum of the all-pass component signal obtained by the existing CEP time delay estimation technology is equivalent to being whitened, only traditional phase weighting can be combined, and the invention can effectively combine various improved algorithms such as GCC-phase and the like, so that the algorithm has better adaptability to noise, and meanwhile, the reverberation resistance of the all-pass component signal is kept.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (5)

1. A time delay estimation method suitable for sound source positioning is characterized by comprising the following steps:
performing signal processing on voice signals obtained through the two microphones to obtain a complex cepstrum of a minimum phase component;
calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component;
calculating an improved all-pass component frequency spectrum by utilizing the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, and calculating a cross-power spectrum by combining the improved all-pass component frequency spectrum and the improved phase weighting function;
solving a cross-correlation function of the cross-power spectrum by a fast Fourier inverse transformation method, and calculating to obtain delay time according to the cross-correlation function;
the processing of the speech signals obtained by the two microphones to obtain the complex cepstrum of the minimum phase component specifically includes:
obtaining first channel voice signals x through two microphones respectively1(t) and second channel speech signal x2(t);
For the first channel voice signal x1(t) and the second channel speech signal x2(t) signal processing is carried out to obtain a first channel complex cepstrum k1(n) and second channel complex cepstrum k2(n);
For the first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n) homomorphic filtering to obtain the complex cepstrum k of the minimum phase component of the first channel1min(n) and complex cepstrum k of the second channel minimum phase component2min(n);
The calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component specifically includes:
a complex cepstrum k from the first channel minimum phase component1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n) calculating the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(ω);
According to the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(omega) calculating to obtain the frequency spectrum Y of the first channel all-pass component signal1all(ω) and the frequency spectrum Y of the second channel all-pass component signal2all(ω);
The calculating an improved all-pass component frequency spectrum by using the modulus of the frequency spectrum of the minimum phase component and the frequency spectrum of the all-pass component signal, and calculating an improved phase weighting function according to the improved all-pass component frequency spectrum, specifically:
frequency spectrum Y of minimum phase component passing through the first channel1min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the first channel1min(ω) |, spectrum Y of the minimum phase component through the second channel2min(ω) obtaining a modulus Y of the frequency spectrum of the minimum phase component of the second channel2min(ω)|;
The modulus Y of the frequency spectrum of the minimum phase component of the first channel1min(ω) | and the frequency spectrum Y of the first channel all-pass component signal1all(omega) to obtain the frequency spectrum Y of the first channel improved all-pass component signal1-nall(ω); wherein, Y1-nall(ω)=Y1all(ω)*|Y1min(ω)|;
The module value Y of the frequency spectrum of the minimum phase component of the second channel2min(ω) | and the frequency spectrum Y of the second channel all-pass component signal2all(omega) to obtain the frequency spectrum Y of the second channel improved all-pass component signal2-nall(ω); wherein, Y2-nall(ω)=Y2all(ω)*|Y2min(ω)|;
Improving the frequency spectrum Y of an all-pass component signal using the first channel1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) calculating the improved phase weighting function
Figure FDA0002558253080000021
Wherein
Figure FDA0002558253080000022
2. The method for estimating time delay suitable for sound source localization according to claim 1, wherein the signal processing is performed on the first channel speech signal and the second channel speech signal, specifically:
let the first channel speech signal be x1(t), the second channel speech signal is x2(t);
For the filtered and frame-divided signal x1(t) and x2(t) detecting the voice end point, and selecting the voice frame of the same frame to obtain the corresponding y1(t) and y2(t);
Are respectively paired with y1(t) and y2(t) performing discrete Fourier transform to obtain corresponding Y1(omega) and Y2(ω);
According to Y1(omega) and Y2(omega) obtaining said first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n); wherein k is1(n)=IFFT(ln(|Y1(ω)|)),k2(n)=IFFT(ln(|Y2(ω) |)), IFFT is inverse fast fourier transform;
for the first channel complex cepstrum k1(n) and the second channel complex cepstrum k2(n) homomorphic filtering processing is carried out to respectively obtain the complex cepstrum k of the minimum phase component of the first channel1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n);
Wherein k is1min(n)=u*k1(n),k2min(n)=u*k2(n),
Figure FDA0002558253080000031
N is the number of points of the fourier transform.
3. The method for estimating time delay suitable for sound source localization according to claim 2, wherein the calculating the frequency spectrum of the minimum phase component of the signal and the frequency spectrum of the all-pass component signal according to the complex cepstrum of the minimum phase component specifically comprises:
a complex cepstrum k from the first channel minimum phase component1min(n) and a complex cepstrum k of the second channel minimum phase component2min(n) calculating to obtain the firstFrequency spectrum Y of minimum phase component of channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(ω), wherein,
Figure FDA0002558253080000032
FFT is fast Fourier transform;
according to the frequency spectrum Y of the minimum phase component of the first channel1min(ω) and the frequency spectrum Y of the minimum phase component of the second channel2min(omega) calculating to obtain the frequency spectrum Y of the first channel all-pass component signal1all(ω) and the frequency spectrum Y of the second channel all-pass component signal2all(ω) wherein Y1all(ω)=Y1(ω)/Y1min(ω),Y2all(ω)=Y2(ω)/Y2min(ω)。
4. The method for time delay estimation suitable for sound source localization according to claim 3, wherein the combining the modified all-pass component spectrum and the modified phase weighting function to calculate a cross-power spectrum comprises:
modifying the first channel to the frequency spectrum Y of the all-pass component signal1-nall(ω) and second channel improving the frequency spectrum Y of the all-pass component signal2-nall(ω) multiplying said modified phase weighting function
Figure FDA0002558253080000041
Calculating cross power spectrum
Figure FDA0002558253080000042
Wherein the content of the first and second substances,
Figure FDA0002558253080000043
5. the method for time delay estimation suitable for sound source localization according to claim 4, wherein the cross-correlation function of the cross-power spectrum is solved by an inverse fast fourier transform method, and the delay time is calculated according to the cross-correlation function, specifically:
solving the cross-power spectrum G by an inverse fast Fourier transform method12Cross correlation function R of (omega)12And obtaining the delay time tau after samplingmax(ii) a Wherein R is12(τ)=IFFT(G12(ω)),τmax=arg maxτR12(τ), IFFT is inverse fast fourier transform;
according to the delay time tau after samplingmaxSolving for the delay time delay before sampling12Wherein, delay12=τmax*fs,fsIs the sampling frequency.
CN201910242080.4A 2019-03-28 2019-03-28 Time delay estimation method suitable for sound source positioning Active CN109901114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910242080.4A CN109901114B (en) 2019-03-28 2019-03-28 Time delay estimation method suitable for sound source positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910242080.4A CN109901114B (en) 2019-03-28 2019-03-28 Time delay estimation method suitable for sound source positioning

Publications (2)

Publication Number Publication Date
CN109901114A CN109901114A (en) 2019-06-18
CN109901114B true CN109901114B (en) 2020-10-27

Family

ID=66953085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910242080.4A Active CN109901114B (en) 2019-03-28 2019-03-28 Time delay estimation method suitable for sound source positioning

Country Status (1)

Country Link
CN (1) CN109901114B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110418242B (en) * 2019-07-30 2021-02-05 西安声必捷信息科技有限公司 Sound source orientation method, device and system
CN113948098A (en) * 2020-07-17 2022-01-18 华为技术有限公司 Stereo audio signal time delay estimation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102347028A (en) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
CN107479030A (en) * 2017-07-14 2017-12-15 重庆邮电大学 Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004002192A1 (en) * 2002-06-21 2003-12-31 University Of Southern California System and method for automatic room acoustic correction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102347028A (en) * 2011-07-14 2012-02-08 瑞声声学科技(深圳)有限公司 Double-microphone speech enhancer and speech enhancement method thereof
CN107479030A (en) * 2017-07-14 2017-12-15 重庆邮电大学 Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"A new cepstral prefiltering technique for estimating time delay under reverberant conditions";Alex Stephenne et al.;《Signal Processing》;19971231;全文 *
"Time delay estimation via minimum-phase and all-pass component processing";Mosayyebpour S et al.;《2013 IEEE International Conference on Acoustics,Speech and Signal Processing》;20131231;全文 *
"基于时延估计的声源定位系统研究";张明翰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20091215(第12期);全文 *
"混响环境下基于倒谱BRIR的双耳互相关声源定位算法";张毅 等;《自动化学报》;20161031;第42卷(第10期);全文 *

Also Published As

Publication number Publication date
CN109901114A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
US11825279B2 (en) Robust estimation of sound source localization
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
WO2015196729A1 (en) Microphone array speech enhancement method and device
Yegnanarayana et al. Processing of reverberant speech for time-delay estimation
CN106226739A (en) Merge the double sound source localization method of Substrip analysis
CN106373589B (en) A kind of ears mixing voice separation method based on iteration structure
CN106875938B (en) Improved nonlinear self-adaptive voice endpoint detection method
CN107346664A (en) A kind of ears speech separating method based on critical band
CN110610718B (en) Method and device for extracting expected sound source voice signal
CN109901114B (en) Time delay estimation method suitable for sound source positioning
CN102565759A (en) Binaural sound source localization method based on sub-band signal to noise ratio estimation
CN107369460B (en) Voice enhancement device and method based on acoustic vector sensor space sharpening technology
JP2008054071A (en) Device for eliminating paper rustling noise
CN114089279A (en) Sound target positioning method based on uniform concentric circle microphone array
CN101587712B (en) Directional speech enhancement method based on small microphone array
CN111986695A (en) Non-overlapping sub-band division fast independent vector analysis voice blind separation method and system
EP2429214A2 (en) Method for acoustic signal tracking
CN110838303B (en) Voice sound source positioning method using microphone array
Nesta et al. Cumulative state coherence transform for a robust two-channel multiple source localization
Guo et al. A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments.
CN113660578B (en) Directional pickup method and device with adjustable pickup angle range for double microphones
JP3588576B2 (en) Sound pickup device and sound pickup method
Nakano et al. Automatic estimation of position and orientation of an acoustic source by a microphone array network
CN108269581B (en) Double-microphone time delay difference estimation method based on frequency domain coherent function
Ji et al. Coherence-Based Dual-Channel Noise Reduction Algorithm in a Complex Noisy Environment.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant