CN108269581B

CN108269581B - Double-microphone time delay difference estimation method based on frequency domain coherent function

Info

Publication number: CN108269581B
Application number: CN201710004194.6A
Authority: CN
Inventors: 方义; 冯海泓; 陈友元
Original assignee: Institute of Acoustics CAS
Current assignee: Institute of Acoustics CAS
Priority date: 2017-01-04
Filing date: 2017-01-04
Publication date: 2021-06-08
Anticipated expiration: 2037-01-04
Also published as: CN108269581A

Abstract

The invention discloses a double-microphone time delay difference estimation method based on a frequency domain coherent function, which comprises the following steps: step 1) calculating coherent functions of the two microphones at different azimuth angles under the condition that the two microphones do not receive sound source signals, extracting a real part and an imaginary part of the coherent functions under the coherent functions corresponding to each angle, and establishing a database of coherent function characteristic quantities; and 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals in the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating time delay difference. The method effectively overcomes the interference of background noise and reverberation by extracting the frequency points with large modulus values of the coherent function frame by frame, and improves the accuracy of the estimation of the time delay difference of the double microphones.

Description

Double-microphone time delay difference estimation method based on frequency domain coherent function

Technical Field

The invention relates to the field of voice signal delay estimation, in particular to a double-microphone delay difference estimation method based on a frequency domain coherent function.

Background

The time delay difference estimation method plays an important role in algorithms such as sound source positioning and voice enhancement of a small microphone array, and the time delay difference refers to the time difference between the same signal source received by the microphone array and caused by different transmission distances.

The known methods are based on the estimation of the delay difference by finding the peak time of the generalized cross-correlation function (reference [1 ]: Knapp C, Carter G. the generated correlation method for estimation of time delay [ J ]. IEEE Transactions on optics Specification & Signal Processing,1976,24(4): 320. 12. Liu C, Wheeler B C, Jr O W, et al. localization of multiple sources with optics. J. Journal of the optical source of America 2000, 108. 1888. 1905. reference [ 3. Jr. J. distribution S. distribution of optics. J. Journal of the optical source of America,2000, 108. 1888. J. Journal of analysis [ 10. J. Journal of analysis [ 64. J. Journal of analysis [ 1. J. Journal of analysis [ 10. J. Journal of analysis [ 1. J. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10. J. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10. J. Journal of analysis [ 10. Journal of analysis [ 10 ] 1. Journal of analysis [ 10 ] of analysis [ 10. Journal of analysis [ 10. Journal of analysis of, speech, and Signal Processing, IEEE International Conference on ICASSP. IEEE,1983: 1148-1151; reference [5 ]: extension of a organizational cross-correlation model by relational inhibition. I.A. simulation of correlation for statistical signals [ J ]. Journal of the environmental Society of America,1986,80(6):1608-22 ]. Although in quiet environments, such methods provide a good estimate of the delay difference. But has problems in that: under the interference of background noise and a reverberation environment, the accuracy of the delay inequality estimation is reduced sharply.

Disclosure of Invention

The invention aims to overcome the defects of the time delay difference estimation method in the prior art, and utilizes the characteristic that the module value of the coherent function of a direct voice section is 1, and the module values of the coherent functions of a noise and reverberation sound section are generally less than 1; and smoothing the peak value of the coherent function by using the modulus of the coherent function in the frequency domain, thereby ensuring that each frequency point of the coherent function after the peak value smoothing is dominated by direct sound. That is, the peak smoothed coherence function filters out a portion of the interference of the background noise and reverberant sound. Finally, matching the coherent function after the peak value smoothing with the coherent function under the ideal condition so as to obtain the azimuth angle corresponding to the coherent function under the closest ideal condition; finally, solving the time delay difference; thereby improving the deficiencies of the existing delay inequality estimation technology.

In order to achieve the above object, the present invention provides a method for estimating delay inequality of two microphones based on a frequency domain coherence function, the method comprising:

step 1) calculating coherent functions of the two microphones at different azimuth angles under the condition that the two microphones do not receive sound source signals, extracting a real part and an imaginary part of the coherent functions under the coherent functions corresponding to each angle, and establishing a database of coherent function characteristic quantities;

and 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals in the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating time delay difference.

In the above technical solution, the step 1) specifically includes:

step 1-1) carrying out incremental increase by taking every 7.5 degrees as an interval, changing the azimuth angle of a sound source from 0 degree to 180 degrees, and calculating 25 coherence functions of the signals of the double microphones under an ideal condition;

the formula for the coherence function is:

wherein d is the distance between the two microphones; c is 340m/s, theta is the azimuth angle of the sound source, omega is the angular frequency, and fs is the sampling rate;

step 1-2) respectively extracting real parts and imaginary parts of 25 coherent functions;

the real part is: cos (ω · τ · cos (θ)); the imaginary part is sin (ω · τ · cos (θ)); wherein,

step 1-3) using a K nearest KNN classification algorithm to obtain the characteristic quantity of the coherent function of each angle: and classifying the imaginary part and the real part, wherein each angle corresponds to one class, namely 25 classes in total, and the classification label is set to be 1-25, so that a coherence function characteristic quantity database is established.

In the above technical solution, the step 2) specifically includes:

step 2-1), framing and windowing signals received by the double microphones, then transforming the signals into a frequency domain through FFT, recording the signals transformed into the frequency domain as X1 (lambda, mu), wherein X2 (lambda, mu) is a time frame, and mu is a frequency point of the frequency domain;

step 2-2) calculating a coherent function of the two signals on a frequency domain;

the formula for the calculation of the coherence function is:

where PX1X1(λ, μ) is the self-power spectrum of signal X1(λ, μ), PX2X2(λ, μ) is the self-power spectrum of signal X2(λ, μ), and PX1X2(λ, μ) is the cross-power spectrum of the two signals:

PX1X1(λ，μ)＝α·PX1X1((λ-1，μ))+(1-α)·|X1(λ，μ)|²

PX2X2(λ，μ)＝α·PX2X2((λ-1，μ))+(1-α)·|X2(λ，μ)|²

PX1X2(λ，μ)＝α·PX1X2((λ-1，μ))+(1-α)·|X1(λ，μ)·X2(λ，μ)|²

wherein α is a smoothing factor;

step 2-3) calculating a modulus value gamma of the coherence function_X1X2(lambda, mu) l, and performing peak value smoothing at each frequency point to obtain a smoothed peak value

Step 2-4) extracting in frequency domain

The real part and the imaginary part are matched with a coherent function characteristic quantity database by using a K nearest KNN classification algorithm according to the two values, and each frame of signal obtains a matching result; the matching result is the serial number of the classification label, and the corresponding azimuth angle theta is obtained₀Then the delay difference Time is:

in the above technical solution, the value of the smoothing factor α in the step 2-2) is 0.68.

In the above technical solution, the specific implementation process of step 2-3) is as follows:

computing the modulus value | Γ of the coherence function_X1X2(lambda, mu) l, and acquiring the Peak value Peak (lambda, mu) of each frequency point; if | Γ_X1X2(lambda, mu) is greater than | Peak (lambda, mu) |, Peak (lambda, mu) is smoothed, and the smoothed Peak value

Comprises the following steps:

wherein alpha is₁The value is 0.35;

if | Γ_X1X2(λ, μ) | is smaller than | Peak (λ, μ) |, the smoothed Peak value

Comprises the following steps:

wherein alpha is₂The value is 0.95, and the initial value of Peak (lambda, mu) is equal to gamma_X1X2(1, μ), the initial value is equal to the coherence function of the first frame of speech.

The invention has the advantages that: the method effectively overcomes the interference of background noise and reverberation by extracting the frequency points with large modulus values of the coherent function frame by frame, and improves the accuracy of the estimation of the time delay difference of the double microphones.

Drawings

FIG. 1 is a schematic diagram of a two-microphone delay-lag estimation scenario of the present invention;

FIG. 2 is a schematic diagram of the present invention for building an ideal frequency domain coherence function database;

FIG. 3 is a schematic representation of the real and imaginary parts of an ideal coherence function at three different azimuths;

fig. 4 is a flow chart of step 2) of the method of the invention.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

The time delay estimation method based on the frequency domain coherent function utilizes the coherent function which obtains the frequency domain in real time to estimate the time delay difference, the method utilizes the module value of the coherent function (the module value of the coherent function of target direct sound is 1, and the module value of the coherent function of background noise and a voice segment which is greatly influenced by reverberation is smaller) to judge the reliability of frequency points, the specific method is to utilize a peak value smoothing function to obtain the reliable coherent function, and through the method, the method filters the influence of the background noise and reverberant sound to a certain extent. Therefore, the time delay difference is matched with the established ideal characteristic quantity database of the coherent function at different angles, and the matched optimal value is the current angle value, so that the time delay difference can be obtained. The method is suitable for equipment for positioning and separating the sound source of the double microphones and the like.

FIG. 1 is a schematic diagram of a two-microphone delay spread estimation algorithm, in which two microphones are at a distance d, a sound source is located at an azimuth θ between the two microphones, and the delay spread between the two microphones is

A two-microphone time delay difference estimation method based on a frequency domain coherence function comprises the following steps:

as shown in fig. 2, the step 1) specifically includes:

step 1-1) carrying out incremental increase by taking every 7.5 degrees as an interval, changing the azimuth angle of a sound source from 0 degree to 180 degrees, and calculating 25 coherent functions under the ideal condition of double-microphone signals;

the formula for the coherence function is:

wherein d is the distance between the two microphones; where c is 340m/s, θ is the azimuth of the sound source, ω is the angular frequency, and fs is the sampling rate, which is 16000Hz in this embodiment.

The azimuth angle of the sound source has 25 values, and 25 different coherence functions can be obtained in total.

the real part is: cos (ω · τ · cos (θ));

the imaginary part is sin (ω · τ · cos (θ));

wherein,

from the real part and imaginary part formulas, in an ideal environment (only a target sound source, no interference of background noise and reverberation), the modulus of the coherence function is:

step 1-3) utilizing a K nearest KNN (K-nearest neighbor) classification algorithm to obtain the characteristic quantity of the coherence function of each angle: and classifying the imaginary part and the real part, wherein each angle corresponds to one class, namely 25 classes in total, and the classification label is set to be 1-25, so that a coherence function characteristic quantity database is established.

Fig. 3 shows the real and imaginary values of the ideal coherence function at three different azimuthal angles. The microphone distance in this figure is 0.255 m. As can be seen from FIG. 3, the difference between the real part and the imaginary part of the coherent function at different azimuth angles is large, and the present invention uses the characteristics of the real part and the imaginary part of the coherent function at different azimuth angles to make classification judgment.

As shown in fig. 4, the step 2) specifically includes:

in the embodiment, the received signals x1, x2 of the two microphones are subjected to framing, windowing and FFT conversion; at a sampling rate of 16000Hz, 512 samples per frame and 128 points are shifted.

the formula for the calculation of the coherence function is:

PX1X1(λ，μ)＝α·PX1X1((λ-1，μ))+(1-α)·|X1(λ，μ)|²

PX2X2(λ，μ)＝α·PX2X2((λ-1，μ))+(1-α)·|X2(λ，μ)|²

PX1X2(λ，μ)＝α·PX1X2((λ-1，μ))+(1-α)·|X1(λ，μ)·X2(λ，μ)|²

wherein alpha is a smoothing factor and takes a value of 0.68.

Step 2-3) calculating a modulus value gamma of the coherence function_X1X2(λ, μ) |, and performing peak smoothing at each frequency point;

Comprises the following steps:

α₁the value is 0.35;

Comprises the following steps:

Step 2-4) extracting in frequency domain

The real part and the imaginary part are matched with a coherent function characteristic quantity database by using a K nearest KNN classification algorithm according to the two values, and each frame of signal obtains a matching result; the matching result is a classification labelFrom the number of (2), the corresponding azimuth angle theta can be obtained₀Then the delay difference is:

the invention fully utilizes the characteristics of a real part and an imaginary part of a frequency domain coherent function to construct a characteristic vector to classify sound sources with different azimuth angles, and the algorithm realizes the azimuth judgment of each frame of voice signals through off-line modeling and on-line prediction, namely delay difference estimation. Meanwhile, the algorithm considers the interference problem under the background noise and reverberation environment, utilizes the module value of the frequency domain coherent function to smooth the peak value of the coherent function, and effectively ensures the effectiveness of the characteristic vector at the online prediction stage. The algorithm is clear in idea, simple and effective. The method is convenient to realize in real time in devices such as microphone arrays and the like.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A two-microphone time delay difference estimation method based on a frequency domain coherence function, the method comprising:

step 2) converting signals received by the double microphones into a frequency domain, calculating a coherent function of the two signals on the frequency domain, performing peak value smoothing on each frequency point according to a module value of the coherent function, then calculating a real part and an imaginary part of the coherent function, matching in a coherent function characteristic quantity database to obtain a sound source azimuth angle, and calculating a time delay difference;

the step 1) specifically comprises the following steps:

the formula for the coherence function is:

step 1-3) using a K nearest KNN classification algorithm to obtain the characteristic quantity of the coherent function of each angle: classifying the imaginary part and the real part, wherein each angle corresponds to one class, namely 25 classes in total, and the classification label is set to be 1-25, so that a coherence function characteristic quantity database is established;

the step 2) specifically comprises the following steps:

the formula for the calculation of the coherence function is: