CN114639398B

CN114639398B - Broadband DOA estimation method based on microphone array

Info

Publication number: CN114639398B
Application number: CN202210240262.XA
Authority: CN
Inventors: 黄际彦; 慕方方; 周杨; 李汉君; 王珍; 马敏
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2023-05-26
Anticipated expiration: 2042-03-10
Also published as: CN114639398A

Abstract

The invention belongs to the field of array signal processing, and particularly relates to a broadband DOA estimation method based on a microphone array. The invention firstly utilizes a voice endpoint detection algorithm to preprocess voice signals received by a microphone array, extracts a voice section, then only processes the voice section, can save processing time, uses a correlation method to detect voice endpoints, then converts a preprocessed time domain signal into a frequency domain, selects corresponding frequencies according to a set threshold, then utilizes a narrow-band DOA estimation technology to calculate output signals corresponding to the microphone array under the current frequency in the selected frequencies, searches DOA corresponding to the peak value of the output signals, counts and clusters DOA under each frequency, and finally obtains a clustering result, namely a DOA estimated result.

Description

Broadband DOA estimation method based on microphone array

Technical Field

The invention belongs to the field of array signal processing, and particularly relates to a broadband DOA estimation method based on a microphone array.

Background

DOA estimation is an important basis in the field of microphone array signal processing, and the technology is intensively studied at home and abroad, and excellent technical results are achieved. At present, the narrowband technology is mature, but as the environment is more and more complex, the frequency range of the signal is wider and wider, and the broadband signal has more complex information compared with the narrowband signal, so that the research on DOA estimation of the broadband signal is necessary. Considering a complex sound source environment, information on a plurality of sound sources in a short time may sometimes occur. When the DOA estimation is performed using the delay-and-sum algorithm (Bartlett) and the classical MUSIC algorithm in a signal source that is very short and possibly has not all speech signals, it is found that not all signal sources can be estimated.

Disclosure of Invention

The invention aims at providing a broadband DOA estimation method based on a microphone array aiming at two different problems of short-time double-sound-source broadband signals and clustering results caused by human interference.

The invention mainly comprises the following steps: first, a voice signal received by a microphone array is preprocessed by a voice endpoint detection algorithm. The voice endpoint detection is to detect the presence segment, extract the presence segment, and then only process the presence segment, so that the processing time can be saved. The invention uses correlation to detect the voice end point. Then, the preprocessed time domain signal is converted to a frequency domain, and a corresponding frequency is selected according to a set threshold. And then, obtaining output signals corresponding to the microphone array at the current frequency in the selected frequency by utilizing a narrow-band DOA estimation technology. And searching an angle corresponding to the peak value of the output signal, counting and clustering DOA under each frequency, and obtaining a final clustering result, namely a DOA estimated result.

The specific technical scheme of the invention is as follows:

a wideband DOA estimation method based on a microphone array is characterized in that frequency information in wideband signals is utilized to count estimated angles in frequency, and the resolution of high-frequency DOA is higher. The method comprises the following steps:

s1, voice signal preprocessing

For a piece of audio data it may not contain speech components at all times, where only noise signals may be present at a certain time. Voice endpoint detection techniques are used to improve this problem. The voice endpoint detection is to detect the presence segment, extract the presence segment, and then only process the presence segment, so that the processing time can be saved. Common endpoint detection methods are: a double threshold method, a correlation method and a spectral entropy method. The present invention uses correlation methods for endpoint detection.

Voice signal x _n Short-time autocorrelation function R of (m) _n (k) Is calculated as

Where K represents the maximum delay point number.

In order to avoid influence caused by absolute energy in the voice endpoint detection process, normalization processing is performed on the autocorrelation function as follows:

R _n (k)＝R _n (k)/R _n (0)(0≤k≤K)

when the voice endpoint detection is carried out, firstly, the autocorrelation is carried out according to the noise section of the current voice data, and two critical values T are determined according to the autocorrelation value ₁ 、T ₂ . The audio data is subjected to framing processing, and when the maximum value of the autocorrelation function of the initial frame data is larger than the maximum value of the autocorrelation function of the end frame data is smaller than the maximum value of the autocorrelation function of the end frame data, the section has a speaking section. The result of voice endpoint detection for a segment of audio data using the autocorrelation function maximum is shown in fig. 4.

S2, frequency selection

The frequency range of the voice signal is f _min =300 Hz to f _max =3400hz, and this band is divided into N bands. The preprocessed voice signal in the time domain is converted into the frequency domain. The frequency with higher energy is selected. Setting a threshold value by using the data received by the array element 1, wherein the set threshold value T is as follows:

wherein Y is _i Representing the corresponding amplitude value at the ith frequency band received by the array element 1;

represents the average value of Y. And selecting frequencies which are larger than the frequency corresponding to the T according to the threshold value T on the corresponding frequency band, namely selecting frequencies with higher energy.

S3, DOA estimation

An angular scan is performed at each frequency obtained in the second step, with the scan angle ranging from 0 degrees to 180 degrees, once every 0.5 degrees. The method comprises the following steps:

w＝e ^-2πfτ

wherein w represents a weight vector; f represents the frequency point selected in the step 2; τ represents the delay of the microphone element relative to the reference element.

The output of the microphone array at each selected frequency bin is obtained as:

Y _out ＝|w′Y _max |

Y _max representing the values of all array elements on the corresponding frequency domains under the selected frequency points; w' represents the transpose of the weighting vector w. Then searching the peak value of the output signal, wherein the corresponding angle at the peak value of the signal is the DOA angle under each frequency point, and counting the DOA corresponding to each frequency point to obtain a data set D _θ 。

S4, clustering

In the clustering algorithm, DOA estimated in the third step is taken as a data set D _θ . Firstly, the number of clusters in the data set is found, and the method is that the number range of clusters is set to be 1, min (100, length (D) _θ ))](length(D _θ ) For dataset D _θ The data size of the cluster), and then the fastest cluster number of which the sum of the distances between all points and the centroid between the classes is reduced is obtained in the range, namely the cluster number L to be found. Then, with the angular accuracy, a threshold value is set. Removal of D _θ The angle exceeding the threshold value is obtained to obtain the remaining angle D _θ ' as a new dataset, D will finally _θ ' Kmeans clustering was performed.

The angle measurement precision is affected by noise, and is:

in θ _B Representing the beam width; SNR is expressed as signal-to-noise ratio. Consider the input signal as:

X(t)＝X _s (t)+X _n (t)

wherein X is _s (t) represents a target signal; x is X _n (t) the noise signal is

/>

The output of the beamforming is

y(t)＝W ^H X(t)＝W ^H X _s (t)+W ^H X _n (t)

Wherein W represents a weight vector; the superscript "H" denotes the conjugate transpose. The output power is

E[|y(t)| ² ]＝W ^H R _s W+W ^H R _n W

In which W is ^H R _s W is the signal power; w (W) ^H R _n W is the noise power, then the signal to noise ratio is

Utilizing the obtained clustering number L to D _θ ' Kmeans clustering is carried out to obtain a new clustering result.

Compared with the traditional DOA estimation method, the method utilizes the information of all the frequency points and counts the DOA estimated by each frequency point. As can be seen from fig. 2,3 and 5, the present invention improves the performance of DOA estimation, improves the DOA angular resolution, and can correctly obtain the incoming wave direction of the double sound sources. The algorithm not only correctly judges the number of clusters, but also combines the angle measurement precision to improve the precision of DOA estimation results.

Drawings

FIG. 1 is a schematic illustration of sound source propagation;

FIG. 2 is a graph of DOA estimation results of a delay-add algorithm;

FIG. 3 is a graph of the DOA estimation results of the MUSIC algorithm;

FIG. 4 is a schematic diagram of voice endpoint detection; a step of

FIG. 5 is a graph of the results after 700HZ-710HZ DOA estimation

FIG. 6 is a graph of DOA estimation results for wideband signals based on a microphone array

FIG. 7 is an initial clustering result graph

FIG. 8 is a graph showing the relationship between the number of clusters and the distance between classes

FIG. 9 is a graph showing DOA estimation results after angle screening

Detailed Description

The technical scheme of the invention is further described in detail:

establishing a microphone array signal model

A microphone array signal refers to a signal received by a microphone array that may obtain more information than a single microphone array element. For far-field sources, the sound-to-microphone elements are transmitted in the form of plane waves. The sound source propagation schematic diagram is shown in fig. 1, taking an array element 1 as a reference array element, and coordinates of an array element 2 relative to the array element 1 are as follows:

r＝[x,y,z] ^T

assuming that the included angle between the sound source and the axis is theta, the propagation speed of the sound source in the air is c, and the propagation direction of the signal

The method comprises the following steps:

if the signal received by the array element 1 is s (t), the signal received by the array element 2 is relative to the signal received by the array element 1

By analogy, let the number of microphone array elements be M, and the M (m=1, 2,3 …, M) th array element position be r _m . When the microphone array receives K signals in total, the signals received by the m-th array element are: />

Wherein n is _m (t) Representing the interference signal received by the mth array element,

the method comprises the following steps:

if the time when the kth signal reaches the array element 1 is 0, taking the array element 1 as a reference, the time delay tau of the kth signal received by the kth array element _mk ：

The matrix form of the signals received by the microphone array is:

the conventional array signal processing technology basically assumes that M array elements exist in a marc array under the condition of narrowband signals, and the incident signal s _k (t) can be expressed in complex envelope form as:

wherein ω represents a frequency; u (u) _k (t) represents amplitude;

representing the phase. Simultaneously satisfies the following formula:

substituting the above into the incident signal s _k (t) the form of the available complex envelope can be deduced:

the signals received by the m-th array element are as follows:

wherein alpha is _mk Representing the attenuation of sound source k to the mth array element; n is n _m (t) represents noise of the mth array element at the moment; τ _mk Representing the time delay for the kth sound source to reach the mth microphone. Based on attenuation alpha in far field conditions _mk For 1, the matrix expression for obtaining the receiving signals of the array elements at the moment t is as follows:

writing the above as a vector form

X＝AS+N

Wherein X represents signals received by M array elements; a is a dimensional array direction matrix; s represents a source, and the size of the source is K1 dimension matrix; n represents an interference signal, and the size is m×1 dimensional matrix.

The frequency domain model of the broadband signal can effectively solve the problem that the array flow pattern in the broadband signal model changes because of the non-single frequency, thereby causing the change of the subspace of the signal. For a certain array, let it now be assumed that the incident signal is a wideband signal, and according to equation (2-7), the signal received by the mth array element is:

fourier transforming the above can result in:

from the above equation, fourier transform converts the time delay in the time domain to a phase shift in the frequency domain, which can be converted into:

from the narrowband signal model and the wideband signal model, it is known that the narrowband model is a time-domain based expression, while the wideband signal model is a frequency-domain based expression.

The peak value of the autocorrelation function of the speech segment is much larger than the peak value of the autocorrelation function of the speech segment, so that the endpoint detection of the speech signal can be done based on the maximum value of the autocorrelation function. When the voice endpoint detection is carried out, firstly, the autocorrelation is carried out according to the noise section of the current voice data, and two critical values T are determined according to the autocorrelation value ₁ 、T ₂ . And carrying out framing processing on the audio data, and when the maximum value of the autocorrelation function of the initial frame data is larger than that of the final frame data and the maximum value of the autocorrelation function of the final frame data is smaller than that of the final frame data, the section belongs to the talking section.

Examples

Two sections of audios are selected, two-second signals are intercepted, then the two sections of audios form signals actually received by each array element, the actual incidence angles of the two sections of audios are 70 degrees and 110 degrees, and the distances from the origin of the array elements are 8 meters and 5 meters respectively. After which a gaussian white noise signal is added. In estimating multiple targets, a delay-and-add algorithm is used to estimate DOA. Wherein the scanning angle is set to 0 ° to 180 °, and the calculation is performed every 0.5 °. Firstly, calculating the delay of each array element relative to a reference array element, then carrying out corresponding delay on signals received by each array element, adding the signals after the delay, and then solving the power corresponding to the signals. The power at each angle of the division is then normalized and logarithmically processed. The DOA estimation results using delay-and-add are shown in FIG. 2. The DOA estimation is carried out by using a broadband MUSIC, firstly, signals received by a microphone array are transferred from a time domain to a frequency domain, the frequency bands are divided, then, the frequency bands with higher energy are selected from the divided frequency bands, a covariance matrix is calculated on the selected frequency bands, then, eigenvalue decomposition is carried out on the covariance matrix to obtain a signal subspace and a noise subspace, a steering vector of the microphone array is constructed, a space spectrum is obtained by using a construction function of the broadband MUSIC algorithm, and in the power spectrum, the angle corresponding to a peak value is the incoming wave direction of a target sound source. The DOA estimation result of the wideband MUSIC algorithm is shown in figure 3. The invention selects the frequency range 1025HZ-1030HZ to draw every 0.5 degrees, and the result is shown in figure 5. After preliminary clustering is carried out on the multiple results, the results are shown in fig. 7, and the final clustering effect is shown in fig. 9.

The invention can utilize the audio information in a short time to carry out correct DOA estimation, and can be seen from the result graph that the invention improves the DOA estimation performance and can correctly obtain the incoming wave direction of the double sound sources. After DOA estimation, the obtained angles are clustered by using a Kmeans algorithm, the clustering result is the incoming wave direction of the target signal, and the Kmeans clustered result is 108.6032 degrees and 71.4290 degrees. The number of Kmeans clusters is considered to be artificially decided. This interference will result in different clustering results, and in some cases, the user cannot determine the number of clusters, the present invention improves the clustering for this problem, and as can be seen from the result fig. 8, the present invention correctly determines the number of clusters, and the final clustering results are 109.8939 ° and 70.0956 ° as shown in fig. 9. It can be seen that combining angular accuracy improves the accuracy of the DOA estimation.

Claims

1. A wideband DOA estimation method based on a microphone array, comprising the steps of:

s1, preprocessing a voice signal received by a microphone array, specifically extracting a voice section in the voice signal;

s2, dividing the frequency range of the voice signal into N frequency bands, converting the extracted voice signal from a time domain to a frequency domain, and selecting the frequency domain signal according to energy, wherein the method comprises the following steps: the threshold value is set according to the data received by the array element 1:

wherein Y is _i Representing the corresponding amplitude at the i-th frequency band received by element 1,

representing the average value of the whole array element receiving signals Y, and selecting frequencies which are larger than T and correspond to the T according to a threshold T on the corresponding frequency band;

s3, carrying out angle scanning on the frequency selected in the step S2 from 0 degree to 180 degrees, and scanning once every 0.5 degrees to obtain:

w＝e ^-2πfτ

wherein w represents a weighting vector, f represents the frequency point selected in S2, and τ represents the time delay of the microphone array element relative to the reference array element;

obtaining output signals Y of the microphone array at each selected frequency point _out The method comprises the following steps:

Y _out ＝|w′Y _max |

wherein Y is _max Representing the values of all array elements in the corresponding frequency domain at the selected frequency point, w' representing the transpose of the weighting vector w, and then finding the output signal Y _out The corresponding angle at the peak of the signal is DOA angle under each frequency point, and the DOA corresponding to each frequency point is counted to obtain a data set D _θ ；

S4, adopting the obtained data set D _θ Clustering is carried out to obtain DOA estimation results, specifically:

s41, obtaining the number of clusters: the number of clusters is set to be 1, min (100, length (D) _θ ))]，length(D _θ ) For dataset D _θ The data size of the cluster is calculated under the range of the cluster number, namely the cluster number L to be found is obtained, the threshold value is set by using the angle measurement precision, and D is removed _θ The angle exceeding the threshold value, the remaining angle is obtained as a new data set D' _θ The method comprises the steps of carrying out a first treatment on the surface of the Wherein the angle measurement precision is:

wherein θ _B Representing beam width, SNR is expressed as signal to noise ratio;

s42, for new data set D' _θ And carrying out Kmeans clustering to obtain a new clustering result, namely a DOA estimation result.