CN111679244A

CN111679244A - Direct sound time-frequency point selection method based on plane wave relative density

Info

Publication number: CN111679244A
Application number: CN202010400193.5A
Authority: CN
Inventors: 周晓凤; 黄青华
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-09-18
Anticipated expiration: 2040-05-13
Also published as: CN111679244B

Abstract

The invention discloses a direct sound time-frequency point selection method based on plane wave relative density, which comprises the following steps: firstly, establishing a ball array output model in a reverberation environment, and transforming the ball array output model into a time-frequency domain through short-time Fourier transform in order to apply the sparsity of a voice signal; obtaining the spherical harmonic coefficient of the plane wave density of the signal through spherical harmonic domain plane wave decomposition; utilizing inverse spherical Fourier transform to construct the plane wave density of each time-frequency point at any certain angle, wherein the higher the density is, the more likely the time-frequency point is dominated by single direct sound; after modulus is taken, traversing the angle, and calculating the ratio of the maximum plane wave density to the second maximum plane wave density as the relative density of the plane waves; and selecting the time frequency point with high relative density of the plane wave so as to obtain the final DOA estimation result of the direct sound. According to the method, a plane wave density function in a spherical harmonic domain is considered, the complex calculated amount caused by covariance matrix eigenvalue decomposition is reduced, time frequency points with high relative density are selected, and the DOA estimation precision is improved.

Description

Direct sound time-frequency point selection method based on plane wave relative density

Technical Field

The invention relates to a direct sound time-frequency point selection method based on plane wave relative density, which is applied to the technical fields of voice enhancement, video conference, robot hearing and the like.

Background

As one of important research directions of array signal processing, Direction of arrival (DOA) estimation of signals is extremely important. Many researchers are working on speech enhancement, sound scene analysis. Of these studies, more and more focus is on the study of DOA estimation in real environments. In a reverberant environment, reflected sound confuses direct sound due to multipath effects of sound propagation, so that it is difficult to accurately obtain a direct sound DOA estimation result.

In recent years, a common method for improving the robustness of reverberation is to process only signal segments containing direct sound, while rejecting those signal segments that are disturbed by reflections. For this purpose, Short-time fourier Transform (STFT) is commonly used for array signals, the signals are transformed into the time-frequency domain, and then time-frequency points dominated by direct sound are selected. With this method, the robustness to reverberation depends to a large extent on the ability to correctly select the time-frequency points dominated by the direct sound.

At present, spherical microphone arrays are widely applied to sound field analysis. The spherical array has three-dimensional symmetry, so that the sound field can be analyzed more comprehensively and the DOA of the sound source can be estimated conveniently. The spherical array acoustic field analysis is based on spherical fourier transform, i.e. the acoustic field is decomposed into orthogonal basis functions of spherical harmonics. Through spherical harmonic decomposition, the steering matrix decouples the independent portions of angle and frequency. Mohan et al propose a coherent detection method by selecting time-frequency points with an effective rank of 1, i.e. selecting time-frequency points with only a single active sound source within the time-frequency points. Since the covariance matrix is time-smoothed, when the effective rank is 1, a sound source with little coherence can be detected, and thus, the performance is good even in a low reverberation environment. However, under the condition of high reverberation, the selected time-frequency points contain direct sound and coherent reverberation signals, and the DOA estimation positioning result is influenced. Rafaely et al propose to use direct-path significance (DPD) test to select time-frequency points dominated by direct sound. The DPD test adopts frequency smoothing based on a spherical array, and the influence of coherent signals in a reverberation environment is obviously reduced. And selecting the time-frequency point corresponding to the high ratio by calculating the ratio of the maximum eigenvalue of the local covariance matrix to the second maximum eigenvalue. This test is for an array of balls, which eliminates the need to compute a focus matrix during frequency smoothing. However, the key of the frequency point when the eigenvalue ratio is selected is to select a time-frequency point when the energy of the maximum eigenvalue is relatively large, and at this time, the dominant sound source in the time-frequency point may be a reverberant sound with high energy, thereby reducing the estimation performance of the following Multiple Signal Classification (MUSIC). Madmoni et al propose a method for selecting time-frequency points based on plane wave similarity based on the MUSIC method. The method mainly comprises the step of calculating the similarity between a first eigenvector of a covariance matrix and each angle plane wave component in each time frequency point. The higher the similarity is, the higher the accuracy of the time frequency point selected by the time frequency point is, the DOA estimation performance of the final direct sound is improved, but the calculation amount is increased. In practical scene application, due to limited computing resources and strict real-time requirements, a high-complexity method for selecting the time frequency point is inevitably limited.

Disclosure of Invention

The invention aims to: aiming at the defects of the prior art, the method for selecting the direct sound time-frequency points based on the relative density of the plane waves is provided, the calculated amount of time-frequency point selection is reduced, and the estimation of the DOA of the direct sound is facilitated.

In order to achieve the above object, the idea of the present invention is:

firstly, converting a signal received based on a spherical array in a reverberation environment into a time-frequency domain; then calculating the spherical harmonic coefficient of the plane wave density function; then calculating the ratio of the maximum value of the plane wave density to the second maximum value; and finally, selecting a time frequency point corresponding to the high ratio, and calculating to obtain a DOA estimation result of the direct sound.

Firstly, establishing a signal model based on a spherical array in a reverberation environment, and converting a signal into a time-frequency domain through short-time Fourier transform in order to utilize the sparsity of a voice signal; then, performing spherical Fourier transform on the time-frequency domain signal, and performing plane wave decomposition to obtain spherical harmonic coefficients of a plane wave density function; calculating the plane wave density of each angle through inverse spherical Fourier transform, and calculating the ratio of the maximum value to the second maximum value after modulus taking, namely the plane wave relative density; and finally, selecting a time-frequency point corresponding to the high ratio in order to select a time-frequency point with dominant direct sound, and calculating to obtain a direct sound DOA estimation result.

According to the inventive concept, the technical scheme adopted by the invention is as follows:

a direct sound time-frequency point selection method based on plane wave relative density comprises the following steps:

1) establishing a spherical microphone array output model in a reverberation environment, and converting an obtained signal into a time-frequency domain through short-time Fourier transform in order to utilize the sparsity of a voice signal;

2) performing spherical Fourier transform and plane wave decomposition on the time-frequency domain signals obtained in the step 1) to obtain spherical harmonic coefficients of a plane wave density function;

3) calculating the spherical harmonic coefficient of the plane wave density function obtained in the step 2) to obtain the plane wave density of any certain angle corresponding to each time-frequency point by applying the inverse transformation of the spherical Fourier transform;

4) traversing the angle after modulus extraction of the plane wave density obtained in the step 3), and calculating the ratio of the maximum value to the second maximum value to obtain the plane wave relative density;

5) selecting the time frequency point corresponding to the relatively high density of the plane wave according to the ratio obtained in the step 4), and thus calculating to obtain a final direct sound DOA estimation result.

Compared with the prior art, the method has the following obvious and prominent substantive characteristics and remarkable advantages:

according to the method, the covariance matrix does not need to be calculated, the characteristic value decomposition is not involved, the calculated amount is reduced, the fact that the direct sound is not only high-density plane waves is considered, the maximum value of the density of the plane waves accounts for the absolute advantage is also considered, the time-frequency point corresponding to the high-density plane waves is selected in a ratio mode, and the DOA estimation precision of the final direct sound is improved.

Drawings

Fig. 1 is a flowchart of a direct sound time-frequency point selection method based on plane wave relative density according to the present invention.

Fig. 2 is a schematic diagram of a coordinate system of the spherical microphone array of the present invention.

FIG. 3 is a schematic diagram of selecting time-frequency points corresponding to a relatively high density of plane waves according to the present invention.

Detailed Description

For a better understanding of the technical solution of the present invention, the following detailed description of the preferred embodiments of the present invention is provided in conjunction with the accompanying drawings:

example one

Referring to fig. 1-3, a direct sound time-frequency point selection method based on plane wave relative density includes the following steps:

Example two

In this embodiment, referring to fig. 1, a process of a direct sound time-frequency point selection method based on a plane wave relative density is that, in a reverberation environment, a signal received by a spherical array is transformed to a time-frequency domain through short-time fourier transform, and then spherical harmonic coefficients corresponding to a plane wave density function of the received signal are obtained through spherical fourier transform and plane wave decomposition; at the moment, the steering matrix only contains sound source angle information, a plane wave density function corresponding to any certain angle is calculated in each time frequency point by solving inverse Fourier transform, the relative density of a plane wave is calculated by considering that the direct sound is not only the high density of the plane wave, but also the plane wave with the maximum plane wave density occupying the absolute dominant plane wave, a time frequency point higher than a threshold value is selected, and the DOA estimation of the direct sound is obtained, wherein the specific implementation steps are as follows:

1) as shown in fig. 2, a spherical microphone array output model in a reverberation environment is established, and in order to utilize the sparsity of a voice signal, an obtained signal is converted into a time-frequency domain through short-time fourier transform, so that direct sound estimation is converted into a time-frequency point for selecting a direct sound leading part, which is specifically as follows:

assuming that D signal sources are incident to a uniform spherical array with an array element number of L, signals received by the spherical array are expressed in a matrix form:

p(t)＝V(k,Φ)s(t)+n(t) (1)

wherein the content of the first and second substances,

representing the sound pressure signal received by the ball array, s (t) ═ s₁(t),...,s_D(t)]^TIs a vector of the sound source signal,

indicating a steering matrix containing DOA and frequency information, n (t) ═ n₁(t),...,n_L(t)]^TIs a mean of 0 and a variance of

White additive gaussian noise.

Transform equation (2) to the time-frequency domain using short-time fourier transform (STFT), expressed as:

p(τ,ω)＝V(k,Φ)s(τ,ω)+n(τ,ω) (2)

where τ is the time index and ω is the frequency index. Since the steering matrix is only related to DOA and frequency information, the steering matrix is unchanged after a short time fourier transform.

2) Performing spherical Fourier transform and plane wave decomposition on the time-frequency domain signals obtained in the step 1) to obtain spherical harmonic coefficients of a plane wave density function, wherein the specific process is as follows:

for spherical arrays, the steering matrix is decomposed into DOA and frequency components. Equation (2) can be expressed as:

p(τ,ω)＝Y(Ω)B(k)Y^H(Φ)s(τ,ω)+n(τ,ω) (3)

wherein the content of the first and second substances,

Y(Φ)＝[y^T(Φ₁),...,y^T(Φ_D)]^T(4)

is D × (N +1)²Phi of the sound source of dimension_d＝(θ_d,φ_d) The information on the angle of the light source,

each element

Representing spherical harmonics of order N and m, N being the maximum spherical harmonic order, and similarly, Y (Ω) is L × (N +1)²Array element of dimensional spherical array

Angle information of (2). (N +1)²×(N+1)²The diagonal matrix of dimensions b (k) contains the radial function of the plane wave scattering from a rigid sphere.

And (4) calculating to obtain a plane wave density function through spherical harmonic domain plane wave decomposition according to the formula:

wherein, a_nm(τ, ω) is a length of (N +1)²The spherical harmonic coefficient of the plane wave density function of (1).

3) Calculating the spherical harmonic coefficient of the plane wave density function obtained in the step 2) by using the inverse transformation of the spherical Fourier transform to obtain each angle theta corresponding to each time-frequency point_j(J1.. J.) the corresponding plane wave density is as follows:

a(τ,ω,Θ_j)＝y^T(Θ)a(τ,ω) (6)

4) traversing the angle after modulus of the plane wave density obtained in the step 3), and calculating the ratio of the maximum value to the second maximum value to obtain the plane wave relative density, wherein the specific process is as follows:

DEN(τ,ω，Θ_j)＝|a(τ,ω,Θ_j)| (7)

where DEN (τ, ω) represents the mode of the calculated plane wave density at (τ, ω). The larger the value of DEN (τ, ω), the more likely it is that the direct sound is dominant. However, if (τ, ω) is dominant in reverberant sound, the DEN (τ, ω) also has a maximum value, which is not sufficiently prominent.

In order to select the time frequency point with the relatively prominent maximum value, a plane wave density ratio is set, and the expression is as follows:

wherein RA (τ, ω) represents the plane wave density ratio at this time-frequency point (τ, ω), Θ_j1And Θ_j2Corresponding to the angles at which the DEN gets the maximum and second largest value, respectively.

5) Selecting the time frequency point corresponding to the relatively high density of the plane wave from the ratio obtained in the step 4), thereby obtaining a final direct sound DOA estimation result by calculation, wherein the specific process is as follows:

D^*＝{(τ,ω):RA(τ,ω)>TH} (9)

wherein D is^*The selected time frequency point set is shown, and TH is a threshold value. Calculating the final DOA estimated value in the selected time frequency points

Expressed as:

in summary, the method for selecting the direct sound time-frequency point based on the relative density of the plane wave comprises the following steps: firstly, establishing a ball array output model in a reverberation environment, and transforming the ball array output model into a time-frequency domain through short-time Fourier transform in order to apply the sparsity of a voice signal; obtaining the spherical harmonic coefficient of the plane wave density of the signal through spherical harmonic domain plane wave decomposition; utilizing inverse spherical Fourier transform to construct the plane wave density of each time-frequency point at any certain angle, wherein the higher the density is, the more likely the time-frequency point is dominated by single direct sound; after modulus is taken, traversing the angle, and calculating the ratio of the maximum plane wave density to the second maximum plane wave density as the relative density of the plane waves; and selecting the time frequency point with high relative density of the plane wave so as to obtain the final DOA estimation result of the direct sound. The method considers the plane wave density function in the spherical harmonic domain, reduces the complex calculated amount caused by covariance matrix eigenvalue decomposition, considers that direct sound is not only the plane wave high density but also the maximum value of the plane wave density occupies the absolute advantage, provides a method for the plane wave relative density, selects the time frequency point with high relative density, and improves the DOA estimation precision.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes and modifications can be made according to the purpose of the invention, and all changes, modifications, substitutions, combinations or simplifications made according to the spirit and principle of the technical solution of the present invention shall be equivalent substitution ways, so long as the invention meets the purpose of the present invention, and the technical principle and inventive concept of the method for selecting the time-frequency point of direct sound based on the relative density of plane waves shall fall within the protection scope of the present invention.

Claims

1. A direct sound time-frequency point selection method based on plane wave relative density is characterized by comprising the following steps:

2. The method for selecting the direct sound time-frequency point based on the relative density of the plane wave according to claim 1, wherein the spherical microphone array output model in the reverberation environment is established in the step 1), and specifically comprises the following steps:

p(t)＝V(k,Φ)s(t)+n(t) (1)

wherein the content of the first and second substances,

Additive white gaussian noise of (1);

transforming equation (2) to the time-frequency domain using a short-time fourier transform, expressed as:

p(τ,ω)＝V(k,Φ)s(τ,ω)+n(τ,ω) (2)

wherein τ is a time index and ω is a frequency index; since the steering matrix is only related to DOA and frequency information, the steering matrix is unchanged after a short time fourier transform.

3. The method for selecting the direct sound time-frequency point based on the relative density of the plane wave as claimed in claim 1, wherein the spherical harmonic coefficient of the plane wave density function is obtained by performing spherical Fourier transform and plane wave decomposition on the time-frequency domain signal obtained in step 1), and the specific operation steps are as follows:

for a spherical array, the steering matrix is decomposed into DOA and frequency components; equation (2) can be expressed as:

p(τ,ω)＝Y(Ω)B(k)Y^H(Φ)s(τ,ω)+n(τ,ω) (3)

wherein the content of the first and second substances,

Y(Φ)＝[y^T(Φ₁),...,y^T(Φ_D)]^T(4)

each element

Representing spherical harmonics of order N and m, N being the maximum spherical harmonic order, and similarly, Y (Ω) being L × (N +1)²Array element of dimensional spherical array

Angle information of (a); (N +1)²×(N+1)²The diagonal matrix of dimensions B (k) contains the radial function of the plane wave scattering from a rigid sphere;

4. The method for selecting direct sound time-frequency points based on plane wave relative density according to claim 1, wherein the spherical harmonic coefficients of the plane wave density function obtained in the step 2) are subjected to inverse spherical Fourier transform to calculate and obtain each angle theta corresponding to each time-frequency point_j(J1.. J.) the specific operation procedure for the plane wave density is as follows：

a(τ,ω,Θ_j)＝y^T(Θ)a(τ,ω) (6)。

5. The method for selecting the direct sound time-frequency point based on the relative density of the plane wave as claimed in claim 1, wherein the relative density of the plane wave is obtained by taking the modulus of the density of the plane wave obtained in step 3), traversing the angle, and calculating the ratio of the maximum value to the second maximum value, and the operation steps are as follows:

DEN(τ,ω,Θ_j)＝|a(τ,ω,Θ_j)| (7)

where DEN (τ, ω) represents the mode of the calculated plane wave density at (τ, ω); the larger the value of DEN (τ, ω), the more likely (τ, ω) is that the direct sound is dominant; if (τ, ω) is dominant in reverberant sound, there is also a maximum in DEN (τ, ω), which is not sufficiently prominent;

wherein RA (tau, omega) represents the plane wave density ratio at this time frequency point (tau, omega),

and

corresponding to the angles at which the DEN gets the maximum and second largest value, respectively.

6. The method for selecting the direct sound time-frequency point based on the relative density of the plane wave according to claim 1, wherein the ratio obtained in the step 4) is used for selecting the time-frequency point corresponding to the relatively high density of the plane wave, so as to calculate and obtain the final direct sound DOA estimation result, and the method comprises the following operation steps:

D^*＝{(τ,ω):RA(τ,ω)>TH} (9)

wherein D is^*Representing the selected time frequency point set, TH being a threshold value; calculating the final DOA estimated value in the selected time frequency points

Expressed as: