CN118250592A

CN118250592A - Wind noise reduction method and system based on array microphone

Info

Publication number: CN118250592A
Application number: CN202410359407.7A
Authority: CN
Inventors: 邱志豪
Original assignee: Xiamen Yealink Network Technology Co Ltd
Current assignee: Xiamen Yealink Network Technology Co Ltd
Priority date: 2024-03-27
Filing date: 2024-03-27
Publication date: 2024-06-25

Abstract

The invention provides a wind noise reduction method and a system based on an array microphone, wherein the method comprises the following steps: acquiring sound signals through an array microphone; calculating complex correlation coefficients corresponding to the array microphones according to the sound signals; determining the scene of the sound signal according to the complex coherence coefficient, wherein the scene of the sound signal comprises a pure wind noise scene, a pure human voice scene or a wind noise human voice mixed scene; constructing a wind noise removal filter according to the complex coherence coefficient; and according to the scene of the sound signal, performing corresponding wind noise reduction processing by using the wind noise reduction filter. The wind noise reducing method and system based on the array microphone can effectively judge wind noise, inhibit wind noise on the premise of keeping voice information and improve conversation quality of users.

Description

Wind noise reduction method and system based on array microphone

Technical Field

The invention relates to the technical field of microphone noise reduction, in particular to a wind noise reduction method and system based on an array microphone.

Background

Wind noise refers to unsteady noise generated by wind flowing to the microphone surface and microphone friction, so wind noise has no apparent directivity. Because wind noise has no obvious directivity and strong randomness, and the main energy is concentrated at low frequency, the voice pitch range is covered, and the conversation quality of users can be seriously affected. As the use of headphones increases, the chance of users encountering wind noise interference increases. Such as in an office scenario, the air conditioning blowing sound, and the fan blowing sound all affect the user experience. For example, in an outdoor scene, strong wind noise may even cause a user to be unable to talk using headphones. However, the wind noise detection and suppression cannot be effectively performed by using the conventional single-channel speech enhancement (spectral subtraction, wiener filtering) method or the multi-channel speech enhancement (spatial filtering such as beam forming) method. In addition, the recently developed deep learning method has the problems that the real-time performance is difficult to meet due to the large required operation amount, and the robustness for different scenes is still required to be improved.

Disclosure of Invention

The invention provides a wind noise reducing method and system based on an array microphone, which can effectively judge wind noise, inhibit the wind noise on the premise of keeping voice information and improve the conversation quality of a user.

In a first aspect, an embodiment of the present invention provides a wind noise reduction method based on an array microphone, including:

Acquiring sound signals through an array microphone;

Calculating complex correlation coefficients corresponding to the array microphones according to the sound signals;

Determining the scene of the sound signal according to the complex coherence coefficient, wherein the scene of the sound signal comprises a pure wind noise scene, a pure human voice scene or a wind noise human voice mixed scene;

constructing a wind noise removal filter according to the complex coherence coefficient;

And according to the scene of the sound signal, performing corresponding wind noise reduction processing by using the wind noise reduction filter.

The embodiment of the invention provides a wind noise reduction method based on an array microphone, which calculates complex coherence coefficients corresponding to the array microphone according to sound signals, further judges the scene of the sound signals according to the complex coherence coefficients, and implements different wind noise reduction processing means for different scenes, so that the wind noise is suppressed, the definition of voice is ensured, the voice distortion is avoided, and the conversation quality of a user is improved. Compared with other single-channel wind noise removal algorithms, the wind noise judgment method is more accurate and reliable; the wind noise is filtered more stably and cleanly; in addition, compared with other AI algorithms, the method has the advantages of smaller calculated amount and smaller performance cost required for achieving the same noise reduction effect.

In one possible implementation manner, the calculating the complex coherence coefficient corresponding to the array microphone according to the sound signal specifically includes:

Selecting a first microphone and a second microphone from the array microphones at will;

Calculating complex correlation coefficients of the first microphone and the second microphone according to the self-power spectral density of the first microphone, the self-power spectral density of the second microphone and the cross-power spectral density of the first microphone and the second microphone;

And calculating complex coherence coefficients between each microphone and other microphones in the array microphone, and carrying out summation and averaging to obtain the complex coherence coefficient corresponding to the array microphone.

The embodiment of the invention provides a method for calculating complex coherence coefficients corresponding to array microphones, which comprises the steps of calculating complex coherence coefficients among all microphones, and then comprehensively calculating the complex coherence coefficients of the whole array microphones. In the use scene of the earphone, because the human voice can be approximated as a point sound source, the process of transmitting the human voice to the array microphone is spatial, but wind noise is a random process, so that the complex coherence between the microphones of the wind noise is low, and the complex coherence of target voice is relatively high. Therefore, according to complex correlation coefficients among microphones, the scene where the voice signal is located can be judged, and the recognition accuracy of the conversation scene is improved.

Further, the complex coherence coefficient between the first microphone and the second microphone is calculated according to the self-power spectral density of the first microphone, the self-power spectral density of the second microphone and the cross-power spectral density of the first microphone and the second microphone, and the specific formula is as follows:

wherein Γ _ij (ω, τ) is the complex coherence coefficient of the first and second microphones with respect to each other, (Omega, tau) is the cross-power spectral density of the first and second microphones,AndThe self-power spectral densities of the first microphone and the second microphone, respectively, x _i and x _j represent the first microphone and the second microphone, respectively, ω represents a frequency domain unit, τ represents a time frame;

the cross-power spectral density of the first microphone and the second microphone is calculated by smoothing between time frames:

where γ is a smoothing coefficient, h is a conjugate transpose symbol, and X _i and X _j represent sound signals acquired by the first microphone and the second microphone, respectively.

Further, the complex coherence coefficient between each microphone and other microphones in the array microphone is calculated and summed and averaged, so as to obtain the complex coherence coefficient corresponding to the array microphone, and the specific formula is as follows:

Wherein Γ (ω, τ) is a complex coherence coefficient corresponding to the array microphone, and N is the number of microphones in the array microphone.

In one possible implementation manner, the determining, according to the complex coherence coefficient, a scene where the sound signal is located specifically is:

According to the frequency distribution of the sound signal, dividing the complex coherence coefficient into a low-frequency complex coherence coefficient and a medium-high-frequency complex coherence coefficient based on a preset frequency value;

When the complex correlation coefficient of the low frequency is smaller than a first threshold value, wind noise exists in the sound signal; otherwise, no wind noise exists in the sound signal;

When the complex correlation coefficient of the medium and high frequencies is larger than a second threshold value, human voice exists in the sound signal; otherwise, no human voice exists in the sound signal;

Determining the scene of the sound signal by combining the judgment of whether wind noise and human voice exist in the sound signal: when wind noise exists in the sound signal and no human voice exists, the scene is a pure wind noise scene; when no wind noise exists in the sound signal and the human voice exists, the scene is a pure human voice scene; when wind noise and human voice exist in the sound signal at the same time, the scene is a wind noise human voice mixed scene.

The embodiment of the invention provides a method for judging a scene where a sound signal is located according to complex coherence coefficients, wherein the complex coherence coefficients are divided into low-frequency complex coherence coefficients and medium-high-frequency complex coherence coefficients based on frequency distribution and preset frequency values of the sound signal.

In one possible implementation manner, the performing, according to the scene where the sound signal is located, the corresponding wind noise reduction processing by using the wind noise reduction filter specifically includes:

When the scene of the sound signal is the pure wind noise scene, wind noise reduction processing is carried out on the sound signal through the wind noise removal filter;

when the scene of the sound signal is the pure human sound scene, wind noise reduction is not performed;

When the scene of the sound signal is the wind noise and human voice mixed scene, firstly, carrying out fixed beam forming processing on the sound signal through a preset beam forming algorithm, and then carrying out wind noise reduction processing on the sound signal subjected to the fixed beam forming processing through the wind noise removal filter.

The embodiment of the invention provides different wind noise processing methods for three scenes, when the scene where the sound signal is located is the pure wind noise scene, wind noise reduction processing can be directly carried out, and noise without actual content is prevented from being heard by a user; when the scene of the sound signal is the pure human sound scene, wind noise reduction processing is not performed, the fidelity and definition of human sound are reserved, and meanwhile, the system performance cost is saved; when the scene of the sound signal is the wind noise and human voice mixed scene, distinguishing the wind noise signal and the human voice signal in the sound signal based on a beam forming algorithm, and carrying out wind noise reduction treatment on the wind noise signal by using a preset wind noise removal filter so as to keep the human voice signal; through the three different processing modes, the wind noise is restrained on the premise of keeping the voice information, and the conversation quality of the user is improved.

In a second aspect, correspondingly, an embodiment of the present invention provides an array microphone-based wind noise reduction system, which includes an acquisition module, a calculation module, a judgment module, a construction module and a noise reduction module;

the acquisition module is used for acquiring sound signals through the array microphone;

The calculation module is used for calculating complex coherence coefficients corresponding to the array microphones according to the sound signals;

the judging module is used for determining the scene of the sound signal according to the complex coherence coefficient, wherein the scene of the sound signal comprises a pure wind noise scene, a pure human voice scene or a wind noise human voice mixed scene;

The construction module is used for constructing a wind noise removal filter according to the complex coherence coefficient;

The noise reduction module is used for carrying out corresponding wind noise reduction processing by using the wind noise removal filter according to the scene where the sound signal is located.

In one possible implementation manner, the calculating module calculates complex coherence coefficients corresponding to the array microphone according to the sound signal, specifically:

In one possible implementation manner, the determining module determines, according to the complex coherence coefficient, a scene where the sound signal is located, specifically:

In one possible implementation manner, the noise reduction module uses the wind noise removal filter to perform corresponding wind noise reduction processing according to the scene where the sound signal is located, specifically:

Drawings

Fig. 1: the invention provides a flow diagram of an embodiment of a wind noise reduction method based on an array microphone.

Fig. 2: the invention provides a microphone arrangement schematic diagram in a method for reducing wind noise based on array microphones.

Fig. 3: the invention provides a wind noise and human voice mixed scene schematic diagram in a wind noise reduction method based on an array microphone.

Fig. 4: the invention provides a complex correlation coefficient and wind noise removal filter coefficient corresponding relation diagram in a wind noise reduction method based on an array microphone.

Fig. 5: the invention provides a noise reduction flow diagram of an embodiment of a wind noise reduction method based on an array microphone in a wind noise and human voice mixed scene.

Fig. 6: the invention provides a structural schematic diagram of an embodiment of a wind noise reduction method based on an array microphone.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that, the step numbers herein are only for convenience of explanation of the specific embodiments, and are not used as limiting the order of execution of the steps.

Throughout the specification, the array microphone described in the embodiments of the present invention is also referred to as a microphone array in some documents and data, which is an array formed by arranging a group of omnidirectional microphones located at different positions in space according to a certain shape rule, and is a device for spatially sampling spatially propagated sound signals, where the collected signals include spatial position information thereof. The array may be divided into a near-field model and a far-field model according to the distance between the sound source and the array microphone. The array microphone can be classified into a linear array, a planar array, a bulk array, and the like according to the topology of the array microphone.

The beam forming described in the embodiment of the invention is to perform delay or phase compensation and amplitude weighting processing on the output of each array element so as to form a beam pointing to a specific direction. The beam forming mainly aims at a multi-microphone array, fuses data of a plurality of channels, suppresses noise and interference directions and enhances signals in a target direction.

Embodiment one:

As shown in fig. 1, a first embodiment provides a wind noise reduction method based on an array microphone, which includes steps S1 to S5:

s1, acquiring sound signals through an array microphone;

S2, calculating complex coherence coefficients corresponding to the array microphones according to the sound signals;

S3, determining the scene of the sound signal according to the complex coherence coefficient, wherein the scene of the sound signal comprises a pure wind noise scene, a pure human voice scene or a wind noise and human voice mixed scene;

s4, constructing a wind noise removal filter according to the complex coherence coefficient;

and S5, according to the scene of the sound signal, performing corresponding wind noise reduction processing by using the wind noise removal filter.

Preferably, fig. 2 shows a microphone arrangement on an array microphone earphone, as shown in the figure, where four independent microphones are arranged in a transverse direction, so as to collect sound signals and distinguish sound sources at different positions in the sound signals. It should be noted that the number of microphones and the layout thereof are not limited thereto in practical applications.

In one possible implementation manner, in step S2, the calculating complex correlation coefficients corresponding to the array microphone according to the sound signal specifically includes:

In one possible implementation manner, in step S3, the determining, according to the complex coherence coefficient, a scene where the sound signal is located, specifically is:

Preferably, fig. 3 shows a wind noise and human voice mixed scene, wind noise and human voice are simultaneously obtained as sound signals by the array microphone, and the embodiment of the invention can determine the scene according to the complex coherence coefficient of the array microphone at the moment, further distinguish the wind noise signal and the human voice signal in the sound signals, and perform noise reduction processing on the wind noise signal.

Preferably, in step S4, the magnitude of the complex coherence coefficient of the array microphone ranges from 0 to 1, and the magnitude of the complex coherence coefficient of the array microphone is close to 1 for the target voice, and the complex coherence coefficient of the array microphone is smaller as the wind noise is stronger. The construction of the wind noise removal filter can be performed according to this characteristic. One way to construct the wind noise removal filter is to perform nonlinear processing on complex coherence coefficients of the array microphone, so that the wind noise removal filter can adjust the wind noise removal strength of the wind noise removal filter according to the complex coherence coefficients, and improve the wind noise removal stability, as shown in fig. 4. Fig. 4 shows a correspondence between complex coherence coefficients and de-noising filter coefficients, wherein the abscissa represents complex coherence coefficients of the array microphone and the ordinate represents de-noising filter coefficients.

In one possible implementation manner, in step S5, according to the scene where the sound signal is located, the wind noise removal filter is used to perform corresponding wind noise reduction processing, which specifically includes:

In a preferred embodiment, when the scene where the sound signal is located is the wind noise and human voice mixed scene, a flow of reducing wind noise of the array microphone is shown in fig. 5. In fig. 5, the current sound signal is first acquired by N microphones Mi c1 to Mi cN; then, calculating complex coherent coefficients corresponding to the array microphone according to the current sound signals, and carrying out nonlinear processing on the complex coherent coefficients to construct a wind noise removal filter; after determining that the scene of the sound signal is a wind noise and human voice mixed scene according to the complex coherence coefficient, performing fixed beam forming processing on the sound signal through a preset beam forming algorithm, and finally performing noise reduction processing on the wind noise signal by using a wind noise removal filter to realize wind noise reduction of the original sound signal.

In a second aspect, as shown in fig. 6, an embodiment of the present invention provides an array microphone-based wind noise reduction system, which includes an obtaining module 10, a calculating module 20, a judging module 30, a constructing module 40, and a noise reduction module 50;

wherein, the acquisition module 10 is used for acquiring sound signals through an array microphone;

The calculating module 20 is configured to calculate complex correlation coefficients corresponding to the array microphones according to the sound signals;

The judging module 30 is configured to determine a scene where the sound signal is located according to the complex coherence coefficient, where the scene where the sound signal is located includes a pure wind noise scene, a pure human voice scene, or a wind noise human voice mixed scene;

the construction module 40 is configured to construct a wind noise removal filter according to the complex correlation coefficient;

the noise reduction module 50 is configured to perform corresponding wind noise reduction processing by using the wind noise removal filter according to the scene where the sound signal is located.

In one possible implementation manner, the calculating module 20 calculates complex correlation coefficients corresponding to the array microphone according to the sound signal, specifically:

In one possible implementation manner, the determining module 30 determines, according to the complex coherence coefficient, a scene in which the sound signal is located, specifically:

In one possible implementation manner, the noise reduction module 50 uses the wind noise removal filter to perform corresponding wind noise reduction processing according to the scene where the sound signal is located, specifically:

The embodiment of the invention provides a wind noise reduction system based on an array microphone, which calculates complex coherence coefficients corresponding to the array microphone according to sound signals, further judges the scene of the sound signals according to the complex coherence coefficients, and implements different wind noise reduction processing means for different scenes, so that the wind noise is suppressed, the definition of voice can be ensured, the voice distortion is avoided, and the conversation quality of a user is improved. Compared with other single-channel wind noise removal algorithms, the wind noise judgment method is more accurate and reliable; the wind noise is filtered more stably and cleanly; in addition, compared with other AI algorithms, the method has the advantages of smaller calculated amount and smaller performance cost required for achieving the same noise reduction effect.

The working principle and the step flow of the embodiment can be described in more detail but are not limited to the description related to the first embodiment.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims

1. The wind noise reduction method based on the array microphone is characterized by comprising the following steps of:

Acquiring sound signals through an array microphone;

2. The method for reducing wind noise based on array microphones of claim 1, wherein the calculating complex coherence coefficients corresponding to the array microphones according to the sound signals is specifically as follows:

3. The method for reducing wind noise based on array microphones as claimed in claim 2, wherein the complex coherence coefficient between the first microphone and the second microphone is calculated according to the self-power spectral density of the first microphone, the self-power spectral density of the second microphone and the cross-power spectral density of the first microphone and the second microphone, and the specific formula is:

4. The method for reducing wind noise based on array microphones of claim 3, wherein the complex correlation coefficient between each microphone and other microphones in the array microphones is calculated and summed and averaged to obtain the complex correlation coefficient corresponding to the array microphones, and the specific formula is:

5. The method for reducing wind noise based on array microphone as defined in claim 1, wherein the determining the scene of the sound signal according to the complex coherence coefficient is specifically:

6. The method for reducing wind noise based on array microphone as claimed in claim 1, wherein said using said wind noise reduction filter to perform corresponding wind noise reduction according to the scene of said sound signal comprises:

7. The wind noise reduction system based on the array microphone is characterized by comprising an acquisition module, a calculation module, a judgment module, a construction module and a noise reduction module;

8. The wind noise reduction system based on an array microphone according to claim 7, wherein the calculating module calculates complex coherence coefficients corresponding to the array microphone according to the sound signal, specifically:

9. The wind noise reduction system based on the array microphone as claimed in claim 7, wherein the determining module determines a scene in which the sound signal is located according to the complex coherence coefficient, specifically:

10. The system for reducing wind noise based on array microphones of claim 7, wherein the noise reduction module uses the wind noise removal filter to perform corresponding wind noise reduction according to a scene where the sound signal is located, specifically: