CN112309417B

CN112309417B - Method, device, system and readable medium for processing audio signal with wind noise suppression

Info

Publication number: CN112309417B
Application number: CN202011141705.7A
Authority: CN
Inventors: 许云峰
Original assignee: Lusheng Technology Co ltd
Current assignee: Lusheng Technology Co ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2023-07-07
Anticipated expiration: 2040-10-22
Also published as: CN112309417A

Abstract

The application provides an audio signal processing method, device and system for wind noise suppression, a computer readable medium and a training method of a wind noise attenuation prediction model based on an artificial neural network. The method for processing the audio signal with the wind noise suppression comprises the following steps: converting an original time domain signal of the input audio into an initial frequency domain signal; predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a wind noise attenuation prediction model based on an artificial neural network; determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame; wind noise suppression is carried out on the initial frequency domain signal based on the final predicted wind noise attenuation gain; and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression. According to the method, wind noise elimination is carried out by adopting a wind noise attenuation prediction model based on an artificial neural network, so that the wind noise suppression effect can be greatly improved.

Description

Method, device, system and readable medium for processing audio signal with wind noise suppression

Technical Field

The present application relates generally to the field of signal processing, and in particular, to a method, apparatus, system, and computer readable medium for processing an audio signal with wind noise suppression, and a training method for a wind noise attenuation prediction model based on an artificial neural network.

Background

Wind noise is a very complex and difficult to predict noise that is created when wind strikes the microphone surface, as these turbulence features formed on the surface near the microphone are picked up by the microphone. Many outdoor sound-collecting devices are subject to wind noise, for example: hearing aids, conversation headphones, cell phones, handheld video equipment, and the like. Particularly when the outdoor wind is very large, wind noise can have a very large influence on these devices, and interference with the devices results in very poor sound quality.

The spectral characteristics and magnitude of wind noise depend on the magnitude of wind speed, microphone position and sound gun structure. Generally, wind noise spectrum energy is mainly concentrated in a low frequency part, and the energy shifts to a high frequency as the wind speed increases. The formation of wind noise is complex and difficult to predict, so how to handle wind noise is a problem in the art.

In general, the conventional manner of wind noise processing mainly includes wind noise processing based on a single microphone and wind noise processing based on multiple microphones. The basic framework of wind noise processing of a single microphone is generally that wind noise detection is firstly carried out, if wind noise is detected, then the Power Spectral Density (PSD) of the current wind noise is estimated, then the attenuation gain of wind noise suppression is calculated according to the power spectral spectrum, and noise-carried voice is attenuated to obtain a clean voice signal. There are various wind noise detection methods of a general single channel, for example: based on spectral centroid methods, negative Slope Fitting (NSF), etc. Single channel power spectral density estimation methods are also varied, for example: based on template matching methods, negative slope fitting methods, etc.

Compared with the single-microphone scheme, the traditional double-microphone noise reduction method can effectively utilize the correlation of 2 microphones for processing, so that the performance is better than that of the single-microphone scheme. The double-microphone noise reduction scheme utilizes the shape of weak correlation of wind noise collected by two microphones to detect the wind noise and estimate the power spectral density, thereby further suppressing the wind noise. Under the condition of wind noise, the coherence coefficient of the two microphones is very small, and the coherence coefficient is larger when the voice appears.

The wind noise suppression performance of the dual microphone is relatively better than that of the single microphone, but the single microphone is cheaper in cost and more convenient in use. In any event, these conventional algorithms still have the following problems: wind noise suppression has larger residues due to difficulty in accurately estimating wind noise power spectrum; wind noise is very unstable in the time domain and causes difficult tracking, which in turn may cause significant distortion and even discontinuity in near-end speech. Therefore, how to better perform wind noise suppression is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The technical problem to be solved by the application is to provide an audio signal processing method, device and system for wind noise suppression, a computer readable medium and a training method for a wind noise attenuation prediction model based on an artificial neural network, which can better perform wind noise suppression.

In order to solve the above technical problems, the present application provides an audio signal processing method for wind noise suppression, including: converting an original time domain signal of the input audio into an initial frequency domain signal; predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; determining a final predicted wind noise attenuation gain of a current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of a neighboring frequency point signal of the current frame of the initial frequency domain signal; wind noise suppression is carried out on the initial frequency domain signal based on the final prediction wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal, and a frequency domain signal after wind noise suppression is obtained; and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

In an embodiment of the present application, the converting the original time domain signal of the input audio into the initial frequency domain signal is performed using a subband filtering module based on a weighted overlap-add analysis filter; and the step of converting the frequency domain signal after wind noise suppression into the time domain signal after wind noise suppression is performed by using a comprehensive processing module based on a weighted superposition analysis filter.

In an embodiment of the present application, the pre-trained wind noise attenuation prediction model based on the artificial neural network is a pre-trained wind noise attenuation prediction model based on a gated loop unit neural network.

In an embodiment of the present application, the determining, according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal includes: calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal; and determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame of the adjacent frequency point signal of the initial frequency domain signal is performed in the following manner:

GainHarm(t，k)＝IRM′(t，k)/max(IRM′(t，k) _{k-2，k-1，k，k+1，k+2} )

Wherein t is the number of frames, k is the frequency point, gainHarm (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM '(t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, max (IRM' (t, k) _{k-2，k-1，k，k+1，k+2} ) And obtaining the maximum value of the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal is performed in the following manner:

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

wherein t is the frame number, k is the frequency point, gainF (t, k) is the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, gainram (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the method further comprises: calculating noise suppression attenuation gain according to the initial frequency domain signal; and taking the minimum value of the noise suppression attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal as a new final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

In order to solve the above technical problem, the present application further provides an audio signal processing method for wind noise suppression, including: converting an original time domain signal of the input audio into an initial frequency domain signal; predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; detecting whether wind noise protection is needed to be carried out on the current frequency point signal of the current frame of the initial frequency domain signal according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal; if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, obtaining a frequency domain signal after wind noise suppression, and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection, wind noise suppression is not carried out on the current frequency point signal of the current frame of the initial frequency domain signal, and the initial frequency domain signal is converted into a corresponding time domain signal.

In an embodiment of the present application, the detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal includes: performing spectrum centroid estimation on the current frequency point signal of the current frame of the initial frequency domain signal to obtain a first centroid; performing spectrum centroid estimation according to the product of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and obtaining a second centroid; calculating a difference between the first centroid and the second centroid, and judging whether the difference is smaller than a first threshold; and if the difference value is smaller than the first threshold, determining that wind noise protection is not needed for the current frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal further includes: if the difference value is not smaller than the first threshold, judging whether a predicted wind noise attenuation gain amount is smaller than a second threshold, wherein the predicted wind noise attenuation gain amount is an average value of the predicted wind noise attenuation gains of all frequency point signals of a current frame of the initial frequency domain signal; if the predicted wind noise attenuation gain is smaller than the second threshold, determining that wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal; and if the predicted wind noise attenuation gain is not smaller than the second threshold, determining that wind noise protection is not needed for the current frequency point signal of the current frame of the initial frequency domain signal.

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

In order to solve the above technical problem, the present application further provides an audio signal processing device for wind noise suppression, including: the first conversion module is used for converting an original time domain signal of input audio into an initial frequency domain signal; the wind noise attenuation prediction module is used for predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; the attenuation gain fusion module is used for determining the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the prediction wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal; the wind noise suppression module is used for performing wind noise suppression on the initial frequency domain signal based on the final prediction wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after wind noise suppression; and the second conversion module is used for converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

In order to solve the above technical problem, the present application further provides an audio signal processing device for wind noise suppression, including: the first conversion module is used for converting an original time domain signal of input audio into an initial frequency domain signal; the wind noise attenuation prediction module is used for predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; the wind noise protection detection module is used for detecting whether wind noise protection is needed to be carried out on the current frequency point signal of the current frame of the initial frequency domain signal according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal; the wind noise suppression module is used for determining a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal if wind noise protection is not needed, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, obtaining a frequency domain signal after wind noise suppression, and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and the wind noise protection module is used for converting the initial frequency domain signal into a corresponding time domain signal without wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection.

In order to solve the above technical problem, the present application further provides an audio signal processing system for wind noise suppression, including: a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement any of the methods described above.

To solve the above technical problem, the present application also provides a computer readable medium storing computer program code which, when executed by a processor, implements any of the methods described above.

In order to solve the technical problem, the application also provides a training method of the wind noise attenuation prediction model based on the artificial neural network, which comprises the following steps: acquiring voice data and wind noise data; mixing the voice data and the wind noise data to generate a mixed sound signal; calculating a learning target according to the voice data and the wind noise data; and taking the mixed sound signal as input of an artificial neural network-based wind noise attenuation prediction model, taking the learning target as expected output of the artificial neural network-based wind noise attenuation prediction model, training the artificial neural network-based wind noise attenuation prediction model, and obtaining a trained artificial neural network-based wind noise attenuation prediction model.

In an embodiment of the present application, the calculating the learning target according to the voice data and the wind noise data is calculated in the following manner:

wherein IRM is the learning target; t is the number of frames; k is a frequency point; s is the voice data; n is the wind noise data; beta is an exponential factor.

In an embodiment of the present application, the cost function adopted by the wind noise attenuation prediction model based on the artificial neural network is a minimum mean square error.

Compared with the prior art, the wind noise suppression audio signal processing method, device, system and computer readable medium provided by the application predict the predicted wind noise attenuation gain of the initial frequency domain signal by adopting the wind noise attenuation prediction model based on the artificial neural network, then eliminate wind noise, can have better wind noise detail suppression amount, reduce wind noise residues and greatly improve the wind noise suppression effect.

According to the audio signal processing method, device, system and computer readable medium for wind noise suppression, whether the wind noise protection is needed or not is detected, the wind noise is eliminated by adopting the wind noise attenuation prediction model based on the artificial neural network to predict the predicted wind noise attenuation gain of the signal under the condition that the wind noise protection is not needed, the detail suppression amount of wind noise can be better, the wind noise residual is reduced, and the wind noise suppression effect is greatly improved; under the condition that wind noise protection is needed, wind noise suppression is not carried out on the signals, and the situation that under the condition that no wind noise exists, the useful signals are greatly damaged due to the fact that a wind noise attenuation prediction model is not matched can be greatly reduced.

According to the training method of the wind noise attenuation prediction model based on the artificial neural network, the wind noise attenuation prediction model based on the artificial neural network can be trained, and when the wind noise attenuation prediction model based on the artificial neural network obtained through training is used for processing the wind noise suppressed audio signal, the detail suppression amount of wind noise is more excellent, the wind noise residual is less, and the wind noise suppression effect is better.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the accompanying drawings:

fig. 1 is a schematic flow chart of an audio signal processing method of wind noise suppression according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of an audio signal processing method of wind noise suppression according to another embodiment of the present application.

Fig. 3 is a schematic block diagram of an audio signal processing apparatus for wind noise suppression according to an embodiment of the present application.

Fig. 4 is a schematic block diagram of an audio signal processing apparatus for wind noise suppression according to another embodiment of the present application.

FIG. 5 is a schematic flow chart diagram illustrating a method of training an artificial neural network-based wind noise attenuation prediction model according to an embodiment of the present application.

FIG. 6 is a schematic block diagram of an audio signal processing system or training system based on an artificial neural network wind noise attenuation prediction model showing wind noise suppression according to an embodiment of the present application.

Fig. 7 is a schematic flow chart diagram illustrating a method of implementing step 203 of fig. 2 according to an embodiment of the present application.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application may be applied to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

As used in this application and in the claims, the terms "a," "an," "the," and/or "the" are not specific to the singular, but may include the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Flowcharts are used in this application to describe the operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously. At the same time, other operations are added to or removed from these processes.

The application provides an audio signal processing method for wind noise suppression. Fig. 1 is a flowchart of an audio signal processing method of wind noise suppression shown according to the present embodiment. Referring to fig. 1, the method for processing an audio signal for wind noise suppression of the present embodiment includes:

step 101, converting an original time domain signal of input audio into an initial frequency domain signal;

step 102, predicting a predicted wind noise attenuation gain according to an initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

step 103, determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal;

104, performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after wind noise suppression; and

step 105, converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

The following describes in detail the respective steps 101 to 105 of the audio signal processing method of wind noise suppression of the present embodiment:

In step 101, the wind noise suppressed audio signal processing system converts an original time domain signal of the input audio into an initial frequency domain signal. In an embodiment of the present application, converting the original time domain signal of the input audio into the original frequency domain signal may be performed using a Weighted Overlap-add (WOLA) based subband filtering module. The WOLA-based sub-band filtering module can perform sub-band filtering on an original time domain signal of input audio to obtain a multi-channel frequency domain signal of a complex domain.

In step 102, the system predicts a predicted wind noise attenuation gain from the initial frequency domain signal using a pre-trained artificial neural network-based wind noise attenuation prediction model. In an embodiment of the present application, the pre-trained artificial neural network-based wind noise attenuation prediction model may be a pre-trained gated loop unit (Gate Recurrent Unit, GRU) neural network-based wind noise attenuation prediction model. In an embodiment of the present application, the system may take log from a model of the initial frequency domain signal and generate a log spectrum, and use the log spectrum of the initial frequency domain signal as an input of a pre-trained wind noise attenuation prediction model based on an artificial neural network. The GRU is one of the cyclic neural networks (Recurrent Neural Network, RNN) and has the advantage of smaller model and calculation.

In step 103, the system determines a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, step 103 may include the following steps 1031-1032:

in step 1031, the system calculates a harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal. In an embodiment of the present application, step 1031 may be performed in the following manner:

wherein t is the frame number, k is the frequency point, gainHarm (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM '(t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and max (IRM' (t, k) _{k-2，k-1，k，k+1，k+2} ) The maximum value of the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal is obtained. The adjacent frequency point signals of the current frame of the initial frequency domain signal in the embodiment comprise four adjacent frequency points with frequency points of k-2, k-1, k+1 and k+2, and the user can determine the number of the adjacent frequency points according to actual needs, which is not limited in the application. The above method of calculating the harmonic enhancement gain may be referred to as a local normalization method. The harmonic enhancement gain is a gain for further suppressing noise between the harmonics of the speech and protecting the harmonics of the speech of the current frame signal, and is capable of enhancing the speech signal.

In step 1032, the system determines a final predicted wind noise attenuation gain based on the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

In one embodiment of the present application, step 1032 may be performed in the following manner:

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

where t is the number of frames, k is the frequency bin, gainF (t, k) is the final predicted wind noise attenuation gain of the current frequency bin signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency bin signal of the current frame of the initial frequency domain signal, gainram (t, k) is the harmonic enhancement gain of the current frequency bin signal of the current frame of the initial frequency domain signal. The wind noise attenuation gain may be used to suppress wind noise and enhance speech from the initial frequency domain signal.

In step 104, the system performs wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal, and obtains a frequency domain signal after wind noise suppression. Wind noise suppression is to multiply each frequency point signal of each frame of the initial frequency domain signal with a corresponding final predicted wind noise attenuation gain respectively, and then the obtained product is the frequency domain signal after wind noise suppression.

In step 105, the system converts the wind noise suppressed frequency domain signal to a wind noise suppressed time domain signal. In an embodiment of the present application, step 105 may be performed using an integrated processing module based on a weighted overlap analysis filter.

In an embodiment of the present application, the method for processing an audio signal with wind noise suppression may further include the following steps 106 to 107, and the steps 106 to 107 may be performed between the step 103 and the step 104:

step 106, the system calculates noise suppression attenuation gain according to the initial frequency domain signal. The system may use a generic noise suppressor to calculate the noise suppression attenuation gain, which is not limited in this application.

In step 107, the system takes the minimum value of the noise suppression attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal as the new final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal. In one example, the final predicted wind noise attenuation gain used in step 107 may be the final predicted wind noise attenuation gain resulting from steps 1031 and 1032 described above, that is, step 107 may be performed in the following manner:

GainF(t，k)＝min(IRM’(t，k)*GainHarm(t，k)，GainN(t，k))

Where t is the number of frames, k is the frequency bin, gainF (t, k) is the new final predicted wind noise attenuation gain of the current frequency bin signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency bin signal of the current frame of the initial frequency domain signal, gainram (t, k) is the harmonic enhancement gain of the current frequency bin signal of the current frame of the initial frequency domain signal.

The wind noise suppression effect when wind noise suppression is performed by using the final predicted wind noise attenuation gain can be further improved by taking the minimum value of the noise suppression attenuation gain and the final predicted wind noise attenuation gain in the steps 106 to 107 as the new final predicted wind noise attenuation gain.

In summary, in the audio signal processing method for wind noise suppression in the embodiment, the wind noise attenuation prediction model based on the artificial neural network is used to predict the predicted wind noise attenuation gain of the initial frequency domain signal, and then wind noise is eliminated, so that better detail suppression amount of wind noise can be achieved, wind noise residues are reduced, and wind noise suppression effect is greatly improved.

The application also provides another audio signal processing method for wind noise suppression. Fig. 2 is a flowchart of an audio signal processing method of wind noise suppression shown according to the present embodiment. Referring to fig. 2, the method for processing an audio signal for wind noise suppression of the present embodiment includes:

Step 201, an audio signal processing system with wind noise suppression converts an original time domain signal of input audio into an initial frequency domain signal;

step 202, predicting a predicted wind noise attenuation gain according to an initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

step 203, the system detects whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, the step 204 is entered, and if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection, the step 205 is entered;

step 204, the system determines the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the prediction wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, performs wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, obtains a frequency domain signal after wind noise suppression, and converts the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and

In step 205, the system does not perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal, and converts the initial frequency domain signal into a corresponding time domain signal.

The following describes in detail the respective steps 201 to 205 of the audio signal processing method of wind noise suppression of the present embodiment:

steps

201 and 202 may refer to

steps

101 and 102 in the previous embodiments, and the description will not be repeated here.

In step 203, the system detects whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, so as to detect whether the current signal has wind noise, and greatly reduce the situation that the wind noise is strongly suppressed due to the prediction error of the wind noise attenuation prediction model under the condition that the wind noise is not generated, thereby causing great damage to the useful signal.

In an embodiment of the present application, as shown in fig. 7, detecting whether wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal in step 203 may include the following steps 2031-2035:

Step 2031, the system performs spectrum centroid estimation on the current frequency point signal of the current frame of the initial frequency domain signal and obtains a first centroid;

step 2032, the system performs spectrum centroid estimation according to the product of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and obtains a second centroid;

step 2033, the system calculates a difference between the first centroid and the second centroid;

step 2034, the system determines whether the difference is less than a first threshold; and

in step 2035, if the difference is smaller than the first threshold, the system determines that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, where the first threshold may be preset by the user according to actual needs, which is not limited in the present application.

In one example, the spectral centroid estimation of

steps

2031 and 2032 may be performed in the following manner:

wherein C is the mass center; m is the number of frequency points; k is a frequency point index, and the serial number of the frequency point represents a sampling point of the frequency; y (k) is the spectrum sampling point of the kth frequency point.

Centroid measurement can measure the centroid of the power spectral density of the current signal, when the current signal contains wind noise, the centroid C of the signal is close to a low-frequency part, and then the centroid C is smaller; when wind noise is not contained, the centroid C is relatively close to the high frequency part, and then the centroid C is larger. Thus, when the difference value is smaller than the first threshold, it is indicated that the wind noise of the current frequency point signal of the current frame of the initial frequency domain signal is small or no wind noise exists.

In an embodiment of the present application, as shown in fig. 7, in step 203, detecting whether wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal may further include the following steps 2036-2038, and steps 2036-2038 may be performed after the foregoing step 2034:

step 2036, if the difference is not less than the first threshold, the system determines whether the predicted wind noise attenuation gain is less than the second threshold, where the predicted wind noise attenuation gain is an average value of predicted wind noise attenuation gains of all frequency point signals of the current frame of the initial frequency domain signal;

step 2037, if the predicted wind noise attenuation gain is smaller than the second threshold, determining that wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal by the system; and

in step 2038, if the predicted wind noise attenuation gain is not less than the second threshold, the system determines that wind noise protection is not required for the current frequency point signal of the current frame of the initial frequency domain signal.

And when the difference value is smaller than a first threshold, the wind noise or the wind noise is larger in the current frequency point signal of the current frame of the initial frequency domain signal. In step 2036, the system estimates the amount of attenuation of the predicted wind noise attenuation gain to the initial frequency domain signal by calculating a predicted wind noise attenuation gain amount, which indicates that the signal is suppressed more. The second threshold may be preset by the user according to actual needs, which is not limited in this application.

In step 2037, when the predicted wind noise attenuation gain amount is less than the second threshold, the system may indicate that wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal by outputting flag=1.

Step 204 is performed after the system detects in step 203 that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection. Because wind noise protection is not required for the current frequency point signal of the current frame of the initial frequency domain signal, the system needs to determine the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal for wind noise suppression. The actions performed by the system in step 204 may be divided into the following steps 2041-2043:

step 2041, determining a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal by the system;

step 2042, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and obtaining a frequency domain signal after wind noise suppression; and

Step 2043, converting the wind noise suppressed frequency domain signal into a wind noise suppressed time domain signal.

Steps 2041-2043 described above may be referred to correspondingly to steps 103-105 in the previous embodiments and will not be repeated here.

Step 205 is performed after the system detects in step 203 that the wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal. Because wind noise protection needs to be carried out on the current frequency point signal of the current frame of the initial frequency domain signal, the system does not carry out wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the method for processing an audio signal with wind noise suppression may further include steps 206-207, which may be performed between step 2041 and step 2042:

step 206, calculating noise suppression attenuation gain according to the initial frequency domain signal; and

step 207, taking the minimum value of the noise suppression attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal as the new final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

The above steps 206-207 may refer to steps 106-107 in the previous embodiments, and will not be repeated here.

In summary, in the method for processing the audio signal with wind noise suppression in the embodiment, whether the signal needs wind noise protection is detected, and under the condition that wind noise protection is not needed, the wind noise is eliminated by adopting the wind noise attenuation gain predicted by predicting the wind noise of the signal based on the wind noise attenuation prediction model of the artificial neural network, so that the wind noise detail suppression amount can be better, the wind noise residual can be reduced, and the wind noise suppression effect can be greatly improved; under the condition that wind noise protection is needed, wind noise suppression is not carried out on the signals, and the situation that under the condition that no wind noise exists, the useful signals are greatly damaged due to the fact that a wind noise attenuation prediction model is not matched can be greatly reduced.

The application also provides an audio signal processing device for wind noise suppression. Fig. 3 is a block diagram of an audio signal processing apparatus of wind noise suppression according to the present embodiment. As shown in fig. 3, the wind noise suppressed audio signal processing apparatus 300 includes a first conversion module 301, a wind noise attenuation prediction module 302, an attenuation gain fusion module 303, a wind noise suppression module 304, and a second conversion module 305.

The first conversion module 301 is configured to convert an original time domain signal of input audio into an initial frequency domain signal.

The wind noise attenuation prediction module 302 is configured to predict a predicted wind noise attenuation gain according to the initial frequency domain signal using a pre-trained wind noise attenuation prediction model based on an artificial neural network.

The attenuation gain fusion module 303 is configured to determine a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal. In an embodiment of the present application, the attenuation gain fusion module 303 may include a harmonic enhancement sub-module 3031 and a gain fusion module 3032. The determining of the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal may be performed by a harmonic enhancement sub-module 3031 and a gain fusion module 3032, wherein the harmonic enhancement sub-module 3031 is configured to calculate the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and the gain fusion module 3032 is configured to determine the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frequency point signal of the current frame of the current frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame of the initial frequency point signal. The steps performed by the modules 3031-3032 described above may be referred to in the description of steps 1031-1032 of the previous embodiments, and will not be repeated here.

The wind noise suppression module 304 is configured to perform wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal, and obtain a frequency domain signal after wind noise suppression.

The second conversion module 305 is configured to convert the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

The steps performed by the modules 301-305 described above may be referred to accordingly in the description of steps 101-105 in the previous embodiments, and will not be repeated here.

The application also provides another audio signal processing device for wind noise suppression. Fig. 4 is a block diagram of an audio signal processing apparatus of wind noise suppression according to the present embodiment. As shown in fig. 4, the wind noise suppressed audio signal processing apparatus 400 includes a first conversion module 401, a wind noise attenuation prediction module 402, a wind noise protection detection module 403, a wind noise suppression module 404, and a wind noise protection module 405.

The first conversion module 401 is configured to convert an original time domain signal of input audio into an initial frequency domain signal.

The wind noise attenuation prediction module 402 is configured to predict a predicted wind noise attenuation gain from the initial frequency domain signal using a pre-trained artificial neural network-based wind noise attenuation prediction model.

The wind noise protection detection module 403 is configured to detect whether wind noise protection is required for a current frequency point signal of a current frame of the initial frequency domain signal according to a current frequency point signal of the current frame of the initial frequency domain signal and a predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

The wind noise suppression module 404 is configured to determine a final predicted wind noise attenuation gain of a current frequency point signal of the current frame of the initial frequency domain signal according to a predicted wind noise attenuation gain of a current frequency point signal of the current frame of the initial frequency domain signal and a predicted wind noise attenuation gain of a neighboring frequency point signal of the current frame of the initial frequency domain signal, perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, obtain a frequency domain signal after wind noise suppression, and convert the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

The wind noise protection module 405 is configured to convert an initial frequency domain signal into a corresponding time domain signal without performing wind noise suppression on a current frequency point signal of a current frame of the initial frequency domain signal if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection.

The steps performed by the modules 401-405 described above may be referred to accordingly in the description of the steps 201-205 in the previous embodiments, and will not be repeated here.

The application also provides a training method of the wind noise attenuation prediction model based on the artificial neural network. Fig. 5 is a flowchart showing a training method of an artificial neural network-based wind noise attenuation prediction model according to the present embodiment. Referring to fig. 5, the training method of the wind noise attenuation prediction model based on the artificial neural network of the present embodiment includes steps 501-505, which may be executed by the training system of the wind noise attenuation prediction model based on the artificial neural network:

step 501, obtaining voice data and wind noise data;

step 502, mixing the voice data and the wind noise data to generate a mixed sound signal;

step 503, calculating a learning target according to the voice data and the wind noise data; and

step 504, taking the mixed sound signal as input of a wind noise attenuation prediction model based on the artificial neural network, taking a learning target as expected output of the wind noise attenuation prediction model based on the artificial neural network, training the wind noise attenuation prediction model based on the artificial neural network, and obtaining a trained wind noise attenuation prediction model based on the artificial neural network.

In an embodiment of the present application, step 503 may be calculated in the following manner:

wherein IRM is a learning target, t is a frame number, k is a frequency point, S is voice data, N is wind noise data, β is an exponential factor for controlling attenuation gain, β may be an empirical value set by a user, and in one example may take a value of 0.5 or 1.

In an embodiment of the present application, the cost function adopted by the wind noise attenuation prediction model based on the artificial neural network may be a minimum mean square error.

Through the steps 501-504, the training method of the wind noise attenuation prediction model based on the artificial neural network can train the wind noise attenuation prediction model based on the artificial neural network, and when the wind noise attenuation prediction model based on the artificial neural network obtained through training is used for processing the audio signal of wind noise suppression, the wind noise detail suppression amount is more excellent, the wind noise residual is less, and the wind noise suppression effect is better.

The application also provides an audio signal processing system for wind noise suppression, comprising: a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement any of the wind noise suppressed audio signal processing methods or artificial neural network based wind noise attenuation prediction model training methods described above.

Fig. 6 is a system block diagram of an audio signal processing system or a training system based on a wind noise attenuation prediction model of an artificial neural network showing wind noise suppression according to the present embodiment. The system 600 may include an internal communication bus 601, a Processor (Processor) 602, a Read Only Memory (ROM) 603, a Random Access Memory (RAM) 604, and a communication port 605. When implemented on a personal computer, the system 600 may also include a hard disk 607. Internal communication bus 601 may enable data communication among the components of system 600. The processor 602 may make the determination and issue the prompt. In some embodiments, the processor 602 may be comprised of one or more processors. Communication port 605 may enable system 600 to communicate data with the outside. In some embodiments, system 600 may send and receive information and data from a network through communication port 605. The system 600 may also include various forms of program storage units and data storage units, such as a hard disk 607, read Only Memory (ROM) 603 and Random Access Memory (RAM) 604, capable of storing various data files for computer processing and/or communication, as well as possible program instructions for execution by the processor 602. The processor executes these instructions to implement the main part of the method. The result processed by the processor is transmitted to the user equipment through the communication port and displayed on the user interface.

The above-mentioned method for processing an audio signal with suppressed wind noise or the training method for a model for predicting wind noise attenuation based on an artificial neural network may be implemented as a computer program, stored in the hard disk 607, and recorded in the processor 602 for execution, so as to implement any one of the methods for processing an audio signal with suppressed wind noise or the training method for a model for predicting wind noise attenuation based on an artificial neural network in the present application.

The present application also provides a computer readable medium storing computer program code which, when executed by a processor, implements any of the wind noise suppressed audio signal processing methods or artificial neural network based wind noise attenuation prediction model training methods as described above.

When the audio signal processing method for wind noise suppression or the training method for the wind noise attenuation prediction model based on the artificial neural network is implemented as a computer program, the audio signal processing method and the training method can also be stored in a computer readable storage medium as products. For example, computer-readable storage media may include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips), optical disks (e.g., compact Disk (CD), digital Versatile Disk (DVD)), smart cards, and flash memory devices (e.g., electrically erasable programmable read-only memory (EPROM), cards, sticks, key drives). Moreover, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media (and/or storage media) capable of storing, containing, and/or carrying code and/or instructions and/or data.

It should be understood that the embodiments described above are illustrative only. The embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, and/or other electronic units designed to perform the functions described herein, or a combination thereof.

While the basic concepts have been described above, it will be apparent to those skilled in the art that the foregoing application disclosure is by way of example only and is not intended to be limiting. Although not explicitly described herein, various modifications, improvements, and adaptations of the present application may occur to one skilled in the art. Such modifications, improvements, and modifications are intended to be suggested within this application, and such modifications, improvements, and modifications are intended to be within the spirit and scope of the example embodiments of the present application.

Meanwhile, the present application uses specific words to describe embodiments of the present application. Reference to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the present application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the present application may be combined as suitable.

Some aspects of the present application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. The processor may be one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital signal processing devices (DAPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, or a combination thereof. Furthermore, aspects of the present application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media. For example, computer-readable media can include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, tape … …), optical disk (e.g., compact disk CD, digital versatile disk DVD … …), smart card, and flash memory devices (e.g., card, stick, key drive … …).

Likewise, it should be noted that in order to simplify the presentation disclosed herein and thereby aid in understanding one or more application embodiments, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure. This method of disclosure, however, is not intended to imply that more features than are presented in the claims are required for the subject application. Indeed, less than all of the features of a single embodiment disclosed above.

While the present application has been described with reference to the present specific embodiments, those of ordinary skill in the art will recognize that the above embodiments are for illustrative purposes only, and that various equivalent changes or substitutions can be made without departing from the spirit of the present application, and therefore, all changes and modifications to the embodiments described above are intended to be within the scope of the claims of the present application.

Claims

1. An audio signal processing method of wind noise suppression, comprising:

converting an original time domain signal of the input audio into an initial frequency domain signal;

predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

determining a final predicted wind noise attenuation gain of a current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of a neighboring frequency point signal of the current frame of the initial frequency domain signal;

wind noise suppression is carried out on the initial frequency domain signal based on the final prediction wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal, and a frequency domain signal after wind noise suppression is obtained; and

And converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

2. The method of claim 1, wherein the converting the original time domain signal of the input audio to the original frequency domain signal is performed using a subband filtering module based on a weighted overlap-add analysis filter; and the step of converting the frequency domain signal after wind noise suppression into the time domain signal after wind noise suppression is performed by using a comprehensive processing module based on a weighted superposition analysis filter.

3. The method of claim 1, wherein the pre-trained artificial neural network-based wind noise attenuation prediction model is a pre-trained gated loop cell neural network-based wind noise attenuation prediction model.

4. The method of claim 1, wherein the determining the final predicted wind noise attenuation gain for the current frequency bin signal of the current frame of the initial frequency domain signal from the predicted wind noise attenuation gain for the current frequency bin signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain for the current frame of adjacent frequency bin signals of the initial frequency domain signal comprises:

calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal; and

And determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

5. The method of claim 4, wherein said calculating a harmonic enhancement gain of a current frequency bin signal of a current frame of said initial frequency domain signal from said predicted wind noise attenuation gain of a current frequency bin signal of a current frame of said initial frequency domain signal and said predicted wind noise attenuation gain of a current frame adjacent frequency bin signal of said initial frequency domain signal is performed by:

GainHarm(t,k)＝IRM ^′ (t,k)/max(IRM ^′ (t,k) _{k-2,k-1,k,k+1,k+2} )

wherein t is the frame number, k is the frequency point, gainHarm (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, max (IRM) ^′ (t,k) _{k-2,k-1,k,k+1,k+2} ) And obtaining the maximum value of the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal.

6. The method of claim 4 or 5, wherein said determining said final predicted wind noise attenuation gain from said harmonic enhancement gain of a current frequency point signal of a current frame of said initial frequency domain signal and said predicted wind noise attenuation gain of a current frequency point signal of a current frame of said initial frequency domain signal is performed by:

GainF(t,k)＝IRM′(t,k)*GainHarm(t,k)

Wherein t is the frame number, k is the frequency point, gainF (t, k) is the wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, gainram (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal.

7. The method as recited in claim 1, further comprising:

calculating noise suppression attenuation gain according to the initial frequency domain signal; and

and taking the minimum value of the noise suppression attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal as a new final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

8. An audio signal processing method of wind noise suppression, comprising:

Detecting whether wind noise protection is needed to be carried out on the current frequency point signal of the current frame of the initial frequency domain signal according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal;

if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, obtaining a frequency domain signal after wind noise suppression, and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and

if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection, wind noise suppression is not carried out on the current frequency point signal of the current frame of the initial frequency domain signal, and the initial frequency domain signal is converted into a corresponding time domain signal.

9. The method of claim 8, wherein the detecting whether wind noise protection is required for the current frequency bin signal of the current frame of the initial frequency domain signal based on the predicted wind noise attenuation gain of the current frequency bin signal of the current frame of the initial frequency domain signal and the current frequency bin signal of the current frame of the initial frequency domain signal comprises:

performing spectrum centroid estimation on the current frequency point signal of the current frame of the initial frequency domain signal to obtain a first centroid;

performing spectrum centroid estimation according to the product of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and obtaining a second centroid;

calculating a difference between the first centroid and the second centroid, and judging whether the difference is smaller than a first threshold; and

if the difference value is smaller than the first threshold, determining that wind noise protection is not needed for the current frequency point signal of the current frame of the initial frequency domain signal.

10. The method of claim 9, wherein said detecting whether wind noise protection is required for a current frequency bin signal of a current frame of the initial frequency domain signal based on the predicted wind noise attenuation gains of the current frequency bin signal of the current frame of the initial frequency domain signal and the current frequency bin signal of the current frame of the initial frequency domain signal further comprises:

If the difference value is not smaller than the first threshold, judging whether a predicted wind noise attenuation gain amount is smaller than a second threshold, wherein the predicted wind noise attenuation gain amount is an average value of the predicted wind noise attenuation gains of all frequency point signals of a current frame of the initial frequency domain signal;

if the predicted wind noise attenuation gain is smaller than the second threshold, determining that wind noise protection is required for the current frequency point signal of the current frame of the initial frequency domain signal; and

if the predicted wind noise attenuation gain is not smaller than the second threshold, determining that wind noise protection is not needed for the current frequency point signal of the current frame of the initial frequency domain signal.

11. The method of claim 8, wherein the converting the original time domain signal of the input audio to the original frequency domain signal is performed using a subband filtering module based on a weighted overlap-add analysis filter; and the step of converting the frequency domain signal after wind noise suppression into the time domain signal after wind noise suppression is performed by using a comprehensive processing module based on a weighted superposition analysis filter.

12. The method of claim 8, wherein the pre-trained artificial neural network-based wind noise attenuation prediction model is a pre-trained gated loop cell neural network-based wind noise attenuation prediction model.

13. The method of claim 8, wherein the determining the final predicted wind noise attenuation gain for the current frequency bin signal of the current frame of the initial frequency domain signal from the predicted wind noise attenuation gain for the current frequency bin signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain for the current frame of adjacent frequency bin signals of the initial frequency domain signal comprises:

14. The method of claim 13, wherein the calculating the harmonic enhancement gain of the current frequency bin signal of the current frame of the initial frequency domain signal from the predicted wind noise attenuation gain of the current frequency bin signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame of adjacent frequency bin signals of the initial frequency domain signal is performed by:

GainHarm(t,k)＝IRM ^′ (t,k)/max(IRM ^′ (t,k) _{k-2,k-1,k,k+1,k+2} )

15. The method of claim 13 or 14, wherein said determining said final predicted wind noise attenuation gain from said harmonic enhancement gain of a current frequency point signal of a current frame of said initial frequency domain signal and said predicted wind noise attenuation gain of a current frequency point signal of a current frame of said initial frequency domain signal is performed by:

GainF(t,k)＝IRM’(t,k)*GainHarm(t,k)

wherein t is the frame number, k is the frequency point, gainF (t, k) is the wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, gainHarm (t, k) is the harmonic enhancement gain.

16. The method as recited in claim 8, further comprising:

17. An audio signal processing apparatus of wind noise suppression, comprising:

the first conversion module is used for converting an original time domain signal of input audio into an initial frequency domain signal;

the wind noise attenuation prediction module is used for predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

the attenuation gain fusion module is used for determining the final prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the prediction wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the prediction wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal;

The wind noise suppression module is used for performing wind noise suppression on the initial frequency domain signal based on the final prediction wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after wind noise suppression; and

and the second conversion module is used for converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

18. An audio signal processing apparatus of wind noise suppression, comprising:

the wind noise protection detection module is used for detecting whether wind noise protection is needed to be carried out on the current frequency point signal of the current frame of the initial frequency domain signal according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal;

the wind noise suppression module is used for determining a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal if wind noise protection is not needed, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, obtaining a frequency domain signal after wind noise suppression, and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and

And the wind noise protection module is used for converting the initial frequency domain signal into a corresponding time domain signal without wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection.

19. An audio signal processing system for wind noise suppression, comprising:

a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement the method of any one of claims 1-16.

20. A computer readable medium storing computer program code which, when executed by a processor, implements the method of any of claims 1-16.

21. A training method of an artificial neural network-based wind noise attenuation prediction model, adapted to train the artificial neural network-based wind noise attenuation prediction model according to claim 1, comprising:

acquiring voice data and wind noise data;

mixing the voice data and the wind noise data to generate a mixed sound signal;

calculating a learning target according to the voice data and the wind noise data;

and taking the mixed sound signal as input of an artificial neural network-based wind noise attenuation prediction model, taking the learning target as expected output of the artificial neural network-based wind noise attenuation prediction model, training the artificial neural network-based wind noise attenuation prediction model, and obtaining a trained artificial neural network-based wind noise attenuation prediction model.

22. The method of claim 21, wherein the computing a learning objective from the speech data and the wind noise data is computed by:

wherein IRM is the learning target, t is the frame number, k is the frequency point, S is the voice data, N is the wind noise data, and beta is an exponential factor.

23. The method of claim 21, wherein the cost function employed by the artificial neural network-based wind noise attenuation prediction model is a minimum mean square error.