CN112309417A

CN112309417A - Wind noise suppression audio signal processing method, device, system and readable medium

Info

Publication number: CN112309417A
Application number: CN202011141705.7A
Authority: CN
Inventors: 许云峰
Original assignee: Lusheng Technology Co ltd
Current assignee: Lusheng Technology Co ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2021-02-02
Anticipated expiration: 2040-10-22
Also published as: CN112309417B

Abstract

The application provides an audio signal processing method, device and system for wind noise suppression, a computer readable medium and a training method of a wind noise attenuation prediction model based on an artificial neural network. The wind noise suppression audio signal processing method comprises the following steps: converting an original time domain signal of an input audio into an original frequency domain signal; predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a wind noise attenuation prediction model based on an artificial neural network; determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame; performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain; and converting the frequency domain signal after the wind noise suppression into a time domain signal after the wind noise suppression. According to the method, the wind noise is eliminated by adopting the wind noise attenuation prediction model based on the artificial neural network, and the wind noise suppression effect can be greatly improved.

Description

Wind noise suppression audio signal processing method, device, system and readable medium

Technical Field

The present application relates generally to the field of signal processing, and more particularly, to a method, an apparatus, a system, and a computer readable medium for processing an audio signal with wind noise suppression, and a training method for a wind noise attenuation prediction model based on an artificial neural network.

Background

Wind noise is a very complex and difficult to predict noise that is created when the wind hits the surface of the microphone and these turbulence features created on the surface near the microphone are picked up by the microphone. Many outdoor sound collection devices are affected by wind noise, for example: hearing aids, talking headsets, cell phones, handheld video equipment, etc. Especially when outdoor wind is very strong, wind noise can have a very large influence on the devices, and the noise interferes with the devices, so that the sound quality is very poor.

The spectral characteristics and magnitude of wind noise depend on the magnitude of the wind speed as well as the microphone location and sound gun structure. Generally, the energy of the wind noise spectrum is mainly concentrated in the low frequency part, and the energy is shifted to the high frequency part as the wind speed becomes larger. The formation of wind noise is complex and difficult to predict, so how to process the wind noise is a difficult problem in the field.

Generally, the conventional methods of wind noise processing mainly include single-microphone-based wind noise processing and multi-microphone-based wind noise processing. The basic framework of the single-microphone wind noise processing is generally to perform wind noise detection, estimate the Power Spectral Density (PSD) of the current wind noise if the wind noise is detected, calculate the attenuation gain of the wind noise suppression according to the power spectral density, and attenuate the voice with noise to obtain a clean voice signal. There are many common single-channel wind noise detection methods, such as: based on spectral centroid methods, Negative Slope Fitting (NSF), etc. Single-channel power spectral density estimation methods are also various, for example: based on a template matching method, a negative slope fitting method and the like.

Compared with the single-microphone scheme, the traditional two-microphone noise reduction method can effectively utilize the correlation of 2 microphones for processing, so that the performance is better than that of the single-microphone scheme. The dual-microphone noise reduction scheme utilizes the shape of weak correlation of wind noise collected by two microphones to detect the wind noise and estimate the power spectral density, thereby further suppressing the wind noise. In the presence of wind noise, the coherence coefficients of the two microphones will be very small, and will be large when speech is present.

The wind noise suppression performance of the dual microphone is relatively better than that of the single microphone, but the single microphone is cheaper in cost and more convenient in use scene. In any case, these conventional algorithms still have the following problems: the wind noise suppression has large residue due to the difficulty in accurately estimating the wind noise power spectrum; wind noise is very unstable in the time domain, making it difficult to track, which may result in large distortion or even stutter of the near-end speech. Therefore, how to better perform wind noise suppression is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The technical problem to be solved by the application is to provide an audio signal processing method, device, system and computer readable medium for wind noise suppression, and a training method of a wind noise attenuation prediction model based on an artificial neural network, which can better perform wind noise suppression.

In order to solve the above technical problem, the present application provides a method for processing an audio signal with wind noise suppression, including: converting an original time domain signal of an input audio into an original frequency domain signal; predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal; performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after the wind noise suppression; and converting the frequency domain signal after the wind noise suppression into a time domain signal after the wind noise suppression.

In an embodiment of the present application, the converting the original time domain signal of the input audio into the initial frequency domain signal is performed by using a subband filtering module based on a weighted overlap-add analysis filter; and the conversion of the wind noise suppressed frequency domain signal into a wind noise suppressed time domain signal is performed by using a comprehensive processing module based on a weighted overlap-add analysis filter.

In an embodiment of the application, the pre-trained wind noise attenuation prediction model based on the artificial neural network is a pre-trained wind noise attenuation prediction model based on a gated cyclic unit neural network.

In an embodiment of the present application, the determining a final predicted wind noise attenuation gain of a current frequency point signal of a current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of an adjacent frequency point signal of the current frame of the initial frequency domain signal includes: calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal; and determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frame frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal.

In an embodiment of the present application, the calculating of the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal is performed in the following manner:

GainHarm(t，k)＝IRM′(t，k)/max(IRM′(t，k)_{k-2，k-1，k，k+1，k+2})

wherein t is a frame number, k is a frequency point, gainham (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM '(t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, max (IRM' (t, k)_{k-2，k-1，k，k+1，k+2}) The maximum value of the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frame frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal is performed in the following manner:

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

wherein t is a frame number, k is a frequency point, GainF (t, k) is the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and gainherm (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the method further comprises: calculating noise suppression attenuation gain according to the initial frequency domain signal; and taking the minimum value of the noise suppression attenuation gain of the current frame frequency point signal of the initial frequency domain signal and the final predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal as a new final predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal.

In order to solve the above technical problem, the present application further provides a method for processing an audio signal with wind noise suppression, including: converting an original time domain signal of an input audio into an original frequency domain signal; predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal; if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal to obtain a frequency domain signal after wind noise suppression, and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and if the current frequency point signal of the current frame of the initial frequency domain signal needs to be subjected to wind noise protection, not performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal, and converting the initial frequency domain signal into a corresponding time domain signal.

In an embodiment of the present application, the detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal includes: carrying out spectrum centroid estimation on the current frequency point signal of the current frame of the initial frequency domain signal to obtain a first centroid; performing spectrum centroid estimation according to the product of the current frame current frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame current frequency point signal of the initial frequency domain signal and obtaining a second centroid; calculating a difference value between the first centroid and the second centroid, and judging whether the difference value is smaller than a first threshold; and if the difference value is smaller than the first threshold, determining that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection.

In an embodiment of the present application, the detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal further includes: if the difference is not smaller than the first threshold, judging whether the predicted wind noise attenuation gain amount is smaller than a second threshold, wherein the predicted wind noise attenuation gain amount is the average value of the predicted wind noise attenuation gains of all frequency point signals of the current frame of the initial frequency domain signal; if the predicted wind noise attenuation gain amount is smaller than the second threshold, determining that the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection; and if the predicted wind noise attenuation gain quantity is not smaller than the second threshold, determining that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection.

GainHarm(t，k)＝IRM′(t，k)/max(IRM′(t，k)_{k-2，k-1，k，k+1，k+2})

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

In order to solve the above technical problem, the present application further provides an audio signal processing apparatus with wind noise suppression, including: the first conversion module is used for converting an original time domain signal of an input audio frequency into an initial frequency domain signal; the wind noise attenuation prediction module is used for predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; an attenuation gain fusion module, configured to determine a final predicted wind noise attenuation gain of a current frequency point signal of a current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of an adjacent frequency point signal of the current frame of the initial frequency domain signal; the wind noise suppression module is used for performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after the wind noise suppression; and the second conversion module is used for converting the frequency domain signal subjected to wind noise suppression into a time domain signal subjected to wind noise suppression.

In order to solve the above technical problem, the present application further provides an audio signal processing apparatus with wind noise suppression, including: the first conversion module is used for converting an original time domain signal of an input audio frequency into an initial frequency domain signal; the wind noise attenuation prediction module is used for predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network; a wind noise protection detection module, configured to detect whether a current frequency point signal of the initial frequency domain signal needs wind noise protection according to a current frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the initial frequency domain signal; a wind noise suppression module, configured to determine a final predicted wind noise attenuation gain of a current frequency point signal of a current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of an adjacent frequency point signal of the current frame of the initial frequency domain signal if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal to obtain a frequency domain signal after wind noise suppression, and convert the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and the wind noise protection module is used for not performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal and converting the initial frequency domain signal into a corresponding time domain signal if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection.

In order to solve the above technical problem, the present application further provides an audio signal processing system with wind noise suppression, including: a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement any of the methods described above.

To solve the above technical problem, the present application further provides a computer readable medium storing computer program code, which when executed by a processor implements any one of the methods described above.

In order to solve the technical problem, the present application further provides a training method of a wind noise attenuation prediction model based on an artificial neural network, including: acquiring voice data and wind noise data; mixing the voice data and the wind noise data to generate a mixed sound signal; calculating a learning target according to the voice data and the wind noise data; and taking the sound mixing signal as the input of a wind noise attenuation prediction model based on an artificial neural network, taking the learning target as the expected output of the wind noise attenuation prediction model based on the artificial neural network, training the wind noise attenuation prediction model based on the artificial neural network, and obtaining the trained wind noise attenuation prediction model based on the artificial neural network.

In an embodiment of the present application, the calculating a learning objective according to the voice data and the wind noise data is performed in the following manner:

wherein IRM is the learning objective; t is the number of frames; k is a frequency point; s is the voice data; n is the wind noise data; beta is an exponential factor.

In an embodiment of the application, the cost function adopted by the wind noise attenuation prediction model based on the artificial neural network is a minimum mean square error.

Compared with the prior art, the wind noise suppression audio signal processing method, the wind noise suppression audio signal processing device, the wind noise suppression audio signal processing system and the computer readable medium predict the predicted wind noise attenuation gain of the initial frequency domain signal by adopting the wind noise attenuation prediction model based on the artificial neural network, and then eliminate wind noise, so that the wind noise detail suppression amount is better, the wind noise residue is reduced, and the wind noise suppression effect is greatly improved.

According to the other wind noise suppression audio signal processing method, device and system and the computer readable medium, whether the signals need wind noise protection is detected, and wind noise is eliminated by predicting the predicted wind noise attenuation gain of the signals by adopting a wind noise attenuation prediction model based on an artificial neural network under the condition that the wind noise protection is not needed, so that the wind noise detail suppression amount can be better, the wind noise residue is reduced, and the wind noise suppression effect is greatly improved; the method does not inhibit the wind noise of the signal under the condition that the wind noise protection is needed, and can greatly reduce the condition that the useful signal is greatly damaged due to the mismatching phenomenon of the wind noise attenuation prediction model under the condition that no wind noise exists.

The training method of the wind noise attenuation prediction model based on the artificial neural network can train the wind noise attenuation prediction model based on the artificial neural network, and when the wind noise attenuation prediction model based on the artificial neural network obtained through training is used for processing audio signals for wind noise suppression, the wind noise attenuation prediction model is more excellent in wind noise detail suppression amount, wind noise residues are less, and the wind noise suppression effect is better.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the principle of the application. In the drawings:

fig. 1 is a schematic flow chart of a wind noise suppressed audio signal processing method according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of a wind noise suppressed audio signal processing method according to another embodiment of the present application.

Fig. 3 is a schematic block diagram of a wind noise suppressed audio signal processing apparatus according to an embodiment of the present application.

Fig. 4 is a schematic block diagram of a wind noise suppressed audio signal processing apparatus according to another embodiment of the present application.

Fig. 5 is a schematic flow chart of a training method of an artificial neural network-based wind noise attenuation prediction model according to an embodiment of the present application.

Fig. 6 is a schematic block diagram of a wind noise suppressed audio signal processing system or a training system of a wind noise attenuation prediction model based on an artificial neural network according to an embodiment of the present application.

Fig. 7 is a schematic flow chart of a method implemented in step 203 of fig. 2 according to an embodiment of the present application.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present application unless specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description. Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, various steps may be processed in reverse order or simultaneously. Meanwhile, other operations are added to or removed from these processes.

The application provides an audio signal processing method for wind noise suppression. Fig. 1 is a flowchart of an audio signal processing method of wind noise suppression shown according to the present embodiment. Referring to fig. 1, the audio signal processing method for wind noise suppression of the present embodiment includes:

step 101, converting an original time domain signal of an input audio into an initial frequency domain signal;

step 102, predicting a predicted wind noise attenuation gain according to an initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

103, determining a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal;

104, performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after the wind noise suppression; and

and 105, converting the frequency domain signal subjected to wind noise suppression into a time domain signal subjected to wind noise suppression.

The following describes each step 101-105 of the audio signal processing method for wind noise suppression according to the embodiment in detail:

in step 101, the wind noise suppressed audio signal processing system converts an original time domain signal of the input audio into an original frequency domain signal. In an embodiment of the present application, the converting of the original time domain signal of the input audio into the initial frequency domain signal may be performed using a Weighted Overlap-add (WOLA) based subband filtering module. The WOLA-based subband filtering module can perform subband filtering on an original time domain signal of input audio to obtain a multi-channel frequency domain signal of a complex domain.

In step 102, the system predicts a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on the artificial neural network. In an embodiment of the application, the pre-trained wind noise attenuation prediction model based on the artificial neural network may be a pre-trained wind noise attenuation prediction model based on a Gated Round Unit (GRU) neural network. In an embodiment of the application, the system may modulo a log of the initial frequency domain signal and generate a log spectrum, and use the log spectrum of the initial frequency domain signal as an input of a pre-trained wind noise attenuation prediction model based on the artificial neural network. The GRU is one of Recurrent Neural Networks (RNN), and has an advantage of a relatively small model and computation.

In step 103, the system determines the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, step 103 may include the following steps 1031-1032:

in step 1031, the system calculates the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal. In an embodiment of the present application, step 1031 may be performed in the following manner:

GainHarm(t，k)＝IRM′(t，k)/max(IRM′(t，k)_{k-2，k-1，k，k+1，k+2})

wherein t is the frame number, k is the frequency point, gainham (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM '(t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, max (IRM' (t, k)_{k-2，k-1，k，k+1，k+2}) The maximum value of the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal is obtained. In this embodiment, the adjacent frequency point signals of the current frame of the initial frequency domain signal include four adjacent frequency points with frequency points of k-2, k-1, k +1 and k +2, and a user can determine the number of the adjacent frequency points according to actual needs, which is not limited in this application. The above-described method of calculating the harmonic enhancement gain may be referred to as a local normalization method. The harmonic enhancement gain is used for further suppressing noise between voice harmonics and protecting the voice harmonics of the current frame signal, so that the voice signal can be enhanced.

Step 1032, the system determines the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frame frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal.

In an embodiment of the present application, step 1032 may be performed in the following manner:

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

wherein t is the frame number, k is the frequency point, GainF (t, k) is the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and gainham (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal. The wind noise attenuation gain can be used for the initial frequency domain signal to suppress wind noise and enhance voice.

In step 104, the system performs wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtains a frequency domain signal after wind noise suppression. The wind noise suppression means that each frequency point signal of each frame of the initial frequency domain signal is multiplied by the corresponding final predicted wind noise attenuation gain, and the obtained product is the frequency domain signal after the wind noise suppression.

In step 105, the system converts the wind noise suppressed frequency domain signal into a wind noise suppressed time domain signal. In an embodiment of the present application, step 105 may be performed using an analysis-by-synthesis module based on a weighted overlap-add analysis filter.

In an embodiment of the present application, the method for processing an audio signal with wind noise suppression may further include the following steps 106-:

and 106, calculating the noise suppression attenuation gain by the system according to the initial frequency domain signal. The system may use a general noise suppressor to calculate the noise suppression attenuation gain, which is not limited in this application.

Step 107, the system takes the minimum value of the noise suppression attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal as the new final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal. In one example, the final predicted wind noise reduction gain used in step 107 may be the final predicted wind noise reduction gain obtained from the foregoing steps 1031 and 1032, that is, step 107 may be performed in the following manner:

GainF(t，k)＝min(IRM’(t，k)*GainHarm(t，k)，GainN(t，k))

wherein t is the frame number, k is the frequency point, GainF (t, k) is the new final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and gainherm (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal.

By taking the minimum value of the noise suppression attenuation gain and the final predicted wind noise attenuation gain in the step 106-107 as the new final predicted wind noise attenuation gain, the wind noise suppression effect when wind noise suppression is performed by using the final predicted wind noise attenuation gain can be further improved.

To sum up, in the step 101-105, the wind noise suppression audio signal processing method of the embodiment predicts the predicted wind noise attenuation gain of the initial frequency domain signal by using the wind noise attenuation prediction model based on the artificial neural network, and then performs wind noise elimination, so that a better wind noise detail suppression amount can be realized, wind noise residue is reduced, and the wind noise suppression effect is greatly improved.

The application also provides another wind noise suppression audio signal processing method. Fig. 2 is a flowchart of an audio signal processing method of wind noise suppression shown according to the present embodiment. Referring to fig. 2, the audio signal processing method for wind noise suppression of the present embodiment includes:

step 201, an audio signal processing system for wind noise suppression converts an original time domain signal of an input audio into an initial frequency domain signal;

step 202, the system predicts a predicted wind noise attenuation gain according to an initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

step 203, the system detects whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, the step 204 is entered, and if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection, the step 205 is entered;

step 204, the system determines a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, performs wind noise suppression on the current frame of the current frequency point signal of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frame of the initial frequency domain signal to obtain a frequency domain signal after the wind noise suppression, and converts the frequency domain signal after the wind noise suppression into a time domain signal after the wind noise suppression; and

in step 205, the system does not perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal, and converts the initial frequency domain signal into a corresponding time domain signal.

The following describes each

step

201 and 205 of the audio signal processing method for wind noise suppression according to the present embodiment in detail:

steps

201 and 202 may refer to

steps

101 and 102 in the previous embodiment, and a description thereof is not repeated.

In step 203, the system detects whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection or not by the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, can detect whether the current signal is generated with wind noise, and greatly reduces the situation that the useful signal is greatly damaged due to strong suppression of wind noise caused by prediction error of a wind noise attenuation prediction model under the condition of no wind noise.

In an embodiment of the present application, as shown in fig. 7, the step 203 of detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal may include the following steps 2031 and 2035:

step 2031, the system performs spectrum centroid estimation on the current frequency point signal of the current frame of the initial frequency domain signal and obtains a first centroid;

step 2032, the system estimates the spectrum centroid according to the product of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and obtains a second centroid;

step 2033, the system calculates the difference between the first centroid and the second centroid;

step 2034, the system judges whether the difference is smaller than the first threshold; and

step 2035, if the difference is smaller than the first threshold, the system determines that the current frequency point signal of the current frame of the initial frequency domain signal does not need to be protected by wind noise, wherein the first threshold may be preset by the user according to the actual needs, which is not limited in the present application.

In one example, the spectral centroid estimation of

steps

2031 and 2032 may be performed in the following manner:

wherein C is the centroid; m is the number of frequency points; k is a frequency point index, and the sequence number of the frequency point represents a sampling point of the frequency; and Y (k) is a frequency spectrum sampling point of the k-th frequency point.

The centroid measurement can measure the centroid of the power spectral density of the current signal, and when the current signal contains wind noise, the centroid C of the signal is close to the low-frequency part, so that the centroid C is smaller; when no wind noise is contained, the centroid C is relatively close to the high-frequency part, and the centroid C is larger. Therefore, when the difference value is smaller than the first threshold, it indicates that the wind noise of the current frequency point signal of the current frame of the initial frequency domain signal is smaller or does not exist.

In an embodiment of the present application, as shown in fig. 7, the step 203 of detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal may further include the following step 2036-plus-noise 2038, and the step 2036-plus-noise 2038 may be executed after the step 2034:

step 2036, if the difference is not less than the first threshold, the system judges whether the predicted wind noise attenuation gain amount is less than the second threshold, wherein the predicted wind noise attenuation gain amount is the average value of the predicted wind noise attenuation gains of all frequency point signals of the current frame of the initial frequency domain signal;

step 2037, if the predicted wind noise attenuation gain amount is smaller than the second threshold, the system determines that the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection; and

step 2038, if the predicted wind noise attenuation gain amount is not less than the second threshold, the system determines that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection.

When the difference value is smaller than the first threshold, it is indicated that the current frequency point signal of the current frame of the initial frequency domain signal has wind noise or the wind noise is large. In step 2036, the system estimates the amount of attenuation of the initial frequency domain signal by the predicted wind noise attenuation gain by calculating the amount of predicted wind noise attenuation gain, which indicates the more the signal is suppressed. The second threshold may be preset by a user according to actual needs, and this application is not limited to this.

In step 2037, when the predicted wind noise attenuation gain amount is smaller than the second threshold, the system may indicate that wind noise protection needs to be performed on the current frequency point signal of the current frame of the initial frequency domain signal by outputting flag equal to 1.

Step 204 is performed after the system detects in step 203 that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection. Because the current frequency point signal of the current frame of the initial frequency domain signal does not need to be wind noise protected, the system needs to determine the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal to perform wind noise suppression. The actions performed by the system in step 204 can be divided into the following steps 2041-2043:

2041, determining a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal by the system according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal;

2042, performing wind noise suppression on the current frame current frequency point signal of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frame current frequency point signal of the initial frequency domain signal and obtaining a frequency domain signal after wind noise suppression; and

step 2043, the frequency domain signal after wind noise suppression is converted into a time domain signal after wind noise suppression.

The aforementioned steps 2041-2043 can be referred to correspondingly in the aforementioned steps 103-105 of the embodiment, and will not be described again.

Step 205 is executed after the system detects in step 203 that the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected. Because the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected, the system does not perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal.

In an embodiment of the present application, the method for processing an audio signal with wind noise suppression may further include step 206-:

step 206, calculating noise suppression attenuation gain according to the initial frequency domain signal; and

step 207, taking the minimum value of the noise suppression attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal as the new final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

The steps 206-207 can refer to the steps 106-107 in the aforementioned embodiment, and will not be described again.

To sum up, in step 201-; the method does not inhibit the wind noise of the signal under the condition that the wind noise protection is needed, and can greatly reduce the condition that the useful signal is greatly damaged due to the mismatching phenomenon of the wind noise attenuation prediction model under the condition that no wind noise exists.

The application also provides an audio signal processing device for suppressing wind noise. Fig. 3 is a block diagram of a wind noise suppressed audio signal processing apparatus according to the present embodiment. As shown in fig. 3, the wind noise suppressed audio signal processing apparatus 300 includes a first conversion module 301, a wind noise attenuation prediction module 302, an attenuation gain fusion module 303, a wind noise suppression module 304, and a second conversion module 305.

The first conversion module 301 is configured to convert an original time domain signal of the input audio into an original frequency domain signal.

The wind noise attenuation prediction module 302 is configured to predict a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network.

The attenuation gain fusion module 303 is configured to determine a final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal. In an embodiment of the present application, the attenuation gain fusion module 303 may include a harmonic enhancement sub-module 3031 and a gain fusion module 3032. Determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal can be performed by the harmonic enhancer module 3031 and the gain fusion module 3032, the harmonic enhancer module 3031 is configured to calculate a harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, and the gain fusion module 3032 is configured to determine a final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal. The steps executed by the modules 3031-3032 can refer to the descriptions of the steps 1031-1032 in the foregoing embodiments, and the description is not repeated here.

The wind noise suppression module 304 is configured to perform wind noise suppression on the initial frequency domain signal based on a final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal, and obtain a frequency domain signal after the wind noise suppression.

The second conversion module 305 is configured to convert the wind noise suppressed frequency domain signal into a wind noise suppressed time domain signal.

The steps executed by the modules 301-305 can be referred to the description of the steps 101-105 in the foregoing embodiments accordingly, and will not be described again here.

The application also provides another wind noise suppression audio signal processing device. Fig. 4 is a block diagram of a wind noise suppressed audio signal processing apparatus according to the present embodiment. As shown in fig. 4, the wind noise suppressed audio signal processing apparatus 400 includes a first conversion module 401, a wind noise attenuation prediction module 402, a wind noise protection detection module 403, a wind noise suppression module 404, and a wind noise protection module 405.

The first conversion module 401 is configured to convert an original time domain signal of the input audio into an original frequency domain signal.

The wind noise attenuation prediction module 402 is configured to predict a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network.

The wind noise protection detection module 403 is configured to detect whether the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal.

The wind noise suppression module 404 is configured to determine a final predicted wind noise attenuation gain of the current frequency point signal of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frequency point signal of the initial frequency domain signal, perform wind noise suppression on the current frequency point signal of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the initial frequency domain signal to obtain a frequency domain signal after wind noise suppression, and convert the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression.

The wind noise protection module 405 is configured to perform wind noise protection on a current frequency point signal of a current frame of the initial frequency domain signal, and then not perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal, and convert the initial frequency domain signal into a corresponding time domain signal.

The steps executed by the

modules

401 and 405 can be referred to the description of the

steps

201 and 205 in the foregoing embodiments accordingly, and will not be described again here.

The application also provides a training method of the wind noise attenuation prediction model based on the artificial neural network. Fig. 5 is a flowchart illustrating a training method of the wind noise attenuation prediction model based on the artificial neural network according to the present embodiment. Referring to fig. 5, the training method of the wind noise attenuation prediction model based on the artificial neural network of the present embodiment includes steps 501-505, which may be executed by a training system of the wind noise attenuation prediction model based on the artificial neural network:

step 501, acquiring voice data and wind noise data;

step 502, mixing voice data and wind noise data to generate a mixing signal;

step 503, calculating a learning target according to the voice data and the wind noise data; and

and step 504, taking the mixed sound signal as the input of the wind noise attenuation prediction model based on the artificial neural network, taking the learning target as the expected output of the wind noise attenuation prediction model based on the artificial neural network, training the wind noise attenuation prediction model based on the artificial neural network, and obtaining the trained wind noise attenuation prediction model based on the artificial neural network.

In an embodiment of the present application, step 503 may be calculated as follows:

the IRM is a learning target, t is a frame number, k is a frequency point, S is voice data, N is wind noise data, β is an exponential factor for controlling attenuation gain, β may be an empirical value set by a user, and may take a value of 0.5 or 1 in one example.

In an embodiment of the present application, the cost function adopted by the wind noise attenuation prediction model based on the artificial neural network may be a minimum mean square error.

Through the above steps 501-504, the method for training the wind noise attenuation prediction model based on the artificial neural network of the embodiment can train the wind noise attenuation prediction model based on the artificial neural network, and when the wind noise attenuation prediction model based on the artificial neural network obtained through training is used in the audio signal processing of wind noise suppression, the wind noise detail suppression amount is more excellent, the wind noise residue is less, and the wind noise suppression effect is better.

The present application further provides an audio signal processing system with wind noise suppression, comprising: a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement any of the wind noise suppressed audio signal processing methods or the artificial neural network based wind noise attenuation prediction model training methods described above.

Fig. 6 is a system block diagram of the wind noise suppressed audio signal processing system or the training system of the wind noise attenuation prediction model based on the artificial neural network shown in the present embodiment. System 600 may include internal communication bus 601, Processor (Processor)602, Read Only Memory (ROM)603, Random Access Memory (RAM)604, and communication ports 605. When implemented on a personal computer, the system 600 may also include a hard disk 607. The internal communication bus 601 may enable data communication among the components of the system 600. Processor 602 may make the determination and issue a prompt. In some embodiments, the processor 602 may be comprised of one or more processors. The communication port 605 may enable data communication of the system 600 with the outside. In some embodiments, system 600 may send and receive information and data from a network through communication port 605. The system 600 may also include various forms of program storage units and data storage units such as a hard disk 607, Read Only Memory (ROM)603 and Random Access Memory (RAM)604, capable of storing various data files for computer processing and/or communications, and possibly program instructions for execution by the processor 602. The processor executes these instructions to implement the main parts of the method. The results processed by the processor are communicated to the user device through the communication port and displayed on the user interface.

The above-mentioned wind noise suppression audio signal processing method or the training method of the wind noise attenuation prediction model based on the artificial neural network may be implemented as a computer program, stored in the hard disk 607, and may be recorded in the processor 602 for execution, so as to implement any one of the wind noise suppression audio signal processing methods or the training method of the wind noise attenuation prediction model based on the artificial neural network in the present application.

The present application further provides a computer readable medium having stored thereon a computer program code which, when executed by a processor, implements any of the wind noise suppressed audio signal processing methods or the artificial neural network based training method of the wind noise attenuation prediction model as described above.

The wind noise suppression audio signal processing method or the artificial neural network-based wind noise attenuation prediction model training method may be implemented as a computer program, and may be stored in a computer-readable storage medium as an article of manufacture. For example, computer-readable storage media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD)), smart cards, and flash memory devices (e.g., electrically Erasable Programmable Read Only Memory (EPROM), card, stick, key drive). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media (and/or storage media) capable of storing, containing, and/or carrying code and/or instructions and/or data.

It should be understood that the above-described embodiments are illustrative only. The embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, and/or other electronic units designed to perform the functions described herein, or a combination thereof.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing disclosure is by way of example only, and is not intended to limit the present application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. The processor may be one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), digital signal processing devices (DAPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, microprocessors, or a combination thereof. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media. For example, computer-readable media may include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips … …), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD) … …), smart cards, and flash memory devices (e.g., card, stick, key drive … …).

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Although the present application has been described with reference to the present specific embodiments, it will be recognized by those skilled in the art that the foregoing embodiments are merely illustrative of the present application and that various changes and substitutions of equivalents may be made without departing from the spirit of the application, and therefore, it is intended that all changes and modifications to the above-described embodiments that come within the spirit of the application fall within the scope of the claims of the application.

Claims

1. A method of wind noise suppressed audio signal processing, comprising:

converting an original time domain signal of an input audio into an original frequency domain signal;

predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal;

performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after the wind noise suppression; and

and converting the frequency domain signal after the wind noise suppression into a time domain signal after the wind noise suppression.

2. The method of claim 1, wherein the converting of the original time-domain signal of the input audio into the initial frequency-domain signal is performed using a subband filtering module based on a weighted overlap-add analysis filter; and the conversion of the wind noise suppressed frequency domain signal into a wind noise suppressed time domain signal is performed by using a comprehensive processing module based on a weighted overlap-add analysis filter.

3. The method of claim 1, wherein the pre-trained artificial neural network-based wind noise attenuation prediction model is a pre-trained gated cyclic unit neural network-based wind noise attenuation prediction model.

4. The method of claim 1, wherein said determining a final predicted wind noise attenuation gain for a current frequency bin signal of a current frame of said initial frequency domain signals according to said predicted wind noise attenuation gain for a current frequency bin signal of a current frame of said initial frequency domain signals and said predicted wind noise attenuation gain for an adjacent frequency bin signal of a current frame of said initial frequency domain signals comprises:

calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal; and

and determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frame frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal.

5. The method according to claim 4, wherein said calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal is performed in the following manner:

GainHarm(t，k)＝IRM′(t，k)/max(IRM′(t，k)_{k-2，k-1，k，k+1，k+2})

6. The method according to claim 4 or 5, wherein said determining said final predicted wind noise attenuation gain according to said harmonic enhancement gain of a current frame frequency bin signal of said initial frequency domain signal and said predicted wind noise attenuation gain of a current frame frequency bin signal of said initial frequency domain signal is performed by:

GainF(t，k)＝IRM′(t，k)*GainHarm(t，k)

wherein t is a frame number, k is a frequency point, GainF (t, k) is the wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and gainherm (t, k) is the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal.

7. The method of claim 1, further comprising:

calculating noise suppression attenuation gain according to the initial frequency domain signal; and

and taking the minimum value of the noise suppression attenuation gain of the current frame frequency point signal of the initial frequency domain signal and the final predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal as a new final predicted wind noise attenuation gain of the current frame frequency point signal of the initial frequency domain signal.

8. A method of wind noise suppressed audio signal processing, comprising:

detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal;

if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, determining the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal, performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal to obtain a frequency domain signal after wind noise suppression, and converting the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and

if the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected, wind noise suppression is not carried out on the current frequency point signal of the current frame of the initial frequency domain signal, and the initial frequency domain signal is converted into a corresponding time domain signal.

9. The method of claim 8, wherein the detecting whether the current frequency point signal of the current frame of the initial frequency domain signals needs wind noise protection according to the current frequency point signal of the current frame of the initial frequency domain signals and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signals comprises:

carrying out spectrum centroid estimation on the current frequency point signal of the current frame of the initial frequency domain signal to obtain a first centroid;

performing spectrum centroid estimation according to the product of the current frame current frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frame current frequency point signal of the initial frequency domain signal and obtaining a second centroid;

calculating a difference value between the first centroid and the second centroid, and judging whether the difference value is smaller than a first threshold; and

and if the difference value is smaller than the first threshold, determining that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection.

10. The method of claim 9, wherein said detecting whether the current frequency point signal of the current frame of the initial frequency domain signal needs to be wind noise protected according to the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal further comprises:

if the difference is not smaller than the first threshold, judging whether the predicted wind noise attenuation gain amount is smaller than a second threshold, wherein the predicted wind noise attenuation gain amount is the average value of the predicted wind noise attenuation gains of all frequency point signals of the current frame of the initial frequency domain signal;

if the predicted wind noise attenuation gain amount is smaller than the second threshold, determining that the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection; and

and if the predicted wind noise attenuation gain quantity is not smaller than the second threshold, determining that the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection.

11. The method of claim 8, wherein the converting of the original time-domain signal of the input audio into the initial frequency-domain signal is performed using a subband filtering module based on a weighted overlap-add analysis filter; and the conversion of the wind noise suppressed frequency domain signal into a wind noise suppressed time domain signal is performed by using a comprehensive processing module based on a weighted overlap-add analysis filter.

12. The method of claim 8, wherein the pre-trained artificial neural network-based wind noise attenuation prediction model is a pre-trained gated cyclic unit neural network-based wind noise attenuation prediction model.

13. The method of claim 8, wherein said determining a final predicted wind noise attenuation gain for a current frequency bin signal of a current frame of said initial frequency domain signals according to said predicted wind noise attenuation gain for a current frequency bin signal of a current frame of said initial frequency domain signals and said predicted wind noise attenuation gain for an adjacent frequency bin signal of a current frame of said initial frequency domain signals comprises:

14. The method according to claim 13, wherein said calculating the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the adjacent frequency point signal of the current frame of the initial frequency domain signal is performed in the following manner:

GainHarm(t，k)＝IRM′(t，k)/max(IRM′(t，k)_{k-2，k-1，k，k+1，k+2})

15. The method according to claim 13 or 14, wherein said determining the final predicted wind noise attenuation gain according to the harmonic enhancement gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal is performed by:

GainF(t，k)＝IRM’(t，k)*GainHarm(t，k)

wherein t is a frame number, k is a frequency point, GainF (t, k) is the wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, IRM' (t, k) is the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal, and gainherm (t, k) is the harmonic enhancement gain.

16. The method of claim 8, further comprising:

17. A wind noise suppressed audio signal processing apparatus comprising:

the first conversion module is used for converting an original time domain signal of an input audio frequency into an initial frequency domain signal;

the wind noise attenuation prediction module is used for predicting a predicted wind noise attenuation gain according to the initial frequency domain signal by using a pre-trained wind noise attenuation prediction model based on an artificial neural network;

an attenuation gain fusion module, configured to determine a final predicted wind noise attenuation gain of a current frequency point signal of a current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of an adjacent frequency point signal of the current frame of the initial frequency domain signal;

the wind noise suppression module is used for performing wind noise suppression on the initial frequency domain signal based on the final predicted wind noise attenuation gain of each frequency point signal of each frame of the initial frequency domain signal and obtaining a frequency domain signal after the wind noise suppression; and

and the second conversion module is used for converting the frequency domain signal subjected to the wind noise suppression into a time domain signal subjected to the wind noise suppression.

18. A wind noise suppressed audio signal processing apparatus comprising:

a wind noise protection detection module, configured to detect whether a current frequency point signal of the initial frequency domain signal needs wind noise protection according to a current frequency point signal of the initial frequency domain signal and the predicted wind noise attenuation gain of the current frequency point signal of the initial frequency domain signal;

a wind noise suppression module, configured to determine a final predicted wind noise attenuation gain of a current frequency point signal of a current frame of the initial frequency domain signal according to the predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal and the predicted wind noise attenuation gain of an adjacent frequency point signal of the current frame of the initial frequency domain signal if the current frequency point signal of the current frame of the initial frequency domain signal does not need wind noise protection, perform wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal based on the final predicted wind noise attenuation gain of the current frequency point signal of the current frame of the initial frequency domain signal to obtain a frequency domain signal after wind noise suppression, and convert the frequency domain signal after wind noise suppression into a time domain signal after wind noise suppression; and

and the wind noise protection module is used for not performing wind noise suppression on the current frequency point signal of the current frame of the initial frequency domain signal and converting the initial frequency domain signal into a corresponding time domain signal if the current frequency point signal of the current frame of the initial frequency domain signal needs wind noise protection.

19. A wind noise suppressed audio signal processing system comprising:

a memory for storing instructions executable by the processor; and a processor for executing the instructions to implement the method of any one of claims 1-16.

20. A computer-readable medium having stored thereon computer program code which, when executed by a processor, implements the method of any of claims 1-16.

21. A training method of a wind noise attenuation prediction model based on an artificial neural network comprises the following steps:

acquiring voice data and wind noise data;

mixing the voice data and the wind noise data to generate a mixed sound signal;

calculating a learning target according to the voice data and the wind noise data;

and taking the sound mixing signal as the input of a wind noise attenuation prediction model based on an artificial neural network, taking the learning target as the expected output of the wind noise attenuation prediction model based on the artificial neural network, training the wind noise attenuation prediction model based on the artificial neural network, and obtaining the trained wind noise attenuation prediction model based on the artificial neural network.

22. The method of claim 21, wherein the calculating a learning objective from the speech data and the wind noise data is calculated by:

wherein IRM is the learning target, t is the frame number, k is the frequency point, S is the voice data, N is the wind noise data, and β is the exponential factor.

23. The method of claim 21, wherein the cost function employed by the artificial neural network-based wind noise attenuation prediction model is a minimum mean square error.