CN113611320A

CN113611320A - Wind noise suppression method and device, audio equipment and system

Info

Publication number: CN113611320A
Application number: CN202110371968.5A
Authority: CN
Inventors: 程子胜; 肖全之; 黄荣均
Original assignee: Zhuhai Jieli Technology Co Ltd
Current assignee: Zhuhai Jieli Technology Co Ltd
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-11-05
Anticipated expiration: 2041-04-07
Also published as: CN113611320B

Abstract

The invention discloses a wind noise suppression method, a wind noise suppression device, audio equipment and a system, wherein the method comprises the following steps: step S100, acquiring an original audio signal in a time domain; step S200, the audio signal is duplicated into a first path of audio signal and a second path of audio signal; step S300, performing relevance calculation on the current sampling point and the signals of the previous and/or subsequent sampling points in the first path of audio signal; step S400, determining a peak harmonic point and a non-peak harmonic point according to the second path of audio signal; step S500, noise estimation is respectively carried out on the normalized audio signals on the basis of the peak harmonic points and the non-peak harmonic points; and S600, carrying out noise reduction suppression on the normalized audio signal according to the noise estimation spectrum to obtain a noise-reduced audio signal. On one hand, consumption of overlarge calculation amount is avoided, and on the other hand, compared with the method for searching unsteady wind noise, the method for searching the effective components of the voice is higher in accuracy, so that on the premise of consuming smaller calculation amount, the wind noise estimation accuracy is improved.

Description

Wind noise suppression method and device, audio equipment and system

Technical Field

The invention relates to the technical field of audio signal processing, in particular to a wind noise suppression method, a wind noise suppression device, audio equipment and a system.

Background

At present, the number of mobile end users is increasing day by day, and how to ensure that these mobile devices can have higher call quality when talking in various noise environments will be the focus of improving the competitiveness of audio products. Noise can be roughly classified into stationary and non-stationary. At present, most of noise reduction algorithms aim at suppression similar to white noise in a steady state, but wind noise belongs to strong unsteady noise, and wind noise also belongs to a point sound source (simple sound source), so that effective estimation of noise cannot be performed through means such as beam forming. Therefore, the traditional noise estimation modes such as tracking frequency spectrum minimum value and the like cannot cope with unsteadiness of wind noise; due to the passive characteristic of wind noise, a noise source cannot be accurately tracked when multiple microphones are used for beam forming.

With the development of neural network technology, a number of wind noise suppression algorithms based on a neural network are proposed, but due to various limitations of mobile devices, factors such as high computational complexity of the neural network are considered in practical application. Based on the traditional noise reduction algorithm, no algorithm capable of considering both stationary noise suppression and non-stationary noise suppression exists, and how to consider both the stationary noise suppression and the non-stationary noise suppression is a new direction for future noise reduction.

In the traditional wind noise suppression algorithm, one branch is from improvement of a steady-state noise suppression algorithm, for example, updating rate during noise estimation is adjusted, so that estimated noise is close to real-time transformation, but an obvious defect of the method is that the estimated noise cannot be guaranteed not to include sudden voice components, so that a large voice component loss is caused after a noise reduction result is obtained, and the other branch is used for training and reusing noise features in advance through deep learning or non-negative matrix decomposition and other similar methods.

Therefore, in the process of suppressing wind noise, on the premise of consuming a small amount of computation, improving the accuracy of wind noise estimation becomes an urgent technical problem to be solved.

Disclosure of Invention

Based on the above situation, a primary objective of the present invention is to provide a wind noise suppression method, device, audio device, and system, so as to improve the accuracy of wind noise estimation while consuming a small amount of computation in the process of suppressing wind noise.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

in a first aspect, an embodiment of the present invention discloses a wind noise suppression method, including:

step S100, acquiring an original audio signal on a time domain, wherein the original audio signal comprises at least a part of voice signal and at least a part of wind noise signal; step S200, the original audio signal is duplicated into a first path of audio signal and a second path of audio signal, wherein the first path of audio signal, the second path of audio signal and the original audio signal are the same; step S300, performing relevance calculation on signals of a current sampling point and a previous and/or subsequent sampling point in the first path of audio signal, converting the signals after the relevance calculation into frequency domain signals to obtain audio signals which are normalized on a frequency domain so as to highlight a voice harmonic component; step S400, determining a peak harmonic point and a non-peak harmonic point according to the second path of audio signal in a frequency domain; step S500, respectively carrying out noise estimation on the normalized audio signals based on the peak harmonic point and the non-peak harmonic point to obtain a noise estimation spectrum, wherein when the fundamental frequency point in the second path of audio signals is located at the non-peak harmonic point, the normalized audio signals corresponding to the fundamental frequency point are gained through a preset gain factor to obtain the noise estimation corresponding to the fundamental frequency point; when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency point near the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point; and S600, carrying out noise reduction suppression on the normalized audio signal according to the noise estimation spectrum to obtain a noise-reduced audio signal.

Optionally, step S300 includes: step S310, discrete processing is carried out on the first path of audio signal;

step S320, smoothing the current discrete signal by adopting the signal of the future sampling point so as to correlate the current signal with the signal of the future sampling point; and step S330, carrying out Fourier transform on the smoothed current discrete signal to obtain a current normalized audio signal.

Optionally, in step S320, the current discrete signal is smoothed by using the following formula:

x_a(n)＝x(n+wn²)

wherein n is the number of sampling points of the audio signal, x is the current discrete signal, x_aAnd (n) is a signal obtained after the current discrete point smoothing processing, and w is a warping coefficient.

Optionally, step S400 includes:

step S410, calculating the point of each harmonic of the second path of audio signal; step S420, searching a peak point in a preset point number range of each subharmonic point; in step S430, the peak point is set as the peak harmonic point.

Optionally, step S430 includes: adding frequency point position allowance to the peak harmonic point; and determining the frequency point with the increased frequency point position margin as a peak harmonic point. Optionally, in step S500, the preset gain factor is 1; and when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency points which are not more than 3 before and after the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point.

In a second aspect, an embodiment of the present invention discloses a wind noise suppression apparatus, including:

the system comprises an original signal acquisition module, a processing module and a processing module, wherein the original signal acquisition module is used for acquiring an original audio signal on a time domain, and the original audio signal comprises at least part of a voice signal and at least part of a wind noise signal;

the signal replication module is used for replicating the audio signals into a first path of audio signals and a second path of audio signals, wherein the first path of audio signals, the second path of audio signals and the original audio signals are the same;

the relevance operation module is used for performing relevance operation on the signals of the current sampling point and the previous and/or subsequent sampling points in the first path of audio signal, converting the signals after the relevance operation into frequency domain signals, obtaining audio signals which are normalized on a frequency domain, and highlighting the harmonic components of the voice;

the peak/non-peak harmonic point module is used for determining a peak harmonic point and a non-peak harmonic point according to the second path of audio signal in a frequency domain;

the noise estimation module is used for respectively carrying out noise estimation on the normalized audio signals based on the peak harmonic point and the non-peak harmonic point to obtain a noise estimation spectrum, wherein when the fundamental frequency point in the second path of audio signals is located at the non-peak harmonic point, the normalized audio signals corresponding to the fundamental frequency point are subjected to gain through a preset gain factor to obtain noise estimation corresponding to the fundamental frequency point; when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency point near the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point;

and the noise reduction suppression module is used for performing noise reduction suppression on the normalized audio signal according to the noise estimation spectrum to obtain a noise-reduced audio signal.

Optionally, the association degree operation module includes:

the discrete unit is used for carrying out discrete processing on the first path of audio signal;

the smoothing unit is used for smoothing the current discrete signal by adopting the signal of the future sampling point so as to correlate the current signal with the signal of the future sampling point;

and the Fourier transform unit is used for carrying out Fourier transform on the current discrete signal after the smoothing processing to obtain a current normalized audio signal.

Optionally, in the smoothing unit, the following formula is adopted to perform smoothing processing on the current discrete signal:

x_a(n)＝x(n+wn²)

Optionally, the peak/non-peak harmonic point module comprises:

the harmonic calculation unit is used for calculating the points of the second path of audio signals at which the harmonics are located;

the peak searching unit is used for searching a peak point in a preset point number range of each harmonic point;

and the peak harmonic point determining unit is used for taking the peak point as the peak harmonic point.

Optionally, the peak harmonic point determining unit is specifically configured to:

adding frequency point position allowance to the peak harmonic point;

and determining the frequency point with the increased frequency point position margin as a peak harmonic point.

Optionally, in the noise estimation module:

the preset gain factor is 1;

and when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency points which are not more than 3 before and after the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point.

In a third aspect, an embodiment of the present invention discloses an audio device, where the audio device has an audio data acquisition function, and includes:

a processor for implementing the method disclosed in the first aspect above.

In a fourth aspect, an embodiment of the present invention discloses an audio signal processing system, including: a first device and a second device;

the first equipment is used for acquiring audio data to obtain an original audio signal and sending the original audio signal to the second equipment;

the second device is configured to implement the method disclosed in the first aspect.

In a fourth aspect, an embodiment of the present invention discloses a computer-readable storage medium, on which a computer program is stored, the computer program stored in the storage medium being used for being executed to implement the method disclosed in the first aspect.

In a first aspect, an embodiment of the present invention discloses a chip of an audio device, which has an integrated circuit thereon, wherein the integrated circuit is designed to implement the method disclosed in the first aspect.

[ PROBLEMS ] the present invention

According to the wind noise suppression method, the wind noise suppression device and the audio equipment disclosed by the embodiment of the invention, after an original audio signal is obtained, the original audio signal is copied into a first path of audio signal and a second path of audio signal, and the first path of audio signal is subjected to correlation operation, so that a regular audio signal can be obtained, namely, a voice harmonic component can be highlighted; determining a peak harmonic point and a non-peak harmonic point according to the second path of audio signal; when the fundamental frequency point in the second path of audio signal is located at a non-peak harmonic point, the normalized audio signal corresponding to the fundamental frequency point is gained through a preset gain factor to obtain a noise estimation corresponding to the fundamental frequency point, and when the fundamental frequency point in the second path of audio signal is located at a peak harmonic point, linear interpolation is carried out according to frequency points near the fundamental frequency point to obtain a noise estimation corresponding to the fundamental frequency point, so that a noise estimation spectrum of the normalized audio signal can be obtained; therefore, the suppression and noise reduction of the audio signal can be realized. The embodiment of the invention can highlight the voice based on the relevance calculation of the first path of audio signal, thereby being convenient for better detecting the voice signal during noise estimation, noise estimation is performed on the normalized audio signal in different modes through the peak harmonic point and the non-peak harmonic point, so that the damage to the voice signal can be effectively reduced, and a noise estimation spectrum is obtained, namely, compared with the prior art, the embodiment of the invention can effectively search the range of the effective components of the voice by highlighting the harmonic components of the voice, therefore, the method can avoid directly searching the unsteady wind noise, on one hand, the consumption of overlarge calculation amount is avoided, on the other hand, compared with the method for searching the unsteady wind noise, the method has the advantage that the accuracy for searching the voice effective components is higher, and therefore, on the premise of consuming smaller calculation amount, the accuracy for wind noise estimation is improved.

Other advantages of the present invention will be described in the detailed description, and those skilled in the art will understand the technical features and technical solutions presented in the description.

Drawings

Embodiments according to the present invention will be described below with reference to the accompanying drawings. In the figure:

fig. 1 is a flow chart of a wind noise suppression method disclosed in this embodiment;

fig. 2 is a schematic flow diagram of a wind noise suppression signal according to the present disclosure;

fig. 3 is a flowchart illustrating a frequency domain signal obtained by performing correlation operation on a first channel of audio signals according to this embodiment;

fig. 4 is a schematic diagram illustrating a comparison of smoothing effects performed on a current discrete signal according to the present embodiment;

FIG. 5 is a flowchart illustrating a method for determining a peak harmonic point according to the present embodiment;

fig. 6 is a schematic structural diagram of a wind noise suppression device disclosed in this embodiment.

Detailed Description

In order to improve the accuracy of wind noise estimation on the premise of consuming a small amount of computation in the process of suppressing wind noise, the present embodiment discloses a wind noise suppression method, please refer to fig. 1, which is a flow of the wind noise suppression method disclosed in the present embodiment, and the wind noise suppression method includes:

step S100, an original audio signal is acquired in the time domain. In this embodiment, the original audio signal X is called₀(n) comprises at least part of the speech signal and at least part of the wind noise signal, and specifically, for a specific original audio signal frame, the speech signal may be a pure speech signal, the wind noise signal may be a pure wind noise signal, or both the speech signal and the wind noise signal may be included. In a specific implementation process, please refer to fig. 2, which is a schematic flow diagram of a wind noise suppression signal disclosed in this embodiment, an original audio signal may be collected through a sound pickup unit, and the sound pickup unit may be built-in or external.

And step S200, the audio signals are duplicated into a first path of audio signals and a second path of audio signals. In this embodiment, the first path of audio signal, the second path of audio signal and the original audio signal are the same. Referring to FIG. 2, the original audio signal X with noise can be processed by a frame windowing unit₀(n) division into a plurality of time domain data frames, a noisy original audio signal X for each frame₀(n) performing windowing. Dividing the original signal X (n) into two paths for parallel processing, one pathThe method is used for correlation operation of time domain discrete signals, and the other path is used for calculating fundamental frequency information, which is described in detail below.

Step S300, performing relevance calculation on the signal of the current sampling point and the previous and/or subsequent sampling point in the first path of audio signal, and converting the signal after the relevance calculation into a frequency domain signal. Referring to fig. 2, the first path of audio signal is subjected to relevance operation and Fast Fourier Transform (FFT) by the relevance operation unit and the FFT transformation unit to obtain the audio signal X (λ, μ) normalized in the frequency domain, where λ represents the frame number and μ represents the number of frequency points corresponding to each frame, so that the harmonic components of the voice can be highlighted. In a specific implementation process, the relevance operation may be performed on signals before and after the current signal in the first path of audio signal, that is, the relevance operation may be a signal before the current signal or a signal after the current signal.

In this embodiment, based on the long-term correlation of the voice signal, the correlation operation is performed on the first path of audio signal, so that there is correlation between signals at different times, and therefore, the influence of sudden change of the signal caused by the external environment is reduced, that is, the correlation of the voice signals at different times is improved.

Taking an example of performing correlation operation on future signals of a current signal, please refer to fig. 3, which is a flowchart for performing correlation operation on a first channel of audio signals to obtain frequency domain signals disclosed in this embodiment, in an alternative embodiment, step S300 includes step S310, step S320, and step S330, which are specifically as follows:

step S310, performing discrete processing on the first audio signal. Specifically, the first audio signal may be subjected to discrete processing in an existing discrete manner.

Step S320, performing smoothing processing on the current discrete signal by using the signal of the future sampling point, so as to correlate the current signal with the signal of the future sampling point. Specifically, the following formula is adopted to smooth the current discrete signal:

x_a(n)＝x(n+wn²) Formula (1)

Wherein n is a toneNumber of sampling points of frequency signal, x being current discrete signal, x_aAnd (n) is a signal obtained after the smoothing of the current discrete point, w is a warping coefficient, and the value of w is, for example, 0.00001 to 0.0009, preferably 0.0002.

In this embodiment, the association between the current discrete sampling point and the future time point of the discrete sampling point is smoothed, so as to reduce jitter on the spectrogram caused by abrupt change factors.

Step S330, performing fourier transform on the smoothed current discrete signal to obtain a current normalized audio signal X (λ, μ). In particular, the normalized signal x is_a(n) transforming to the frequency domain by using FFT to obtain a frequency domain signal X (lambda, mu), wherein the specific formula is as follows:

X(λ,μ)＝FFT(x_α(n),N_f) Formula (2)

Wherein, λ represents the frame number, μ represents the corresponding frequency point number of each frame, N_fThe number of FFT points is shown.

In this embodiment, by performing smoothing processing on the current discrete signal, the harmonic component of the noisy speech can be separated, so that the speech harmonic component in the noisy speech signal is more prominent in the frequency domain spectrum (the peak value of the frequency domain signal is obvious).

To visually illustrate the above effects, please refer to fig. 4, which is a schematic diagram illustrating a comparison between smoothing effects of the current discrete signal disclosed in this embodiment, wherein fig. 4(a) is a waveform diagram of an original audio signal obtained in a time domain, and fig. 4(b) is a waveform diagram of the current discrete signal after fourier transform is performed on fig. 4 (a); fig. 4(c) is a waveform diagram of an original audio signal smoothed in the time domain, and fig. 4(d) is a waveform diagram of the original audio signal subjected to fourier transform in fig. 4 (c). The framed signal X (n) (as shown in fig. 4 (a)) is smoothed in one time, i.e. rounded in time (time warping) to obtain a rounded signal X_α(n) as shown in fig. 4 (b); then, the integrated signal X is processed_α(n) Fourier transform, as shown in FIG. 4(d), it can be seen from FIG. 4(b) that the resolution between the harmonic components of the speech signal in the frequency domain after time normalization is enhanced (see FIG. 4(d)), i.e., the peak value and the valley value in the frequency domain are significant, thereby enhancing the effect of the noisy speech signal in the frequency domainThe degree of discrimination of harmonic components of speech in the spectrum.

And S400, determining a peak harmonic point and a non-peak harmonic point according to the second path of audio signal in a frequency domain. See, in particular, the description below.

And S500, respectively carrying out noise estimation on the normalized audio signal based on the peak harmonic point and the non-peak harmonic point to obtain a noise estimation spectrum. In this embodiment, when the fundamental frequency point in the second channel of audio signal is located at a non-peak harmonic point, the normalized audio signal corresponding to the fundamental frequency point is gained by a preset gain factor to obtain a noise estimation corresponding to the fundamental frequency point; and when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency point near the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point.

In the implementation, the gain factor G can be used_p(λ, μ) gain-weighting the original audio signal to obtain a corresponding noise spectrum, wherein, since the noise near the peak harmonic point is smaller than the noise at the non-peak harmonic point, in the present embodiment, the gain near the peak harmonic point is smaller than the gain of the signal at the non-peak harmonic point, thereby improving the accuracy of the noise spectrum. Referring to fig. 2, in the implementation process, the fundamental frequency f may be performed on the audio frame in the second path of audio signal by the fundamental frequency calculating unit₀Calculation of the fundamental frequency information (fundamental frequency value) f₀The fundamental frequency can be obtained by using the existing fundamental frequency calculation algorithm, such as an autocorrelation function method to calculate the fundamental frequency, a wavelet transform-based fundamental frequency extraction algorithm, and the like. Referring to FIG. 2, the fundamental frequency f is calculated by the fundamental frequency calculating unit₀The noise estimation unit may then be based on the fundamental frequency f₀Obtain a gain factor G_p(lambda, mu), then by a gain factor G_p(λ, μ) gain weighting the original audio signal, thereby obtaining a corresponding noise spectrum φ_N(lambda, mu). As an example:

when f is₀When the signal is not equal to 0, the audio signal is represented to contain a speech component, noise reduction processing is required to be further carried out, a pure speech signal is extracted, and at the moment, a corresponding gain factor in binary gain is not required to be calculated, and the fundamental frequency is used for obtaining the speech signal according to the corresponding gain factorAnd carrying out linear interpolation on frequency points near the points to obtain noise estimation corresponding to the fundamental frequency point.

When f is₀When the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, the probability that the noise component is 0 in the harmonic existence range is small, that is, the noise component is 0 does not meet the practical application. Therefore, the normalized audio signal corresponding to the fundamental frequency point can be gained through the preset gain factor to obtain the noise estimation corresponding to the fundamental frequency point;

and step S600, carrying out noise reduction suppression on the normalized audio signal X (lambda, mu) according to the noise estimation spectrum to obtain a noise-reduced audio signal. Referring to FIG. 2, a noise spectrum φ is obtained_NAfter (lambda, mu), the gain suppression unit depends on the noise spectrum phi_N(lambda, mu) to obtain a noise suppression coefficient G_pAnd (lambda, mu), noise reduction and suppression can be carried out on the normalized audio signal X (lambda, mu) through the noise suppression coefficient G (lambda, mu), and a clean voice spectrum is obtained.

The embodiment of the invention can highlight the voice based on the relevance calculation of the first path of audio signal, thereby being convenient for better detecting the voice signal during noise estimation, noise estimation is performed on the normalized audio signal in different modes through the peak harmonic point and the non-peak harmonic point, so that the damage to the voice signal can be effectively reduced, and a noise estimation spectrum is obtained, namely, compared with the prior art, the embodiment of the invention can effectively search the range of the effective components of the voice by highlighting the harmonic components of the voice, therefore, the method can avoid directly searching the unsteady wind noise, on one hand, the consumption of overlarge calculation amount is avoided, on the other hand, compared with the method for searching the unsteady wind noise, the method has the advantage that the accuracy for searching the voice effective components is higher, and therefore, on the premise of consuming smaller calculation amount, the accuracy for wind noise estimation is improved.

Referring to fig. 5, a flowchart of a method for determining a peak harmonic point disclosed in this embodiment is shown, in an alternative embodiment, in step S400, the peak harmonic point is determined as follows:

step S410, calculatingThe points of the two paths of audio signals are the points of the harmonics. In the specific implementation process, the points of each harmonic can be calculated through the fundamental frequency points, please refer to fig. 2, in the noise estimation unit, the number N of FFT points_fAnd a data sampling rate f_sThe frequency resolution Δ f ═ f of the frequency domain data can be obtained_s/N_f(ii) a Recombined with the fundamental frequency f₀The fundamental frequency point number position f can be obtained_N0＝round(f₀,/Δ f), where round () represents a round operation; then, using the formula f_k0＝k·f_N0Finding the point where the k harmonic is located, wherein f_k0Is the point where the k harmonic is located.

Step S420, finding a peak point within a preset point number range of each harmonic point. To avoid f₀Estimation bias, resulting in an estimation bias accumulation effect when defining k harmonics later, where k · f is calculated_N0Then, according to the characteristics of speech harmonic and wind noise spectrum a correction of k-th harmonic is added, i.e. the point k.f of every harmonic calculated according to fundamental frequency is_N0Then, a peak point is searched in the range of m points (m can be dynamically adjusted according to actual conditions) around the point as the final destination point where the k-th harmonic is located, specifically, the peak point is searched by using the following formula:

f_k0＝max(k·f_N0-m,...k·f_N0,...k·f_N0+ m formula (3)

I.e. at the fundamental frequency f₀In each m point on the left and right, k.f_N0The maximum value is the peak point.

In step S430, the peak point is set as the peak harmonic point.

In this embodiment, at the fundamental frequency f₀Finding a peak point nearby as the point where the k-th harmonic is located can avoid f₀The estimation bias, resulting in the cumulative effect of the estimation bias when the k harmonics are defined later.

In order to increase the range of calculating the timing of the speech harmonic component and reduce the probability of misinterpreting the signal containing the speech component as the noise signal, in an alternative embodiment, the step S430 is executed, and includes: adding frequency point position allowance to the peak harmonic point; after adding frequency point position allowanceThe frequency points are determined as peak harmonic points. Specifically, f is obtained in formula (3)_k0Then, f is mixed_k0Increasing the frequency point position margin mu delta by f_k0-μΔ～f_k0+ μ Δ is the final peak harmonic point. In a particular embodiment, the μ Δ range is dynamically adjustable, typically set to 1, with the greater the frequency resolution, the smaller the μ Δ. Therefore, each harmonic set M can be constructed as follows:

M＝{[f_k0-μΔ,...,f_k0,...f_k0+μΔ]n, formula (4) is 0,1,2

Wherein each sub-harmonic set M represents a set of signals containing speech components.

In a specific embodiment, the following formula is adopted when step S500 is executed

To derive a noise estimate spectrum for the off-peak harmonic point, wherein,

for the corresponding noise estimation spectrum, in a preferred embodiment, when performing step S500: predetermined gain factor G_pAnd (. lamda.,. mu.) is 1.

When the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, the probability that the noise component is 0 in the harmonic existence range is small, that is, the noise component is 0 is not suitable for practical application. Thus, in particular embodiments, the noise estimate for the peak harmonic point may be determined by way of linear interpolation.

Preferably, when the fundamental frequency point in the second channel of audio signal is located at the peak harmonic point, linear interpolation may be performed through each of not more than 3 frequency points before and after the fundamental frequency point is adjacent to obtain the noise estimate corresponding to the fundamental frequency point, and the noise estimate is determined by the linear interpolation of two adjacent frequency points of the fundamental frequency point, specifically, in step S500, when the fundamental frequency point in the second channel of audio signal is located at the peak harmonic point, the noise estimate corresponding to the fundamental frequency point is the linear interpolation

Itself. In a specific implementation process, interpolation can be realized through two frequency points, that is, the noise estimation for the base frequency point N is as follows:

is composed of

And

linear interpolation of (2).

That is, formula (5) is used to obtain the noise estimation spectrum

That is, when the fundamental frequency point of the second audio signal is located at the non-peak harmonic point, the noise is estimated as

When the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, the noise is estimated as

In the present embodiment, linear interpolation is used

To determine a noise estimate of a peak harmonic point

The noise of the peak harmonic point can be effectively estimated through noise estimation near the peak harmonic point, so that the noise of the peak harmonic point can be effectively suppressed during noise suppression. That is, adaptive tunable fundamental frequency dynamicsAnd (4) performing interpolation processing by combining the base frequency value of the voice signal to obtain a complete noise spectrum, thereby greatly improving the accuracy and the real-time performance of wind noise estimation.

Preferably, linear interpolation is carried out on each of the front and rear adjacent frequency points of the fundamental frequency point by no more than 3 frequency points to obtain noise estimation corresponding to the fundamental frequency point, so that the noise estimation accuracy can be improved, and interpolation errors caused by unsteadiness of wind noise can be reduced.

In a specific implementation process, when the step S600 is executed to perform noise reduction suppression on the audio signal, the audio signal may be compressed by a compression coefficient to obtain a clean voice spectrum, and then a time-domain discrete signal is obtained through inverse fourier transform (IFFT). Specifically, the method comprises the following steps:

1. calculating the posterior signal-to-noise ratio:

in calculating a noise estimate spectrum

Thereafter, the spectrum is estimated from the noise

And the normalized audio signal X (lambda, mu) to obtain the posterior signal-to-noise ratio

2. And (3) smoothing operation of posterior signal-to-noise ratio:

a posteriori signal-to-noise ratio (SNR) combined with previous frame signal_post0(lambda-1, mu), obtaining the smoothed posterior SNR by the smoothing coefficient sigma and whether the current frame contains the speech component_post0(λ，μ)：

SNR_post0(λ，μ)＝σ·SNR_post0(λ-1，μ)·G(λ-1，μ)+(1-σ)·SNR_post0(λ, μ) formula (6)

Wherein, G (lambda-1, mu) is a compression coefficient of a frame before the current frame, the smoothing coefficient sigma is an empirical value, and can be dynamically adjusted to increase the tracking speed when noise is obtained, and the larger the smoothing coefficient is, the faster the tracking speed is, but the more abrupt the hearing is.

In this embodiment, based on the foregoing experience, it is found that f calculated under the current frame λ₀At 0, the smoothing coefficient σ takes 0.9, thereby increasing the tracking speed; otherwise, the smoothing coefficient σ takes 0.5, thereby reducing auditory salience. That is, in the present embodiment, the adjustment of the smoothing coefficient σ realizes both the increase of the tracking speed and the improvement of the auditory sense.

3. And (3) calculating a pressing coefficient:

referring to FIG. 2, the gain suppressing unit calculates the SNR according to the calculated SNR_post0(λ, μ), the suppression factor for the current frame λ can be calculated as:

where θ represents a compression level control coefficient, and may be configured as needed.

4. Clean speech spectrum calculation:

referring to fig. 2, after the gain suppressing unit calculates the suppression coefficient G (λ, μ) of the current frame λ, the denoising unit may perform suppression on the audio signal X (λ, μ) normalized by the suppression coefficient G (λ, μ) to obtain a clean speech spectrum

5. And (3) voice spectrum inverse operation:

referring to FIG. 2, the clean speech spectrum is obtained by the denoising unit

The clean speech spectrum can then be aligned by the IFFT transform unit

Performs an inverse Fourier transform, andthen, window and synthesis are carried out to obtain the time domain discrete signal after noise reduction

Fig. 6 is a schematic structural diagram of a wind noise suppression device disclosed in this embodiment, where the wind noise suppression device includes: an original signal obtaining module 100, a signal copying module 200, a correlation operation module 300, a binary gain calculation module 400, a signal weighting module 500, and a noise reduction suppression module 600, wherein:

the original signal acquiring module 100 is configured to acquire an original audio signal in a time domain, where the original audio signal includes at least a part of a speech signal and at least a part of a wind noise signal; the signal replication module 200 is configured to replicate the audio signal into a first channel of audio signal and a second channel of audio signal, where the first channel of audio signal, the second channel of audio signal and the original audio signal are the same; the relevance operation module 300 is configured to perform relevance operation on a signal of a current sampling point and a preceding and/or following sampling point in the first path of audio signal, convert the signal after the relevance operation into a frequency domain signal, obtain an audio signal normalized on a frequency domain, and highlight a voice harmonic component; the peak/off-peak harmonic point module 400 is configured to determine a peak harmonic point and an off-peak harmonic point according to the second channel of audio signal in the frequency domain; the noise estimation module 500 is configured to perform noise estimation on the normalized audio signal based on the peak harmonic point and the non-peak harmonic point, respectively, to obtain a noise estimation spectrum, where when the fundamental frequency point in the second channel of audio signal is located at the non-peak harmonic point, the normalized audio signal corresponding to the fundamental frequency point is gained by using a preset gain factor, so as to obtain a noise estimation corresponding to the fundamental frequency point; when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency point near the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point; the noise reduction suppression module 600 is configured to perform noise reduction suppression on the normalized audio signal according to the noise estimation spectrum, so as to obtain a noise-reduced audio signal.

In an alternative embodiment, the association degree operation module 300 includes: the discrete unit is used for carrying out discrete processing on the first path of audio signal; the smoothing unit is used for smoothing the current discrete signal by adopting the signal of the future sampling point so as to correlate the current signal with the signal of the future sampling point; and the Fourier transform unit is used for carrying out Fourier transform on the current discrete signal after the smoothing processing to obtain a current normalized audio signal.

In an alternative embodiment, in the smoothing unit, the following formula is adopted to perform smoothing processing on the current discrete signal:

x_a(n)＝x(n+wn²)

In an alternative embodiment, the peak/non-peak harmonic point module 400 includes: the harmonic calculation unit is used for calculating the points of the second path of audio signals at which the harmonics are located; the peak searching unit is used for searching a peak point in a preset point number range of each harmonic point; and the peak harmonic point determining unit is used for taking the peak point as the peak harmonic point.

In an alternative embodiment, the peak harmonic point determination unit is specifically configured to: adding frequency point position allowance to the peak harmonic point; and determining the frequency point with the increased frequency point position margin as a peak harmonic point.

In an alternative embodiment, in the noise estimation module 500: the preset gain factor is 1; and when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, carrying out linear interpolation according to the frequency points which are not more than 3 before and after the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point.

This embodiment also discloses an audio equipment, and audio equipment can be earphone, audio amplifier, power amplifier, also can be electronic equipment that has audio data acquisition function such as mobile terminal, and audio equipment includes: and the processor is used for realizing the method disclosed by the embodiment.

The embodiment also discloses an audio signal processing system, including: a first device and a second device; the first equipment is used for acquiring audio data to obtain an original audio signal and sending the original audio signal to the second equipment; the second device is used for realizing the method disclosed by the embodiment. For example, the first device is an earphone, and the second device is a mobile terminal; for another example, the first device is a microphone, and the second device is a power amplifier.

The embodiment also discloses a computer readable storage medium, on which a computer program is stored, characterized in that the computer program stored in the storage medium is used for being executed to realize the method disclosed by the above embodiment.

The embodiment also discloses a chip of the audio device, which is provided with an integrated circuit, and is characterized in that the integrated circuit is designed to realize the method disclosed by the embodiment.

It should be noted that step numbers (letter or number numbers) are used to refer to some specific method steps in the present invention only for the purpose of convenience and brevity of description, and the order of the method steps is not limited by letters or numbers in any way. It will be clear to a person skilled in the art that the order of the steps of the method in question, as determined by the technology itself, should not be unduly limited by the presence of step numbers.

It will be appreciated by those skilled in the art that the above-described preferred embodiments may be freely combined, superimposed, without conflict.

It will be understood that the embodiments described above are illustrative only and not restrictive, and that various obvious and equivalent modifications and substitutions for details described herein may be made by those skilled in the art without departing from the basic principles of the invention.

Claims

1. A method of wind noise suppression, comprising:

step S100, acquiring an original audio signal on a time domain, wherein the original audio signal comprises at least a part of voice signal and at least a part of wind noise signal;

step S200, the original audio signals are duplicated into a first path of audio signals and a second path of audio signals, and the first path of audio signals, the second path of audio signals and the original audio signals are the same;

step S300, performing relevance calculation on the signals of the current sampling point and the previous and/or subsequent sampling points in the first path of audio signal, and converting the signals after the relevance calculation into frequency domain signals to obtain audio signals which are normalized on a frequency domain so as to highlight the harmonic components of the voice;

step S400, determining a peak harmonic point and a non-peak harmonic point according to the second path of audio signal in a frequency domain;

step S500, respectively performing noise estimation on the normalized audio signals based on the peak harmonic point and the non-peak harmonic point to obtain a noise estimation spectrum, wherein when a fundamental frequency point in the second path of audio signals is located at the non-peak harmonic point, the normalized audio signals corresponding to the fundamental frequency point are subjected to gain through a preset gain factor to obtain a noise estimation corresponding to the fundamental frequency point; when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, performing linear interpolation according to the frequency point near the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point;

and S600, carrying out noise reduction suppression on the normalized audio signal according to the noise estimation spectrum to obtain a noise-reduced audio signal.

2. The wind noise suppression method according to claim 1, wherein the step S300 includes:

step S310, performing discrete processing on the first path of audio signal;

step S320, smoothing the current discrete signal by adopting the signal of the future sampling point so as to correlate the current signal with the signal of the future sampling point;

and step S330, carrying out Fourier transform on the smoothed current discrete signal to obtain a current normalized audio signal.

3. The method for suppressing wind noise according to claim 2, wherein in step S320, the current discrete signal is smoothed by using the following formula:

x_a(n)＝x(n+wn²)

4. A wind noise suppression method according to any one of claims 1-3, wherein said step S400 comprises:

step S410, calculating the points of the second path of audio signal at which the second path of audio signal is at;

step S420, searching a peak point in a preset point number range of each subharmonic point;

and step S430, taking the peak point as the peak harmonic point.

5. The wind noise suppression method of claim 4, wherein the step S430 comprises:

adding frequency point position allowance to the peak harmonic point;

and determining the frequency point with the increased frequency point position margin as the peak harmonic point.

6. The wind noise suppression method according to claim 4, wherein in the step S500, the preset gain factor is 1;

7. A wind noise suppression device, comprising:

a raw signal acquisition module (100) for acquiring a raw audio signal in the time domain, the raw audio signal comprising at least part of a speech signal and at least part of a wind noise signal;

a signal copying module (200) for copying the audio signal into a first channel of audio signal and a second channel of audio signal, wherein the first channel of audio signal, the second channel of audio signal and the original audio signal are the same;

the relevance operation module (300) is used for performing relevance operation on the signals of the current sampling point and the previous and/or subsequent sampling points in the first path of audio signal, converting the signals after the relevance operation into frequency domain signals, obtaining audio signals which are normalized on a frequency domain, and highlighting the harmonic components of the voice;

a peak/off-peak harmonic point module (400) for determining a peak harmonic point and an off-peak harmonic point in the frequency domain according to the second audio signal;

a noise estimation module (500) configured to perform noise estimation on the normalized audio signal based on the peak harmonic point and the non-peak harmonic point, respectively, to obtain a noise estimation spectrum, where when a fundamental frequency point in the second channel of audio signal is located at the non-peak harmonic point, a preset gain factor is used to gain the normalized audio signal corresponding to the fundamental frequency point, so as to obtain a noise estimation corresponding to the fundamental frequency point; when the fundamental frequency point in the second path of audio signal is located at the peak harmonic point, performing linear interpolation according to the frequency point near the fundamental frequency point to obtain the noise estimation corresponding to the fundamental frequency point;

and the noise reduction suppression module (600) is used for performing noise reduction suppression on the normalized audio signal according to the noise estimation spectrum to obtain a noise-reduced audio signal.

8. A wind noise suppression device according to claim 7, wherein the correlation operation module (300) comprises:

9. The wind noise suppression device according to claim 8, wherein the smoothing unit smoothes the current discrete signal by using the following formula:

x_a(n)＝x(n+wn²)

wherein n is an audio signalThe number of sampling points, x being the current discrete signal, x_aAnd (n) is a signal obtained after the current discrete point smoothing processing, and w is a warping coefficient.

10. A wind noise suppression device according to any of claims 7-9, wherein the peak/non-peak harmonic point module (400) comprises:

a peak harmonic point determination unit configured to take the peak point as the peak harmonic point.

11. The wind noise suppression device according to claim 10, wherein the peak harmonic point determination unit is specifically configured to:

adding frequency point position allowance to the peak harmonic point;

12. A wind noise suppression device according to claim 10, wherein in the noise estimation module (500):

the preset gain factor is 1;

13. An audio device having an audio data acquisition function, comprising:

a processor for implementing the method of any one of claims 1-6.

14. An audio signal processing system, comprising: a first device and a second device;

the second device is for implementing the method of any one of claims 1-6.

15. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program stored in the storage medium is adapted to be executed to implement the method according to any of claims 1-6.

16. A chip of an audio device having an integrated circuit thereon, characterized in that the integrated circuit is designed for implementing the method as claimed in any one of claims 1 to 6.