CN117765910A

CN117765910A - Single-channel noise reduction method and device

Info

Publication number: CN117765910A
Application number: CN202311488278.3A
Authority: CN
Inventors: 涂晴莹; 董斐
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2024-03-26

Abstract

The application discloses a single-channel noise reduction method and device, wherein the method comprises the following steps: receiving a noisy speech signal; determining the prior signal-to-noise ratio of the noise-containing voice signal and the power spectrum of the pure voice signal; carrying out harmonic enhancement according to the pure voice power spectrum to obtain an enhanced pure voice signal power spectrum; determining a signal to noise ratio according to the enhanced pure voice signal power spectrum to obtain a signal to noise ratio estimated value; determining a final signal-to-noise ratio according to the prior signal-to-noise ratio and the signal-to-noise ratio estimation value; calculating a gain according to the final signal-to-noise ratio; noise reduction processing is carried out on the noise-containing voice signal according to the gain; and outputting the noise-reduced voice signal. According to the method and the device, the voice harmonic wave damaged due to noise suppression can be recovered and enhanced, and the intelligibility of conversation voice in a noise environment is improved.

Description

Single-channel noise reduction method and device

Technical Field

The application relates to the technical field of voice signal processing, in particular to a single-channel noise reduction method and device.

Background

The radio frequency power amplifier (called radio frequency power amplifier for short) is an important component of the radio frequency front end, and can enable the terminal to obtain higher radio frequency output power through the radio frequency power amplifier. The efficiency of the radio frequency power amplifier is inversely proportional to the power amplifier supply voltage, and the higher the supply voltage is, the lower the efficiency is; meanwhile, in order to meet the output power, the radio frequency power amplifier needs the lowest supply voltage requirement. Therefore, in order to improve the efficiency of the radio frequency power amplifier, the power supply voltage of the radio frequency power amplifier needs to be dynamically controlled.

When the signal-to-noise ratio is low, higher harmonics of voice can be submerged in noise, and the conventional common voice enhancement algorithms such as spectral subtraction and wiener filtering methods often bring damage to the harmonics after noise suppression, so that the voice intelligibility is affected, and therefore, the recovery of the voice harmonics has important significance for improving the conversation voice quality.

The common scheme for recovering or enhancing the voice harmonic wave is to detect the voice signal, reduce the noise estimation of the noise-containing voice frequency spectrum at the position of the fundamental tone frequency multiplication or improve the prior signal to noise ratio at the position of the fundamental tone frequency multiplication, thereby realizing the protection of the fundamental tone frequency multiplication, namely harmonic components. Therefore, the industry provides a harmonic enhancement method based on a cosine harmonic model in a frequency domain, the scheme improves the problem that fundamental tone detection errors are accumulated along with the increase of frequency multiplication, a least square method is used for estimating harmonic model parameters to obtain a frequency domain comb filter for recovering voice harmonic, the method has the defects that the parameters of the model are more, parameter estimation is not simple linear estimation, and the performance of an algorithm also depends on the parameters of the model seriously. The present disclosure also provides a method for nonlinear harmonic recovery that employs an improved prior signal-to-noise estimator to apply a nonlinear function to a filtered time domain signal, and performs a weighted mixing of the filtered signal and the nonlinear processed signal to re-estimate the prior signal-to-noise ratio to recover the damaged harmonics in the frequency spectrum.

Disclosure of Invention

The embodiment of the application provides a single-channel noise reduction method and device, which can recover and strengthen voice harmonic waves damaged by noise suppression and improve the intelligibility of talking voice in a noise environment.

In one aspect, the present application provides a single channel noise reduction method, the method comprising:

receiving a noisy speech signal y (n);

determining a priori signal to noise ratio of the noisy speech signalAnd a clean speech signal power spectrum lambda _s (l,k)；

According to the pure voice power spectrum lambda _s (l, k) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)；

According to the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) determining a signal-to-noise ratio to obtain a signal-to-noise ratio estimated value; determining a new priori signal-to-noise ratio according to the priori signal-to-noise ratio minimum estimation threshold and the signal-to-noise ratio estimation value

Based on the new a priori signal to noise ratioCalculating gain G _final (l,k)；

According to the gain G _final (l, k) noise reduction processing is performed on the noise-containing voice signal y (n);

and outputting the noise-reduced voice signal s (n).

Optionally, the method further comprises:

converting the voice time domain signal Y (n) into a frequency domain signal Y (l, k);

performing single-channel noise estimation on the frequency domain signal Y (l, k) to obtain a noise power spectrum lambda _n (l,k)；

Said determining an a priori signal to noise ratio of said noisy speech signalAnd a clean speech signal power spectrum lambda _s (l, k) comprises:

according to the noise power spectrum lambda _n (l, k) and the power spectrum |Y (l, k) | of the frequency domain signal Y (l, k) ² Determining a priori signal to noise ratio

The prior signal to noise ratio is calculatedAnd the noise power spectrum lambda _n (l, k) multiplying to obtain a clean speech power spectrum lambda of the noisy speech time domain signal y (n) _s (l,k)。

Optionally, the single channel noise estimation on the frequency domain signal Y (l, k) includes: and carrying out single-channel noise estimation on the frequency domain signal Y (l, k) by adopting any one of the following methods: minimum statistics, minimum tracking, and recursive averaging.

Optionally, the determining a priori signal to noise ratioComprising the following steps: determining a priori signal-to-noise ratio using decision directed method

Optionally, the power spectrum lambda according to the pure voice _s (l, k) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l, k) comprises:

according to the pure voice power spectrum lambda _s (l, k) obtaining a cepstral domain signal

For the cepstral domain signalHarmonic enhancement is carried out to obtain the power spectrum lambda 'of the enhanced pure voice signal' _s (l,k)。

Optionally, the pair of cepstral domain signalsHarmonic enhancement is carried out to obtain the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) comprises:

for the cepstral domain signalPerforming pitch detection and determining pitch frequency;

the cepstral domain signal is mapped according to the cepstral energy corresponding to the fundamental frequencyCarrying out harmonic enhancement;

transforming the cepstral domain signal with enhanced harmonic wave into frequency domain to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)。

Optionally, the cepstral domain signal is mapped to the cepstral energy corresponding to the pitch frequencyPerforming harmonic enhancement includes: if the scrambling energy corresponding to the pitch frequency is greater than the set threshold thr, then a range of energy centered on the corresponding notch of the base frequency is increased.

Optionally, the cepstral domain signal is mapped to the cepstral energy corresponding to the pitch frequencyPerforming harmonic enhancement further includes:

if the scrambling energy corresponding to the pitch frequency is larger than a set threshold thr, weak smoothing processing is carried out in a certain range with the scrambling point corresponding to the pitch frequency as the center, and other scrambling points except the envelope are carried out strong smoothing processing;

otherwise, smoothing is carried out on all the frequency-reversing points.

Optionally, the step of generating a signal according to the enhanced clean speech signal power spectrum lambda' _s (l, k) determining a signal-to-noise ratio, the obtaining a signal-to-noise ratio estimate comprising: according to the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) and the noise power spectrum lambda _n (l, k) calculating the signal-to-noise ratio to obtain an estimate of the signal-to-noise ratioAnd (5) calculating.

Optionally, the determining a new prior signal-to-noise ratio based on a prior signal-to-noise ratio minimum estimation threshold and the signal-to-noise ratio estimation valueComprising the following steps: selecting the maximum value of the signal-to-noise ratio estimated value and the prior signal-to-noise ratio minimum estimated threshold value as a new prior signal-to-noise ratio +.>

Optionally, according to the gain G _final (l, k) noise reduction processing of the noisy speech signal y (n) includes: the gain G _final (l, k) is multiplied with the frequency domain signal Y (l, k) to obtain an enhanced frequency domain signal S (l, k).

Optionally, the outputting the noise-reduced speech signal s (n) includes:

converting the enhanced frequency domain signal S (l, k) into a time domain signal S (l, n);

and outputting the noise-reduced voice signal s (n) after the time domain signal s (l, n) is subjected to frame synthesis.

In another aspect, an embodiment of the present application further provides a single-channel noise reduction device, where the device includes:

the receiving module is used for receiving the noise-containing voice signal y (n);

a signal-to-noise ratio calculation module for determining a priori signal-to-noise ratio of the noisy speech signal

A power spectrum determining module for determining a power spectrum lambda of the clean voice signal of the noise-containing voice signal _s (l,k)；

A harmonic enhancement module for enhancing the power spectrum lambda of the pure voice _s (l, k) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)；

A signal-to-noise ratio updating module for updating the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) determining the signal to noise ratioObtaining a signal-to-noise ratio estimated value; determining a new priori signal-to-noise ratio according to the priori signal-to-noise ratio minimum estimation threshold and the signal-to-noise ratio estimation value

A gain calculation module for calculating a new prior signal-to-noise ratio according to the new prior signal-to-noise ratioCalculating gain G _final (l,k)；

A noise reduction processing module for processing the gain G according to the gain _final (l, k) noise reduction processing is performed on the noise-containing voice signal y (n);

and the output module is used for outputting the noise-reduced voice signal s (n).

Optionally, the apparatus further comprises:

a first signal conversion module for converting the speech time domain signal Y (n) into a frequency domain signal Y (l, k);

a noise estimation module for performing single-channel noise estimation on the frequency domain signal Y (l, k) to obtain a noise power spectrum lambda _n (l,k)；

The signal-to-noise ratio calculation module calculates the noise power spectrum lambda according to the noise power spectrum lambda _n (l, k) and the power spectrum |Y (l, k) | of the frequency domain signal Y (l, k) ² Determining a priori signal to noise ratio

The power spectrum determination module uses the prior signal to noise ratioAnd the noise power spectrum lambda _n (l, k) multiplying to obtain a clean speech power spectrum lambda of the noisy speech time domain signal y (n) _s (l,k)。

Optionally, the harmonic enhancement module includes:

a cepstral domain signal generation unit for generating a cepstral domain signal according to the pure speech power spectrum lambda _s (l, k) obtaining a cepstral domain signal

A harmonic enhancement unit for the cepstral domain signalHarmonic enhancement is carried out to obtain the power spectrum lambda 'of the enhanced pure voice signal' _s (l,k)。

Optionally, the harmonic enhancement unit includes:

a pitch detection subunit for detecting the cepstral domain signalPerforming pitch detection and determining pitch frequency;

a harmonic enhancement subunit for generating a cepstral domain signal according to cepstral energy corresponding to the fundamental frequencyCarrying out harmonic enhancement;

a signal conversion subunit for transforming the cepstral domain signal with enhanced harmonic wave into frequency domain to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)。

Optionally, the output module includes:

a second signal conversion module for converting the enhanced frequency domain signal S (l, k) into a time domain signal S (l, n);

and the frame synthesis module is used for outputting the noise-reduced voice signal s (n) after the time domain signal s (l, n) is subjected to frame synthesis.

In another aspect, embodiments of the present application also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the single channel noise reduction method described above.

According to the single-channel noise reduction method and device, the noise-containing voice signal is subjected to pre-filtering treatment, the noise power spectrum estimated by the pre-filtering treatment and the prior signal to noise ratio are combined to obtain the power spectrum of the pure voice signal, the enhanced power spectrum of the pure voice signal is obtained through harmonic enhancement, the signal to noise ratio estimation is carried out again to improve the prior signal to noise ratio at the harmonic, the final enhanced voice signal is obtained through calculation and gain filtering according to the newly determined prior signal to noise ratio, and therefore the voice harmonic damaged due to noise suppression can be recovered and enhanced, and the high-quality voice signal is obtained.

Further, after the power spectrum of the pure voice signal is converted into a cepstrum domain, fundamental tone detection is carried out, harmonic components are improved through harmonic enhancement factors alpha (coefficients larger than 1), other parts are smoothed to remove music noise, then the frequency domain is inversely transformed, the prior signal to noise ratio is recalculated, gain filtering is calculated based on the new prior signal to noise ratio to obtain a finally enhanced voice signal, and therefore the music noise is effectively removed, and the intelligibility and subjective hearing of conversation voice in a noise environment are improved.

Drawings

FIG. 1 is a flowchart of a single channel noise reduction method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a specific process of a single channel noise reduction method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a process for harmonic enhancement of cepstral domain signals in an embodiment of the present application;

FIG. 4 is a diagram of voiced sound amplitude spectra and cepstrum in an embodiment of the present application;

FIG. 5 is a schematic diagram of the effect of harmonic enhancement factors on the magnitude spectrum of FIG. 4;

FIG. 6 is a graph comparison diagram of speech enhancement by different methods;

FIG. 7 is a schematic structural diagram of a single channel noise reducer according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of another configuration of a single channel noise reducer according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a harmonic enhancement module in an embodiment of the present application.

Detailed Description

In order to make the above objects, features and advantages of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Classical speech enhancement algorithms based on short-term spectral estimation often damage speech harmonics while effectively suppressing noise, especially higher harmonics with lower signal-to-noise inundation in noise, thereby affecting speech intelligibility. Therefore, the embodiment of the application provides a single-channel noise reduction method and system with voice harmonic enhancement, which are characterized in that noise-containing voice signals are subjected to pre-filtering treatment, the power spectrum of pure voice signals is obtained by combining the noise power spectrum estimated by the pre-filtering and the prior signal-to-noise ratio, the power spectrum of the enhanced pure voice signals is obtained by harmonic enhancement, the signal-to-noise ratio estimation is carried out again to improve the prior signal-to-noise ratio at the harmonic, and then gain filtering is calculated according to the finally determined signal-to-noise ratio to obtain the finally enhanced voice signals.

As shown in fig. 1, a flowchart of a single-channel noise reduction method according to an embodiment of the present application includes the following steps:

step 101, a noisy speech signal y (n) is received.

The noisy speech signal Y (n) is a time domain signal which also needs to be converted into a frequency domain signal Y (l, k) for the subsequent signal-to-noise ratio estimation. Performing single-channel noise estimation on the frequency domain signal Y (l, k) to obtain a noise power spectrum lambda _n (l,k)。

Specifically, the noise-containing voice time domain signal Y (N) is subjected to framing and windowing to obtain a signal Y (l, N), and the signal is subjected to N-point DFT (Discrete Fourier Transform ) to obtain a frequency domain signal Y (l, k), wherein l represents a frame index and k represents a frequency point index.

In a specific implementation, the single-channel noise estimation of the frequency domain signal Y (l, k) may be performed by using a plurality of methods, for example, any one of the following noise estimation methods may be used: the detailed calculation process of the minimum value statistics method, the minimum value tracking method, the recursive average method and the like can refer to the existing related technology, and will not be described in detail herein.

Step 102, determining a priori signal to noise ratio of the noisy speech signalAnd a clean speech signal power spectrum lambda _s (l,k)。

In the embodiment of the application, the noise power spectrum lambda is utilized _n (l, k) to determine a priori signal to noise ratioAnd a clean speech signal power spectrum lambda _s (l, k). Specifically, according to the noise power spectrum lambda _n (l, k) and the power spectrum |Y (l, k) | of the frequency domain signal Y (l, k) ² Determining a priori signal to noise ratio +.>-adding the a priori signal to noise ratio->And the noise power spectrum lambda _n (l, k) multiplying to obtain a clean speech power spectrum lambda of the noisy speech time domain signal y (n) _s (l,k)。

Priori signal to noise ratioThe determination of (2) may employ a decision-directed approach, i.e. the new a priori snr is a weighted average of the historical a priori snr and the current a priori snr estimate, and the implementation is as follows:

wherein alpha is _dd For smoothing coefficients, the value ranges from 0 to 1, optionally alpha _dd Either a fixed value or an adaptive parameter may be used, and the fixed value may be typically 0.98.

Wherein, gamma (l, k) is the posterior signal-to-noise ratio, G (l, k) is the noise reduction gain, the value is 0-1, and a general wiener gain calculation method is adopted.

Pure speech power spectrum lambda _s The calculation formula of (l, k) is as follows:

step 103, according to the pure voice power spectrum lambda _s (l, k) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)。

First, according to the clean speech power spectrum lambda _s (l, k) obtaining a cepstral domain signalSpecifically, the speech power spectrum takes log and IDFT is done to obtain cepstral domain power spectrum +.>The realization is as follows:

wherein m is a corresponding cepstrum frequency point (simply referred to as a cepstrum frequency point).

Then, for the cepstral domain signal(l, m) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)。

Specifically, the gene frequency can be determined based on the fundamental tone detection, and the cepstral domain signal is subjected to cepstral energy corresponding to the fundamental tone frequency(l, m) performing harmonic enhancement; transforming the cepstral domain signal with enhanced harmonic wave into frequency domain to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)。

In addition, when harmonic enhancement is performed, energy of a voice signal can be enhanced with respect to a certain range of energy around a frequency-down point corresponding to a pitch frequency. Furthermore, different smoothing treatments can be performed on different frequency-reversing points according to frequency-reversing energy corresponding to the fundamental tone frequency, so that music noise is removed, and the intelligibility and subjective hearing of conversation voice in a noise environment are improved.

The specific harmonic enhancement process will be described in detail later.

Step 104, according to the enhanced pure voice signal power spectrum lambda' _s (l, k) determining a signal-to-noise ratio to obtain a signal-to-noise ratio estimated value; determining a new priori signal-to-noise ratio according to the priori signal-to-noise ratio minimum estimation threshold and the signal-to-noise ratio estimation value

Combined output noise power spectrum lambda _n (l, k) can be calculated to obtain an SNR estimateSpecifically, the signal-to-noise ratio estimate may be calculated as follows:

based on the prior SNR minimum estimation threshold and the SNR estimation valueDetermining a new a priori signal to noise ratioCan be expressed as:

wherein, xi _min The minimum estimated threshold for a priori signal to noise ratio may be determined from empirical values.

Step 105, according to the new a priori signal to noise ratioCalculating gain G _final (l,k)。

The gain calculation method may be various, for example, wiener gain, MMSE (Minimum Mean Square Error ), MMSE-LSA (Minimum Mean Squared Error Log Spectrum Amplitude, logarithmic spectrum of minimum mean square error), maximum a posteriori estimator, etc., which is not limited to the embodiment of the present application.

For example, taking the wiener gain as an example, the gain can be expressed as:

step 106, according to the gain G _final (l, k) noise reduction processing is performed on the noise-containing voice signal y (n).

Specifically, the gain G _final (l, k) is multiplied with the frequency domain signal Y (l, k) to obtain an enhanced frequency domain signal S (l, k).

Step 107, outputting the noise-reduced speech signal s (n).

Specifically, the enhanced frequency domain signal S (l, k) is converted into a time domain signal S (l, n) through IDFT (Inverse Discrete Fourier Transform ), and then the time domain signal S (l, n) is subjected to frame synthesis to output a noise-reduced speech signal S (n).

Fig. 2 is a schematic diagram illustrating a specific process of the single-channel noise reduction method according to the embodiment of the present application, and the process is described in detail below with reference to fig. 2.

Referring to fig. 2, a signal 2 is obtained after the noise-containing voice time domain signal 1 is framed and windowed; the signal 2 is subjected to DFT conversion at N points to obtain a frequency domain signal 3.

The frequency domain signal 3 is subjected to noise estimation of a single channel to obtain a noise power spectrum 4;

combining noise power spectrum 4 with noise-containing speech signal power spectrum |Y (l, k) | ² A priori signal to noise ratio 5 is obtained by adopting a decision guiding method;

after obtaining a priori signal-to-noise ratio 5 of the preliminary estimation, multiplying the priori signal-to-noise ratio 5 by an estimated noise power spectrum to obtain an estimated pure voice power spectrum 6;

taking logarithm of the voice power spectrum 6, and performing IDFT to obtain a cepstral domain power spectrum 7;

the cepstrum domain power spectrum 7 is subjected to harmonic enhancement to obtain a clean voice signal power spectrum 8 after the enhancement harmonic, and a new priori signal-to-noise ratio 9 can be obtained by calculation by combining the noise power spectrum 4 output by the previous noise estimation;

calculating a gain 10 according to the prior signal-to-noise ratio 9;

the gain 10 is multiplied by the frequency domain signal 3 to obtain an enhanced frequency domain signal 11, the enhanced frequency domain signal 11 is subjected to IDFT to obtain a time domain signal 12, and the time domain output signal 13 is obtained through frame synthesis.

Referring to fig. 3, fig. 3 is a spectrum domain signal of the cepstral domain signal of step 103 of fig. 1 described aboveSchematic of the process for performing harmonic enhancement.

The input signal is a preliminary estimated clean speech signal cepstral domain power spectrumThe representation of a speech with a harmonic structure in the cepstral domain is shown in fig. 4, where fs=16000, the sampling rate in Hz; n=512, points of DFT, the top of fig. 4 is a voiced sound amplitude spectrum, and the bottom is a cepstrum.

As can be seen from FIG. 4, a voiced sound appears as a distinct peak at the frequency point corresponding to the cepstral domain pitch, as indicated by the circle in the figure. Therefore, a simple cepstrum domain maximum value searching method is adopted to find the corresponding fundamental tone frequency. In general, the pitch frequency of speech is within a certain range (f _0,low ,f _0,high ) Can search in the range, the frequency-reversing point m corresponding to the pitch frequency _pi (l) The preliminary determination may be as follows:

if the corresponding scrambling energy of the fundamental tone is greater than the threshold thr, the frame is considered to be a speech frame with harmonic structure, increasing m _pitch (l) And the energy of the adjacent frequency-reversing points, and performing weaker smoothing treatment on the frequency-reversing points and the frequency-reversing points in a certain range near the frequency-reversing points, and performing stronger smoothing treatment on the frequency-reversing points except the envelope.

If the scrambling energy corresponding to the fundamental tone is smaller than or equal to the threshold thr, the frame is considered as a noise frame, harmonic enhancement is not performed, and smoothing processing is performed on all scrambling points.

Referring to fig. 3, the input signal is a preliminary estimated clean speech signal cepstral domain power spectrumAfter harmonic enhancement and smoothing, output enhanced smoothed signal +>The specific process is as follows:

if it isThen:

otherwise:

α _s (l,m)＝β*α _s (l-1,m)+(1-β)*α _c (l,m)

wherein delta is _p The number of adjacent scrambling points for the selected pitch frequency.

Where α (l) is a harmonic enhancement factor greater than 1, optionally the parameter may be fixed or adaptively time-varying. This parameter being greater than 1 means that the energy of the periodic components in the frequency spectrum, i.e. the energy of the harmonics, is increased, and the setting of this parameter is not so great that more spurious harmonics are generated.

Fig. 5 shows the change in the amplitude spectrum of the harmonic enhancement factor alpha at the set speech harmonics.

As can be seen from fig. 5, the harmonic enhancement factor is transformed into the amplitude spectrum after being applied in the cepstrum domain, so that the peak energy of the harmonic is stronger and the trough energy is weaker.

Wherein alpha is _s Is a cepstrum smoothing coefficient, different scrambling corresponds to different smoothing coefficients, a part with low scrambling is expressed as an envelope of a voice spectrum, in order to ensure that the envelope characteristic of the spectrum is not damaged, the smoothing coefficient of a low scrambling point is set to be smaller alpha _lo The low frequency scrambling range is selected to be [1, m _thr1 ]At the same time, the harmonic wave is also an important characteristic of voice, and a smaller smoothing coefficient alpha is adopted _lo With increasing scrambling, the range is from [ m ] _thr1 +1,…,m _thr2 ]Increase to [ m ] _thr2 +1,…N/2+1]The greater the setting of the smoothing coefficient, the greater the smoothing coefficient, from α _me Rising to alpha _hi . The operation can effectively remove music noise generated after filtering due to inaccurate estimation of the primary priori signal-to-noise ratio. Generally set m _thr1 ＝3,m _thr2 The smoothing coefficient is set to be typically α =20 _lo ＝0.1～0.2，α _me ＝0.5～0.7，α _hi ＝0.96～0.99。

Wherein beta is a constant, is a cepstrum smoothing coefficient alpha _s Is a smoothing coefficient of (a); alpha _c (l, m) is an array of constant smoothing coefficients that sets the values of the cepstrum smoothing coefficients of different scrambling points.

Harmonic enhanced and smoothed signalAfter DFT conversion to logarithmic frequency domain, the frequency domain signal lambda 'is converted after exponential calculation' _s (l, k), i.e. the clean speech signal power spectrum 8 after the enhancement harmonics in fig. 2. The specific process is as follows:

wherein δ (l) is to compensate for the decrease in energy of the frequency domain signal after cepstrum smoothing, alternatively, the parameter may be increased based on the speech presence probability of the frame, to increase the energy of the estimated clean speech power spectrum, and the compensation parameter may be adjusted according to the size of the smoothing coefficient.

According to the single-channel noise reduction method, the noise-containing voice signal is subjected to pre-filtering treatment, the noise power spectrum estimated by the pre-filtering treatment and the prior signal to noise ratio are combined to obtain the power spectrum of the pure voice signal, the enhanced power spectrum of the pure voice signal is obtained through harmonic enhancement, the signal to noise ratio estimation is carried out again to improve the prior signal to noise ratio at the harmonic, the final enhanced voice signal is obtained through calculation and gain filtering according to the newly determined prior signal to noise ratio, and therefore the voice harmonic damaged due to noise suppression can be recovered and enhanced, and the high-quality voice signal is obtained.

The single-channel noise reduction method provided by the embodiment of the application can enhance voice harmonic waves, eliminate music noise and improve subjective hearing, and the comparison effect is shown in fig. 6.

As can be seen from fig. 6, the higher harmonics of the noise-containing voice are basically submerged in the noise, and the higher harmonics are completely damaged after wiener filtering treatment, and more obvious musical noise exists; the HRNR (Harmonic Regeneration Noise Reduction, harmonic reconstruction noise suppression) method recovers higher harmonics through nonlinear processing, but the recovered harmonics cover the whole frequency band with inter-harmonic noise, and music noise can be suppressed to a certain extent but still has some residues; the method provided by the application is used for recovering the higher harmonic waves without generating false harmonic waves, and noise among the harmonic waves and music noise are obviously suppressed.

Correspondingly, the embodiment of the application also provides a single-channel noise reduction device, as shown in fig. 7, which is a schematic structural diagram of the single-channel noise reduction device provided by the embodiment of the application.

The single-channel noise reduction device comprises the following modules:

a receiving module 701, configured to receive a noise-containing speech signal y (n);

a signal-to-noise ratio calculation module 702 for determining a priori signal-to-noise ratio of the noise-containing speech signal

A power spectrum determining module 703 for determining a power spectrum lambda of the clean voice signal of the noise-containing voice signal _s (l,k)；

A harmonic enhancement module 704 for enhancing the spectrum lambda according to the pure voice power _s (l, k) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l,k)；

A signal-to-noise ratio updating module 705 for updating the power spectrum lambda 'according to the enhanced clean voice signal' _s (l, k) determining a signal-to-noise ratio to obtain a signal-to-noise ratio estimated value; according to the prior signal to noise ratioAnd said signal to noise ratio estimate determining a final signal to noise ratio +.>

A gain calculation module 706 for calculating a final signal to noise ratio according to the final signal to noise ratioCalculating gain G _final (l,k)；

A noise reduction processing module 707 for generating a gain G according to the gain _final (l, k) noise reduction processing is performed on the noise-containing voice signal y (n);

an output module 708, configured to output the noise-reduced speech signal s (n).

As shown in fig. 8, in another embodiment of the single channel noise reduction device described in the present application, the method may further include: a signal conversion module 801 and a noise estimation module 802. Wherein:

the first signal conversion module 801 is configured to convert the speech time domain signal Y (n) into a frequency domain signal Y (l, k);

the noise estimation module 802 is configured to perform single-channel noise estimation on the frequency domain signal Y (l, k) to obtain a noise power spectrum λ _n (l,k)。

Accordingly, in this embodiment, the signal-to-noise ratio calculation module 702 calculates the noise power spectrum λ based on the noise power spectrum λ _n (l, k) and the power spectrum |Y (l, k) | of the frequency domain signal Y (l, k) ² Determining a priori signal to noise ratio

Accordingly, in this embodiment, the power spectrum determination module 703 will compare the a priori signal to noise ratioAnd the noise power spectrum lambda _n (l, k) multiplying to obtain a clean speech power spectrum lambda of the noisy speech time domain signal y (n) _s (l,k)。

The output module 708 may include: a second signal conversion module and a frame synthesis module, wherein the second signal conversion module is configured to convert the enhanced frequency domain signal S (l, k) into a time domain signal S (l, n); the frame synthesis module is used for outputting the noise-reduced voice signal s (n) after the time domain signal s (l, n) is subjected to frame synthesis.

Fig. 9 is a schematic structural diagram of a harmonic enhancement module according to an embodiment of the present application.

The harmonic enhancement module 704 includes the following elements:

a cepstral domain signal generation unit 741 for generating a spectrum lambda according to the pure speech power _s (l, k) obtaining a cepstral domain signal

A harmonic enhancement unit 742 for the cepstral domain signalHarmonic enhancement is carried out to obtain the power spectrum lambda 'of the enhanced pure voice signal' _s (l,k)。

Wherein the harmonic enhancement unit 742 may comprise the following subunits:

Other relevant descriptions of the modules and units in the single-channel noise reduction device can be found in the foregoing embodiments of the radio frequency power consumption optimization method of the present application, and are not repeated here.

By utilizing the single-channel noise reduction method and device provided by the embodiment of the application, not only can the voice harmonic wave damaged due to noise suppression be recovered and enhanced, but also the music noise can be effectively removed, and the intelligibility and subjective hearing of the talking voice in the noise environment are improved.

The scheme can be applied to terminal equipment in 2G, 3G, 4G and 5G (5 th Generation Mobile Communication Technology, fifth-generation mobile communication technology) systems.

In a specific implementation, regarding each apparatus and each module/unit included in each product described in the above embodiments, it may be a software module/unit, or a hardware module/unit, or may be a software module/unit partially, or a hardware module/unit partially. For example, for each device or product applied to or integrated on a chip, each module/unit included in the device or product may be implemented in hardware such as a circuit, or at least part of the modules/units may be implemented in software program, where the software program runs on a processor integrated inside the chip, and the rest (if any) of the modules/units may be implemented in hardware such as a circuit; for each device and product applied to or integrated in the chip module, each module/unit contained in the device and product can be realized in a hardware manner such as a circuit, different modules/units can be located in the same component (such as a chip, a circuit module and the like) or different components of the chip module, or at least part of the modules/units can be realized in a software program, the software program runs on a processor integrated in the chip module, and the rest (if any) of the modules/units can be realized in a hardware manner such as a circuit; for each device, product, or application to or integrated with the terminal, each module/unit included in the device, product, or application may be implemented by using hardware such as a circuit, different modules/units may be located in the same component (for example, a chip, a circuit module, or the like) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program, where the software program runs on a processor integrated inside the terminal, and the remaining (if any) part of the modules/units may be implemented by using hardware such as a circuit.

The embodiment of the application also discloses a storage medium, which is a computer readable storage medium, and a computer program is stored on the storage medium, and when the computer program runs, part or all of the steps of the method shown in fig. 1 can be executed. The storage medium may include Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic or optical disks, and the like. The storage medium may also include non-volatile memory (non-volatile) or non-transitory memory (non-transitory) or the like.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In this context, the character "/" indicates that the front and rear associated objects are an "or" relationship.

The term "plurality" as used in the embodiments herein refers to two or more.

The first, second, etc. descriptions in the embodiments of the present application are only used for illustrating and distinguishing the description objects, and no order division is used, nor does it indicate that the number of the devices in the embodiments of the present application is particularly limited, and no limitation on the embodiments of the present application should be construed.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and system may be implemented in other manners. For example, the device embodiments described above are merely illustrative; for example, the division of the units is only one logic function division, and other division modes can be adopted in actual implementation; for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be physically disposed separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the methods described in the embodiments of the present application.

Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be made by one skilled in the art without departing from the spirit and scope of the invention, and the scope of the invention shall be defined by the appended claims.

Claims

1. A method of single channel noise reduction, the method comprising:

receiving a noisy speech signal y (n);

and outputting the noise-reduced voice signal s (n).

2. The single channel noise reduction method of claim 1, further comprising:

3. The single channel noise reduction method according to claim 2, wherein said single channel noise estimation of said frequency domain signal Y (l, k) comprises:

and carrying out single-channel noise estimation on the frequency domain signal Y (l, k) by adopting any one of the following methods: minimum statistics, minimum tracking, and recursive averaging.

4. The single channel noise reduction method of claim 2, wherein the determining a priori signal to noise ratioComprising the following steps:

determining a priori signal-to-noise ratio using decision directed method

5. The single channel noise reduction method according to claim 2, wherein the power spectrum λ is based on the clean speech _s (l, k) performing harmonic enhancement to obtain enhanced pure voice signal power spectrum lambda' _s (l, k) comprises:

6. The single channel noise reduction method of claim 5, wherein the pair of cepstral domain signalsHarmonic enhancement is carried out to obtain the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) comprises:

7. The single channel noise reduction method according to claim 5, wherein the cepstral domain signal is subjected to the cepstral energy corresponding to a pitch frequencyPerforming harmonic enhancement includes:

if the scrambling energy corresponding to the pitch frequency is greater than the set threshold thr, then a range of energy centered on the corresponding notch of the base frequency is increased.

8. The single channel noise reduction method according to claim 7, wherein the cepstral domain signal is subjected to the cepstral energy corresponding to a pitch frequencyPerforming harmonic enhancement further includes:

otherwise, smoothing is carried out on all the frequency-reversing points.

9. The single channel noise reduction method according to claim 2, wherein the step of generating the enhanced clean speech signal power spectrum λ 'is based on the enhanced clean speech signal power spectrum λ' _s (l, k) determining a signal-to-noise ratio, the obtaining a signal-to-noise ratio estimate comprising:

according to the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) and the noise power spectrum lambda _n And (l, k) calculating the signal-to-noise ratio to obtain a signal-to-noise ratio estimated value.

10. The single channel noise reduction method of claim 9, wherein the determining a new a priori signal to noise ratio is based on a priori signal to noise ratio minimum estimation threshold and the signal to noise ratio estimateComprising the following steps:

selecting the maximum value of the signal-to-noise ratio estimated value and the prior signal-to-noise ratio minimum estimated threshold value as a new prior signal-to-noise ratio

11. The single channel noise reduction method according to claim 2, wherein the gain G is based on _final (l, k) noise reduction processing of the noisy speech signal y (n) includes:

the gain G _final (l, k) is multiplied with the frequency domain signal Y (l, k) to obtain an enhanced frequency domain signal S (l, k).

12. The single channel noise reduction method according to claim 11, wherein the outputting the noise reduced speech signal s (n) comprises:

13. A single channel noise reduction device, the device comprising:

A signal-to-noise ratio updating module for updating the power spectrum lambda 'of the enhanced pure voice signal' _s (l, k) determining a signal-to-noise ratio to obtain a signal-to-noise ratio estimated value; determining a new priori signal-to-noise ratio according to the priori signal-to-noise ratio minimum estimation threshold and the signal-to-noise ratio estimation value

14. The single channel noise reduction device of claim 13, further comprising:

The power spectrum determination module uses the prior signal to noise ratioAnd the noise power spectrum lambda _n (l, k) multiplication,obtaining a clean voice power spectrum lambda of the noise-containing voice time domain signal y (n) _s (l,k)。

15. The single channel noise reduction device of claim 14, wherein the harmonic enhancement module comprises:

16. The single channel noise reduction device of claim 15, wherein the harmonic enhancement unit comprises:

17. The single channel noise reduction device of claim 16, wherein the output module comprises:

18. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the single channel noise reduction method of any of claims 1 to 12.