CN107045874B

CN107045874B - Non-linear voice enhancement method based on correlation

Info

Publication number: CN107045874B
Application number: CN201610079921.0A
Authority: CN
Inventors: 韩翀蛟; 高可攀; 羊开云; 徐晓峰; 李夏宾
Original assignee: GRANDSTREAM NETWORK Inc; SHENZHEN GRANDSTREAM NETWORKS Inc
Current assignee: GRANDSTREAM NETWORK Inc; SHENZHEN GRANDSTREAM NETWORKS Inc
Priority date: 2016-02-05
Filing date: 2016-02-05
Publication date: 2021-03-02
Anticipated expiration: 2036-02-05
Also published as: CN107045874A

Abstract

The invention discloses a non-linear method based on correlationA method of sexual speech enhancement, comprising: step a: noisy speech data for speech preprocessing

And estimating noisy data

Performing fast Fourier transform to obtain frequency spectrum of noisy speech frame

And estimating the spectrum of the noise frame

(ii) a Step b: calculating signal-to-noise ratio and attenuation gain to obtain attenuation gain

，

(ii) a Step c: calculating the correlation between the voice with noise and the noise to obtain the frequency spectrum of the voice frame with noise

And estimating the spectrum of the noise frame

Cross correlation function of

，

(ii) a Step d: calculating nonlinear attenuation gain to obtain the nonlinear attenuation gain

(ii) a Step e: speech enhancement processing by gain-attenuating

And said nonlinear attenuation gain in said step d

Co-acting on the frequency spectrum of the noisy speech frame

To realize the processing of voice enhancement and obtain the pure voice signal frequency spectrum

. The technical scheme provided by the invention can more thoroughly remove the noise component in the voice signal with noise, and can be flexibly applied in compromise in the aspects of removing noise and ensuring voice quality according to different application scenes.

Description

Non-linear voice enhancement method based on correlation

Technical Field

The invention belongs to the technical field of voice communication, and particularly relates to a voice enhancement technology.

Background

In the process of voice communication, the voice sent by the sender can be interfered by noise introduced from the surrounding environment where the sender is located, such as the sound of an air conditioner in an office, the sound of the rotation of fans such as a computer host and the like. The voice received at the receiving end is not the pure voice of the transmitter-end talker any more, but the noisy voice interfered by various noises is introduced, so that the recognition degree of the voice heard by the receiver at the receiving end is reduced. However, in many situations, especially during a teleconference, speech recognition and speech quality need to be better guaranteed, so that speech enhancement is necessary, and incoming speech enhancement techniques are rapidly developed.

One of the existing speech enhancement methods is a method based on a spectral subtraction idea, and the method performs a difference between a noisy speech spectrum and an estimated noise spectrum to obtain an enhanced speech signal spectrum, and has the disadvantages of low algorithm complexity and small calculation amount, but has the defect of serious noise residue in the speech signal after speech enhancement by using spectral subtraction. The second category is speech enhancement technology based on adaptive filtering algorithm, which cannot fundamentally overcome the contradiction between convergence rate and steady-state error, and the algorithm has poor effect in the environment with low signal-to-noise ratio. The third type is a speech enhancement method based on matrix decomposition or model learning, which has a good effect of removing non-stationary sudden noise, but the method involves complex theoretical implementation processes such as matrix decomposition and model training learning, and the calculated amount is much higher than that of the first two types. Based on the above, the present invention discloses a novel speech enhancement technique to overcome the disadvantages of the prior art.

Disclosure of Invention

The invention aims to provide a correlation-based nonlinear speech enhancement method, which solves the problems of unclean noise removal and the like on the premise of ensuring speech quality and can obtain a better speech enhancement effect under the scene of a lower signal-to-noise ratio.

In order to achieve the above object, the technical solution of the present invention is as follows: a nonlinear speech enhancement method based on correlation mainly comprises the following steps: step a: noisy speech data for speech preprocessing

And estimating noisy data

And estimating the spectrum of the noise frame

，

(ii) a Step c: the correlation calculation of the voice with noise and the noise obtains the frequency spectrum of the voice frame with noise

And estimating the spectrum of the noise frame

Cross correlation function of

，

(ii) a Step d: calculating nonlinear attenuation gain to obtain nonlinear attenuation gain

(ii) a Step e: speech enhancement processing to attenuate gain

And the nonlinear attenuation gain in step d

Co-acting on frequency spectrum of noisy speech frame

。

Preferably, step e is followed by step f of inverse fast fourier transforming the spectrum of the speech signal

Performing a known inverse fast fourier transform, converting the signal from the frequency domain back to the time domain:

。

preferably, step b further comprises the steps of: step b 1: calculating the posterior signal-to-noise ratio

，

(ii) a Step b 2: calculating SNR update coefficients

，

Wherein

For the previous frame of noisy speech data, parameters

A proper value can be selected according to a specific application scene; step b 3: calculating a priori signal-to-noise ratio

，

(ii) a Step b 4: calculating a priori signal-to-noise ratio

，

(ii) a Step b 5: calculating optimal attenuation gain by using hyper-geometric distribution correlation calculation formula

(ii) a Step b 6: calculating attenuation gain lower bound

(ii) a Step b 7: calculating to obtain attenuation gain

。

Preferably, the parameters described in step b2

The value range commonly used is [0.05,0.30]]. Parameter(s)

May be taken to be 0.25.

Preferably, step b5 best attenuation gain

Wherein

，

For the purpose of the known gamma function,

，

is based on natural constant

An exponential function of the base is used,

and

are respectively 0 order andbessel function of order 1.

Preferably, the lower attenuation gain limit of step b6

Wherein

Is based on natural constant

An exponential function of the base.

Preferably, the attenuation gain

Wherein

For the weighting coefficients, suitable values can be selected according to the application scenario, and the commonly used value range is [0.60,0.90 ]]。

Preferably, the attenuation gain is non-linear

By passing

And calculating to obtain the result, wherein,

for usual operations on smaller values, i.e.

。

Preferably, the attenuation gain is non-linear

The calculation process of (2) is as follows:

。

the invention provides a nonlinear speech enhancement method based on correlation, which can overcome the defects of the prior art method on the premise of lower calculated amount, can more thoroughly remove noise components in a noisy speech signal by utilizing the technical scheme of the invention, and can flexibly compromise the noise removal and the speech quality assurance according to different application scenes.

Drawings

FIG. 1 is a flow chart of a non-linear speech enhancement method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The basic principle of the invention is as follows: a non-linear speech enhancement method. The method comprises the steps of calculating a signal-to-noise ratio by using frequency domain information of a voice signal with noise and a reference noise signal, and calculating attenuation gain values of all frequency bands by using the signal-to-noise ratio; and then calculating the correlation between the voice signal with noise and the reference noise signal, carrying out nonlinear adjustment on the attenuation gain value according to the correlation, and finally multiplying the adjusted attenuation gain by the voice frequency spectrum with noise to obtain the pure voice without noise interference.

FIG. 1 is a flow chart of a non-linear speech enhancement method according to an embodiment of the present invention. The steps of the method of the present invention are further described below with reference to FIG. 1.

The invention concerns the use of known noisy speech

And known estimated noise

On the premise of carrying out voice enhancement processing on noise

The estimation process of (2) is not described.

Step 1, voice preprocessing:

for voice with noise

And estimating noise

Performing frame division processing to obtain the voice data with noise to be enhanced by windowing and frame division processing

And estimating noisy data

：

Wherein the content of the first and second substances,

for the window function, a Hamming (Hamming) window is used in the present embodiment; the windowing and framing processing is a common and necessary process in digital signal processing, and a digital signal operation processing unit can read and process a limited number of digital signals each time and frames the digital signals according to the number of readable processing each time by using a window function.

Step 2, fast Fourier transform:

to the noisy speech obtained by windowing

And estimating noise

Performing a known fast Fourier transform to obtain a frequency spectrum of the noisy speech frame

And estimating the spectrum of the noise frame

:

Wherein

Known as the fast fourier transform.

Step 3, calculating signal-to-noise ratio and attenuation gain:

in this step, the estimation of the Signal-to-noise ratio and the attenuation gain refers to the classic algorithm proposed by y, Ephraim and d, Malah in "y, Ephraim and d, Malah," Speech enhancement using minimum mean-square error short-time spectral estimation estimator ". IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-32, No. 6, pp. 1109 and 1121, 1984", and the algorithm is improved and simplified, the calculation process is described only briefly, and the detailed information refers to the above-mentioned original text:

1) first calculating the posterior signal-to-noise ratio

：

2) Then calculating the SNR update coefficient

：

Wherein

For the previous frame of noisy speech data, parameters

Suitable values can be selected according to specific application scenes, and the common value range is [0.05,0.30]]In the examples of the present invention

Selecting the content to be 0.25;

3) calculating a priori signal-to-noise ratio

：

In the step, the posterior signal-to-noise ratio calculated in the step 1) is utilized

And 2) the update coefficient calculated in

Weighted summation to obtain estimated prior signal-to-noise ratio

；

4) Using calculation of 3)

Calculating a priori signal-to-noise ratio

：

5) Calculating optimal attenuation gain by using hyper-geometric distribution correlation calculation formula

：

Wherein

，

For the purpose of the known gamma function,

，

is an exponential function with a natural constant as the base,

and

for Bessel correlation, see William J.Lentz, "Bessel functions in Mie calibration using coherent fractions";

6) calculating attenuation gain lower bound

：

Wherein

As in 5), is a natural constant

Bottom exponential function, lower attenuation gain bound

Is a positive value, and is used to determine the optimum attenuation gain

Is limited if

Then, the optimum attenuation gain is described

The value is too small, so that the enhanced speech will have fluctuating "musical noise", and it is necessary to use

To pair

The value of (3) is limited, see the operation process in 7);

7) calculating to obtain attenuation gain

：

Wherein

Is commonly usedTo take a larger value, i.e.

By using

To pair

Is restricted and is combined with

Weighted summation and squaring are carried out to obtain attenuation gain

(ii) a Wherein

For the weighting coefficients, suitable values can be selected according to the application scenario, and the commonly used value range is [0.60,0.90 ]]In the examples of the present invention, selection

Is 0.75.

Step 4, calculating the correlation between the voice with noise and the noise

In the step, the voice signal with noise is calculated firstly

Power spectrum of

And estimating noise

Power spectrum of

In the step involving

The lower corner indicates the real part of the complex quantity,

the lower corner indicates the imaginary part of the complex quantity:

then calculates the voice signal with noise

And estimating noise

Cross power spectrum of

：

Then calculates the voice signal with noise

And estimating noise

Cross correlation function of

：

。

The invention aims to utilize a noisy speech signal

And estimating noise

The correlation of (2) enhances the effect of speech enhancement, in this step the noisy speech power spectrum is used in the frequency domain

Estimating a noise power spectrum

And cross power spectra of the two

Calculating to obtain a noisy speech signal

And estimating noise

Cross correlation function of

. During speech processing, noisy speech signals

And estimating the noise signal

The cross-correlation function can represent the correlation degree of the voice with noise and the estimated noise in different frequency bands, the cross-correlation function value is larger, the correlation between the voice with noise and the estimated noise is stronger, the voice with noise has no voice component or less voice component, and the noise component has higher ratio; the cross-correlation function value is small, which means that the correlation between the noisy speech and the estimated noise is weak, and it means that the noisy speech contains more speech components, so that the noisy speech and the estimated noise show weak correlation.

Step 5 nonlinear attenuation gain calculation

Computing

Mean value of

：

In the above formula

Is less than

The integer of the upper limit value is used for controlling the power of the power supply according to different application scenes,

the values may be chosen differently, such as where the noise is concentrated at low frequencies,

a smaller value may be selected, and where the noise characteristics are unknown,

can select and

the upper limit value is the same size. For example, the sampling rate is 16kHz, the frame length in the windowing preprocessing process is 10ms, the number of data points in one frame is 160, the frame stacking method is adopted to perform fast fourier transform and obtain the cross-correlation function

Then, then

A value in the range of 0, 159, if noise is knownThe sound is intensively distributed at the low frequency band of 0Hz-4kHz, so that the sound can be concentrated and distributed

Value is selected as 79 to obtain

。

Mean value of correlation according to frequency band of interest

Determining whether to apply nonlinear attenuation gain to the current frame, and comparing

Correlation threshold

If, if

In the frequency domain segment of interest, the correlation between the current speech frame data and the estimated noise data is small, the speech occupies the main component, and in order to ensure that the voice quality is not damaged, the nonlinear attenuation gain is not applied, and the nonlinear attenuation gain is applied

Setting the value to be 1.0; if it is not

In the frequency domain segment of interest, the correlation between the current speech frame data and the estimated noise data is large, the noise component is dominant, in order to better achieve the speech component enhancement effect, a nonlinear attenuation gain is applied to further remove the noise, and the nonlinear attenuation gain

By passing

The calculation results in that,wherein

For usual operations on smaller values, i.e.

Use of

Is to ensure

Ensuring nonlinear attenuation gain

The effect of attenuation rather than amplification is played for noisy speech.

In summary,

the calculation process is as follows:

wherein

The appropriate value can be selected according to the specific application scene, and the value can also be considered as a compromise between removing noise interference and ensuring voice tone quality if the value is selected

If a larger value is selected, then according to the above formula,

the probability of being set to 1.0 is increased, the effect of nonlinear attenuation gain is weakened, and noise is left while the voice quality is ensured not to be damaged; if it is

The smaller value is selected to be the value of,

the probability of being set to 1.0 is reduced and the effect of the nonlinear attenuation gain is enhanced, allowing better removal of noise interference, but if it is set to 1.0

If the selected value is too small, the nonlinear attenuation gain is too large, which may damage the voice quality. Thus, it is possible to provide

Appropriate values need to be selected according to specific application scenarios, and the commonly used value range is [0.70, 0.80 ]]In the examples of the present invention

The value is 0.735.

Step 6 speech enhancement processing

The attenuation gain calculated in the step 3

And the nonlinear attenuation gain calculated in the step 5

Acting together on the spectrum of noisy speech

The voice enhancement processing is realized:

spectrum of noisy speech signal

Obtained by calculation using signal-to-noise ratioAttenuation gain of

On the basis of the action, nonlinear attenuation gain processing is further carried out, noise is better removed by utilizing the nonlinear attenuation gain, and purer voice is obtained

。

Step 7 inverse fast fourier transform

For the speech signal frequency spectrum obtained by enhancement processing

to obtain an enhanced time-domain speech signal

Wherein

Known as the inverse fast fourier transform.

The present invention is not limited to the above-described preferred embodiments, but rather, the present invention is intended to cover all modifications, equivalents, and improvements falling within the spirit and scope of the present invention.

Claims

1. A non-linear speech enhancement method based on correlation is characterized in that the method mainly comprises the following steps:

step a: noisy speech data s preprocessed for speech_w(l) And estimated noise data n'_w(l) Performing fast Fourier transform to obtain a frequency spectrum S (k) of a voice frame with noise and a frequency spectrum N' (k) of an estimated noise frame;

step b: calculating signal-to-noise ratio and attenuation gain to obtainThe attenuation gain (k) is,

Gain_opt(k) for optimum attenuation Gain, Gain _ floor (k) is the lower attenuation Gain limit, and MAX (-) is the usual larger operation, i.e.

Wherein λ is a weighting coefficient;

step c: calculating the correlation between the voice with noise and the noise, calculating to obtain the cross-correlation function CohSN (k) of the frequency spectrum S (k) of the voice frame with noise and the frequency spectrum N' (k) of the estimated noise frame,

wherein, spsd (k) is a power spectrum of the speech with noise, npsd (k) is an estimated power spectrum of the noise, snpsd (k) is a cross-power spectrum of a frequency spectrum s (k) of the speech frame with noise and a frequency spectrum N' (k) of the estimated noise frame, r lower corner mark related represents a real part of the complex quantity, i lower corner mark represents an imaginary part of the complex quantity;

step d: calculating nonlinear attenuation gain to obtain the nonlinear attenuation gain Nlpgain (k);

step e: the speech enhancement processing, in which the attenuation gain (k) and the nonlinear attenuation gain Nlpgain (k) in the step d are jointly applied to the spectrum S (k) of the noisy speech frame to realize the speech enhancement processing, so as to obtain a pure speech signal spectrum S_out(k)。

2. The method of claim 1, wherein said step e is further followed by a step f of spectrally separating said speech signal S_out(k) Performing a known inverse fast fourier transform, converting the signal from the frequency domain back to the time domain: s_out(l)＝IFFT(S_out(k))。

3. The method of claim 2, wherein step b further comprises the steps of: step b 1: calculating the A posteriori SNR_post(k)，SNR_post(k) (k)/N' (k) -1.0; step b 2: calculating the update coefficient gamma (k) of the S/N ratio, wherein the gamma (k) is alpha + (1-alpha) x (S)_n-1(k)/(S_n-1(k)+N′(k)))²In which S is_n-1(k) For the previous frame of voice data with noise, the parameter alpha can be selected to be a proper value according to a specific application scene; step b 3: calculating the prior SNR_prior(k)，SNR_prior(k)＝γ(k)·SNR_post(k)+(1-γ(k))·(S_n-1(k) N' (k)); step b 4: calculating the prior SNR_ratio(k)，

Step b 5: calculating optimal attenuation Gain by using hyper-geometric distribution correlation calculation formula_opt(k) (ii) a Step b 6: calculating the lower attenuation Gain limit Gain _ floor (k); step b 7: the attenuation gain (k) is calculated.

4. The method of claim 3, wherein the value of the parameter α in the step b2 is [0.05,0.30 ].

5. The method according to claim 4, characterized in that the parameter α takes 0.25.

6. Method according to claim 4 or 5, characterized in that said optimal attenuation gain in step b5

Where θ (k) is SNR_ratio(k)·(1.0+SNR_post(k) Γ () is a known gamma function,

exp (-) is an exponential function based on a natural constant e, I₀(. and I)₁(. cndot.) are Bessel functions of order 0 and 1, respectively.

7. The method of claim 6, wherein the first and second light sources are selected from the group consisting of,wherein said attenuation gain lower limit in said step b6

Where exp (·) is an exponential function based on a natural constant e.

8. The method of claim 7, wherein the attenuation gain

Wherein λ is weighting coefficient, and the value range is [0.60,0.90 ]]。

9. The method of claim 8, wherein the nonlinear damping gain NlpGaln (k) is calculated by MIN (1.0,1.35-CohSN (k)), where MIN () is a commonly used lower-value operation, i.e.

10. The method of claim 9 wherein the nonlinear damping gain nlpgain (k) is calculated by:

AvgCoh is the mean value of the correlation of the frequency bands, and ThreCoh is the correlation threshold.