CN106997766B

CN106997766B - Homomorphic filtering speech enhancement method based on broadband noise

Info

Publication number: CN106997766B
Application number: CN201710176799.3A
Authority: CN
Inventors: 马英
Original assignee: Qinghai Nationalities University
Current assignee: Qinghai Nationalities University
Priority date: 2017-03-16
Filing date: 2017-03-16
Publication date: 2020-05-15
Anticipated expiration: 2037-03-16
Also published as: CN106997766A

Abstract

The invention discloses a homomorphic filtering voice enhancement method based on broadband noise, and belongs to the technical field of voice processing methods. The purpose is for solving prior art still produces the error when extracting speech signal characteristic easily to and the speech signal after making an uproar still can have more residual noise's problem, and the method specifically is: obtaining a short-time stable voice signal, then carrying out autocorrelation analysis on the voice signal to obtain an autocorrelation coefficient, then carrying out homomorphic filtering analysis processing on the autocorrelation coefficient, solving the cepstrum of the voice signal with noise during homomorphic filtering analysis processing, removing the noise component of the cepstrum of the voice signal with noise to obtain the cepstrum of the enhanced voice, obtaining the characteristic parameters after noise reduction through spectrum analysis, and synthesizing the voice signal after noise reduction. The improved homomorphic filtering anti-noise method provided by the invention can enhance the robustness of the voice cepstrum characteristics to environmental noise, more accurately obtain the characteristic information of voice, has better stubborn health and can better achieve the purpose of voice enhancement.

Description

Homomorphic filtering speech enhancement method based on broadband noise

Technical Field

The invention particularly relates to a homomorphic filtering voice enhancement method based on broadband noise, and belongs to the technical field of voice processing methods.

Background

In the actual environment, the speech signal processing technology comprises many aspects, while the speech enhancement technology is one of the effective methods for solving the problem of noise pollution of speech, and the research goal is to extract a pure speech signal from a noisy speech signal as much as possible by system output so as to improve the transmission quality; the traditional speech enhancement technology also has a plurality of technologies, including spectral subtraction, center filtering, homomorphic filtering anti-noise, nonlinear processing, adaptive analysis, wavelet analysis, etc., and in the noise processing process, different speech enhancement methods are selected according to different characteristics of speech, perception characteristics of human ears and different noise properties.

In the traditional homomorphic filtering anti-noise method, when homomorphic processing is carried out, fundamental tone peak values in cepstrum become unclear or even disappear under many conditions, and when a voice signal is separated from a noise signal, the problem of phase multivaluence is easily caused, namely, phase winding is easy to generate, so that the error of extracted characteristic parameters is overlarge, the original voice signal is difficult to recover, and the real-time property of a system cannot be met; the traditional autocorrelation analysis method is easy to generate second harmonic during autocorrelation processing, is not easy to directly perform characteristic extraction on autocorrelation coefficients of noisy speech signals so as to achieve the purpose of speech enhancement, requires strong periodicity of the signals, and is not easy to perform denoising analysis because autocorrelation functions are similar to high-frequency waveforms of noise; the traditional center filtering method can cause the damage of voice quality when voice is enhanced, the threshold selection of the center filtering method is very important, the related information of a voice signal is easily lost, and the voice signal can only be analyzed in a frequency domain; in the traditional spectral subtraction method, because the voice energy is more concentrated in a certain frequency band, a large amount of residual noise still exists after voice enhancement processing, and pure-tone noise is still easily generated in a voice signal if the noise of a high-power component cannot be eliminated in the noise elimination process. Aiming at the problems existing in the traditional method of speech enhancement, at present, a plurality of new methods for speech enhancement under different background noise environments are researched, and the speech enhancement algorithm based on anisotropic filtering under white noise is researched [ J ]. the university of jia hous (natural science edition), 2015, 06: 902-; speech enhancement algorithm study with improved wavelet threshold function [ J ] signal processing, 2016, 02: 203-213, which adopts the speech enhancement algorithm for improving the wavelet threshold function to effectively improve the intelligibility and the overall quality of the speech signal; speech enhancement algorithm research [ J ] scientific technology and engineering based on cepstrum preprocessing technology, 2013, 21: 6111-; speech enhancement algorithm study under low signal-to-noise ratio [ J ] chinese new communication, 2015, 15: 73-74, aiming at the problem of music noise and low definition of the spectral subtraction algorithm in the aspect of voice enhancement under the background condition of low signal-to-noise ratio, the voice enhancement algorithm based on the cepstrum distance and the spectral subtraction algorithm is provided.

Short-time autocorrelation analysis is a common method in time-domain analysis of speech signals, defining a speech signal s_n(m) short-time autocorrelation function Zⁿ(k) The calculation expression of (a) is as follows:

where L is the maximum number of delay points.

The self-correlation processing anti-noise method reduces the harmonic component after filtering, the curve becomes smooth, but the second harmonic still exists, the peak value is still not very sharp, and the error is still easy to generate when the voice signal characteristic is extracted.

In the traditional homomorphic filtering and noise-resisting method, the first peak value of the cepstrum of the voice signal is still not obvious, and high-frequency components between the origin and the second peak value are more, which affects the accuracy of observing characteristic values. The noise-reduced speech signal still has more residual noise.

Disclosure of Invention

Therefore, the invention provides a homomorphic filtering speech enhancement method based on broadband noise, aiming at the problems that in the prior art, errors are still easy to generate when speech signal features are extracted, and the noise-reduced speech signals still have more residual noise.

The method specifically comprises the following steps:

obtaining a short-time stable voice signal, then carrying out autocorrelation analysis on the voice signal to obtain an autocorrelation coefficient, then carrying out homomorphic filtering analysis processing on the autocorrelation coefficient, solving the cepstrum of the voice signal with noise during homomorphic filtering analysis processing, removing the noise component of the cepstrum of the voice signal with noise to obtain the cepstrum of the enhanced voice, obtaining the characteristic parameters after noise reduction through spectrum analysis, and synthesizing the voice signal after noise reduction.

Further, in the method, the autocorrelation analysis is performed on the speech signal, and the obtained autocorrelation coefficient specifically includes:

setting a speech signal s_n(m) is [0, N-1]]Then the short-time autocorrelation of the speech signal is:

where L is the maximum number of delay points.

Further, the homomorphic filtering analysis processing of the autocorrelation coefficients in the method specifically comprises:

autocorrelation coefficient FFT transform R_n(e^jw)；

To R_n(e^jw) The real part of (A) is logarithmically calculated to obtain:

c improved by inverse FFT of the result_n(m)：

The invention has the beneficial effects that: the invention provides a homomorphic filtering speech enhancement method based on broadband noise, which is superior to the traditional method in improvement of the traditional method, has smaller average relative error and higher accuracy and is more beneficial to speech recognition and speech synthesis. The improved homomorphic filtering anti-noise method can enhance the robustness of the voice cepstrum characteristics to environmental noise, accurately obtain the characteristic information of voice, has better stubborn performance and can better achieve the purpose of voice enhancement.

Drawings

FIG. 1 is a flowchart illustrating a process of auto-correlation anti-noise method according to an embodiment;

FIG. 2 is a waveform diagram of the original speech signal "hello" in the autocorrelation anti-noise method in the embodiment;

FIG. 3 is a schematic diagram of FIG. 2 illustrating an auto-correlation analysis of a frame of speech signal;

FIG. 4 is a diagram of simulation effect of conventional autocorrelation anti-noise French voice enhancement;

FIG. 5 is a flow chart of a conventional homomorphic filtering anti-noise method;

FIG. 6 is a diagram illustrating the effect of a conventional homomorphic filtering anti-noise method;

FIG. 7 is a diagram illustrating conventional homomorphic filtering feature extraction;

FIG. 8 is a flow chart of an improved homomorphic filtering anti-noise method;

FIG. 9 is a graph of the simulation effect of improved homomorphic filtering anti-noise French voice enhancement;

FIG. 10 is a simulation diagram of feature parameters of an improved homomorphic filtered output extracted speech signal.

Detailed Description

The following description of the embodiments of the present invention is provided with reference to the accompanying drawings:

the invention improves the traditional autocorrelation processing anti-noise method and the traditional homomorphic filtering anti-noise method, and designs the improved homomorphic filtering anti-noise method.

Traditional autocorrelation noise-resisting method

The autocorrelation processing anti-noise method carries out autocorrelation analysis on the voice signal with noise to obtain an autocorrelation sequence which is the same as that of the voice signal without noise, and the autocorrelation of the voice signal is irrelevant to the noise, so the autocorrelation of the voice signal with noise can be approximate to the autocorrelation of a pure voice signal, and the aim of anti-noise can be achieved by taking an autocorrelation coefficient as a characteristic value of a voice processing system.

The processing flow of the autocorrelation anti-noise method is shown in fig. 1.

In a common indoor environment, Cooledit is adopted to record a noisy voice signal of a girl, namely 'hello', the sampling frequency is 22KHZ, and the voice signal is monaural, as shown in figure 2.

A frame of speech signal is truncated for autocorrelation analysis as shown in fig. 3.

As can be seen from fig. 3, an original frame of speech signal with noise has a certain periodicity, but has a large number of harmonic components, and the peak value is not very sharp, which may generate a certain error in the extraction process of the feature parameters.

The simulation effect of applying conventional autocorrelation anti-noise french speech enhancement to the original noisy speech signal is shown in fig. 4.

It can be seen from the simulation of fig. 4 that the harmonic component after filtering is reduced, the curve becomes smooth, but the second harmonic still exists, the peak value is still not very sharp, and an error is still easily generated when the speech signal feature is extracted.

Traditional homomorphic filtering anti-noise method

The speech signal is not an additive signal but a convolutional signal. In order to process the data by using a linear system, a convolution homomorphic system can be adopted for processing.

Assuming that the interval occupied by the processed speech signal s (N) is [0, N-1], the interval length N used here can be chosen to be larger than the actual length; the magnitude of N determines whether aliasing is present in the cepstrum c (N), which represents whether the discrete time domain spectrum has better resolution. When N is greater than the actual length of s (N), a number of zeros may be added behind s (N) to make up the required length, which is called "zero padding". If the speech signal s (n) can restore the original speech signal through homomorphic filtering, it is proved that the speech signal has no aliasing distortion, and the cepstrum c (n) of the speech signal s (n) can be directly solved.

Setting:

then taking the logarithm thereof can be obtained:

the logarithm of the complex number is still a complex number, which contains a real part and an imaginary part; imaginary part of logarithm arg S (e)^jw)]Due to being S (e)^jw) Will produce inconsistencies. If we only consider

And real part of

c(n)＝F^-1ln|S(e^jw) L (formula C)

c (n) is an inverse fourier transform of the log-amplitude spectrum of the speech signal s (n), which can be considered as the "cepstrum".

Then, a flow chart of the conventional homomorphic filtering anti-noise method is shown in fig. 5.

Wherein, A is a short-time voice signal; b is a short-time frequency spectrum; c is a logarithmic spectrum; d is a cepstrum coefficient; e is a logarithmic spectrum envelope; f is the fundamental period.

The simulation effect of speech enhancement is shown in fig. 6 by applying the conventional homomorphic filtering anti-noise method to the original noisy speech signal.

It can be seen from the simulation of fig. 6 that, after the conventional homomorphic filtering and noise-resisting method is adopted, the first cepstrum peak of the speech signal is still not very obvious, and the high-frequency component between the origin and the second peak is more, which affects the accuracy of the observed feature value.

The feature parameters of the speech signal are extracted for the conventional homomorphic filtered output, as shown in fig. 7.

As can be seen from fig. 7, in the conventional homomorphic filtering output, the coordinates of the first peak close to the origin are (12, 0.12), the first peak is relatively smooth, a large error is caused during feature extraction, and the noise-reduced speech signal still has a large amount of residual noise.

Method of the invention

In the present invention, the improved homomorphic filtering anti-noise method is adopted to perform the same preprocessing on the voice signals as the traditional homomorphic filtering anti-noise method, the voice signals must be digitized and preprocessed to obtain the short-time stable voice signals, then the signals are subjected to autocorrelation analysis to obtain autocorrelation coefficients, then the autocorrelation coefficients are subjected to homomorphic filtering analysis, during homomorphic processing, the cepstrum of the voice signals with noise is solved, the noise components of the cepstrum of the voice signals with noise are removed to obtain the cepstrum of the enhanced voice, the characteristic parameters after noise reduction are obtained through spectrum analysis, the voice signals after noise reduction are synthesized to achieve the purpose of voice enhancement, and the specific algorithm flow is shown in fig. 8.

Wherein, A is a short-time voice signal; b is an autocorrelation coefficient; c is an autocorrelation short-time spectrum; d is an autocorrelation logarithmic spectrum; e is the improved cepstral coefficient; f is a logarithmic spectrum envelope; g is the fundamental period.

1) Autocorrelation of a short-term speech signal:

setting a speech signal s_n(m) is [0, N-1]]. The short-time autocorrelation of the speech signal is then:

2) homomorphically processing the autocorrelation coefficients:

a) autocorrelation coefficient FFT transform R_n(e^jw)；

b) To R_n(e^jw) The real part of (A) is logarithmically calculated to obtain:

c) c improved by inverse FFT transformation of the above results_n(m)：

An improved homomorphic filtering anti-noise method is adopted for an original voice signal with noise, and the simulation effect of voice enhancement is shown in fig. 9;

it can be seen from the simulation of fig. 9 that, by using the improved homomorphic filtering anti-noise method, the harmonic component is obviously reduced, and the periodicity becomes clearer, which will improve the accuracy of extracting the characteristic parameters of the noisy speech signal.

The simulation of the feature parameters of the extracted speech signal for the improved homomorphic filtered output is shown in fig. 10.

As can be seen from fig. 10, in the improved homomorphic filtering output, the coordinates of the first peak close to the origin are (11, 0.34), the first peak is relatively sharp and closer to the origin, which improves the accuracy of feature extraction, and can better synthesize the noise-reduced speech signal, thereby achieving the purpose of speech enhancement.

Comparison analysis of the method of the present invention with the conventional homomorphic filtering anti-noise method

To further compare the performance of the conventional homomorphic filtering anti-noise method with the performance of the homomorphic filtering anti-noise method related to the present invention, 20 experimental simulations were performed on the voice recording with noise. The two algorithms are used for comparing and analyzing the signal-to-noise ratio, comparing the voice signal with a large signal-to-noise ratio for 10 times and the voice signal with a small signal-to-noise ratio for 10 times, and calculating the average relative error (percentage).

Experimental results the average relative error of the large snr speech signal is shown in table 1, and the average relative error of the small snr speech signal is shown in table 2.

TABLE 1

Detection algorithm	Traditional homomorphic filtering	Correlation-homomorphic filtering
			Average relative error	0.55	0.46

TABLE 2

Detection algorithm	Traditional homomorphic filtering	Correlation-homomorphic filtering
			Average relative error	20.3	15.9

The comparison and analysis of the signal-to-noise ratio of the voice signal with noise can be used for obtaining the noise-carrying voice signal, the correlation-homomorphic filtering method is superior to the traditional homomorphic filtering method, the average relative error is smaller, the accuracy is higher, and the noise-carrying voice signal is more beneficial to voice recognition and voice synthesis.

The noise may be additive noise or non-additive noise, and the broadband noise caused by breathing during speaking is non-additive noise, and needs to be converted into additive noise by adopting homomorphic filtering, and the pure speech signal and the noise signal are separated to provide the pure speech signal, in the above analysis, it can be seen that,

the improved homomorphic filtering anti-noise method can enhance the robustness of the voice cepstrum characteristics to environmental noise, accurately obtain the characteristic information of voice, has better stubborn performance and can better achieve the purpose of voice enhancement.

The foregoing is a preferred embodiment of the present invention, and it should be noted that it would be apparent to those skilled in the art that various modifications and enhancements can be made without departing from the principles of the invention, and such modifications and enhancements are also considered to be within the scope of the invention.

Claims

1. A homomorphic filtering speech enhancement method based on broadband noise is characterized in that the method specifically comprises the following steps:

obtaining a short-time stable voice signal, then carrying out autocorrelation analysis on the voice signal to obtain an autocorrelation coefficient, carrying out homomorphic filtering analysis processing on the autocorrelation coefficient, solving a cepstrum of the voice signal with noise during homomorphic filtering analysis processing, removing a noise component of the cepstrum of the voice signal with noise to obtain a cepstrum of enhanced voice, obtaining a characteristic parameter after noise reduction through spectrum analysis, and synthesizing the voice signal after noise reduction;

the method comprises the following steps of carrying out autocorrelation analysis on a voice signal to obtain an autocorrelation coefficient:

assuming that the interval occupied by the speech signal sn (m) is [0, N-1], the short-time autocorrelation of the speech signal is:

where L is the maximum number of delay points;

the homomorphic filtering analysis processing of the autocorrelation coefficient specifically comprises the following steps:

the autocorrelation coefficients FFT transform rn (ejw);

the logarithm of the real part of Rn (ejw) is obtained:

inverse FFT transformation of the results yields an improved cn (m):