CN105869649B - Perceptual filtering method and perceptual filter - Google Patents
Perceptual filtering method and perceptual filter Download PDFInfo
- Publication number
- CN105869649B CN105869649B CN201510031872.9A CN201510031872A CN105869649B CN 105869649 B CN105869649 B CN 105869649B CN 201510031872 A CN201510031872 A CN 201510031872A CN 105869649 B CN105869649 B CN 105869649B
- Authority
- CN
- China
- Prior art keywords
- frequency domain
- noise
- voice
- gain
- power
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Telephone Function (AREA)
Abstract
The invention provides a perception filtering method, which comprises the following steps: acquiring a voice with noise, and calculating the voice with noise according to a noise estimation algorithm to obtain noise power; calculating the voice with noise according to a masking model to obtain a frequency domain masking threshold; converting the voice with noise into a frequency domain to obtain the voice with noise in the frequency domain, including pure voice in the frequency domain and background noise in the frequency domain; based on a voice estimation error algorithm, expressing voice signal distortion as a relational expression about frequency domain pure voice and perceptual filter gain, and expressing filtering background noise as a relational expression about frequency domain background noise and perceptual filter gain; constructing an equation about the gain of the perceptual filter based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is less than or equal to the masking threshold of the frequency domain; solving an equation to obtain the gain of the perception filter; and according to the gain of the perception filter, the noise-containing voice is filtered to obtain enhanced voice, so that the subjective perception quality of the enhanced voice is improved. A perceptual filter is also provided.
Description
Technical Field
The present invention relates to the field of speech signal processing, and in particular, to a perceptual filtering method and a perceptual filter.
Background
In real life, voice signals are inevitably polluted by background noise, and voice enhancement is an efficient way for solving noise pollution as a signal processing method, so that the method is always a research hotspot in the field of voice signal processing. The purpose of speech enhancement is to remove background noise as much as possible and improve the subjective auditory effect of speech on the premise of ensuring speech intelligibility.
Conventional speech enhancement algorithms include spectral subtraction, wiener filtering, minimum mean square error estimation, log-spectral amplitude minimum mean square error, Discrete Cosine Transform (DCT) -based enhancement methods, and the like. Most of the methods are based on statistical models of voice and noise components in a frequency domain, and various estimation theories are combined to design a targeted noise elimination technology. However, the traditional speech enhancement algorithm still has a lot of speech distortion and residual noise in the enhanced speech signal due to the deviation of the assumed model from the actual situation, which affects the speech enhancement effect.
Disclosure of Invention
In view of the above, it is desirable to provide a perceptual filtering method and a perceptual filter that reduce the noise level below the auditory masking threshold of the human ear to improve the subjective perceptual quality of enhanced speech.
A method of perceptual filtering, the method comprising:
acquiring a voice with noise, and calculating the voice with noise according to a noise estimation algorithm to obtain noise power;
calculating the voice with noise according to a masking model to obtain a frequency domain masking threshold;
converting the voice with noise into a frequency domain to obtain the voice with noise of the frequency domain, wherein the voice with noise of the frequency domain comprises pure voice of the frequency domain and background noise of the frequency domain;
based on a speech estimation error algorithm, representing the speech signal distortion as a relational expression about the frequency domain pure speech and the gain of a perception filter, and representing the filtering background noise as a relational expression about the frequency domain background noise and the gain of the perception filter;
according to the voice signal distortion and the filtering background noise, constructing an equation related to the gain of the perception filter based on the relation that the sum of the voice signal distortion power and the filtering background noise power is less than or equal to the frequency domain masking threshold;
solving the equation to obtain the gain of the perceptual filter;
and according to the gain of the perception filter, carrying out filtering processing on the voice with noise to obtain enhanced voice.
In one embodiment, the step of constructing an equation regarding the perceptual filter gain based on the relationship that the sum of the speech signal distortion power and the filtered background noise power is less than or equal to the frequency domain masking threshold according to the speech signal distortion and the filtered background noise is as follows:
obtaining the distortion power of the voice signal according to the distortion of the voice signal;
obtaining the filtering background noise power according to the filtering background noise;
obtaining the equation (G (k) -1) based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is equal to the masking threshold of the frequency domain2Ps(k)+(G(k))2Pz(k) T (k), where g (k) is the perceptual filter gain, k is the spectrum number, PS(k) For the frequency domain pure speech power, PZ(k) For frequency domain background noise power, t (k) is the frequency domain masking threshold.
In one embodiment, the step of solving the equation to obtain the perceptual filter gain comprises:
calculating to obtain the frequency domain background noise power by adopting an approximate algorithm according to the noise power;
calculating to obtain a posterior signal-to-noise ratio according to the frequency domain background noise power;
calculating to obtain a prior signal-to-noise ratio based on a direct decision algorithm according to the posterior signal-to-noise ratio;
and solving the equation according to the frequency domain background noise power, the prior signal-to-noise ratio and the frequency domain masking threshold to obtain the gain of the perception filter.
In one embodiment, before the step of converting the noisy speech into the frequency domain to obtain a frequency-domain noisy speech, where the frequency-domain noisy speech includes frequency-domain clean speech and frequency-domain background noise, the method further includes:
the method comprises the following steps of enhancing the voice with noise by adopting a short-time amplitude spectrum estimation method to obtain enhanced voice with noise, converting the voice with noise into a frequency domain to convert the enhanced voice with noise into the frequency domain, filtering the voice with noise to obtain enhanced voice, filtering the enhanced voice with noise to obtain enhanced voice, and calculating the background noise power of the frequency domain by adopting an approximate algorithm according to the noise power:
obtaining the frequency domain gain function G based on the short-time amplitude spectrum estimation methodH(k) Wherein k is a frequency spectrum serial number;
according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the frequency domain background noise power PZ(k) Wherein λ isd(k) Y (k) is the noise power, and Y (k) is the frequency domain noisy speech.
In one embodiment, the step of calculating the prior snr based on a direct decision algorithm according to the a posteriori snr comprises:
obtaining posterior signal-to-noise ratios of a current frame and a previous frame, wherein the posterior signal-to-noise ratios are respectively gamma '(k, l) and gamma' (k, l-1), k is a frequency spectrum serial number, l is a frame serial number, and the current frame is l frames;
acquiring a previous frame perceptual filter gain G (k, l-1), wherein if the previous frame is a first frame, the previous frame perceptual filter gain is a preset value;
obtaining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio and the gain of the perception filter by a formulaWherein η is a smoothing factor and 0 < η < 1.
A perceptual filter, the perceptual filter comprising:
the acquisition module is used for acquiring the voice with noise;
the noise power calculation module is used for calculating the voice with noise according to a noise estimation algorithm to obtain noise power;
the masking threshold calculation module is used for calculating the voice with the noise according to a masking model to obtain a frequency domain masking threshold;
the frequency domain conversion module is used for converting the voice with noise into a frequency domain to obtain the voice with noise in the frequency domain, wherein the voice with noise in the frequency domain comprises pure voice in the frequency domain and background noise in the frequency domain;
an equation constructing module, configured to represent, based on a speech estimation error algorithm, speech signal distortion as a relational expression about the frequency-domain clean speech and the gain of the perceptual filter, represent filtering background noise as a relational expression about the frequency-domain background noise and the gain of the perceptual filter, and construct, according to the speech signal distortion and the filtering background noise, an equation about the gain of the perceptual filter based on a relational expression that a sum of speech signal distortion power and filtering background noise power is less than or equal to a frequency-domain masking threshold;
the gain solving module is used for solving the equation to obtain the gain of the perception filter;
and the filtering processing module is used for filtering the voice with noise according to the gain of the perception filter to obtain enhanced voice.
In one embodiment, the equation constructing module constructs an equation about the perceptual filter gain according to the speech signal distortion and the filtering background noise, based on a relationship that a sum of the speech signal distortion power and the filtering background noise power is less than or equal to a frequency domain masking threshold, specifically:
obtaining the distortion power of the voice signal according to the distortion of the voice signal;
obtaining the filtering background noise power according to the filtering background noise;
obtaining the equation (G (k) -1) based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is equal to the masking threshold of the frequency domain2Ps(k)+(G(k))2Pz(k) T (k), where g (k) is the perceptual filter gain, k is the spectrum number, PS(k) For the frequency domain pure speech power, PZ(k) For frequency domain background noise power, t (k) is the frequency domain masking threshold.
In one embodiment, the gain solving module comprises:
the solving preparation unit is used for calculating to obtain the frequency domain background noise power by adopting an approximation algorithm according to the noise power, calculating to obtain a posterior signal-to-noise ratio according to the frequency domain background noise power, and calculating to obtain a prior signal-to-noise ratio based on a direct decision algorithm according to the posterior signal-to-noise ratio;
and the solving unit is used for solving the equation according to the frequency domain background noise power, the prior signal-to-noise ratio and the frequency domain masking threshold value to obtain the gain of the perceptual filter.
In one embodiment, the perceptual filter further comprises:
the enhancement module is used for enhancing the voice with noise by adopting a short-time amplitude spectrum estimation method to obtain the enhanced voice with noise;
the frequency domain conversion module converts the noisy speech into a frequency domain to convert the enhanced noisy speech into the frequency domain;
the filtering processing module is used for filtering the voice with noise to obtain enhanced voice, and filtering the enhanced voice with noise to obtain enhanced voice;
the calculation preparation unit calculates the frequency domain background noise power by adopting an approximation algorithm according to the noise power, and specifically comprises the following steps:
obtaining the frequency domain gain function G based on the short-time amplitude spectrum estimation methodH(k) Wherein k is a frequency spectrum serial number;
according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the frequency domain background noise power PZ(k) Wherein λ isd(k) Y (k) is the noise power, and Y (k) is the frequency domain noisy speech.
In one embodiment, the prior snr obtained by the solution preparation unit based on the direct decision algorithm according to the a posteriori snr is specifically:
obtaining posterior signal-to-noise ratios of a current frame and a previous frame, wherein the posterior signal-to-noise ratios are respectively gamma '(k, l) and gamma' (k, l-1), k is a frequency spectrum serial number, l is a frame serial number, and the current frame is l frames;
acquiring a previous frame perceptual filter gain G (k, l-1), wherein if the previous frame is a first frame, the previous frame perceptual filter gain is a preset value;
obtaining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio and the gain of the perception filter by a formulaWherein η is a smoothing factor and 0 < η < 1.
According to the perception filtering method and the perception filter, the noise power is obtained by obtaining the voice with noise, the voice with noise is calculated according to the noise estimation algorithm, the frequency domain masking threshold value is obtained by calculating the voice with noise according to the masking model, the voice with noise is converted into the frequency domain, and the voice with noise in the frequency domain is obtained and comprises the pure voice in the frequency domain and the background noise in the frequency domain; based on a speech estimation error algorithm, representing the speech signal distortion as a relational expression about the frequency domain pure speech and the gain of a perception filter, and representing the filtering background noise as a relational expression about the frequency domain background noise and the gain of the perception filter; according to the distortion of the voice signal and the filtering background noise, constructing an equation related to the gain of the perception filter based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is smaller than or equal to the masking threshold of the frequency domain; solving an equation to obtain the gain of the perception filter; and according to the gain of the perception filter, filtering the voice with noise to obtain enhanced voice. Because the sum of the distortion power of the voice signal and the power of the filtering background noise is less than or equal to the frequency domain masking threshold, the voice signal distortion power is ensured to be small, and simultaneously, the noise level is ensured to be less than the auditory masking threshold of human ears and not to be heard by the human ears, thereby improving the subjective perception quality of the enhanced voice.
Drawings
FIG. 1 is a flow diagram of a method of perceptual filtering in one embodiment;
FIG. 2 is a flow diagram for constructing equations relating perceptual filter gains in one embodiment;
FIG. 3 is a flow diagram of solving equations to obtain perceptual filter gains in one embodiment;
FIG. 4 is a block diagram of the architecture of the speech enhancement system in one embodiment;
FIG. 5 is a block diagram of the structure of a perceptual filter in one embodiment;
FIG. 6 is a block diagram of a gain solver module in one embodiment;
fig. 7 is a block diagram of the structure of a perceptual filter in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, there is provided a perceptual filtering method comprising the steps of:
step S110, obtaining the voice with noise, and calculating the voice with noise according to a noise estimation algorithm to obtain noise power.
In this embodiment, the obtained noisy speech is represented in the time domain as y (n) ═ s (n) + z (n), where s (n) is a clean speech signal, and z (n) is the background noise in the original noisy speech. The noise estimation algorithm may adopt an existing algorithm, and the noise power λ of the frequency domain is calculated according to the noise estimation algorithm from the noisy speech y (n)(s) (n) + z (n)d(k) And k is a frequency spectrum serial number.
And step S120, calculating the voice with noise according to the masking model to obtain a frequency domain masking threshold.
In this embodiment, the masking model may be an existing masking model, such as a psychoacoustic model, and the frequency domain masking threshold t (k) of the frequency domain noisy speech y (k) is calculated according to the masking model.
Step S130, converting the speech with noise into a frequency domain to obtain a frequency domain speech with noise, including a frequency domain pure speech and a frequency domain background noise.
In this embodiment, the noisy speech y (n) ═ s (n) + z (n) is transformed into the frequency domain through FFT, and the frequency domain noisy speech y (k) is obtained, which is denoted as y (k) ═ s (k) + z (k), where s (k) is the frequency domain pure speech, z (k) is the frequency domain background noise, and k is the frequency spectrum sequence number. It will be appreciated that the noisy speech may be a noisy speech processed by a speech enhancement algorithm, such as a speech enhancement method based on short-time spectral amplitude estimation, where z (n) is the residual noise in the speech processed by the speech enhancement method based on short-time spectral amplitude estimation.
Step S140, based on the speech estimation error algorithm, representing the speech signal distortion as a relational expression about the frequency domain pure speech and the gain of the perception filter, and representing the filtering background noise as a relational expression about the frequency domain background noise and the gain of the perception filter.
In this embodiment, the frequency domain enhanced speech after being denoised by the perceptual filterEstimating error from speechObtaining e (k) ═ s (k) -g (k) y (k), where e (k) is the speech estimation error, s (k) is the frequency domain pure speech, g (k) is the perceptual filter gain, and y (k) is the frequency domain noisy speech. According to y (k) ═ s (k) + z (k), e (k) ═ s (k) — g (k) (s (k) + z (k)) is obtained, where z (k) is frequency domain background noise. Converting the speech estimation error E (k) into E (k) ═ 1-G (k) — S (k) — G (k) Z (K) to obtain the speech signal distortion epsilonS(k) (1-g (k)) s (k)), filtering background noise ∈Z(k)=|-G(k)Z(k)|=|G(k)Z(k)|。
And step S150, constructing an equation related to the gain of the perception filter according to the distortion of the voice signal and the filtering background noise and based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is less than or equal to the masking threshold of the frequency domain.
In this embodiment, the power of the distortion of the speech signal isFiltered background noise power ofWhere E {. denotes expectation and T denotes the transpose of the matrix. In combination with the human ear masking effect, the optimal gain function g (k) should make the voice distortion as small as possible, and at the same time, make the background noise below the human ear masking threshold, and if the voice distortion is too large, the distortion obviously occurs, and the subjective perception quality will be affected, therefore, this embodiment requires the voice signal distortion power ES(k) Filtered background noise power Ez(k) The sum is less than or equal to the frequency domain masking threshold T (k), i.e. ES(k)+EZ(k) T (k) is less than or equal to T (k). Can meet the requirement of ES(k)+EZ(k) Customized E under the condition of ≦ T (k)S(k)+EZ(k) The relationship between T (k) and T (k) constructs an equation for G (k), e.g. ES(k)+EZ(k)=T(k)/2。
In one embodiment, as shown in fig. 2, the equation for g (k) is constructed according to the sum of the distortion power of the speech signal and the power of the filtered background noise being equal to the frequency-domain masking threshold, and the step S150 includes the following steps:
and step S151, obtaining the distortion power of the voice signal according to the distortion of the voice signal.
In particular, the speech signal is distorted byS(k) Substitution of (1-g (k)) s (k)Obtaining the distortion power E of the speech signals(k)=(G(k)-1)2Ps(k) In which P isS(k)=E{ST(k) S (k) } is the frequency domain pure speech power.
And S152, obtaining the filtering background noise power according to the filtering background noise.
In particular, the background noise ε is filteredZ(k) Substitution of | g (k) z (k) |Obtaining the power E of the filtered background noisez(k)=(G(k))2Pz(k) In which P isZ(k)=E{ZT(k) And z (k) } is the frequency domain background noise power.
Step S153, based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is equal to the masking threshold of the frequency domain, obtaining the equation (G (k) -1)2PS(k)+(G(k))2PZ(k) T (k), where g (k) is the perceptual filter gain, k is the spectrum number, PS(k) For the frequency domain pure speech power, PZ(k) For frequency domain background noise power, t (k) is the frequency domain masking threshold.
In particular, the speech signal is distorted by the power ES(k)=(G(k)-1)2PS(k) Filtered background noise power Ez(k)=(G(k))2Pz(k) Substitution into ES(k)+EZ(k) T (k) to yield (g (k) -1)2PS(k)+(G(k))2PZ(k) T (k), where g (k) is the perceptual filter gain, k is the spectrum number, PS(k) Is frequency ofDomain pure speech power, PZ(k) For frequency domain background noise power, t (k) is the frequency domain masking threshold.
And S150, solving an equation to obtain the gain of the perception filter.
In this embodiment, (G (k) -1)2PS(k)+(G(k))2PZ(k) T (k) is a quadratic, unary equation for g (k), which may be calculated by first calculating PS(k)、PZ(k) And then solving a quadratic equation of one unit to obtain the root of the equation. The equations can also be solved by transforming the equations. Since there are cases where the one-dimensional quadratic equation is not solved, G (k) can be customized in this case.
In one embodiment, as shown in fig. 3, step S160 includes the following steps:
and step S161, calculating to obtain frequency domain background noise power by adopting an approximation algorithm according to the noise power.
In particular, the noise power λd(k) Approximately equal to the frequency domain background noise power PZ(k) From PZ(k)=λd(k) Obtaining the background noise power P of the frequency domainZ(k) In that respect It will be appreciated that if the acquired noisy speech is processed through a speech enhancement algorithm, the approximation algorithm may be different, with a custom approximation algorithm.
And step S162, calculating according to the frequency domain background noise power to obtain a posterior signal-to-noise ratio, and calculating according to the posterior signal-to-noise ratio based on a direct decision algorithm to obtain a prior signal-to-noise ratio.
In the present embodiment, the posterior signal-to-noise ratio γ' (k) is defined asWhere Y (k) is noisy speech, | Y (k) | is spectral amplitude of noisy speech, PZ(k) The direct decision algorithm can use the existing algorithm to calculate the prior signal-to-noise ratio ξ' (k).
And step S163, solving an equation according to the frequency domain background noise power, the prior signal-to-noise ratio and the frequency domain masking threshold to obtain the gain of the perception filter.
In this exampleIf according to ES(k)+EZ(k) T (k) the equation is constructed as (g (k) -1)2PS(k)+(G(k))2PZ(k) T (k). Taking the solution of this equation as an example, divide this equation by P simultaneouslyZ(k) Conversion to equation (G (k) -1)2ξ′(k)+(G(k))2C (k), where ξ' (k) is the a priori signal-to-noise ratio, PZ(k) The equation is a linear equation of two about G (k), wherein ξ' (k), C (k) are known, and the equation is obtained by solving the root equation according to a quadratic equation of one elementIs satisfied under the condition thatIf it is notOr C (k) is not solved when the equation is more than or equal to 1. If it is notDefinition ofIf C (k) is more than or equal to 1, according to the frequency domain background noise power PZ(k) Under the frequency domain masking threshold t (k), at this time, the noise level is smaller than the auditory masking threshold of human ears, and at this time, the filtering processing is not needed to be performed on the frequency domain noisy speech y (k), and a better auditory subjective effect can be achieved, and g (k) ═ 1 is defined. Thus, combining the above analysis, the perceptual filter g (k) is:
it is understood that the present embodiment is only to solve the equation (G (k) -1)2PS(k)+(G(k))2PZ(k) Given as an example t (k), the equation may be according to ES(k)+EZ(k) Other equations of construction ≦ T (k).
And S170, filtering the voice with noise according to the gain of the perception filter to obtain enhanced voice.
In this embodiment, according to the gain G (k) of the perceptual filter, the frequency domain is used to generate the noisy speechObtaining enhanced frequency domain speechThen converting it to time domain to obtain enhanced speechOr firstly converting the gain G (k) of the perception filter into the time domain to obtain g (n), and then obtaining the gain G (n) by the gain G (k) of the perception filterEnhanced speechWhere y (n) is time domain noisy speech, which represents a convolution.
In the embodiment, a noisy speech is obtained, the noisy speech is calculated according to a noise estimation algorithm to obtain a noise power, the noisy speech is calculated according to a masking model to obtain a frequency domain masking threshold, the noisy speech is converted into a frequency domain to obtain a frequency domain noisy speech, and the frequency domain noisy speech comprises a frequency domain pure speech and a frequency domain background noise; based on a speech estimation error algorithm, representing the speech signal distortion as a relational expression about the frequency domain pure speech and the gain of a perception filter, and representing the filtering background noise as a relational expression about the frequency domain background noise and the gain of the perception filter; according to the distortion of the voice signal and the filtering background noise, constructing an equation related to the gain of the perception filter based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is smaller than or equal to the masking threshold of the frequency domain; solving an equation to obtain the gain of the perception filter; and according to the gain of the perception filter, filtering the voice with noise to obtain enhanced voice. Because the sum of the distortion power of the voice signal and the power of the filtering background noise is less than or equal to the frequency domain masking threshold, the voice signal distortion power is ensured to be small, and simultaneously, the noise level is ensured to be less than the auditory masking threshold of human ears and not to be heard by the human ears, thereby improving the subjective perception quality of the enhanced voice.
In one embodiment, before step S130, the method further includes: the method includes the steps of enhancing a noisy speech by a short-time-based amplitude spectrum estimation method to obtain an enhanced noisy speech, converting the noisy speech to a frequency domain in step S130 to convert the enhanced noisy speech to the frequency domain, filtering the noisy speech in step S170 to obtain an enhanced speech, and filtering the enhanced noisy speech to obtain the enhanced speech, wherein step S161 includes:
obtaining a frequency domain gain function G based on a short-time amplitude spectrum estimation methodH(k) Wherein k is a frequency spectrum serial number; according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the background noise power P of the frequency domainZ(k) Wherein λ isd(k) Y (k) is the noise power, and Y (k) is the frequency domain noisy speech.
In this embodiment, because residual noise still exists in the speech enhanced by the short-time amplitude spectrum estimation method, the effect of enhancing the speech can be further improved by the perceptual filtering method of this embodiment. At this time according to the noise power lambdad(k) Calculating to obtain frequency domain background noise power P by adopting approximation algorithmZ(k) Firstly, a frequency domain gain function G based on a short-time amplitude spectrum estimation method is obtainedH(k) Then according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the background noise power P of the frequency domain by approximate estimationZ(k) Where y (k) is noisy speech and y (k) is the spectral magnitude of the noisy speech.
In one embodiment, the step of calculating the prior snr based on the direct decision algorithm based on the a posteriori snr comprises: obtaining posterior signal-to-noise ratios of a current frame and a previous frame, wherein the posterior signal-to-noise ratios are respectively gamma '(k, l) and gamma' (k, l-1), k is a frequency spectrum serial number, l is a frame serial number, and the current frame is l frames; obtaining a previous frame perceptual filter gain G (k, l-1), if the previous frame is a first frame, the previous frame perceptual filter gain G (k, l-1)The gain is a preset value; obtaining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio and the gain of the perception filter by a formulaWherein η is a smoothing factor and 0 < η < 1.
In this embodiment, the first frame perceptual filter gain G (k,1) is defined as a preset value, preferably 1, and the a posteriori snr of the second frame and the first frame is obtained as γ '(k, 2), γ' (k,1) according to the predetermined valueη is a smoothing factor, which can be any number between 0 and 1, preferably η ═ 0.92. after the prior snr of the second frame is obtained, the perceptual filter gain G (k,2) of the second frame can be obtained according to the equation solved in the subsequent steps.
The above embodiment can be applied to the speech enhancement system shown in fig. 4, wherein the noisy speech is input, the frequency domain masking threshold t (k) is obtained through the masking threshold estimation 164, and the noise power λ is obtained through the noise estimation 166d(k) T (k), λd(k) And inputting the perception enhancement filter 165, constructing and solving an equation to obtain the gain of the perception filter, and performing filtering processing to obtain the enhanced voice.
In one embodiment, as shown in fig. 5, there is provided a perceptual filter comprising:
the obtaining module 210 is configured to obtain a noisy speech.
And the noise power calculation module 220 is configured to calculate the noise power of the voice with noise according to a noise estimation algorithm.
And the masking threshold calculation module 230 is configured to calculate a frequency domain masking threshold for the voice with noise according to the masking model.
The frequency domain converting module 240 is configured to convert the voice with noise into a frequency domain to obtain a frequency domain voice with noise, where the frequency domain voice with noise includes a frequency domain pure voice and a frequency domain background noise.
And the equation constructing module 250 is used for expressing the distortion of the voice signal as a relational expression about the pure voice of the frequency domain and the gain of the perception filter based on a voice estimation error algorithm, expressing the filtering background noise as a relational expression about the background noise of the frequency domain and the gain of the perception filter, and constructing an equation about the gain of the perception filter based on a relational expression that the sum of the distortion power of the voice signal and the power of the filtering background noise is less than or equal to a masking threshold of the frequency domain according to the distortion of the voice signal and the filtering background noise.
The gain solving module 260 is used for solving an equation to obtain the gain of the perception filter;
and a filtering processing module 270, configured to perform filtering processing on the noisy speech according to the gain of the perceptual filter to obtain an enhanced speech.
In one embodiment, the equation constructing module 250 constructs an equation for the perceptual filter gain according to the speech signal distortion and the filtering background noise, based on the relationship that the sum of the speech signal distortion power and the filtering background noise power is less than or equal to the frequency domain masking threshold, specifically: obtaining the distortion power of the voice signal according to the distortion of the voice signal, obtaining the power of the filtering background noise according to the filtering background noise, and obtaining an equation (G (k) -1) based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is equal to the masking threshold of a frequency domain2Ps(k)+(G(k))2Pz(k)=T(k),Wherein G (k) is the perceptual filter gain, k is the frequency spectrum number, PS(k) For the frequency domain pure speech power, PZ(k) For frequency domain background noise power, t (k) is the frequency domain masking threshold.
In one embodiment, as shown in FIG. 6, the gain solving module 260 includes:
the solution preparation unit 261 is configured to calculate a frequency domain background noise power by using an approximation algorithm according to the noise power, calculate a posterior signal-to-noise ratio according to the frequency domain background noise power, and calculate a prior signal-to-noise ratio based on a direct decision algorithm according to the posterior signal-to-noise ratio.
And the solving unit 262 is configured to solve the equation according to the frequency domain background noise power, the prior signal-to-noise ratio, and the frequency domain masking threshold to obtain the gain of the perceptual filter.
In one embodiment, as shown in fig. 7, on the basis of the above embodiment, the perceptual filter further includes:
and the enhancing module 280 is configured to enhance the noisy speech by using a short-time amplitude spectrum estimation method to obtain an enhanced noisy speech.
The frequency domain converting module 240 converts the noisy speech into a frequency domain to convert the enhanced noisy speech into a frequency domain, and the filtering processing module 270 performs filtering processing on the noisy speech to obtain an enhanced speech, or performs filtering processing on the enhanced noisy speech to obtain an enhanced speech.
The solution preparation unit 261 calculates the frequency domain background noise power by using an approximation algorithm according to the noise power, specifically: obtaining a frequency domain gain function G based on a short-time amplitude spectrum estimation methodH(k) Where k is the spectral number, according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the background noise power P of the frequency domainZ(k) Wherein λ isd(k) Y (k) is the noise power, and Y (k) is the frequency domain noisy speech.
In one embodiment, the solution preparation unit 261 calculates the prior snr based on a direct decision algorithm according to the posterior snr specifically as follows: acquiring posterior signal-to-noise ratios of a current frame and a previous frame, wherein the posterior signal-to-noise ratios are respectively gamma '(k, l) and gamma' (k, l-1), wherein k is a frequency spectrum serial number, l is a frame serial number, the current frame is a frame, and acquiring a previous frame sensing filter gain G (k, l-1), and if the previous frame is a first frame, the previous frame sensing filter gain is a preset value; obtaining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio and the gain of the perception filter by a formulaWherein η is a smoothing factor and 0 < η < 1.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (6)
1. A method of perceptual filtering, the method comprising:
acquiring a voice with noise, and calculating the voice with noise according to a noise estimation algorithm to obtain noise power;
calculating the voice with noise according to a masking model to obtain a frequency domain masking threshold;
converting the voice with noise into a frequency domain to obtain the voice with noise of the frequency domain, wherein the voice with noise of the frequency domain comprises pure voice of the frequency domain and background noise of the frequency domain;
based on a speech estimation error algorithm, representing the speech signal distortion as a relational expression about the frequency domain pure speech and the gain of a perception filter, and representing the filtering background noise as a relational expression about the frequency domain background noise and the gain of the perception filter, wherein the filtering background noise is epsilonZ(k) (k) z (k) is | g (k), g (k) is perceptual filter gain, and z (k) is frequency domain background noise;
obtaining the distortion power of the voice signal according to the distortion of the voice signal;
obtaining the filtering background noise power according to the filtering background noise;
based on the relation that the sum of the distortion power of the speech signal and the power of the filtering background noise is equal to the masking threshold of the frequency domain, the equation (G (k) -1) is obtained2Ps(k)+(G(k))2Pz(k) T (k), where g (k) is the perceptual filter gain, k is the spectrum number, PS(k) For the frequency domain pure speech power, PZ(k) Is the frequency domain background noise power, T (k) is the frequency domain masking threshold;
according to the voice signal distortion and the filtering background noise, constructing an equation related to the gain of the perception filter based on the relation that the sum of the voice signal distortion power and the filtering background noise power is less than or equal to the frequency domain masking threshold;
obtaining posterior signal-to-noise ratios of a current frame and a previous frame, wherein the posterior signal-to-noise ratios are respectively gamma '(k, l) and gamma' (k, l-1), k is a frequency spectrum serial number, and l is a frequency spectrum serial numberFrame number, the current frame is l frames; acquiring a previous frame perceptual filter gain G (k, l-1), wherein if the previous frame is a first frame, the previous frame perceptual filter gain is a preset value; obtaining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio and the gain of the perception filter by a formulaWherein η is a smoothing factor and 0 < η < 1;
solving the equation (G (k) -1) according to the frequency domain background noise power, the prior signal-to-noise ratio and the frequency domain masking threshold2Ps(k)+(G(k))2Pz(k) Obtaining the perceptual filter gain t (k);
and according to the gain of the perception filter, carrying out filtering processing on the voice with noise to obtain enhanced voice.
2. The method of claim 1, wherein the step of solving the equation to obtain the perceptual filter gain comprises:
calculating to obtain the frequency domain background noise power by adopting an approximate algorithm according to the noise power;
calculating to obtain a posterior signal-to-noise ratio according to the frequency domain background noise power;
calculating to obtain a prior signal-to-noise ratio based on a direct decision algorithm according to the posterior signal-to-noise ratio;
and solving the equation according to the frequency domain background noise power, the prior signal-to-noise ratio and the frequency domain masking threshold to obtain the gain of the perception filter.
3. The method according to claim 2, wherein before the step of converting the noisy speech into the frequency domain to obtain frequency-domain noisy speech, the frequency-domain noisy speech comprising frequency-domain clean speech and frequency-domain background noise, further comprising:
the method comprises the following steps of enhancing the voice with noise by adopting a short-time amplitude spectrum estimation method to obtain enhanced voice with noise, converting the voice with noise into a frequency domain to convert the enhanced voice with noise into the frequency domain, filtering the voice with noise to obtain enhanced voice, filtering the enhanced voice with noise to obtain enhanced voice, and calculating the background noise power of the frequency domain by adopting an approximate algorithm according to the noise power:
obtaining the frequency domain gain function G based on the short-time amplitude spectrum estimation methodH(k) Wherein k is a frequency spectrum serial number;
according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the frequency domain background noise power PZ(k) Wherein λ isd(k) Y (k) is the noise power, and Y (k) is the frequency domain noisy speech.
4. A perceptual filter, the perceptual filter comprising:
the acquisition module is used for acquiring the voice with noise;
the noise power calculation module is used for calculating the voice with noise according to a noise estimation algorithm to obtain noise power;
the masking threshold calculation module is used for calculating the voice with the noise according to a masking model to obtain a frequency domain masking threshold;
the frequency domain conversion module is used for converting the voice with noise into a frequency domain to obtain the voice with noise in the frequency domain, wherein the voice with noise in the frequency domain comprises pure voice in the frequency domain and background noise in the frequency domain;
an equation constructing module for expressing the speech signal distortion as a relational expression about the frequency domain pure speech and the gain of the perception filter and expressing the filtering background noise as a relational expression about the frequency domain background noise and the gain of the perception filter based on the speech estimation error algorithm, wherein the filtering background noise is epsilonZ(k) L-g (k) z (k) l g (k) z (k), g (k) is perceptual filter gain, z (k) is frequency domain background noise, and the distortion power of the speech signal is obtained according to the distortion of the speech signal; obtaining the filtering background noise power according to the filtering background noise; based on the relation that the sum of the distortion power of the voice signal and the power of the filtering background noise is equal to the masking threshold of the frequency domain, an equation is obtained(G(k)-1)2Ps(k)+(G(k))2Pz(k) T (k), where g (k) is the perceptual filter gain, k is the spectrum number, PS(k) For the frequency domain pure speech power, PZ(k) Is the frequency domain background noise power, T (k) is the frequency domain masking threshold; according to the voice signal distortion and the filtering background noise, constructing an equation related to the gain of the perception filter based on the relation that the sum of the voice signal distortion power and the filtering background noise power is less than or equal to the frequency domain masking threshold;
the gain solving module is used for acquiring the posterior signal-to-noise ratio of the current frame and the previous frame, which are respectively gamma '(k, l) and gamma' (k, l-1), wherein k is a frequency spectrum serial number, l is a frame serial number, and the current frame is a frame; acquiring a previous frame perceptual filter gain G (k, l-1), wherein if the previous frame is a first frame, the previous frame perceptual filter gain is a preset value; obtaining the prior signal-to-noise ratio of the current frame according to the posterior signal-to-noise ratio and the gain of the perception filter by a formulaWherein η is a smoothing factor and 0 < η < 1, solving the equation (G (k) -1) based on the frequency domain background noise power, the prior signal-to-noise ratio, and the frequency domain masking threshold2Ps(k)+(G(k))2Pz(k) Obtaining the perceptual filter gain t (k);
and the filtering processing module is used for filtering the voice with noise according to the gain of the perception filter to obtain enhanced voice.
5. The perceptual filter of claim 4, wherein the gain solving module comprises:
the solving preparation unit is used for calculating to obtain the frequency domain background noise power by adopting an approximation algorithm according to the noise power, calculating to obtain a posterior signal-to-noise ratio according to the frequency domain background noise power, and calculating to obtain a prior signal-to-noise ratio based on a direct decision algorithm according to the posterior signal-to-noise ratio;
and the solving unit is used for solving the equation according to the frequency domain background noise power, the prior signal-to-noise ratio and the frequency domain masking threshold value to obtain the gain of the perceptual filter.
6. The perceptual filter of claim 4, wherein the perceptual filter further comprises:
the enhancement module is used for enhancing the voice with noise by adopting a short-time amplitude spectrum estimation method to obtain the enhanced voice with noise;
the frequency domain conversion module converts the noisy speech into a frequency domain to convert the enhanced noisy speech into the frequency domain;
the filtering processing module is used for filtering the voice with noise to obtain enhanced voice, and filtering the enhanced voice with noise to obtain enhanced voice;
the calculation preparation unit calculates the frequency domain background noise power by adopting an approximation algorithm according to the noise power, and specifically comprises the following steps:
obtaining the frequency domain gain function G based on the short-time amplitude spectrum estimation methodH(k) Wherein k is a frequency spectrum serial number;
according to PZ(k)=λd(k)-(1-GH(k))|Y(k)|2Obtaining the frequency domain background noise power PZ(k) Wherein λ isd(k) Y (k) is the noise power, and Y (k) is the frequency domain noisy speech.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510031872.9A CN105869649B (en) | 2015-01-21 | 2015-01-21 | Perceptual filtering method and perceptual filter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510031872.9A CN105869649B (en) | 2015-01-21 | 2015-01-21 | Perceptual filtering method and perceptual filter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105869649A CN105869649A (en) | 2016-08-17 |
CN105869649B true CN105869649B (en) | 2020-02-21 |
Family
ID=56623456
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510031872.9A Active CN105869649B (en) | 2015-01-21 | 2015-01-21 | Perceptual filtering method and perceptual filter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105869649B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106448696A (en) * | 2016-12-20 | 2017-02-22 | 成都启英泰伦科技有限公司 | Adaptive high-pass filtering speech noise reduction method based on background noise estimation |
CN109979478A (en) * | 2019-04-08 | 2019-07-05 | 网易(杭州)网络有限公司 | Voice de-noising method and device, storage medium and electronic equipment |
JP7449803B2 (en) * | 2020-07-22 | 2024-03-14 | 三菱重工業株式会社 | Abnormality factor estimation method, abnormality factor estimation device, and program |
CN112951262B (en) * | 2021-02-24 | 2023-03-10 | 北京小米松果电子有限公司 | Audio recording method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003514264A (en) * | 1999-11-15 | 2003-04-15 | ノキア コーポレイション | Noise suppression device |
CN1684143A (en) * | 2004-04-14 | 2005-10-19 | 华为技术有限公司 | Method for strengthening sound |
CN103824562A (en) * | 2014-02-10 | 2014-05-28 | 太原理工大学 | Psychological acoustic model-based voice post-perception filter |
JP2014232331A (en) * | 2007-07-06 | 2014-12-11 | オーディエンス,インコーポレイテッド | System and method for adaptive intelligent noise suppression |
-
2015
- 2015-01-21 CN CN201510031872.9A patent/CN105869649B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003514264A (en) * | 1999-11-15 | 2003-04-15 | ノキア コーポレイション | Noise suppression device |
CN1684143A (en) * | 2004-04-14 | 2005-10-19 | 华为技术有限公司 | Method for strengthening sound |
JP2014232331A (en) * | 2007-07-06 | 2014-12-11 | オーディエンス,インコーポレイテッド | System and method for adaptive intelligent noise suppression |
CN103824562A (en) * | 2014-02-10 | 2014-05-28 | 太原理工大学 | Psychological acoustic model-based voice post-perception filter |
Non-Patent Citations (1)
Title |
---|
结合人耳听觉感知的两级语音增强算法;张勇等;《信号处理》;20140430;第1-11页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105869649A (en) | 2016-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105741849B (en) | The sound enhancement method of phase estimation and human hearing characteristic is merged in digital deaf-aid | |
CN102903368B (en) | Method and equipment for separating convoluted blind sources | |
CN105869649B (en) | Perceptual filtering method and perceptual filter | |
CN103778920B (en) | Speech enhan-cement and compensating for frequency response phase fusion method in digital deaf-aid | |
CN101976566B (en) | Voice enhancement method and device using same | |
CN108735225A (en) | It is a kind of based on human ear masking effect and Bayesian Estimation improvement spectrum subtract method | |
CN104810024A (en) | Double-path microphone speech noise reduction treatment method and system | |
CN105679330B (en) | Based on the digital deaf-aid noise-reduction method for improving subband signal-to-noise ratio (SNR) estimation | |
CN103761974B (en) | Cochlear implant | |
CN105489226A (en) | Wiener filtering speech enhancement method for multi-taper spectrum estimation of pickup | |
CN106328155A (en) | Speech enhancement method of correcting priori signal-to-noise ratio overestimation | |
CN108665054A (en) | Based on the Mallat algorithms of genetic algorithm optimization threshold value cardiechema signals noise reduction application | |
Min et al. | Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement | |
WO2019024621A1 (en) | Acoustic echo canceller output voice signal post-processing method and apparatus | |
Zou et al. | Speech signal enhancement based on MAP algorithm in the ICA space | |
CN111543984A (en) | Method for removing ocular artifacts of electroencephalogram signals based on SSDA (steady state data acquisition) | |
CN105869652B (en) | Psychoacoustic model calculation method and device | |
CN107731242A (en) | A kind of gain function sound enhancement method of the spectral amplitude estimation of broad sense maximum a posteriori | |
CN110602621B (en) | Noise reduction method and system for digital hearing aid and special DSP | |
CN108962275A (en) | A kind of music noise suppressing method and device | |
WO2013067714A1 (en) | Method for reducing burst noise | |
KR20130109793A (en) | Audio encoding method and apparatus for noise reduction | |
Inoue et al. | Theoretical analysis of musical noise in generalized spectral subtraction: why should not use power/amplitude subtraction? | |
CN102945674A (en) | Method for realizing noise reduction processing on speech signal by using digital noise reduction algorithm | |
Sun et al. | An RNN-based speech enhancement method for a binaural hearing aid system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |