CN103295580A - Method and device for suppressing noise of voice signals - Google Patents

Method and device for suppressing noise of voice signals Download PDF

Info

Publication number
CN103295580A
CN103295580A CN 201310175549 CN201310175549A CN103295580A CN 103295580 A CN103295580 A CN 103295580A CN 201310175549 CN201310175549 CN 201310175549 CN 201310175549 A CN201310175549 A CN 201310175549A CN 103295580 A CN103295580 A CN 103295580A
Authority
CN
China
Prior art keywords
noise
signal
subband
estimated
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201310175549
Other languages
Chinese (zh)
Inventor
宋辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN 201310175549 priority Critical patent/CN103295580A/en
Publication of CN103295580A publication Critical patent/CN103295580A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and a device for suppressing noise of voice signals. The method for suppressing the noise of the voice signals includes performing Mel-domain decomposition on a target voice signal X(f) to obtain K Mel-domain sub-band signals X<i>(f); determining local minimum energy values of various sub-bands in a time domain and signals corresponding to the minimum energy values by the aid of preset time windows, estimating corresponding sub-band noise signals of the various sub-bands and acquiring an estimated signal N(f) of the noise in a full frequency band by means of splicing estimation results; using the N(f) as a removal target to generate a filter, and suppressing the noise of the target voice signal X(f). The i can be 1, or 2, ..., or K. According to the scheme, the method and the device have the advantages that the integrity of voice characteristics can be kept to the greatest extent in a noise reduction procedure, and accordingly the voice recognition accuracy is improved.

Description

A kind of pronunciation signal noise inhibition method and device
Technical field
The present invention relates to the audio signal processing technique field, particularly relate to a kind of pronunciation signal noise inhibition method and device.
Background technology
Along with Internet development, the user has been not limited only to content of text at the object of the enterprising line search of network, and picture, audio frequency, video etc. have all become the object that search engine is supported.Wherein, voice-based search has become a kind of popular application form.Different with traditional text search, in the input process of voice signal, noise is inevitably, therefore how to carry out effective noise reduction process, has just become to improve the key of phonetic search performance.
At present, noise reduction process thinking at phonetic search mainly is divided into two classes: the first kind is that the clean speech characteristic signal in the feature database is added the processing of making an uproar, training obtains the voice feature data with noise, thereby increases the matching degree of feature database data and the actual recording of user data.The defective of this mode, be that on the one hand data training cost is higher, be on the other hand to add to make an uproar and handle all noise types that are difficult to contain in the reality, therefore, in case user's surrounding environment when using phonetic search is a kind of new noise, training data will not possess the anti-noise effect fully so.
The thinking of the second class noise reduction process is to utilize existing voice enhancement algorithm that the voice signal of band noise is strengthened processing.Yet the original purpose of existing voice enhancement algorithm is in order to promote the subjective auditory perception of voice, but has destroyed the spectrum structure of voice, influences the integrality of phonetic feature.Therefore directly existing speech enhancement technique is applied in the speech recognition system, might not obtains the forward income, also can reduce the discrimination of system under a lot of situations.
Summary of the invention
For solving the problems of the technologies described above, the embodiment of the invention provides a kind of pronunciation signal noise inhibition method and device, to be implemented in the integrality that keeps phonetic feature in the noise reduction process as far as possible, promotes the speech recognition accuracy rate.Technical scheme is as follows:
The embodiment of the invention provides a kind of pronunciation signal noise inhibition method, and this method comprises:
Target voice signal X (f) is carried out the Mel territory decompose, obtain K Mel territory subband signal X i(f), (i=1,2 ..., K);
Utilize the preset time window mouth, determine that each subband is in the local minimum energy value of time domain
Figure BDA00003182899900021
And the signal of this minimum energy value correspondence
Figure BDA00003182899900022
Utilize each subband
Figure BDA00003182899900023
Noise signal to respective sub-bands is estimated, obtains the noise estimated signal N (f) of full range band by the splicing estimated result;
Serve as to eliminate target to generate wave filter with N (f), target voice signal X (f) is carried out squelch handle.
According to a kind of embodiment of the present invention, described each subband that utilizes
Figure BDA00003182899900024
Noise signal to respective sub-bands is estimated, comprising:
With each subband
Figure BDA00003182899900025
Noise estimated signal as respective sub-bands.
According to a kind of embodiment of the present invention, described each subband that utilizes
Figure BDA00003182899900026
Noise signal to respective sub-bands is estimated, comprising:
Local minimum energy value with each subband
Figure BDA00003182899900027
Noise energy estimated value as this subband;
According to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively;
The time quantum that the average sub band signal to noise ratio (S/N ratio) is lower than predetermined threshold value is defined as the non-voice time quantum;
Utilize the non-voice time quantum respective signal of each subband With the minimum energy value respective signal
Figure BDA00003182899900029
Noise signal to respective sub-bands is estimated.
According to a kind of embodiment of the present invention, described noise energy estimated value according to each subband is calculated the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively, comprising:
Only at default Mid Frequency and high band, according to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively.
According to a kind of embodiment of the present invention, the described non-voice time quantum respective signal of utilizing each subband
Figure BDA00003182899900031
With the minimum energy value respective signal Noise signal to respective sub-bands is estimated, comprising:
Utilize X i update ( f ) = &lambda; i X i min ( f ) + ( 1 - &lambda; i ) X i non - speech ( f ) , Obtain the noise estimated signal of each subband; λ wherein iBe preset weight value, 0≤λ i≤ 1.
According to a kind of embodiment of the present invention, at different subbands, default different λ iValue.
According to a kind of embodiment of the present invention, described at different subbands, default different λ iValue comprises:
At the subband than low-frequency range, default less λ iValue, at the subband of higher frequency band, default bigger λ iValue.
According to a kind of embodiment of the present invention, described time quantum is: an audio frame.
The embodiment of the invention also provides a kind of pronunciation signal noise restraining device, and this device comprises:
The signal decomposition module is used for that target voice signal X (f) is carried out the Mel territory and decomposes, and obtains K Mel territory subband signal X i(f), (i=1,2 ..., K);
The least energy determination module is used for utilizing the preset time window mouth, determines that each subband is in the local minimum energy value of time domain And the signal of this minimum energy value correspondence
Figure BDA00003182899900035
The noise estimation module is used for utilizing each subband Noise signal to respective sub-bands is estimated, obtains the noise estimated signal N (f) of full range band by the splicing estimated result;
Noise suppression module, being used for N (f) serves as to eliminate target to generate wave filter, target voice signal X (f) is carried out squelch handle.
According to a kind of embodiment of the present invention, described noise estimation module specifically is used for:
With each subband
Figure BDA00003182899900037
Noise estimated signal as respective sub-bands.
According to a kind of embodiment of the present invention, described noise estimation module comprises:
The time quantum type is judged submodule, is used for the local minimum energy value with each subband
Figure BDA00003182899900038
Noise energy estimated value as this subband; According to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively; The time quantum that the average sub band signal to noise ratio (S/N ratio) is lower than predetermined threshold value is defined as the non-voice time quantum;
Noise estimator module is for the non-voice time quantum respective signal of utilizing each subband With the minimum energy value respective signal
Figure BDA00003182899900042
Noise signal to respective sub-bands is estimated.
According to a kind of embodiment of the present invention, described time quantum type is judged submodule, specifically is used for:
Only at default Mid Frequency and high band, according to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively.
According to a kind of embodiment of the present invention, described noise estimator module specifically is used for:
Utilize X i update ( f ) = &lambda; i X i min ( f ) + ( 1 - &lambda; i ) X i non - speech ( f ) , Obtain the noise estimated signal of each subband; λ wherein iBe preset weight value, 0≤λ i≤ 1.
According to a kind of embodiment of the present invention, at different subbands, default different λ iValue.
According to a kind of embodiment of the present invention, at the subband than low-frequency range, default less λ iValue, at the subband of higher frequency band, default bigger λ iValue.
According to a kind of embodiment of the present invention, described time quantum is: an audio frame.
The technical scheme that the embodiment of the invention provides adopts the least energy tracer technique in Mel territory, the least energy of each Mel frequency band is followed the trail of, and preserved the signal corresponding with each Mel subband least energy
Figure BDA00003182899900044
Be with the difference of traditional noise suppressing method: these signals
Figure BDA00003182899900045
May not belong to same time frame, but can guarantee that the noise estimated result of each Mel frequency band is the energy minimum, guarantee the reliability that noise is estimated effectively, make speech components can in the process of squelch, not sustain damage.
Further, the technical scheme that the embodiment of the invention provides can also be utilized the least energy tracking result, and the speech/non-speech discriminant information, at the Mel frequency band noise estimation value is revised, at the different qualities of different Mel frequency range noises, can select different noise correction step-lengths.Like this, can guarantee can not introduce speech components to the reliability of noise estimation, can carry out dynamic estimation to the noise of different frequency range again, thereby further promote the noise suppression effect for the nonstationary noise voice signal.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, the accompanying drawing that describes below only is some embodiment that put down in writing among the present invention, for those of ordinary skills, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is a kind of structural representation of embodiment of the invention speech recognition system;
Fig. 2 is a kind of process flow diagram of embodiment of the invention pronunciation signal noise inhibition method;
Fig. 3 is a kind of structural representation of embodiment of the invention pronunciation signal noise restraining device.
Embodiment
Figure 1 shows that a kind of typical speech recognition system in the prior art, by noise suppression module, add the training module of making an uproar, identification module etc. and partly constitute.In the training stage, noisy speech is through Noise Suppression Device, and the voice signal after being enhanced adds the training module of making an uproar and utilizes this signal to carry out plus noise model training, obtains adding the acoustic model after making an uproar.At cognitive phase, Noisy Speech Signal passes through noise suppression module equally, and identification module sent in the enhancing voice that obtain, and identifies processing according to plus noise model and language model that training in advance is good, obtains final voice identification result.
As seen, noise suppression module is the important component part of whole speech recognition system.Existing noise reduction process thinking is to utilize existing voice enhancement algorithm that the voice signal of band noise is strengthened processing.Though voice enhancement algorithm can promote the subjective auditory perception of voice, destroyed the spectrum structure of voice, influence the integrality of phonetic feature.Therefore directly existing speech enhancement technique is applied in the speech recognition system, might not obtains the forward income, also can reduce the discrimination of system under a lot of situations.
At the problems referred to above, the embodiment of the invention provides a kind of pronunciation signal noise to suppress scheme, and this design for scheme target is under the prerequisite of not damaging voice signal, suppresses ground unrest as far as possible, residual noise in the feasible enhancing voice that finally obtain reduces on amplitude as far as possible.The basic functional principle of this noise suppressing system is as follows:
At first with the Noisy Speech Signal of input, (Mel Filter Bank MFB), carries out Mel territory sub-band division to extract identical Mel triangular filter group by one group with phonic signal character.Each Mel subband signal is carried out local least energy to be followed the trail of, the local least energy in time domain of each Mel subband of real time record, determine that simultaneously (effect of Mel subband least energy has two to the Mel territory signal corresponding with each least energy, the one, as the detection foundation of the voice Origin And Destination of input signal, the 2nd, the foundation of estimating as noise signal in the noise suppression algorithm); Utilize the least energy respective signal of each subband that the noise signal of each subband is estimated then, obtain the noise estimated signal of full range band by the splicing estimated result; At last the noise estimated signal with the full range band serves as to eliminate the target design wave filter, the Noisy Speech Signal of initial input is carried out squelch handle.
In such scheme, because whole noise estimation procedure (rather than as traditional voice enhancement algorithm in frequency domain) in the Mel territory is carried out, act on same transform domain with the feature extraction processing of voice, and each Mel subband has been carried out the least energy tracking, therefore can guarantee the squelch to each Mel subband.The part that eliminates must be noise component, and can not lose speech components, has effectively protected the phonetic element of each Mel subband.Compare with common speech enhancement technique, can significantly promote the accuracy rate of speech recognition system.
In order to make those skilled in the art understand technical scheme among the present invention better, below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is described in detail.Obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills obtain should belong to the scope of protection of the invention.
Shown in Figure 2, be the process flow diagram of a kind of noise suppressing method of the present invention, this method can may further comprise the steps:
S101 carries out the Mel territory to the target voice signal and decomposes, and obtains K Mel territory subband signal;
The Mel territory is to be divided by the one group of triangular filter group that defines at frequency domain, and this bank of filters is also referred to as Mel Filter Bank, is called for short MFB.Because MFB can simulate the critical bandwidth effect of human auditory system well, therefore in speech recognition system, can extract the feature of voice well in the Mel territory.In embodiment of the invention scheme, sub-band division is carried out according to the Mel frequency marking, and squelch is also handled in the Mel territory.
Suppose that pending target voice signal is X (f), the process of Mel territory sub-band division is exactly to divide according to the bandwidth of Mel triangular filter group, with full range band voice signal, resolves into K Mel territory subband signal X 1(f) ..., X K(f).
Wherein, each subband signal can further be expressed as voice subband component and noise subband component sum again:
X i(f)=S i(f)+N i(f),i=1,...,K
If can estimate the noise component N of each subband i(f), just can estimate the noise signal N (f) of full range band, serve as to eliminate target with full range band noise signal then, and designing filter is finished squelch.In the content of present embodiment back, will introduce in detail and how noise signal N (f) be estimated.
S102 utilizes the preset time window mouth, determines that each subband is in the local minimum energy value of time domain And the signal of this minimum energy value correspondence
Figure BDA00003182899900071
Embodiment of the invention scheme utilizes the local least energy of each Mel to the noise component N of each subband i(f) estimate.At first the least energy of each subband is followed the trail of, determined that namely each subband is in the local minimum energy value of time domain.Consider that the actual noise signal mostly is non-stationary signal, therefore can introduce long time window here, to improve the adaptive ability to nonstationary noise.
For example, access time, window was 300 frames, in this time window each Mel subband was carried out local least energy and followed the trail of, that is: at each Mel subband, find 300 continuous frames respectively, each 300 frame is 300 frames with least energy in subband separately.
Supposing to follow the trail of the least energy that obtains each Mel subband is respectively:
E N , 1 min , . . . . . . , E N , K min .
Record the signal of each the Mel subband corresponding with least energy simultaneously:
X 1 min ( f ) , . . . . . . , X K min ( f ) .
These subband signals may not belong to same signal frame on time domain, but can guarantee the energy minimum of each Mel frequency band, and therefore the foundation that this part signal is estimated as noise is very reliable.In carrying out the squelch filtering, if with this part as by the component that filtered out, can guarantee that the whole of filtering are the noise component of each Mel frequency band, speech components has then intactly been kept.That is to say, the application's scheme adopts a kind of comparatively method of " insurance ", only the very strong noise signal of determinacy is carried out filtering and handles, and whether uncertain for those is the signal of noise, do filtering would rather not and handle, avoid the phonetic feature of signal is caused damage.
S103 utilizes each subband
Figure BDA00003182899900081
Noise signal to respective sub-bands is estimated, obtains the noise estimated signal N (f) of full range band by the splicing estimated result;
1) method of estimation of the noise signal of each subband:
In one embodiment of the invention, can be directly with the least energy respective signal of each subband
Figure BDA00003182899900082
Noise estimated signal as respective sub-bands.According to the explanation of front as can be known, this method of estimation can guarantee can not comprise speech components in the noise estimated result.
In another embodiment of the invention, can also be to the least energy respective signal of each subband
Figure BDA00003182899900083
Do further correction, to realize noise estimation effect more accurately.Concrete steps are as follows:
S103a is with the local minimum energy value of each subband
Figure BDA00003182899900084
Noise energy estimated value as this subband;
S103b according to the noise energy estimated value of each subband, calculates the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively;
For sound signal, can be on the time domain be several time quantums with this division of signal, each time quantum is as a basic processing unit.In field of audio processing, typical basic processing unit is an audio frame.In an embodiment of the present invention, also describe as the basic processing unit with " audio frame ", but this and should not be construed restriction to the present invention program.
For each frame signal, calculate the average sub band signal to noise ratio (S/N ratio) of this frame:
SNR ( f ) = &Sigma; k E i E i min
E wherein iThe gross energy of representing i subband;
Figure BDA00003182899900086
Be the minimum local energy of i subband, the noise energy estimated value as i subband participates in snr computation here.To the snr value summation of each subband of frame signal, just obtain the average sub band signal to noise ratio snr (f) of this frame, this value is more big, illustrates that voice signal is more strong; Otherwise, illustrate that voice signal is more weak, even may not have voice signal, all be noise.
By the research to a large amount of actual speech signal, find that noise signal concentrates on low-frequency range usually, so the signal to noise ratio (S/N ratio) of medium-high frequency section has more the ability of differentiation voice/noise.Based on above-mentioned situation, can adopt following formula to calculate every frame signal average sub band signal to noise ratio (S/N ratio):
SNR ( f ) = &Sigma; k = K / 3 K E i E i min
Notice that calculating signal to noise ratio (S/N ratio) does not here adopt whole subband signal summations, but adopt the subband of medium-high frequency section, the summation scope is from K/3 to K, and purpose is to remove low-frequency band, keeps the medium-high frequency band.Because noise signal concentrates on low-frequency band mostly, so input signal is bigger in the interference of low-frequency range, and the interference of medium-high frequency section is less, and the result reliability that calculates signal to noise ratio (S/N ratio) is higher.Here K/3 is an empirical value, is understandable that, this value also should not be construed as restriction to the present invention program.
S103c, the time quantum that the average sub band signal to noise ratio (S/N ratio) is lower than predetermined threshold value is defined as the non-voice time quantum;
SNR (f) and preset threshold value SNR with each frame ThrCompare.If SNR (f)>SNR Thr, then this frame is differentiated and is speech frame, otherwise this frame is differentiated is non-speech frame.
For non-speech frame, further determine its respective signal at each subband
Figure BDA00003182899900093
Represent that certain frame is in the signal characteristic value of i subband.If have a plurality of non-speech frame in the subband, can adopt the impartial mode of making even to obtain
Figure BDA00003182899900094
S103d utilizes the non-voice time quantum respective signal of each subband
Figure BDA00003182899900095
With the minimum energy value respective signal
Figure BDA00003182899900096
Noise signal to respective sub-bands is estimated.
Utilize the result of speech/non-speech judgement, revise minimum Mel sub belt energy tracking result:
X 1 min ( f ) , . . . . . . , X K min ( f ) .
Obtain the noise estimated result of " more reliable " of each Mel subband:
X 1 update ( f ) , . . . . . . , X K update ( f ) .
The noise correction of each subband, can carry out in the following manner:
X i update ( f ) = &lambda; i X i min ( f ) + ( 1 - &lambda; i ) X i non - speech ( f )
As can be seen from the above equation, revised noise estimated signal is actually the non-speech frame respective signal
Figure BDA000031828999000910
With the minimum energy value respective signal
Figure BDA000031828999000911
This two-part weighted results.
λ wherein iBe preset weight value, span plays the effect that step-length is revised in control in [0,1].
Two kinds of extreme cases are:
λ i=0: then noise estimation value is modified to " non-speech frame " respective signal; Be equivalent to: as long as find the signal of non-voice, then utilize non-speech audio as the noise estimated result.
λ i=1: then utilize the least energy tracking result to estimate as noise all the time, be equivalent to not revise.
When actual treatment, can adjust different λ according to concrete environment, noisiness iValue.In addition, at different Mel frequency bands, can select different step-length λ iFor example, for low-frequency band, noise signal more trends towards stationary signal, then λ iCan select a little bit smaller, " insurance " more; For high frequency band, noise signal more trends towards non-stationary, at this moment λ iCan select more greatly, can dynamically trace into the variation of noise like this.
Here the normally individual empirical value of the span of said " low frequency ", " high frequency " does not have absolute restriction.For example the frequency band of voice signal is 0~8k, it is generally acknowledged that 0~2kHz is low frequency, is high frequency more than the 2k.By repeatedly experiment, general λ with low-frequency range iBe set to 0.1~0.2, the λ of high band iBe set to 0.5~0.6, this only is optimal value on the experiment meaning certainly, should not be construed as the restriction to the present invention program.
More than provide two kinds of noise signals to each subband to carry out estimation approach, wherein the advantage of first method is to handle simply, under some simple environment, such as steady or accurate stationary noise environment, can directly adopt the least energy tracking result, as the estimation of noise.Second method has then further increased the dynamic tracking power to noise, the signal of its utilization " non-speech frame " (being noise signal in real time) is followed the trail of the result to minimum Mel energy and is revised, given the deviser very big degree of freedom simultaneously, can design different correction step factors according to the actual noise environment.The more important thing is, can select different step-lengths at the noisiness of different Mel frequency bands, thereby realize the specific aim of different frequency bands noise is upgraded, further promote the estimation accuracy to nonstationary noise.
2) joining method of the noise signal of each subband:
By the processing of step 1), obtain the noise signal estimated value of each subband:
X 1 min ( f ) , . . . . . . , X K min ( f ) Or X 1 update ( f ) , . . . . . . , X K update ( f ) .
Handle for designing filter carries out squelch, the noise signal of each subband need be spliced, obtain noise estimated signal N (f) or the N of full range band Update(f).
Exist overlappingly at the frequency domain of standard owing to decompose K subband signal obtaining in the Mel territory, so can not obtain N (f) by the mode of direct addition, particularly, can adopt " sectional type splicing method " to splice.Illustrate, for two adjacent Mel frequency bands
Figure BDA00003182899900111
With
Figure BDA00003182899900112
Corresponding Mel frequency band range is respectively: And
Figure BDA00003182899900114
Then connecting method is as follows:
N update ( f ) = X i update ( f ) if f i lower / 3 < f < f i upper / 3
N update ( f ) = X i + 1 update ( f ) if f i + 1 lower / 3 < f < f i + 1 upper / 3
N update ( f ) = [ X i update ( f ) + X i + 1 update ( f ) ] / 2 if f i upper / 3 < f < f i + 1 lower / 3
Be example with i subband, it and former and later two subbands have overlapping, so can be divided into 3 parts to i subband: 1) with the overlapping part of i-1 subband; 2) independent parts; 3) with the overlapping part of i+1 subband.For convenience of calculation, think that here overlapping part is the same with independent parts length, so the length of each part is equivalent to 1/3rd of whole subband length, so will be divided by 3 for interval endpoint.
Except segmentation splicing method, can also adopt " Mel territory compartment splicing method " to splice, the odd number section, the even number section that are about to the Mel subband are spliced respectively.Because odd number wave filter and the even number wave filter of Mel triangular filter group, can both not have and constitute the full range band signal overlappingly, so can directly splice processing respectively.
Certainly, more than be that two kinds of typical joining methods are introduced.According to the characteristics of Mel territory signal decomposition, those skilled in the art can also adopt additive method that each subband signal after decomposing is spliced, and the embodiment of the invention does not need to limit to concrete joining method.
S104 serves as to eliminate target to generate wave filter with N (f), target voice signal X (f) is carried out squelch handle.
Obtain noise estimated signal N (f) or the N of full range band Update(f) afterwards, just can realize squelch by designing filter.
For example, to every frame signals with noise of input, with N Update(f) carry out filtering for eliminating target, finish noise N Update(f) inhibition is handled.Suppose that present frame is X (f), according to minimum mean square error criterion, generate optimal filter and satisfy:
H ( f ) = 1 - 1 SNR ( f ) = 1 - N update ( f ) X ( f )
Utilize H (f), input signal is carried out filtering, can obtain the voice signal after the denoising.Still in frequency domain, carry out on this filtering surface, but because noise estimates that with the noise renewal all be to finish in the Mel territory, the optimum Wiener filtering that therefore in fact is still in the Mel territory is handled.Because the noise estimated result of each Mel frequency band can not comprise speech components, therefore can guarantee the reliability that noise is estimated effectively, make speech components can in the process of filtering, not sustain damage.
Be understandable that, according to noise estimated signal N (f) or the N of full range band Update(f), those skilled in the art can also adopt the additive method designing filter, and the embodiment of the invention does not need to limit to concrete filter design method.
Corresponding to top method embodiment, the present invention also provides a kind of pronunciation signal noise restraining device, referring to shown in Figure 3, this device can comprise: signal decomposition module 110, least energy determination module 120, noise estimation module 130, noise suppression module 140.Concrete function and cooperation relation to each module is described further below:
Signal decomposition module 110 is used for that target voice signal X (f) is carried out the Mel territory and decomposes, and obtains K Mel territory subband signal X i(f), (i=1,2 ..., K);
The Mel territory is to be divided by the one group of triangular filter group that defines at frequency domain, and this bank of filters is also referred to as Mel Filter Bank, is called for short MFB.Because MFB can simulate the critical bandwidth effect of human auditory system well, therefore in speech recognition system, can extract the feature of voice well in the Mel territory.In embodiment of the invention scheme, sub-band division is carried out according to the Mel frequency marking.Squelch is also handled in the Mel territory.
Suppose that pending target voice signal is X (f), the process of Mel territory sub-band division is exactly to divide according to the bandwidth of Mel triangular filter group, with full range band voice signal, resolves into K Mel territory subband signal X 1(f) ..., X K(f).
Wherein, each subband signal can further be expressed as voice subband component and noise subband component sum again:
X i(f)=S i(f)+N i(f),i=1,...,K
If can estimate the noise component N of each subband i(f), just can estimate the noise signal N (f) of full range band, serve as to eliminate the target design wave filter with full range band noise signal then, finishes squelch.In the content of present embodiment back, will introduce in detail and how noise signal N (f) be estimated.
Least energy determination module 120 is used for utilizing the preset time window mouth, determines that each subband is in the local minimum energy value of time domain
Figure BDA00003182899900135
And the signal of this minimum energy value correspondence
Figure BDA00003182899900134
Embodiment of the invention scheme utilizes the local least energy of each Mel to the noise component N of each subband i(f) estimate.At first the least energy of each subband is followed the trail of, determined that namely each subband is in the local minimum energy value of time domain.Consider that the actual noise signal mostly is non-stationary signal, therefore can introduce long time window here, to improve the adaptive ability to nonstationary noise.
For example, access time, window was 300 frames, in this time window each Mel subband was carried out local least energy and followed the trail of, that is: at each Mel subband, find 300 continuous frames respectively, each 300 frame is 300 frames with least energy in subband separately.
Supposing to follow the trail of the least energy that obtains each Mel subband is respectively:
E N , 1 min , . . . . . . , E N , K min .
Record the signal of each the Mel subband corresponding with least energy simultaneously:
X 1 min ( f ) , . . . . . . , X K min ( f ) .
These subband signals may not belong to same signal frame on time domain, but can guarantee the energy minimum of each Mel frequency band, therefore with the foundation of this part signal as the noise estimation, be very reliable, in carrying out the squelch filtering, if with this part as by the component that filtered out, can guarantee that the whole of filtering are the noise component of each Mel frequency band, speech components has then intactly been kept, that is to say, the application's scheme adopts a kind of comparatively method of " insurance ", only the very strong noise signal of determinacy being carried out filtering handles, whether uncertain for those is the signal of noise, does filtering would rather not and handles, and avoids the phonetic feature of signal is caused damage.
Noise estimation module 130 is used for utilizing each subband
Figure BDA00003182899900133
Noise signal to respective sub-bands is estimated, obtains the noise estimated signal N (f) of full range band by the splicing estimated result;
1) noise signal of each subband is estimated:
In one embodiment of the invention, can be directly with the least energy respective signal of each subband
Figure BDA00003182899900141
Noise estimated signal as respective sub-bands.According to the explanation of front as can be known, this method of estimation can guarantee can not comprise speech components in the noise estimated result.
In another embodiment of the invention, can also be to the least energy respective signal of each subband
Figure BDA00003182899900142
Do further correction, to realize noise estimation effect more accurately.Specific design is as follows:
The time quantum type is judged submodule, is used for the local minimum energy value with each subband
Figure BDA00003182899900143
Noise energy estimated value as this subband; According to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively; The time quantum that the average sub band signal to noise ratio (S/N ratio) is lower than predetermined threshold value is defined as the non-voice time quantum;
For sound signal, can be on the time domain be several time quantums with this division of signal, each time quantum is as a basic processing unit.In field of audio processing, typical basic processing unit is an audio frame, in an embodiment of the present invention, also describe as the basic processing unit with " audio frame ", but this and should not be construed restriction to the present invention program.
For each frame signal, calculate the average sub band signal to noise ratio (S/N ratio) of this frame:
SNR ( f ) = &Sigma; k E i E i min
E wherein iThe gross energy of representing i subband;
Figure BDA00003182899900145
Be the minimum local energy of i subband, the noise energy estimated value as i subband participates in snr computation here.To the snr value summation of each subband of frame signal, just obtain the average sub band signal to noise ratio snr (f) of this frame, this value is more big, illustrates that voice signal is more strong; Otherwise, illustrate that voice signal is more weak, even may not have voice signal, all be noise.
By the research to a large amount of actual speech signal, find that noise signal concentrates on low-frequency range usually, so the signal to noise ratio (S/N ratio) of medium-high frequency section has more the ability of differentiation voice/noise.Based on above-mentioned situation, can adopt following formula to calculate every frame signal average sub band signal to noise ratio (S/N ratio):
SNR ( f ) = &Sigma; k = K / 3 K E i E i min
Notice that calculating signal to noise ratio (S/N ratio) does not here adopt whole subband signal summations, but adopt the subband of medium-high frequency section, the summation scope is from K/3 to K, and purpose is to remove low-frequency band, keeps the medium-high frequency band.Because noise signal concentrates on low-frequency band mostly, so input signal is bigger in the interference of low-frequency range, and the interference of medium-high frequency section is less, and the result reliability that calculates signal to noise ratio (S/N ratio) is higher.Here K/3 is an empirical value, is understandable that, this value also should not be construed as restriction to the present invention program.
SNR (f) and preset threshold value SNR with each frame ThrCompare.If SNR (f)>SNR Thr, then this frame is differentiated and is speech frame, otherwise this frame is differentiated is non-speech frame.
For non-speech frame, further determine its respective signal at each subband
Figure BDA00003182899900152
Figure BDA00003182899900153
Represent certain frame in the signal characteristic value of i subband, if having a plurality of non-speech frame in the subband, can adopt the impartial mode of making even to obtain
Figure BDA00003182899900154
Noise estimator module is for the non-voice time quantum respective signal of utilizing each subband
Figure BDA00003182899900155
With the minimum energy value respective signal
Figure BDA00003182899900156
Noise signal to respective sub-bands is estimated.
Utilize the result of speech/non-speech judgement, revise minimum Mel sub belt energy tracking result:
X 1 min ( f ) , . . . . . . , X K min ( f ) .
Obtain the noise estimated result of " more reliable " of each Mel subband:
X 1 update ( f ) , . . . . . . , X K update ( f ) .
The noise correction of each subband, can carry out in the following manner:
X i update ( f ) = &lambda; i X i min ( f ) + ( 1 - &lambda; i ) X i non - speech ( f )
As can be seen from the above equation, revised noise estimated signal is actually the non-speech frame respective signal
Figure BDA000031828999001510
With the minimum energy value respective signal This two-part weighted results.
λ wherein iBe preset weight value, span plays the effect that step-length is revised in control in [0,1],
Two kinds of extreme cases are:
λ i=0: then noise estimation value is modified to " non-speech frame " respective signal; Be equivalent to: as long as find the signal of non-voice, then utilize non-speech audio as the noise estimated result.
λ i=1: then utilize the least energy tracking result as the noise estimated result all the time, be equivalent to not revise.
When actual treatment, can adjust different λ according to concrete environment, noisiness iValue.In addition, at different Mel frequency bands, can select different step-length λ i, for example, for low-frequency band, noise signal more trends towards stationary signal, then λ iCan select a little bit smaller, " insurance " more; For high frequency band, noise signal more trends towards non-stationary, at this moment λ iCan select more greatly, can dynamically trace into the variation of noise like this.
Here the normally individual empirical value of the span of said " low frequency ", " high frequency " does not have absolute restriction.For example the frequency band of voice signal is 0~8k, it is generally acknowledged that 0~2kHz is low frequency, is high frequency more than the 2k.By repeatedly experiment, general λ with low-frequency range iBe set to 0.1~0.2, the λ of high band iBe set to 0.5~0.6, this only is optimal value on the experiment meaning certainly, should not be construed as the restriction to the present invention program.
Two kinds of schemes that the noise signal of each subband is estimated more than are provided, wherein the advantage of first method is to handle simply, under some simple environment, such as steady or accurate stationary noise environment, can directly adopt the least energy tracking result, as the estimation of noise.Second method has then further increased the dynamic tracking power to noise, the signal of its utilization " non-speech frame " (being noise signal in real time) is followed the trail of the result to minimum Mel energy and is revised, given the deviser very big degree of freedom simultaneously, can design different correction step factors according to the actual noise environment.The more important thing is, can select different step-lengths at the noisiness of different Mel frequency bands, thereby realize the specific aim of different frequency bands noise is upgraded, further promote the estimation accuracy to nonstationary noise.
2) noise signal of each subband is spliced:
By 1) processing, obtain the noise signal estimated value of each subband:
X 1 min ( f ) , . . . . . . , X K min ( f ) Or X 1 update ( f ) , . . . . . . , X K update ( f ) .
Handle for designing filter carries out squelch, the noise signal of each subband need be spliced, obtain noise estimated signal N (f) or the N of full range band Update(f).
Because decomposing K the subband signal that obtains in the Mel territory exists overlapping at the frequency domain of standard, therefore can not obtain N (f) by the mode of direct addition, particularly, can adopt " sectional type splicing method " to splice, illustrate, for two adjacent Mel frequency bands
Figure BDA00003182899900171
With
Figure BDA00003182899900172
Corresponding Mel frequency band range is respectively:
Figure BDA00003182899900173
And
Figure BDA00003182899900174
Then connecting method is as follows:
N update ( f ) = X i update ( f ) if f i lower / 3 < f < f i upper / 3
N update ( f ) = X i + 1 update ( f ) if f i + 1 lower / 3 < f < f i + 1 upper / 3
N update ( f ) = [ X i update ( f ) + X i + 1 update ( f ) ] / 2 if f i upper / 3 < f < f i + 1 lower / 3
Be example with i subband, it and former and later two subbands have overlapping, so can be divided into 3 parts to i subband: 1) with the overlapping part of i-1 subband; 2) independent parts; 3) with the overlapping part of i+1 subband.For convenience of calculation, think that here overlapping part is the same with independent parts length, so the length of each part is equivalent to 1/3rd of whole subband length, so will be divided by 3 for interval endpoint.
Except segmentation splicing method, can also adopt " Mel territory compartment splicing method " to splice, the odd number section, the even number section that are about to the Mel subband are spliced respectively.Because odd number wave filter and the even number wave filter of Mel triangular filter group can both not have overlapping formation full range band signal, so can directly splice processing respectively.
Certainly, more than two kinds of typical joining methods are introduced, according to the characteristics of Mel territory signal decomposition, those skilled in the art can also adopt additive method that each subband signal after decomposing is spliced, and the embodiment of the invention does not need to limit to concrete joining method.
Noise suppression module 140, being used for N (f) serves as to eliminate target to generate wave filter, target voice signal X (f) is carried out squelch handle.
Obtain noise estimated signal N (f) or the N of full range band Update(f) afterwards, just can realize squelch by designing filter.
For example, to every frame signals with noise of input, with N Update(f) carry out filtering for eliminating target, finish noise N Update(f) inhibition is handled.Suppose that present frame is X (f), according to minimum mean square error criterion, generate optimal filter and satisfy:
H ( f ) = 1 - 1 SNR ( f ) = 1 - N update ( f ) X ( f )
Utilize H (f), input signal is carried out filtering, can obtain the voice signal after the denoising.Still in frequency domain, carry out on this filtering surface, but because noise estimates that with the noise renewal all be to finish in the Mel territory, the optimum Wiener filtering that therefore in fact is still in the Mel territory is handled.Because the noise estimated result of each Mel frequency band can not comprise speech components, therefore can guarantee the reliability that noise is estimated effectively, make speech components can in the process of filtering, not sustain damage.
Be understandable that, according to noise estimated signal N (f) or the N of full range band Update(f), those skilled in the art can also adopt the additive method designing filter, and the embodiment of the invention does not need to limit to concrete filter design method.
For the convenience of describing, be divided into various modules with function when describing above the device and describe respectively.Certainly, when enforcement is of the present invention, can in same or a plurality of softwares and/or hardware, realize the function of each module.Wherein said module as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical modules also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills namely can understand and implement under the situation of not paying creative work.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the present invention or embodiment.
The above only is the specific embodiment of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (16)

1. a pronunciation signal noise inhibition method is characterized in that, this method comprises:
Target voice signal X (f) is carried out the Mel territory decompose, obtain K Mel territory subband signal X i(f), (i=1,2 ..., K);
Utilize the preset time window mouth, determine that each subband is in the local minimum energy value of time domain
Figure FDA00003182899800011
And the signal of this minimum energy value correspondence
Figure FDA00003182899800012
Utilize each subband
Figure FDA00003182899800013
Noise signal to respective sub-bands is estimated, obtains the noise estimated signal N (f) of full range band by the splicing estimated result;
Serve as to eliminate target to generate wave filter with N (f), target voice signal X (f) is carried out squelch handle.
2. method according to claim 1 is characterized in that, described each subband that utilizes
Figure FDA00003182899800014
Noise signal to respective sub-bands is estimated, comprising:
With each subband Noise estimated signal as respective sub-bands.
3. method according to claim 1 is characterized in that, described each subband that utilizes
Figure FDA00003182899800016
Noise signal to respective sub-bands is estimated, comprising:
Local minimum energy value with each subband Noise energy estimated value as this subband;
According to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively;
The time quantum that the average sub band signal to noise ratio (S/N ratio) is lower than predetermined threshold value is defined as the non-voice time quantum;
Utilize the non-voice time quantum respective signal of each subband
Figure FDA00003182899800018
With the minimum energy value respective signal
Figure FDA00003182899800019
Noise signal to respective sub-bands is estimated.
4. method according to claim 3 is characterized in that, described noise energy estimated value according to each subband is calculated the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively, comprising:
Only at default Mid Frequency and high band, according to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively.
5. method according to claim 3 is characterized in that, the described non-voice time quantum respective signal of utilizing each subband
Figure FDA00003182899800021
With the minimum energy value respective signal
Figure FDA00003182899800022
Noise signal to respective sub-bands is estimated, comprising:
Utilize X i update ( f ) = &lambda; i X i min ( f ) + ( 1 - &lambda; i ) X i non - speech ( f ) , Obtain the noise estimated signal of each subband; λ wherein iBe preset weight value, 0≤λ i≤ 1.
6. method according to claim 5 is characterized in that, at different subbands, and default different λ iValue.
7. method according to claim 6 is characterized in that, and is described at different subbands, default different λ iValue comprises:
At the subband than low-frequency range, default less λ iValue, at the subband of higher frequency band, default bigger λ iValue.
8. according to each described method of claim 3 to 7, it is characterized in that described time quantum is: an audio frame.
9. a pronunciation signal noise restraining device is characterized in that, this device comprises:
The signal decomposition module is used for that target voice signal X (f) is carried out the Mel territory and decomposes, and obtains K Mel territory subband signal X i(f), (i=1,2 ..., K);
The least energy determination module is used for utilizing the preset time window mouth, determines that each subband is in the local minimum energy value of time domain
Figure FDA00003182899800024
And the signal of this minimum energy value correspondence
Figure FDA00003182899800025
The noise estimation module is used for utilizing each subband
Figure FDA00003182899800026
Noise signal to respective sub-bands is estimated, obtains the noise estimated signal N (f) of full range band by the splicing estimated result;
Noise suppression module, being used for N (f) serves as to eliminate target to generate wave filter, target voice signal X (f) is carried out squelch handle.
10. device according to claim 9 is characterized in that, described noise estimation module specifically is used for:
With each subband
Figure FDA00003182899800027
Noise estimated signal as respective sub-bands.
11. device according to claim 9 is characterized in that, described noise estimation module comprises:
The time quantum type is judged submodule, is used for the local minimum energy value with each subband
Figure FDA00003182899800031
Noise energy estimated value as this subband; According to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively; The time quantum that the average sub band signal to noise ratio (S/N ratio) is lower than predetermined threshold value is defined as the non-voice time quantum;
Noise estimator module is for the non-voice time quantum respective signal of utilizing each subband With the minimum energy value respective signal
Figure FDA00003182899800033
Noise signal to respective sub-bands is estimated.
12. device according to claim 11 is characterized in that, described time quantum type is judged submodule, specifically is used for:
Only at default Mid Frequency and high band, according to the noise energy estimated value of each subband, calculate the average sub band signal to noise ratio (S/N ratio) of each time quantum of target voice signal respectively.
13. device according to claim 11 is characterized in that, described noise estimator module specifically is used for:
Utilize X i update ( f ) = &lambda; i X i min ( f ) + ( 1 - &lambda; i ) X i non - speech ( f ) , Obtain the noise estimated signal of each subband; λ wherein iBe preset weight value, 0≤λ i≤ 1.
14. device according to claim 13 is characterized in that, at different subbands, and default different λ iValue.
15. device according to claim 14 is characterized in that, at the subband than low-frequency range, and default less λ iValue, at the subband of higher frequency band, default bigger λ iValue.
16. according to each described device of claim 11 to 15, it is characterized in that described time quantum is: an audio frame.
CN 201310175549 2013-05-13 2013-05-13 Method and device for suppressing noise of voice signals Pending CN103295580A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201310175549 CN103295580A (en) 2013-05-13 2013-05-13 Method and device for suppressing noise of voice signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201310175549 CN103295580A (en) 2013-05-13 2013-05-13 Method and device for suppressing noise of voice signals

Publications (1)

Publication Number Publication Date
CN103295580A true CN103295580A (en) 2013-09-11

Family

ID=49096336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201310175549 Pending CN103295580A (en) 2013-05-13 2013-05-13 Method and device for suppressing noise of voice signals

Country Status (1)

Country Link
CN (1) CN103295580A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106486131A (en) * 2016-10-14 2017-03-08 上海谦问万答吧云计算科技有限公司 A kind of method and device of speech de-noising
CN108022591A (en) * 2017-12-30 2018-05-11 北京百度网讯科技有限公司 The processing method of speech recognition, device and electronic equipment in environment inside car
CN111477243A (en) * 2020-04-16 2020-07-31 维沃移动通信有限公司 Audio signal processing method and electronic equipment
CN111970014A (en) * 2020-08-10 2020-11-20 紫光展锐(重庆)科技有限公司 Method for estimating noise of signal and related product
CN112002339A (en) * 2020-07-22 2020-11-27 海尔优家智能科技(北京)有限公司 Voice noise reduction method and device, computer-readable storage medium and electronic device
CN113409813A (en) * 2021-05-26 2021-09-17 北京捷通华声科技股份有限公司 Voice separation method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106486131A (en) * 2016-10-14 2017-03-08 上海谦问万答吧云计算科技有限公司 A kind of method and device of speech de-noising
CN108022591A (en) * 2017-12-30 2018-05-11 北京百度网讯科技有限公司 The processing method of speech recognition, device and electronic equipment in environment inside car
CN108022591B (en) * 2017-12-30 2021-03-16 北京百度网讯科技有限公司 Processing method and device for voice recognition in-vehicle environment and electronic equipment
US11017799B2 (en) 2017-12-30 2021-05-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Method for processing voice in interior environment of vehicle and electronic device using noise data based on input signal to noise ratio
CN111477243A (en) * 2020-04-16 2020-07-31 维沃移动通信有限公司 Audio signal processing method and electronic equipment
CN112002339A (en) * 2020-07-22 2020-11-27 海尔优家智能科技(北京)有限公司 Voice noise reduction method and device, computer-readable storage medium and electronic device
CN112002339B (en) * 2020-07-22 2024-01-26 海尔优家智能科技(北京)有限公司 Speech noise reduction method and device, computer-readable storage medium and electronic device
CN111970014A (en) * 2020-08-10 2020-11-20 紫光展锐(重庆)科技有限公司 Method for estimating noise of signal and related product
CN113409813A (en) * 2021-05-26 2021-09-17 北京捷通华声科技股份有限公司 Voice separation method and device

Similar Documents

Publication Publication Date Title
US7313518B2 (en) Noise reduction method and device using two pass filtering
EP1891624B1 (en) Multi-sensory speech enhancement using a speech-state model
US7286980B2 (en) Speech processing apparatus and method for enhancing speech information and suppressing noise in spectral divisions of a speech signal
US20080140396A1 (en) Model-based signal enhancement system
CN103295580A (en) Method and device for suppressing noise of voice signals
Xiao et al. Normalization of the speech modulation spectra for robust speech recognition
Hansen et al. Speech enhancement based on generalized minimum mean square error estimators and masking properties of the auditory system
CN109147798B (en) Speech recognition method, device, electronic equipment and readable storage medium
Soe Naing et al. Discrete Wavelet Denoising into MFCC for Noise Suppressive in Automatic Speech Recognition System.
EP1995722A1 (en) Method for processing an acoustic input signal to provide an output signal with reduced noise
KR20110021419A (en) Apparatus and method for reducing noise in the complex spectrum
Thomas et al. Acoustic and data-driven features for robust speech activity detection
US20070250312A1 (en) Signal processing apparatus and method thereof
Chang Warped discrete cosine transform-based noisy speech enhancement
Ouyang et al. A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement.
Loweimi et al. Robust Source-Filter Separation of Speech Signal in the Phase Domain.
KR102033469B1 (en) Adaptive noise canceller and method of cancelling noise
Wu et al. Noise-robust speech feature processing with empirical mode decomposition
Srinivas et al. A classification-based non-local means adaptive filtering for speech enhancement and its FPGA prototype
Guzewich et al. Improving Speaker Verification for Reverberant Conditions with Deep Neural Network Dereverberation Processing.
Rao et al. Speech enhancement using sub-band cross-correlation compensated Wiener filter combined with harmonic regeneration
Hamid et al. Single channel speech enhancement using adaptive soft-thresholding with bivariate EMD
Mallidi et al. Robust speaker recognition using spectro-temporal autoregressive models.
Maganti et al. A perceptual masking approach for noise robust speech recognition
Seyedin et al. New features using robust MVDR spectrum of filtered autocorrelation sequence for robust speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20130911

RJ01 Rejection of invention patent application after publication