CN106486131A - A kind of method and device of speech de-noising - Google Patents
A kind of method and device of speech de-noising Download PDFInfo
- Publication number
- CN106486131A CN106486131A CN201610898662.4A CN201610898662A CN106486131A CN 106486131 A CN106486131 A CN 106486131A CN 201610898662 A CN201610898662 A CN 201610898662A CN 106486131 A CN106486131 A CN 106486131A
- Authority
- CN
- China
- Prior art keywords
- noise
- speech
- estimate
- power spectrum
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Abstract
The embodiment of the invention discloses a kind of method and device of speech de-noising.The method includes:Speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame;Respectively noise estimation is carried out to speech frame and non-speech frame, noise power spectrum fusion estimate is obtained, wherein, the noise power spectrum fusion estimate is the fusion value of speech frame noise power spectrum estimate and non-speech frame noise power spectrum estimate;Merging estimate according to the noise power spectrum carries out denoising to the Noisy Speech Signal.The embodiment of the present invention is by adopting technique scheme, noise estimation has all been carried out to speech frame and non-speech frame, and comprehensively both noise estimated results carry out denoising to Noisy Speech Signal, can be effectively improved the denoising effect of existing voice denoising scheme, improve voice quality.
Description
Technical field
The present embodiments relate to voice process technology, more particularly to a kind of method and device of speech de-noising.
Background technology
During real-time speech communicating, various noise jamming problems can be run into, especially for mobile devices such as mobile phones
For, voice noise problem seems especially prominent.Additionally, in the case of sound is played by loudspeaker, due to there is echo
Problem, so for remote recording, in this case the tonequality of voice is highly prone to external environment noise and non-linear
The impact of residual echo.
In order to improve voice communication quality, need to carry out denoising to voice, to improve the definition of voice.Traditional
Speech de-noising algorithm usually assumes that noise is additivity and smoothly, using voice activity detection (Voice Activity
Detection, VAD) noisy speech divided into phonological component and non-speech portion (i.e. unvoiced segments), non-speech portion master by technology
Noise characteristic to be shown as, is then processed to non-speech portion by certain statistical method again, you can obtain ambient noise
The approximate evaluation of characteristic.However, the noise in phonological component there may be difference with the noise of non-speech portion, especially receiving
(there is the noise of the property taken advantage of) in the case of affecting to residual echo, according only to coming the noise estimated result of non-speech portion to whole
Body voice signal carries out the effect on driving birds is not good of denoising.
Content of the invention
A kind of method and device of speech de-noising is embodiments provided, to improve going for existing voice denoising scheme
Make an uproar effect.
In a first aspect, embodiments providing a kind of method of speech de-noising, the method includes:
Speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Respectively noise estimation is carried out to speech frame and non-speech frame, noise power spectrum fusion estimate is obtained, wherein, described
Noise power spectrum fusion estimate is the fusion of speech frame noise power spectrum estimate and non-speech frame noise power spectrum estimate
Value;
Merging estimate according to the noise power spectrum carries out denoising to the Noisy Speech Signal.
Second aspect, the embodiment of the present invention additionally provide a kind of device of speech de-noising, and the device includes:
Speech detection module, for carrying out speech detection to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Noise estimation module, for carrying out noise estimation to speech frame and non-speech frame respectively, obtains noise power spectrum and melts
Estimate is closed, wherein, the noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-speech frame noise
The fusion value of power Spectral Estimation value;
Denoising module, goes to the Noisy Speech Signal for merging estimate according to the noise power spectrum
Make an uproar process.
A kind of method and device of speech de-noising is embodiments provided, by voice being carried out to Noisy Speech Signal
Detection, distinguishes speech frame and non-speech frame, and carries out noise estimation to which, obtains noise power spectrum fusion estimate, according to
Noise power spectrum fusion estimate carries out denoising to Noisy Speech Signal.By adopting technique scheme, to speech frame
Noise estimation is all carried out with non-speech frame, and comprehensively both noise estimated results have been carried out at denoising to Noisy Speech Signal
Reason, can be effectively improved the denoising effect of existing voice denoising scheme, improve voice quality.
Description of the drawings
Fig. 1 is the method flow diagram of the speech de-noising that the embodiment of the present invention one is provided;
Fig. 2 is the method flow diagram of the speech de-noising that the embodiment of the present invention two is provided;
Fig. 3 is the method flow diagram of the speech de-noising that the embodiment of the present invention three is provided;
Fig. 4 is the method flow diagram of the speech de-noising that the embodiment of the present invention four is provided;
Fig. 5 a is the original Noisy Speech Signal spectrogram that the embodiment of the present invention four is provided;
Fig. 5 b is the speech signal spec-trum figure after the denoising that the embodiment of the present invention four is provided;
Fig. 6 is the structural representation of the device of the speech de-noising that the embodiment of the present invention five is provided.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment that states is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just
Part related to the present invention rather than entire infrastructure is illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is the method flow diagram of the speech de-noising that the embodiment of the present invention one is provided, and the present embodiment can be used for speech de-noising,
The method can be executed by the device of speech de-noising, and the device can be realized by the mode of software and/or hardware, and the device is permissible
It is integrated in any intelligent terminal that speech de-noising function is provided, in implementing, intelligent terminal may include:Panel computer, hand
The mobile terminal such as machine and electronic reader, above-mentioned terminal are only citings, and non exhaustive, including but not limited to above-mentioned intelligent terminal.
Referring to Fig. 1, the method for the speech de-noising, including:
S110, speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame.
Intelligent terminal for reception is that a kind of non-stationary time-varying noisy speech by formation after environmental disturbances is believed to voice signal
Number.Intelligent terminal for reception first, is sampled to time domain Noisy Speech Signal to after non-stationary time-varying voice signal, will simulation
Signal is converted into data signal.Generally, the sample frequency of time domain Noisy Speech Signal obtains 44100 for 44100Hz, i.e., one second
Individual sampled data.Adding window framing is carried out to the time domain Noisy Speech Signal after sampling, makes each frame time domain Noisy Speech Signal all
It is stable.Preferably, the window function that commonly uses in speech processes has rectangular window, Hanning window and Hamming window.To adding window framing when
Domain Noisy Speech Signal carries out Fourier transformation, is converted into frequency domain Noisy Speech Signal frame.Wherein, sampling, framing and Fu
In leaf transformation for those skilled in the art common technology means, for brevity, will not be described here.
To frequency domain Noisy Speech Signal frame, speech detection is carried out frame by frame, to distinguish speech frame and non-speech frame.Voice is examined
Survey can be regarded as carrying out feature extraction according to speech characteristic parameter, and wherein, speech characteristic parameter effectively can represent phonetic feature,
With good distinction, voice and non-voice can be efficiently differentiated out according to its feature.VAD skill can be adopted in the present embodiment
Art carries out speech detection.Generally, can be by the mel-frequency cepstrum coefficient (Mel in the frequency domain character parameter of extraction voice signal
Frequency Cepstrum Coefficient, MFCC) distinguishing speech frame and non-speech frame.
S120, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
Exemplary, after speech detection (VAD), one by one noise estimation can be carried out to each frame.If it is determined that current
Frame is speech frame, then estimate that mode carries out noise estimation to present frame according to the noise of speech frame, obtain speech frame noise
Power estimation value.For example noise estimation can be carried out using minimum tracking algorithm or quantile noise estimation method.Preferably, adopt
Carry out noise estimation with fractional-dimension calculus method, in a period of time, Noisy Speech Signal is in the quantile of this frequency arrowband
Value is considered as the noise power estimation value (also being understood as noise energy) of present band.Specifically, can be according to following public affairs
Formula draws speech frame noise power spectrum estimate:
λd(n, k)=Quantiles (X (n, k)2) n=0,1,2..., M
Wherein, M represents frame number;X (n, k) represents the voice spectrum component of k-th frequency of n-th frame;Quantiles () represents
Quantile is taken, typically takes 0.25 or 0.5.It should be noted that M represents carrying out obtaining after sub-frame processing to Noisy Speech Signal
Totalframes, in above formula, n represents the frame number of speech frame, and the concrete value of n is determined by voice detection results.
If it is determined that present frame is non-speech frame, then estimate that mode carries out noise to present frame according to the noise of non-speech frame
Estimate, obtain non-speech noise power estimation value.Non-speech frame noise power spectrum estimate can be drawn according to equation below:
λd(n, k)=a*X (n, k)2+(1-a)*Quantiles(X(n,k)2) n=0,1,2..., M
It should be noted that n represents the frame number of non-speech frame in the formula, the concrete value of n is determined by voice detection results.
Noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-speech frame noise power Power estimation
According to equation below, the fusion value of value, can show that noise power spectrum merges estimate:
Wherein, L represents noise smoothing siding-to-siding block length, can use 9 frames;A represents weight coefficient, it is preferred that a takes 0.8;λd(n,
K) represent speech frame noise power spectrum estimate or non-speech frame noise power spectrum estimate, specifically can be determined by the value of n;
λlast(n, k) represents noise power spectrum fusion estimate.
S130, according to the noise power spectrum merge estimate denoising is carried out to the Noisy Speech Signal.
Exemplary, after noise power spectrum fusion estimate is obtained, prior weight is entered using direct judgement method
Row is estimated.Estimate that the method for prior weight is not limited only to above-mentioned direct judgement method, the algorithm that can also be suitable for using other, example
Such as Casual algorithm, Non-Casual algorithm etc..In the present embodiment, estimate can be merged according to noise power spectrum and calculate priori letter
Make an uproar and compare, Wiener filtering gain function can be correspondingly made available according to prior weight.Obtain Wiener filtering gain function it
Afterwards, Wiener filtering is carried out to frequency domain Noisy Speech Signal, obtains frequency domain and remove noisy speech signal.The estimation of prior weight, wiener
The calculating of filtering gain function can refer to existing calculation, will not be described here.
Further, inversefouriertransform can be carried out to the voice signal of frequency domain denoising, it is possible to by Overlap method
The final output voice of synthesis, completes the process of whole speech de-noising.Wherein, synthesize the method for final output voice and not only limit
In Overlap, here will not enumerate.
The present embodiment provide technical scheme, by carrying out speech detection to Noisy Speech Signal, distinguish speech frame and
Non-speech frame, and noise estimation is carried out to which, noise power spectrum fusion estimate is obtained, estimate is merged according to noise power spectrum
Denoising is carried out to Noisy Speech Signal.By technique scheme is adopted, not only noise figure can be estimated in non-speech frame,
While in speech frame, can also update the estimate of noise component(s), and comprehensively both noise estimated results are believed to noisy speech
Number denoising is carried out, the denoising effect of existing voice denoising scheme can be effectively improved, improves voice quality.
The speech de-noising method of the embodiment of the present invention is applicable to the voice signal during real-time voice network service
Denoising is carried out, the denoising effect under the application scenarios such as real-time voice chat and online question-answering is compared to existing voice denoising side
For case, effect is especially prominent.Compared with black phone voice communication and social networks phone, speech de-noising during online question-answering
Difficulty is larger, this is because during online question-answering, student in order to see the mobile phone or computer screen real-time notes, be unwilling to beat as tradition
Phone equally againsts ear and speaks, and student seldom wears earphone during online question-answering.Based on above-mentioned situation, played by loudspeaker
Sound, echo problem are more projected, and relatively remote record, speech quality is highly prone to external environment noise and non-linear residual
The impact of remaining echo.The speech de-noising method of the embodiment of the present invention, to application scenarios such as real-time voice chat and online question-answerings
Under speech de-noising, the noise that not only can suppress in non-speech portion, and the noise of phonological component can be suppressed, especially
Inhibition to residual echo is more obvious, can be effectively improved the denoising effect of existing voice denoising scheme, improve voice
Quality.
Embodiment two
Fig. 2 is the method flow diagram of the speech de-noising that the embodiment of the present invention two is provided, and the present embodiment is in above-described embodiment
On the basis of, when carrying out speech detection to Noisy Speech Signal, it is proposed that an effective phonetic feature combination, can be more accurate
Distinguish noisy speech frame and non-speech frame.
Referring to Fig. 2, the method for the speech de-noising, including:
S210, the phonetic feature of extraction Noisy Speech Signal.
The noisy speech feature of extraction includes mel cepstrum coefficients MFCC, linear predictive coding residual sum spectral centroid
Centroid.The mankind have different perceptions to different frequency voice:To below 1kHz, linear with frequency, right
More than 1kHz, becomes logarithmic relationship with frequency.Frequency is higher, and perception is poorer.In the application usually only using low frequency MFCC,
And abandon medium-high frequency MFCC.In Mel (Mel) frequency domain, people is linear relationship to the perception of tone, if two sections of voices
Mel difference on the frequency twice, then people is in perceptually also poor twice.Mel frequency domain is simulation perception energy of the human ear to different frequency voice
Power, in order to this perception characteristic of human ear is simulated, common frequencies yardstick is transformed into by Mel dimensions in frequency according to equation below:
Wherein, f is actual linear frequency, fmelFor Mel mark frequency.
Time domain Noisy Speech Signal to adding window framing carries out Fourier transformation, is converted into frequency domain Noisy Speech Signal
After frame, the triangle bandpass filter group of frequency upper linear distribution, power spectral filter to voice signal are marked with one group of M Mel.Each
The scope that individual triangle bandpass filter is covered all is similar to a critical bandwidth of human ear, and that simulates human ear with this shelters effect
Should.Wherein, on M Mel mark frequency, the triangle bandpass filter group response formula of linear distribution is as follows:
Wherein, HmK () represents the triangle bandpass filter group of the upper linear distribution of Mel mark frequency, k represents k-th wave filter, 1
≤ m≤M, M typically take the centre frequency that 40, f (m) represents m-th triangular filter.
Logarithm is asked for output P (m) of triangle bandpass filter group according to equation below, can obtain being similar to homomorphism change
The result that changes:
X (m)=log (P (m))
N rank discrete cosine transform (Discrete Cosine Transformation, DCT) is carried out to X (m), is removed each
Correlation between dimension voice signal, voice signal is mapped to lower dimensional space, tries to achieve the MFCC parameter of standard.According to following public affairs
Formula tries to achieve the MFCC parameter of standard:
Wherein, N typically takes 13, XkThe MFCC parameter of expression standard.
Linear predictive coding (Linear Prediction Coding LPC) residual error is LPC residual, and it belongs to excitation letter
Breath source information, reflects the cyclophysis of vocal cord vibration, and noise does not have the characteristic.Wherein, the voice frequency of n-th sampled point
Spectrum output X (n) can be obtained by the linear combination of front p specimen sample point estimation, can be obtained according to equation below:
X(n)≈a1X(n-1)+a2X(n-2)+...apX(n-p)
Wherein, a1、a2...apFor p rank LPC coefficient, minimized by the systematic error quadratic sum of the model and try to achieve, but original
Frame voice spectrum data are poor with X (n), obtain LPC residual.
Spectral centroid Centroid reflects the barycenter of frequency spectrum, wherein, the barycenter of speech region closer to 800~
4000Hz, and the barycenter distribution of noise is significantly different.Spectral centroid Centroid can be obtained according to equation below:
Preferably, the phonetic feature also includes that frequency spectrum flatness Flatness, spectrum offset amount Rolloff and frequency spectrum are disturbed
At least one of dynamic degree Zcr.
Wherein it is possible to obtain frequency spectrum flatness Flatness according to equation below:
Spectrum offset amount Rolloff is one kind tolerance to spectral shape, and it describes the deviation post of 85% energy spectrum.Can
To obtain spectrum offset amount Rolloff according to equation below:
Frequency spectrum disturbance degree Zcr reflects voice spectrum disturbance, chaotic degree.Frequency spectrum can be obtained according to equation below
Disturbance degree Zcr:
S220, speech model and noise model are generated using classifier training according to the phonetic feature for being extracted, to distinguish
Go out speech frame and non-speech frame.
The phonetic feature of said extracted is constituted speech feature vector, voice is trained using grader, generate language
Sound model and noise model.Wherein, grader can using mixed Gauss model (Gaussian Mixture Model, GMM) or
SVMs (Support Vector Machine, SVM), is instructed to voice using mixed Gauss model in the present embodiment
Practice, generate speech model and noise model.Output probability maximum according to speech model and noise model is carried out to every frame voice
Judge, to distinguish speech frame and non-speech frame.When being trained to speech model and noise model, expectation maximization can be used
Method, estimates the initial parameter of each Gauss based on k mean algorithm, through iteration several times, until speech model and noise mode
Type convergence is restrained.
S230, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
S240, according to the noise power spectrum merge estimate denoising is carried out to the Noisy Speech Signal.
The method of the speech de-noising that the present embodiment is provided, during by carrying out speech detection to Noisy Speech Signal, extracts band
The phonetic feature of noisy speech signal, constitutes an effective phonetic feature combination, can more accurately distinguish noisy speech
Frame and non-speech frame, contribute to improving the denoising effect of speech de-noising scheme further, improve voice quality.
Embodiment three
Fig. 3 is the method flow diagram of the speech de-noising that the embodiment of the present invention three is provided, and the present embodiment is in above-described embodiment
On the basis of, stationary noise suppression, non-speech noise suppression and nonstationary noise suppression is carried out to Noisy Speech Signal.
Referring to Fig. 3, the method for the speech de-noising, including:
S310, speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame.
S320, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
S330, according to the noise power spectrum merge estimate the Noisy Speech Signal is carried out stationary noise suppression,
Non-speech noise suppression and nonstationary noise suppression.
Exemplary, after noise power spectrum fusion estimate is obtained, can be according to the Wiener filtering gain function pair for calculating
The frequency domain Noisy Speech Signal carries out denoising.Suppression to stationary noise, non-voice can be completed by said process
The suppression of noise and the suppression of nonstationary noise, wherein, the suppression to three kinds of noises can be carried out step by step, i.e., first to voice
The stationary noise of signal is suppressed, and then again non-speech noise is suppressed, and finally recycles a nonstationary noise suppression
Factor pair nonstationary noise processed is suppressed, and is finally exported the voice signal for completing above-mentioned three kinds of noise suppressed, you can obtain
The voice signal of final denoising.Additionally, also can the feature of comprehensive three kinds of noises carry out disposable noise suppressed, and export denoising
Voice signal.
The method of the speech de-noising that the present embodiment is provided, by carrying out stationary noise suppression, non-language to Noisy Speech Signal
Sound noise suppressed and nonstationary noise suppression, improve the denoising effect of speech de-noising scheme further, can not only suppress flat
Steady noise, moreover it is possible to suppress nonstationary noise and residual echo, improve voice quality.
Example IV
Fig. 4 is the method flow diagram of the speech de-noising that the embodiment of the present invention four is provided, and the present embodiment is in above-described embodiment
On the basis of, the fusion noise that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to noise power spectrum
Inhibiting factor, and stationary noise suppression, non-voice are carried out to the Noisy Speech Signal according to the fusion noise suppression factor
Noise suppressed and nonstationary noise suppression.
Referring to Fig. 4, the method for the speech de-noising, including:
S410, speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame.
S420, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
S430, merge estimate according to the noise power spectrum and generate stationary noise, non-speech noise and nonstationary noise
Fusion noise suppression factor.
The fusion noise suppression factor of stationary noise, non-speech noise and nonstationary noise is made an uproar to all in voice signal
One comprehensive consideration of sound, according to the fusion noise suppression factor, can be disposably to all of noise in voice signal
Suppressed, and without the need for suppressing to three noise likes substep, improved the treatment effeciency of speech de-noising.Permissible according to equation below
Obtain merging noise suppression factor:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor,
Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.θ is an invariant, and value is general
For a constant between 0.001-0.1.
Wherein, the wiener inhibiting factor can be drawn by equation below:
Wherein, γ (n, k) represents posteriori SNR,Represent prior weight, max () expression takes maximum, λlast
(n, k) represents noise power spectrum fusion estimate, variable factor when β (n) represents.In traditional algorithm, β is immobilisation factor, generally takes
0.9~0.98, and in the algorithm using the β (n) of time-varying, be conducive to making fast reaction to voice signal conversion.
Because the problem of residual echo, VAD detection is caused it is possible that erroneous judgement, needs to carry out frequency spectrum further point
Analysis, because within voice is concentrated mainly on 0~4000Hz of low frequency, even if under the interference of residual echo, occupies voice at least
50% energy above, so need to suppress non-speech portion (including noise and residual echo).
In this step, the non-voice inhibiting factor can be drawn by equation below:
Work as Ratio>During=threshold
Otherwise, coeff (n, k)=1.0
Wherein, Ratio representsWithRatio;Low, for presetting minimum, is an experience
Value;Threshold is given threshold, and threshold generally takes 1.0.
Optionally, the fusion noise suppression factor can also be:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor,
Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
S440, according to described fusion noise suppression factor carry out stationary noise suppression, non-language to the Noisy Speech Signal
Sound noise suppressed and nonstationary noise suppression.
After obtaining the fusion noise suppression factor, Noisy Speech Signal can be suppressed, obtain the voice of denoising
Signal.Can obtain final removing noisy speech signal according to equation below (1) or formula (2).As shown in Figure 5 a, it is that grandfather tape is made an uproar
The spectrogram of voice signal;As shown in Figure 5 b, be according to the present embodiment provide speech de-noising method denoising after voice letter
Number spectrogram.
Wherein, output (n, k) represents the voice spectrum component after the denoising of k-th frequency of n-th frame.
Comparison diagram 5a and Fig. 5 b can be seen that the method for the present embodiment can effectively be suppressed to various noise component(s)s, right
The inhibition of nonstationary noise is especially prominent, and after denoising, speech components are effectively maintained.
The method of the speech de-noising that the present embodiment is provided, is steadily made an uproar by merging estimate generation according to noise power spectrum
The fusion noise suppression factor of sound, non-speech noise and nonstationary noise, and according to fusion noise suppression factor to noisy speech
Signal carries out denoising, can disposably complete the work to speech de-noising, improves the treatment effeciency of speech de-noising.Effectively change
The denoising effect of speech de-noising scheme has been apt to it, stationary noise can not only have been suppressed, moreover it is possible to which analysis has estimated non-voice composition, to non-language
Sound noise is suppressed, while when suppressing to nonstationary noise, weak component is carried out reducing, strong component is amplified, real
Existing dynamic smoothing non-stationary noise reduction, improves voice quality.
Embodiment five
Fig. 5 is the structural representation of the device of the speech de-noising that the embodiment of the present invention five is provided, the dress of the speech de-noising
Put including speech detection module 610, noise estimation module 620 and denoising module 630, below each module is carried out specifically
Bright.
The speech detection module 610, for carrying out speech detection to Noisy Speech Signal, to distinguish speech frame and non-
Speech frame;
The noise estimation module 620, for carrying out noise estimation to speech frame and non-speech frame respectively, obtains noise work(
Rate spectrum fusion estimate, wherein, the noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-voice
The fusion value of frame noise power spectrum estimate;
The denoising module 630, believes to the noisy speech for merging estimate according to the noise power spectrum
Number carry out denoising.
The present embodiment provide technical scheme, by carrying out speech detection to Noisy Speech Signal, distinguish speech frame and
Non-speech frame, and noise estimation is carried out to which, noise power spectrum fusion estimate is obtained, estimate is merged according to noise power spectrum
Denoising is carried out to Noisy Speech Signal.By technique scheme is adopted, not only noise figure can be estimated in non-speech frame,
While in speech frame, can also update the estimate of noise component(s), and comprehensively both noise estimated results are believed to noisy speech
Number denoising is carried out, the denoising effect of existing voice denoising scheme can be effectively improved, improves voice quality.
Optionally, speech detection module 610 includes:
Speech feature extraction unit, for extracting the phonetic feature of Noisy Speech Signal, wherein, the phonetic feature includes
Mel cepstrum coefficients MFCC, linear predictive coding residual sum spectral centroid Centroid;
Taxon, for generating speech model and noise mode according to the phonetic feature for being extracted using classifier training
Type, to distinguish speech frame and non-speech frame.
Optionally, the phonetic feature also includes that frequency spectrum flatness Flatness, spectrum offset amount Rolloff and frequency spectrum are disturbed
At least one of dynamic degree Zcr.
Optionally, noise estimation module 620 includes:
Speech frame noise estimation module, for drawing speech frame noise power spectrum estimate according to equation below:
λd(n, k)=Quantiles (X (n, k)2) n=0,1,2..., M
Non-speech frame noise estimation module, for drawing non-speech frame noise power spectrum estimate according to equation below:
λd(n, k)=a*X (n, k)2+(1-a)*Quantiles(X(n,k)2) n=0,1,2..., M
According to equation below, fusion noise estimation module, for showing that noise power spectrum merges estimate:
Wherein, M represents frame number, and X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and Quantiles () represents
Quantile is taken, L represents noise smoothing siding-to-siding block length, a represents weight coefficient, λd(n, k) represents speech frame noise power Power estimation
Value or non-speech frame noise power spectrum estimate, λlast(n, k) represents noise power spectrum fusion estimate.
Optionally, denoising module 630 includes:
Denoising unit, puts down to the Noisy Speech Signal for merging estimate according to the noise power spectrum
Steady noise suppressed, non-speech noise suppression and nonstationary noise suppression.
Optionally, denoising unit specifically for:
The fusion that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to the noise power spectrum
Noise suppression factor;
Stationary noise suppression, non-speech noise are carried out to the Noisy Speech Signal according to the fusion noise suppression factor
Suppression and nonstationary noise suppression.
Optionally, the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor,
Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
Optionally, the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor,
Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
Optionally, the wiener inhibiting factor is drawn by equation below:
Wherein, γ (n, k) represents posteriori SNR,Represent prior weight, max () expression takes maximum, λlast
(n, k) represents noise power spectrum fusion estimate, variable factor when β (n) represents.
Optionally, the non-voice inhibiting factor is drawn by equation below:
Work as Ratio>During=threshold
Otherwise, coeff (n, k)=1.0
Wherein, Ratio representsWithRatio, low for preset minimum, threshold
For given threshold.
The device of the speech de-noising provided by the embodiment of the present invention can perform the voice provided by any embodiment of the present invention
The method of denoising, possesses the corresponding functional module of execution method and beneficial effect.
Obviously, it will be understood by those skilled in the art that each module of the above-mentioned present invention or each step can be by as above
Described sliced service device and management server are implementing.Optionally, the embodiment of the present invention be able to can be held with computer installation
Capable program realizing, such that it is able to be stored in storage device being executed by processor, can deposit by described program
It is stored in a kind of computer-readable recording medium, storage medium mentioned above can be read-only storage, disk or CD etc.;
Or they are fabricated to each integrated circuit modules respectively, or the multiple modules in them or step are fabricated to single collection
Become circuit module to realize.So, the present invention is not restricted to the combination of any specific hardware and software.
Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes,
Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
Other Equivalent embodiments more can be included, and the scope of the present invention is determined by scope of the appended claims.
Claims (15)
1. a kind of method of speech de-noising, it is characterised in that include:
Speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate, wherein, the noise
Power spectrum fusion estimate is the fusion value of speech frame noise power spectrum estimate and non-speech frame noise power spectrum estimate;
Merging estimate according to the noise power spectrum carries out denoising to the Noisy Speech Signal.
2. method according to claim 1, it is characterised in that speech detection is carried out to Noisy Speech Signal, to distinguish
Speech frame and non-speech frame, including:
The phonetic feature of Noisy Speech Signal is extracted, wherein, the phonetic feature includes mel cepstrum coefficients MFCC, linear prediction
Coded residual and spectral centroid Centroid;
Speech model and noise model are generated using classifier training according to the phonetic feature for being extracted, with distinguish speech frame and
Non-speech frame.
3. method according to claim 2, it is characterised in that the phonetic feature also includes frequency spectrum flatness
At least one of Flatness, spectrum offset amount Rolloff and frequency spectrum disturbance degree Zcr.
4. method according to claim 1, it is characterised in that respectively noise estimation is carried out to speech frame and non-speech frame,
Noise power spectrum fusion estimate is obtained, including:
Speech frame noise power spectrum estimate is drawn according to equation below:
λd(n, k)=Quantiles (X (n, k)2) n=0,1,2..., M
Non-speech frame noise power spectrum estimate is drawn according to equation below:
λd(n, k)=a*X (n, k)2+(1-a)*Quantiles(X(n,k)2) n=0,1,2..., M
Show that noise power spectrum merges estimate according to equation below:
Wherein, M represents frame number, and X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and Quantiles () represents and takes point
Digit, L represent noise smoothing siding-to-siding block length, and a represents weight coefficient, λd(n, k) represent speech frame noise power spectrum estimate or
Non-speech frame noise power spectrum estimate, λlast(n, k) represents noise power spectrum fusion estimate.
5. method according to claim 1, it is characterised in that described estimate is merged to institute according to the noise power spectrum
Stating Noisy Speech Signal carries out denoising, including:
Merging estimate according to the noise power spectrum carries out stationary noise suppression, non-speech noise to the Noisy Speech Signal
Suppression and nonstationary noise suppression.
6. method according to claim 5, it is characterised in that estimate is merged to the band according to the noise power spectrum
Noisy speech signal carries out stationary noise suppression, non-speech noise suppression and nonstationary noise suppression, including:
The fusion noise that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to the noise power spectrum
Inhibiting factor;
Stationary noise suppression, non-speech noise suppression are carried out according to the fusion noise suppression factor to the Noisy Speech Signal
And nonstationary noise suppression.
7. method according to claim 6, it is characterised in that the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff
(n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
8. method according to claim 6, it is characterised in that the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff
(n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
9. the method according to claim 7 or 8, it is characterised in that the wiener inhibiting factor is drawn by equation below:
Wherein, γ (n, k) represents posteriori SNR,Represent prior weight, max () expression takes maximum, λlast(n,
K) noise power spectrum fusion estimate, variable factor when β (n) represents are represented.
10. the method according to claim 7 or 8, it is characterised in that the non-voice inhibiting factor is obtained by equation below
Go out:
Work as Ratio>During=threshold
Otherwise, coeff (n, k)=1.0
Wherein, Ratio representsWithRatio, for presetting minimum, threshold is for setting for low
Determine threshold value.
11. a kind of devices of speech de-noising, it is characterised in that include:
Speech detection module, for carrying out speech detection to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Noise estimation module, for carrying out noise estimation to speech frame and non-speech frame respectively, obtains noise power spectrum fusion and estimates
Evaluation, wherein, the noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-speech frame noise power
The fusion value of Power estimation value;
Denoising module, is carried out at denoising to the Noisy Speech Signal for merging estimate according to the noise power spectrum
Reason.
12. devices according to claim 11, it is characterised in that denoising module includes:
Denoising unit, is steadily made an uproar to the Noisy Speech Signal for merging estimate according to the noise power spectrum
Sound suppression, non-speech noise suppression and nonstationary noise suppression.
13. devices according to claim 12, it is characterised in that denoising unit specifically for:
The fusion noise that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to the noise power spectrum
Inhibiting factor;
Stationary noise suppression, non-speech noise suppression are carried out according to the fusion noise suppression factor to the Noisy Speech Signal
And nonstationary noise suppression.
14. devices according to claim 13, it is characterised in that the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff
(n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
15. devices according to claim 13, it is characterised in that the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff
(n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610898662.4A CN106486131B (en) | 2016-10-14 | 2016-10-14 | A kind of method and device of speech de-noising |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610898662.4A CN106486131B (en) | 2016-10-14 | 2016-10-14 | A kind of method and device of speech de-noising |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106486131A true CN106486131A (en) | 2017-03-08 |
CN106486131B CN106486131B (en) | 2019-10-11 |
Family
ID=58269971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610898662.4A Active CN106486131B (en) | 2016-10-14 | 2016-10-14 | A kind of method and device of speech de-noising |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106486131B (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108122561A (en) * | 2017-12-19 | 2018-06-05 | 广东小天才科技有限公司 | A kind of spoken voice assessment method and electronic equipment based on electronic equipment |
CN108648765A (en) * | 2018-04-27 | 2018-10-12 | 海信集团有限公司 | A kind of method, apparatus and terminal of voice abnormality detection |
CN108735229A (en) * | 2018-06-12 | 2018-11-02 | 华南理工大学 | A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device |
CN108847249A (en) * | 2018-05-30 | 2018-11-20 | 苏州思必驰信息科技有限公司 | Sound converts optimization method and system |
CN108922556A (en) * | 2018-07-16 | 2018-11-30 | 百度在线网络技术(北京)有限公司 | sound processing method, device and equipment |
CN109087657A (en) * | 2018-10-17 | 2018-12-25 | 成都天奥信息科技有限公司 | A kind of sound enhancement method applied to ultrashort wave radio set |
CN109189975A (en) * | 2018-09-06 | 2019-01-11 | 深圳市三宝创新智能有限公司 | A kind of method for playing music, device, computer equipment and readable storage medium storing program for executing |
CN109616133A (en) * | 2018-09-28 | 2019-04-12 | 广州智伴人工智能科技有限公司 | A kind of environmental noise removal system |
CN109829035A (en) * | 2018-12-19 | 2019-05-31 | 平安国际融资租赁有限公司 | Process searching method, device, computer equipment and storage medium |
CN109979476A (en) * | 2017-12-28 | 2019-07-05 | 电信科学技术研究院 | A kind of method and device of speech dereverbcration |
CN110277087A (en) * | 2019-07-03 | 2019-09-24 | 四川大学 | A kind of broadcast singal anticipation preprocess method |
CN110910906A (en) * | 2019-11-12 | 2020-03-24 | 国网山东省电力公司临沂供电公司 | Audio endpoint detection and noise reduction method based on power intranet |
CN111261183A (en) * | 2018-12-03 | 2020-06-09 | 珠海格力电器股份有限公司 | Method and device for denoising voice |
CN111429930A (en) * | 2020-03-16 | 2020-07-17 | 云知声智能科技股份有限公司 | Noise reduction model processing method and system based on adaptive sampling rate |
CN111429932A (en) * | 2020-06-10 | 2020-07-17 | 浙江远传信息技术股份有限公司 | Voice noise reduction method, device, equipment and medium |
CN111986686A (en) * | 2020-07-09 | 2020-11-24 | 厦门快商通科技股份有限公司 | Short-time speech signal-to-noise ratio estimation method, device, equipment and storage medium |
CN111986691A (en) * | 2020-09-04 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Audio processing method and device, computer equipment and storage medium |
CN112002339A (en) * | 2020-07-22 | 2020-11-27 | 海尔优家智能科技(北京)有限公司 | Voice noise reduction method and device, computer-readable storage medium and electronic device |
CN112053702A (en) * | 2020-09-30 | 2020-12-08 | 北京大米科技有限公司 | Voice processing method and device and electronic equipment |
CN112992190A (en) * | 2021-02-02 | 2021-06-18 | 北京字跳网络技术有限公司 | Audio signal processing method and device, electronic equipment and storage medium |
CN113593599A (en) * | 2021-09-02 | 2021-11-02 | 北京云蝶智学科技有限公司 | Method for removing noise signal in voice signal |
CN113724720A (en) * | 2021-07-19 | 2021-11-30 | 电信科学技术第五研究所有限公司 | Non-human voice filtering method in noisy environment based on neural network and MFCC |
CN113808608A (en) * | 2021-09-17 | 2021-12-17 | 随锐科技集团股份有限公司 | Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy |
WO2022160715A1 (en) * | 2021-01-29 | 2022-08-04 | 北京达佳互联信息技术有限公司 | Voice signal processing method and electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5706395A (en) * | 1995-04-19 | 1998-01-06 | Texas Instruments Incorporated | Adaptive weiner filtering using a dynamic suppression factor |
CN103295580A (en) * | 2013-05-13 | 2013-09-11 | 北京百度网讯科技有限公司 | Method and device for suppressing noise of voice signals |
CN103730126A (en) * | 2012-10-16 | 2014-04-16 | 联芯科技有限公司 | Noise suppression method and noise suppressor |
CN104867497A (en) * | 2014-02-26 | 2015-08-26 | 北京信威通信技术股份有限公司 | Voice noise-reducing method |
-
2016
- 2016-10-14 CN CN201610898662.4A patent/CN106486131B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5706395A (en) * | 1995-04-19 | 1998-01-06 | Texas Instruments Incorporated | Adaptive weiner filtering using a dynamic suppression factor |
CN103730126A (en) * | 2012-10-16 | 2014-04-16 | 联芯科技有限公司 | Noise suppression method and noise suppressor |
CN103295580A (en) * | 2013-05-13 | 2013-09-11 | 北京百度网讯科技有限公司 | Method and device for suppressing noise of voice signals |
CN104867497A (en) * | 2014-02-26 | 2015-08-26 | 北京信威通信技术股份有限公司 | Voice noise-reducing method |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108122561A (en) * | 2017-12-19 | 2018-06-05 | 广东小天才科技有限公司 | A kind of spoken voice assessment method and electronic equipment based on electronic equipment |
CN109979476B (en) * | 2017-12-28 | 2021-05-14 | 电信科学技术研究院 | Method and device for removing reverberation of voice |
CN109979476A (en) * | 2017-12-28 | 2019-07-05 | 电信科学技术研究院 | A kind of method and device of speech dereverbcration |
CN108648765A (en) * | 2018-04-27 | 2018-10-12 | 海信集团有限公司 | A kind of method, apparatus and terminal of voice abnormality detection |
CN108648765B (en) * | 2018-04-27 | 2020-09-25 | 海信集团有限公司 | Method, device and terminal for detecting abnormal voice |
CN108847249A (en) * | 2018-05-30 | 2018-11-20 | 苏州思必驰信息科技有限公司 | Sound converts optimization method and system |
CN108847249B (en) * | 2018-05-30 | 2020-06-05 | 苏州思必驰信息科技有限公司 | Sound conversion optimization method and system |
CN108735229A (en) * | 2018-06-12 | 2018-11-02 | 华南理工大学 | A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device |
CN108922556B (en) * | 2018-07-16 | 2019-08-27 | 百度在线网络技术(北京)有限公司 | Sound processing method, device and equipment |
CN108922556A (en) * | 2018-07-16 | 2018-11-30 | 百度在线网络技术(北京)有限公司 | sound processing method, device and equipment |
CN109189975A (en) * | 2018-09-06 | 2019-01-11 | 深圳市三宝创新智能有限公司 | A kind of method for playing music, device, computer equipment and readable storage medium storing program for executing |
CN109189975B (en) * | 2018-09-06 | 2021-12-24 | 深圳市三宝创新智能有限公司 | Music playing method and device, computer equipment and readable storage medium |
CN109616133A (en) * | 2018-09-28 | 2019-04-12 | 广州智伴人工智能科技有限公司 | A kind of environmental noise removal system |
CN109616133B (en) * | 2018-09-28 | 2021-11-30 | 广州智伴人工智能科技有限公司 | Environmental noise removing system |
CN109087657A (en) * | 2018-10-17 | 2018-12-25 | 成都天奥信息科技有限公司 | A kind of sound enhancement method applied to ultrashort wave radio set |
CN111261183A (en) * | 2018-12-03 | 2020-06-09 | 珠海格力电器股份有限公司 | Method and device for denoising voice |
CN111261183B (en) * | 2018-12-03 | 2022-11-22 | 珠海格力电器股份有限公司 | Method and device for denoising voice |
CN109829035A (en) * | 2018-12-19 | 2019-05-31 | 平安国际融资租赁有限公司 | Process searching method, device, computer equipment and storage medium |
CN110277087A (en) * | 2019-07-03 | 2019-09-24 | 四川大学 | A kind of broadcast singal anticipation preprocess method |
CN110277087B (en) * | 2019-07-03 | 2021-04-23 | 四川大学 | Pre-judging preprocessing method for broadcast signals |
CN110910906A (en) * | 2019-11-12 | 2020-03-24 | 国网山东省电力公司临沂供电公司 | Audio endpoint detection and noise reduction method based on power intranet |
CN111429930A (en) * | 2020-03-16 | 2020-07-17 | 云知声智能科技股份有限公司 | Noise reduction model processing method and system based on adaptive sampling rate |
CN111429930B (en) * | 2020-03-16 | 2023-02-28 | 云知声智能科技股份有限公司 | Noise reduction model processing method and system based on adaptive sampling rate |
CN111429932A (en) * | 2020-06-10 | 2020-07-17 | 浙江远传信息技术股份有限公司 | Voice noise reduction method, device, equipment and medium |
CN111986686A (en) * | 2020-07-09 | 2020-11-24 | 厦门快商通科技股份有限公司 | Short-time speech signal-to-noise ratio estimation method, device, equipment and storage medium |
CN111986686B (en) * | 2020-07-09 | 2023-01-03 | 厦门快商通科技股份有限公司 | Short-time speech signal-to-noise ratio estimation method, device, equipment and storage medium |
CN112002339B (en) * | 2020-07-22 | 2024-01-26 | 海尔优家智能科技(北京)有限公司 | Speech noise reduction method and device, computer-readable storage medium and electronic device |
CN112002339A (en) * | 2020-07-22 | 2020-11-27 | 海尔优家智能科技(北京)有限公司 | Voice noise reduction method and device, computer-readable storage medium and electronic device |
CN111986691B (en) * | 2020-09-04 | 2024-02-02 | 腾讯科技(深圳)有限公司 | Audio processing method, device, computer equipment and storage medium |
CN111986691A (en) * | 2020-09-04 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Audio processing method and device, computer equipment and storage medium |
CN112053702B (en) * | 2020-09-30 | 2024-03-19 | 北京大米科技有限公司 | Voice processing method and device and electronic equipment |
CN112053702A (en) * | 2020-09-30 | 2020-12-08 | 北京大米科技有限公司 | Voice processing method and device and electronic equipment |
WO2022160715A1 (en) * | 2021-01-29 | 2022-08-04 | 北京达佳互联信息技术有限公司 | Voice signal processing method and electronic device |
CN112992190A (en) * | 2021-02-02 | 2021-06-18 | 北京字跳网络技术有限公司 | Audio signal processing method and device, electronic equipment and storage medium |
CN113724720B (en) * | 2021-07-19 | 2023-07-11 | 电信科学技术第五研究所有限公司 | Non-human voice filtering method based on neural network and MFCC (multiple frequency component carrier) in noisy environment |
CN113724720A (en) * | 2021-07-19 | 2021-11-30 | 电信科学技术第五研究所有限公司 | Non-human voice filtering method in noisy environment based on neural network and MFCC |
CN113593599A (en) * | 2021-09-02 | 2021-11-02 | 北京云蝶智学科技有限公司 | Method for removing noise signal in voice signal |
CN113808608B (en) * | 2021-09-17 | 2023-07-25 | 随锐科技集团股份有限公司 | Method and device for suppressing mono noise based on time-frequency masking smoothing strategy |
CN113808608A (en) * | 2021-09-17 | 2021-12-17 | 随锐科技集团股份有限公司 | Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy |
Also Published As
Publication number | Publication date |
---|---|
CN106486131B (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106486131B (en) | A kind of method and device of speech de-noising | |
Srinivasan et al. | Binary and ratio time-frequency masks for robust speech recognition | |
US7133826B2 (en) | Method and apparatus using spectral addition for speaker recognition | |
Cui et al. | Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR | |
CN109256138B (en) | Identity verification method, terminal device and computer readable storage medium | |
Yadav et al. | Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing | |
US20210193149A1 (en) | Method, apparatus and device for voiceprint recognition, and medium | |
Talmon et al. | Single-channel transient interference suppression with diffusion maps | |
Hansen et al. | Speech enhancement based on generalized minimum mean square error estimators and masking properties of the auditory system | |
US20110218803A1 (en) | Method and system for assessing intelligibility of speech represented by a speech signal | |
González et al. | MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition | |
Venturini et al. | On speech features fusion, α-integration Gaussian modeling and multi-style training for noise robust speaker classification | |
Nathwani et al. | An extended experimental investigation of DNN uncertainty propagation for noise robust ASR | |
Schmidt et al. | Reduction of non-stationary noise using a non-negative latent variable decomposition | |
Wang et al. | IRM estimation based on data field of cochleagram for speech enhancement | |
Do et al. | A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech | |
Tian et al. | Spoofing detection under noisy conditions: a preliminary investigation and an initial database | |
Han et al. | Reverberation and noise robust feature compensation based on IMM | |
Herrera-Camacho et al. | Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE | |
Ou et al. | Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis | |
Abka et al. | Speech recognition features: Comparison studies on robustness against environmental distortions | |
CN114302301A (en) | Frequency response correction method and related product | |
Chen et al. | Robust speech recognition using spatial–temporal feature distribution characteristics | |
Shome et al. | Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech | |
Kammee et al. | Sound Identification using MFCC with Machine Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200820 Address after: No.259 Nanjing West Road, Tangqiao town, Zhangjiagang City, Suzhou City, Jiangsu Province Patentee after: Suzhou Qianwen wandaba Education Technology Co., Ltd Address before: Yangpu District State Road 200433 Shanghai City No. 200 Building 5 room 2002 Patentee before: SHANGHAI QIANWENWANDABA CLOUD TECH. Co.,Ltd. |