CN106486131A - A kind of method and device of speech de-noising - Google Patents

A kind of method and device of speech de-noising Download PDF

Info

Publication number
CN106486131A
CN106486131A CN201610898662.4A CN201610898662A CN106486131A CN 106486131 A CN106486131 A CN 106486131A CN 201610898662 A CN201610898662 A CN 201610898662A CN 106486131 A CN106486131 A CN 106486131A
Authority
CN
China
Prior art keywords
noise
speech
estimate
power spectrum
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610898662.4A
Other languages
Chinese (zh)
Other versions
CN106486131B (en
Inventor
吴威麒
张凯磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qianwen wandaba Education Technology Co., Ltd
Original Assignee
Shanghai Qian Wan Answer Cloud Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qian Wan Answer Cloud Computing Technology Co Ltd filed Critical Shanghai Qian Wan Answer Cloud Computing Technology Co Ltd
Priority to CN201610898662.4A priority Critical patent/CN106486131B/en
Publication of CN106486131A publication Critical patent/CN106486131A/en
Application granted granted Critical
Publication of CN106486131B publication Critical patent/CN106486131B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Abstract

The embodiment of the invention discloses a kind of method and device of speech de-noising.The method includes:Speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame;Respectively noise estimation is carried out to speech frame and non-speech frame, noise power spectrum fusion estimate is obtained, wherein, the noise power spectrum fusion estimate is the fusion value of speech frame noise power spectrum estimate and non-speech frame noise power spectrum estimate;Merging estimate according to the noise power spectrum carries out denoising to the Noisy Speech Signal.The embodiment of the present invention is by adopting technique scheme, noise estimation has all been carried out to speech frame and non-speech frame, and comprehensively both noise estimated results carry out denoising to Noisy Speech Signal, can be effectively improved the denoising effect of existing voice denoising scheme, improve voice quality.

Description

A kind of method and device of speech de-noising
Technical field
The present embodiments relate to voice process technology, more particularly to a kind of method and device of speech de-noising.
Background technology
During real-time speech communicating, various noise jamming problems can be run into, especially for mobile devices such as mobile phones For, voice noise problem seems especially prominent.Additionally, in the case of sound is played by loudspeaker, due to there is echo Problem, so for remote recording, in this case the tonequality of voice is highly prone to external environment noise and non-linear The impact of residual echo.
In order to improve voice communication quality, need to carry out denoising to voice, to improve the definition of voice.Traditional Speech de-noising algorithm usually assumes that noise is additivity and smoothly, using voice activity detection (Voice Activity Detection, VAD) noisy speech divided into phonological component and non-speech portion (i.e. unvoiced segments), non-speech portion master by technology Noise characteristic to be shown as, is then processed to non-speech portion by certain statistical method again, you can obtain ambient noise The approximate evaluation of characteristic.However, the noise in phonological component there may be difference with the noise of non-speech portion, especially receiving (there is the noise of the property taken advantage of) in the case of affecting to residual echo, according only to coming the noise estimated result of non-speech portion to whole Body voice signal carries out the effect on driving birds is not good of denoising.
Content of the invention
A kind of method and device of speech de-noising is embodiments provided, to improve going for existing voice denoising scheme Make an uproar effect.
In a first aspect, embodiments providing a kind of method of speech de-noising, the method includes:
Speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Respectively noise estimation is carried out to speech frame and non-speech frame, noise power spectrum fusion estimate is obtained, wherein, described Noise power spectrum fusion estimate is the fusion of speech frame noise power spectrum estimate and non-speech frame noise power spectrum estimate Value;
Merging estimate according to the noise power spectrum carries out denoising to the Noisy Speech Signal.
Second aspect, the embodiment of the present invention additionally provide a kind of device of speech de-noising, and the device includes:
Speech detection module, for carrying out speech detection to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Noise estimation module, for carrying out noise estimation to speech frame and non-speech frame respectively, obtains noise power spectrum and melts Estimate is closed, wherein, the noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-speech frame noise The fusion value of power Spectral Estimation value;
Denoising module, goes to the Noisy Speech Signal for merging estimate according to the noise power spectrum Make an uproar process.
A kind of method and device of speech de-noising is embodiments provided, by voice being carried out to Noisy Speech Signal Detection, distinguishes speech frame and non-speech frame, and carries out noise estimation to which, obtains noise power spectrum fusion estimate, according to Noise power spectrum fusion estimate carries out denoising to Noisy Speech Signal.By adopting technique scheme, to speech frame Noise estimation is all carried out with non-speech frame, and comprehensively both noise estimated results have been carried out at denoising to Noisy Speech Signal Reason, can be effectively improved the denoising effect of existing voice denoising scheme, improve voice quality.
Description of the drawings
Fig. 1 is the method flow diagram of the speech de-noising that the embodiment of the present invention one is provided;
Fig. 2 is the method flow diagram of the speech de-noising that the embodiment of the present invention two is provided;
Fig. 3 is the method flow diagram of the speech de-noising that the embodiment of the present invention three is provided;
Fig. 4 is the method flow diagram of the speech de-noising that the embodiment of the present invention four is provided;
Fig. 5 a is the original Noisy Speech Signal spectrogram that the embodiment of the present invention four is provided;
Fig. 5 b is the speech signal spec-trum figure after the denoising that the embodiment of the present invention four is provided;
Fig. 6 is the structural representation of the device of the speech de-noising that the embodiment of the present invention five is provided.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment that states is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just Part related to the present invention rather than entire infrastructure is illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is the method flow diagram of the speech de-noising that the embodiment of the present invention one is provided, and the present embodiment can be used for speech de-noising, The method can be executed by the device of speech de-noising, and the device can be realized by the mode of software and/or hardware, and the device is permissible It is integrated in any intelligent terminal that speech de-noising function is provided, in implementing, intelligent terminal may include:Panel computer, hand The mobile terminal such as machine and electronic reader, above-mentioned terminal are only citings, and non exhaustive, including but not limited to above-mentioned intelligent terminal.
Referring to Fig. 1, the method for the speech de-noising, including:
S110, speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame.
Intelligent terminal for reception is that a kind of non-stationary time-varying noisy speech by formation after environmental disturbances is believed to voice signal Number.Intelligent terminal for reception first, is sampled to time domain Noisy Speech Signal to after non-stationary time-varying voice signal, will simulation Signal is converted into data signal.Generally, the sample frequency of time domain Noisy Speech Signal obtains 44100 for 44100Hz, i.e., one second Individual sampled data.Adding window framing is carried out to the time domain Noisy Speech Signal after sampling, makes each frame time domain Noisy Speech Signal all It is stable.Preferably, the window function that commonly uses in speech processes has rectangular window, Hanning window and Hamming window.To adding window framing when Domain Noisy Speech Signal carries out Fourier transformation, is converted into frequency domain Noisy Speech Signal frame.Wherein, sampling, framing and Fu In leaf transformation for those skilled in the art common technology means, for brevity, will not be described here.
To frequency domain Noisy Speech Signal frame, speech detection is carried out frame by frame, to distinguish speech frame and non-speech frame.Voice is examined Survey can be regarded as carrying out feature extraction according to speech characteristic parameter, and wherein, speech characteristic parameter effectively can represent phonetic feature, With good distinction, voice and non-voice can be efficiently differentiated out according to its feature.VAD skill can be adopted in the present embodiment Art carries out speech detection.Generally, can be by the mel-frequency cepstrum coefficient (Mel in the frequency domain character parameter of extraction voice signal Frequency Cepstrum Coefficient, MFCC) distinguishing speech frame and non-speech frame.
S120, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
Exemplary, after speech detection (VAD), one by one noise estimation can be carried out to each frame.If it is determined that current Frame is speech frame, then estimate that mode carries out noise estimation to present frame according to the noise of speech frame, obtain speech frame noise Power estimation value.For example noise estimation can be carried out using minimum tracking algorithm or quantile noise estimation method.Preferably, adopt Carry out noise estimation with fractional-dimension calculus method, in a period of time, Noisy Speech Signal is in the quantile of this frequency arrowband Value is considered as the noise power estimation value (also being understood as noise energy) of present band.Specifically, can be according to following public affairs Formula draws speech frame noise power spectrum estimate:
λd(n, k)=Quantiles (X (n, k)2) n=0,1,2..., M
Wherein, M represents frame number;X (n, k) represents the voice spectrum component of k-th frequency of n-th frame;Quantiles () represents Quantile is taken, typically takes 0.25 or 0.5.It should be noted that M represents carrying out obtaining after sub-frame processing to Noisy Speech Signal Totalframes, in above formula, n represents the frame number of speech frame, and the concrete value of n is determined by voice detection results.
If it is determined that present frame is non-speech frame, then estimate that mode carries out noise to present frame according to the noise of non-speech frame Estimate, obtain non-speech noise power estimation value.Non-speech frame noise power spectrum estimate can be drawn according to equation below:
λd(n, k)=a*X (n, k)2+(1-a)*Quantiles(X(n,k)2) n=0,1,2..., M
It should be noted that n represents the frame number of non-speech frame in the formula, the concrete value of n is determined by voice detection results.
Noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-speech frame noise power Power estimation According to equation below, the fusion value of value, can show that noise power spectrum merges estimate:
Wherein, L represents noise smoothing siding-to-siding block length, can use 9 frames;A represents weight coefficient, it is preferred that a takes 0.8;λd(n, K) represent speech frame noise power spectrum estimate or non-speech frame noise power spectrum estimate, specifically can be determined by the value of n; λlast(n, k) represents noise power spectrum fusion estimate.
S130, according to the noise power spectrum merge estimate denoising is carried out to the Noisy Speech Signal.
Exemplary, after noise power spectrum fusion estimate is obtained, prior weight is entered using direct judgement method Row is estimated.Estimate that the method for prior weight is not limited only to above-mentioned direct judgement method, the algorithm that can also be suitable for using other, example Such as Casual algorithm, Non-Casual algorithm etc..In the present embodiment, estimate can be merged according to noise power spectrum and calculate priori letter Make an uproar and compare, Wiener filtering gain function can be correspondingly made available according to prior weight.Obtain Wiener filtering gain function it Afterwards, Wiener filtering is carried out to frequency domain Noisy Speech Signal, obtains frequency domain and remove noisy speech signal.The estimation of prior weight, wiener The calculating of filtering gain function can refer to existing calculation, will not be described here.
Further, inversefouriertransform can be carried out to the voice signal of frequency domain denoising, it is possible to by Overlap method The final output voice of synthesis, completes the process of whole speech de-noising.Wherein, synthesize the method for final output voice and not only limit In Overlap, here will not enumerate.
The present embodiment provide technical scheme, by carrying out speech detection to Noisy Speech Signal, distinguish speech frame and Non-speech frame, and noise estimation is carried out to which, noise power spectrum fusion estimate is obtained, estimate is merged according to noise power spectrum Denoising is carried out to Noisy Speech Signal.By technique scheme is adopted, not only noise figure can be estimated in non-speech frame, While in speech frame, can also update the estimate of noise component(s), and comprehensively both noise estimated results are believed to noisy speech Number denoising is carried out, the denoising effect of existing voice denoising scheme can be effectively improved, improves voice quality.
The speech de-noising method of the embodiment of the present invention is applicable to the voice signal during real-time voice network service Denoising is carried out, the denoising effect under the application scenarios such as real-time voice chat and online question-answering is compared to existing voice denoising side For case, effect is especially prominent.Compared with black phone voice communication and social networks phone, speech de-noising during online question-answering Difficulty is larger, this is because during online question-answering, student in order to see the mobile phone or computer screen real-time notes, be unwilling to beat as tradition Phone equally againsts ear and speaks, and student seldom wears earphone during online question-answering.Based on above-mentioned situation, played by loudspeaker Sound, echo problem are more projected, and relatively remote record, speech quality is highly prone to external environment noise and non-linear residual The impact of remaining echo.The speech de-noising method of the embodiment of the present invention, to application scenarios such as real-time voice chat and online question-answerings Under speech de-noising, the noise that not only can suppress in non-speech portion, and the noise of phonological component can be suppressed, especially Inhibition to residual echo is more obvious, can be effectively improved the denoising effect of existing voice denoising scheme, improve voice Quality.
Embodiment two
Fig. 2 is the method flow diagram of the speech de-noising that the embodiment of the present invention two is provided, and the present embodiment is in above-described embodiment On the basis of, when carrying out speech detection to Noisy Speech Signal, it is proposed that an effective phonetic feature combination, can be more accurate Distinguish noisy speech frame and non-speech frame.
Referring to Fig. 2, the method for the speech de-noising, including:
S210, the phonetic feature of extraction Noisy Speech Signal.
The noisy speech feature of extraction includes mel cepstrum coefficients MFCC, linear predictive coding residual sum spectral centroid Centroid.The mankind have different perceptions to different frequency voice:To below 1kHz, linear with frequency, right More than 1kHz, becomes logarithmic relationship with frequency.Frequency is higher, and perception is poorer.In the application usually only using low frequency MFCC, And abandon medium-high frequency MFCC.In Mel (Mel) frequency domain, people is linear relationship to the perception of tone, if two sections of voices Mel difference on the frequency twice, then people is in perceptually also poor twice.Mel frequency domain is simulation perception energy of the human ear to different frequency voice Power, in order to this perception characteristic of human ear is simulated, common frequencies yardstick is transformed into by Mel dimensions in frequency according to equation below:
Wherein, f is actual linear frequency, fmelFor Mel mark frequency.
Time domain Noisy Speech Signal to adding window framing carries out Fourier transformation, is converted into frequency domain Noisy Speech Signal After frame, the triangle bandpass filter group of frequency upper linear distribution, power spectral filter to voice signal are marked with one group of M Mel.Each The scope that individual triangle bandpass filter is covered all is similar to a critical bandwidth of human ear, and that simulates human ear with this shelters effect Should.Wherein, on M Mel mark frequency, the triangle bandpass filter group response formula of linear distribution is as follows:
Wherein, HmK () represents the triangle bandpass filter group of the upper linear distribution of Mel mark frequency, k represents k-th wave filter, 1 ≤ m≤M, M typically take the centre frequency that 40, f (m) represents m-th triangular filter.
Logarithm is asked for output P (m) of triangle bandpass filter group according to equation below, can obtain being similar to homomorphism change The result that changes:
X (m)=log (P (m))
N rank discrete cosine transform (Discrete Cosine Transformation, DCT) is carried out to X (m), is removed each Correlation between dimension voice signal, voice signal is mapped to lower dimensional space, tries to achieve the MFCC parameter of standard.According to following public affairs Formula tries to achieve the MFCC parameter of standard:
Wherein, N typically takes 13, XkThe MFCC parameter of expression standard.
Linear predictive coding (Linear Prediction Coding LPC) residual error is LPC residual, and it belongs to excitation letter Breath source information, reflects the cyclophysis of vocal cord vibration, and noise does not have the characteristic.Wherein, the voice frequency of n-th sampled point Spectrum output X (n) can be obtained by the linear combination of front p specimen sample point estimation, can be obtained according to equation below:
X(n)≈a1X(n-1)+a2X(n-2)+...apX(n-p)
Wherein, a1、a2...apFor p rank LPC coefficient, minimized by the systematic error quadratic sum of the model and try to achieve, but original Frame voice spectrum data are poor with X (n), obtain LPC residual.
Spectral centroid Centroid reflects the barycenter of frequency spectrum, wherein, the barycenter of speech region closer to 800~ 4000Hz, and the barycenter distribution of noise is significantly different.Spectral centroid Centroid can be obtained according to equation below:
Preferably, the phonetic feature also includes that frequency spectrum flatness Flatness, spectrum offset amount Rolloff and frequency spectrum are disturbed At least one of dynamic degree Zcr.
Wherein it is possible to obtain frequency spectrum flatness Flatness according to equation below:
Spectrum offset amount Rolloff is one kind tolerance to spectral shape, and it describes the deviation post of 85% energy spectrum.Can To obtain spectrum offset amount Rolloff according to equation below:
Frequency spectrum disturbance degree Zcr reflects voice spectrum disturbance, chaotic degree.Frequency spectrum can be obtained according to equation below Disturbance degree Zcr:
S220, speech model and noise model are generated using classifier training according to the phonetic feature for being extracted, to distinguish Go out speech frame and non-speech frame.
The phonetic feature of said extracted is constituted speech feature vector, voice is trained using grader, generate language Sound model and noise model.Wherein, grader can using mixed Gauss model (Gaussian Mixture Model, GMM) or SVMs (Support Vector Machine, SVM), is instructed to voice using mixed Gauss model in the present embodiment Practice, generate speech model and noise model.Output probability maximum according to speech model and noise model is carried out to every frame voice Judge, to distinguish speech frame and non-speech frame.When being trained to speech model and noise model, expectation maximization can be used Method, estimates the initial parameter of each Gauss based on k mean algorithm, through iteration several times, until speech model and noise mode Type convergence is restrained.
S230, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
S240, according to the noise power spectrum merge estimate denoising is carried out to the Noisy Speech Signal.
The method of the speech de-noising that the present embodiment is provided, during by carrying out speech detection to Noisy Speech Signal, extracts band The phonetic feature of noisy speech signal, constitutes an effective phonetic feature combination, can more accurately distinguish noisy speech Frame and non-speech frame, contribute to improving the denoising effect of speech de-noising scheme further, improve voice quality.
Embodiment three
Fig. 3 is the method flow diagram of the speech de-noising that the embodiment of the present invention three is provided, and the present embodiment is in above-described embodiment On the basis of, stationary noise suppression, non-speech noise suppression and nonstationary noise suppression is carried out to Noisy Speech Signal.
Referring to Fig. 3, the method for the speech de-noising, including:
S310, speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame.
S320, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
S330, according to the noise power spectrum merge estimate the Noisy Speech Signal is carried out stationary noise suppression, Non-speech noise suppression and nonstationary noise suppression.
Exemplary, after noise power spectrum fusion estimate is obtained, can be according to the Wiener filtering gain function pair for calculating The frequency domain Noisy Speech Signal carries out denoising.Suppression to stationary noise, non-voice can be completed by said process The suppression of noise and the suppression of nonstationary noise, wherein, the suppression to three kinds of noises can be carried out step by step, i.e., first to voice The stationary noise of signal is suppressed, and then again non-speech noise is suppressed, and finally recycles a nonstationary noise suppression Factor pair nonstationary noise processed is suppressed, and is finally exported the voice signal for completing above-mentioned three kinds of noise suppressed, you can obtain The voice signal of final denoising.Additionally, also can the feature of comprehensive three kinds of noises carry out disposable noise suppressed, and export denoising Voice signal.
The method of the speech de-noising that the present embodiment is provided, by carrying out stationary noise suppression, non-language to Noisy Speech Signal Sound noise suppressed and nonstationary noise suppression, improve the denoising effect of speech de-noising scheme further, can not only suppress flat Steady noise, moreover it is possible to suppress nonstationary noise and residual echo, improve voice quality.
Example IV
Fig. 4 is the method flow diagram of the speech de-noising that the embodiment of the present invention four is provided, and the present embodiment is in above-described embodiment On the basis of, the fusion noise that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to noise power spectrum Inhibiting factor, and stationary noise suppression, non-voice are carried out to the Noisy Speech Signal according to the fusion noise suppression factor Noise suppressed and nonstationary noise suppression.
Referring to Fig. 4, the method for the speech de-noising, including:
S410, speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame.
S420, respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate.
S430, merge estimate according to the noise power spectrum and generate stationary noise, non-speech noise and nonstationary noise Fusion noise suppression factor.
The fusion noise suppression factor of stationary noise, non-speech noise and nonstationary noise is made an uproar to all in voice signal One comprehensive consideration of sound, according to the fusion noise suppression factor, can be disposably to all of noise in voice signal Suppressed, and without the need for suppressing to three noise likes substep, improved the treatment effeciency of speech de-noising.Permissible according to equation below Obtain merging noise suppression factor:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.θ is an invariant, and value is general For a constant between 0.001-0.1.
Wherein, the wiener inhibiting factor can be drawn by equation below:
Wherein, γ (n, k) represents posteriori SNR,Represent prior weight, max () expression takes maximum, λlast (n, k) represents noise power spectrum fusion estimate, variable factor when β (n) represents.In traditional algorithm, β is immobilisation factor, generally takes 0.9~0.98, and in the algorithm using the β (n) of time-varying, be conducive to making fast reaction to voice signal conversion.
Because the problem of residual echo, VAD detection is caused it is possible that erroneous judgement, needs to carry out frequency spectrum further point Analysis, because within voice is concentrated mainly on 0~4000Hz of low frequency, even if under the interference of residual echo, occupies voice at least 50% energy above, so need to suppress non-speech portion (including noise and residual echo).
In this step, the non-voice inhibiting factor can be drawn by equation below:
Work as Ratio>During=threshold
Otherwise, coeff (n, k)=1.0
Wherein, Ratio representsWithRatio;Low, for presetting minimum, is an experience Value;Threshold is given threshold, and threshold generally takes 1.0.
Optionally, the fusion noise suppression factor can also be:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
S440, according to described fusion noise suppression factor carry out stationary noise suppression, non-language to the Noisy Speech Signal Sound noise suppressed and nonstationary noise suppression.
After obtaining the fusion noise suppression factor, Noisy Speech Signal can be suppressed, obtain the voice of denoising Signal.Can obtain final removing noisy speech signal according to equation below (1) or formula (2).As shown in Figure 5 a, it is that grandfather tape is made an uproar The spectrogram of voice signal;As shown in Figure 5 b, be according to the present embodiment provide speech de-noising method denoising after voice letter Number spectrogram.
Wherein, output (n, k) represents the voice spectrum component after the denoising of k-th frequency of n-th frame.
Comparison diagram 5a and Fig. 5 b can be seen that the method for the present embodiment can effectively be suppressed to various noise component(s)s, right The inhibition of nonstationary noise is especially prominent, and after denoising, speech components are effectively maintained.
The method of the speech de-noising that the present embodiment is provided, is steadily made an uproar by merging estimate generation according to noise power spectrum The fusion noise suppression factor of sound, non-speech noise and nonstationary noise, and according to fusion noise suppression factor to noisy speech Signal carries out denoising, can disposably complete the work to speech de-noising, improves the treatment effeciency of speech de-noising.Effectively change The denoising effect of speech de-noising scheme has been apt to it, stationary noise can not only have been suppressed, moreover it is possible to which analysis has estimated non-voice composition, to non-language Sound noise is suppressed, while when suppressing to nonstationary noise, weak component is carried out reducing, strong component is amplified, real Existing dynamic smoothing non-stationary noise reduction, improves voice quality.
Embodiment five
Fig. 5 is the structural representation of the device of the speech de-noising that the embodiment of the present invention five is provided, the dress of the speech de-noising Put including speech detection module 610, noise estimation module 620 and denoising module 630, below each module is carried out specifically Bright.
The speech detection module 610, for carrying out speech detection to Noisy Speech Signal, to distinguish speech frame and non- Speech frame;
The noise estimation module 620, for carrying out noise estimation to speech frame and non-speech frame respectively, obtains noise work( Rate spectrum fusion estimate, wherein, the noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-voice The fusion value of frame noise power spectrum estimate;
The denoising module 630, believes to the noisy speech for merging estimate according to the noise power spectrum Number carry out denoising.
The present embodiment provide technical scheme, by carrying out speech detection to Noisy Speech Signal, distinguish speech frame and Non-speech frame, and noise estimation is carried out to which, noise power spectrum fusion estimate is obtained, estimate is merged according to noise power spectrum Denoising is carried out to Noisy Speech Signal.By technique scheme is adopted, not only noise figure can be estimated in non-speech frame, While in speech frame, can also update the estimate of noise component(s), and comprehensively both noise estimated results are believed to noisy speech Number denoising is carried out, the denoising effect of existing voice denoising scheme can be effectively improved, improves voice quality.
Optionally, speech detection module 610 includes:
Speech feature extraction unit, for extracting the phonetic feature of Noisy Speech Signal, wherein, the phonetic feature includes Mel cepstrum coefficients MFCC, linear predictive coding residual sum spectral centroid Centroid;
Taxon, for generating speech model and noise mode according to the phonetic feature for being extracted using classifier training Type, to distinguish speech frame and non-speech frame.
Optionally, the phonetic feature also includes that frequency spectrum flatness Flatness, spectrum offset amount Rolloff and frequency spectrum are disturbed At least one of dynamic degree Zcr.
Optionally, noise estimation module 620 includes:
Speech frame noise estimation module, for drawing speech frame noise power spectrum estimate according to equation below:
λd(n, k)=Quantiles (X (n, k)2) n=0,1,2..., M
Non-speech frame noise estimation module, for drawing non-speech frame noise power spectrum estimate according to equation below:
λd(n, k)=a*X (n, k)2+(1-a)*Quantiles(X(n,k)2) n=0,1,2..., M
According to equation below, fusion noise estimation module, for showing that noise power spectrum merges estimate:
Wherein, M represents frame number, and X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and Quantiles () represents Quantile is taken, L represents noise smoothing siding-to-siding block length, a represents weight coefficient, λd(n, k) represents speech frame noise power Power estimation Value or non-speech frame noise power spectrum estimate, λlast(n, k) represents noise power spectrum fusion estimate.
Optionally, denoising module 630 includes:
Denoising unit, puts down to the Noisy Speech Signal for merging estimate according to the noise power spectrum Steady noise suppressed, non-speech noise suppression and nonstationary noise suppression.
Optionally, denoising unit specifically for:
The fusion that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to the noise power spectrum Noise suppression factor;
Stationary noise suppression, non-speech noise are carried out to the Noisy Speech Signal according to the fusion noise suppression factor Suppression and nonstationary noise suppression.
Optionally, the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
Optionally, the fusion noise suppression factor is:
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, Coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
Optionally, the wiener inhibiting factor is drawn by equation below:
Wherein, γ (n, k) represents posteriori SNR,Represent prior weight, max () expression takes maximum, λlast (n, k) represents noise power spectrum fusion estimate, variable factor when β (n) represents.
Optionally, the non-voice inhibiting factor is drawn by equation below:
Work as Ratio>During=threshold
Otherwise, coeff (n, k)=1.0
Wherein, Ratio representsWithRatio, low for preset minimum, threshold For given threshold.
The device of the speech de-noising provided by the embodiment of the present invention can perform the voice provided by any embodiment of the present invention The method of denoising, possesses the corresponding functional module of execution method and beneficial effect.
Obviously, it will be understood by those skilled in the art that each module of the above-mentioned present invention or each step can be by as above Described sliced service device and management server are implementing.Optionally, the embodiment of the present invention be able to can be held with computer installation Capable program realizing, such that it is able to be stored in storage device being executed by processor, can deposit by described program It is stored in a kind of computer-readable recording medium, storage medium mentioned above can be read-only storage, disk or CD etc.; Or they are fabricated to each integrated circuit modules respectively, or the multiple modules in them or step are fabricated to single collection Become circuit module to realize.So, the present invention is not restricted to the combination of any specific hardware and software.
Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other Equivalent embodiments more can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims (15)

1. a kind of method of speech de-noising, it is characterised in that include:
Speech detection is carried out to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Respectively noise estimation is carried out to speech frame and non-speech frame, obtain noise power spectrum fusion estimate, wherein, the noise Power spectrum fusion estimate is the fusion value of speech frame noise power spectrum estimate and non-speech frame noise power spectrum estimate;
Merging estimate according to the noise power spectrum carries out denoising to the Noisy Speech Signal.
2. method according to claim 1, it is characterised in that speech detection is carried out to Noisy Speech Signal, to distinguish Speech frame and non-speech frame, including:
The phonetic feature of Noisy Speech Signal is extracted, wherein, the phonetic feature includes mel cepstrum coefficients MFCC, linear prediction Coded residual and spectral centroid Centroid;
Speech model and noise model are generated using classifier training according to the phonetic feature for being extracted, with distinguish speech frame and Non-speech frame.
3. method according to claim 2, it is characterised in that the phonetic feature also includes frequency spectrum flatness At least one of Flatness, spectrum offset amount Rolloff and frequency spectrum disturbance degree Zcr.
4. method according to claim 1, it is characterised in that respectively noise estimation is carried out to speech frame and non-speech frame, Noise power spectrum fusion estimate is obtained, including:
Speech frame noise power spectrum estimate is drawn according to equation below:
λd(n, k)=Quantiles (X (n, k)2) n=0,1,2..., M
Non-speech frame noise power spectrum estimate is drawn according to equation below:
λd(n, k)=a*X (n, k)2+(1-a)*Quantiles(X(n,k)2) n=0,1,2..., M
Show that noise power spectrum merges estimate according to equation below:
λ l a s t ( n , k ) = Σ n = 0 L - 1 λ d ( n , k ) L
Wherein, M represents frame number, and X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and Quantiles () represents and takes point Digit, L represent noise smoothing siding-to-siding block length, and a represents weight coefficient, λd(n, k) represent speech frame noise power spectrum estimate or Non-speech frame noise power spectrum estimate, λlast(n, k) represents noise power spectrum fusion estimate.
5. method according to claim 1, it is characterised in that described estimate is merged to institute according to the noise power spectrum Stating Noisy Speech Signal carries out denoising, including:
Merging estimate according to the noise power spectrum carries out stationary noise suppression, non-speech noise to the Noisy Speech Signal Suppression and nonstationary noise suppression.
6. method according to claim 5, it is characterised in that estimate is merged to the band according to the noise power spectrum Noisy speech signal carries out stationary noise suppression, non-speech noise suppression and nonstationary noise suppression, including:
The fusion noise that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to the noise power spectrum Inhibiting factor;
Stationary noise suppression, non-speech noise suppression are carried out according to the fusion noise suppression factor to the Noisy Speech Signal And nonstationary noise suppression.
7. method according to claim 6, it is characterised in that the fusion noise suppression factor is:
Q ( n , k ) = | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) + θ
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
8. method according to claim 6, it is characterised in that the fusion noise suppression factor is:
Q ( n , k ) = ( | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) ) 2 ( | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) ) 2 + θ
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
9. the method according to claim 7 or 8, it is characterised in that the wiener inhibiting factor is drawn by equation below:
γ ( n , k ) = X ( n , k ) 2 λ l a s t ( n , k )
ϵ ^ ( n , k ) = β ( n ) g ( n - 1 , k ) * γ ( n - 1 , k ) λ l a s t ( n , k ) + ( 1 - β ( n ) ) m a x [ γ ( n , k ) - 1 , 0 ]
β ( n ) = 1 1 + | m a x [ γ ( n , k ) - 1 , 0 ] - ϵ ^ ( n - 1 , k ) m a x [ γ ( n , k ) - 1 , 0 ] + 1 |
g ( n , k ) = ϵ ^ ( n , k ) ϵ ^ ( n , k ) + 1
Wherein, γ (n, k) represents posteriori SNR,Represent prior weight, max () expression takes maximum, λlast(n, K) noise power spectrum fusion estimate, variable factor when β (n) represents are represented.
10. the method according to claim 7 or 8, it is characterised in that the non-voice inhibiting factor is obtained by equation below Go out:
R a t i o = Σ k = l o w + 1 N - 1 | X ( n , k ) | Σ k = 1 l o w | X ( n , k ) |
Work as Ratio>During=threshold
c o e f f ( n , k ) = 0.95 k = 1 , ... , l o w c o e f f ( n , k ) = 0.95 - ( 0.95 - 0.5 / R a t i o ) l o w * ( k - l o w - 1 ) k = l o w + 1 , ... , N - 1
Otherwise, coeff (n, k)=1.0
Wherein, Ratio representsWithRatio, for presetting minimum, threshold is for setting for low Determine threshold value.
11. a kind of devices of speech de-noising, it is characterised in that include:
Speech detection module, for carrying out speech detection to Noisy Speech Signal, to distinguish speech frame and non-speech frame;
Noise estimation module, for carrying out noise estimation to speech frame and non-speech frame respectively, obtains noise power spectrum fusion and estimates Evaluation, wherein, the noise power spectrum fusion estimate is speech frame noise power spectrum estimate and non-speech frame noise power The fusion value of Power estimation value;
Denoising module, is carried out at denoising to the Noisy Speech Signal for merging estimate according to the noise power spectrum Reason.
12. devices according to claim 11, it is characterised in that denoising module includes:
Denoising unit, is steadily made an uproar to the Noisy Speech Signal for merging estimate according to the noise power spectrum Sound suppression, non-speech noise suppression and nonstationary noise suppression.
13. devices according to claim 12, it is characterised in that denoising unit specifically for:
The fusion noise that estimate generates stationary noise, non-speech noise and nonstationary noise is merged according to the noise power spectrum Inhibiting factor;
Stationary noise suppression, non-speech noise suppression are carried out according to the fusion noise suppression factor to the Noisy Speech Signal And nonstationary noise suppression.
14. devices according to claim 13, it is characterised in that the fusion noise suppression factor is:
Q ( n , k ) = | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) + θ
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
15. devices according to claim 13, it is characterised in that the fusion noise suppression factor is:
Q ( n , k ) = ( | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) ) 2 ( | X ( n , k ) | * g ( n , k ) * c o e f f ( n , k ) ) 2 + θ
Wherein, X (n, k) represents the voice spectrum component of k-th frequency of n-th frame, and g (n, k) represents wiener inhibiting factor, coeff (n, k) represents non-voice inhibiting factor, and θ represents nonstationary noise inhibiting factor.
CN201610898662.4A 2016-10-14 2016-10-14 A kind of method and device of speech de-noising Active CN106486131B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610898662.4A CN106486131B (en) 2016-10-14 2016-10-14 A kind of method and device of speech de-noising

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610898662.4A CN106486131B (en) 2016-10-14 2016-10-14 A kind of method and device of speech de-noising

Publications (2)

Publication Number Publication Date
CN106486131A true CN106486131A (en) 2017-03-08
CN106486131B CN106486131B (en) 2019-10-11

Family

ID=58269971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610898662.4A Active CN106486131B (en) 2016-10-14 2016-10-14 A kind of method and device of speech de-noising

Country Status (1)

Country Link
CN (1) CN106486131B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108122561A (en) * 2017-12-19 2018-06-05 广东小天才科技有限公司 A kind of spoken voice assessment method and electronic equipment based on electronic equipment
CN108648765A (en) * 2018-04-27 2018-10-12 海信集团有限公司 A kind of method, apparatus and terminal of voice abnormality detection
CN108735229A (en) * 2018-06-12 2018-11-02 华南理工大学 A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device
CN108847249A (en) * 2018-05-30 2018-11-20 苏州思必驰信息科技有限公司 Sound converts optimization method and system
CN108922556A (en) * 2018-07-16 2018-11-30 百度在线网络技术(北京)有限公司 sound processing method, device and equipment
CN109087657A (en) * 2018-10-17 2018-12-25 成都天奥信息科技有限公司 A kind of sound enhancement method applied to ultrashort wave radio set
CN109189975A (en) * 2018-09-06 2019-01-11 深圳市三宝创新智能有限公司 A kind of method for playing music, device, computer equipment and readable storage medium storing program for executing
CN109616133A (en) * 2018-09-28 2019-04-12 广州智伴人工智能科技有限公司 A kind of environmental noise removal system
CN109829035A (en) * 2018-12-19 2019-05-31 平安国际融资租赁有限公司 Process searching method, device, computer equipment and storage medium
CN109979476A (en) * 2017-12-28 2019-07-05 电信科学技术研究院 A kind of method and device of speech dereverbcration
CN110277087A (en) * 2019-07-03 2019-09-24 四川大学 A kind of broadcast singal anticipation preprocess method
CN110910906A (en) * 2019-11-12 2020-03-24 国网山东省电力公司临沂供电公司 Audio endpoint detection and noise reduction method based on power intranet
CN111261183A (en) * 2018-12-03 2020-06-09 珠海格力电器股份有限公司 Method and device for denoising voice
CN111429930A (en) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 Noise reduction model processing method and system based on adaptive sampling rate
CN111429932A (en) * 2020-06-10 2020-07-17 浙江远传信息技术股份有限公司 Voice noise reduction method, device, equipment and medium
CN111986686A (en) * 2020-07-09 2020-11-24 厦门快商通科技股份有限公司 Short-time speech signal-to-noise ratio estimation method, device, equipment and storage medium
CN111986691A (en) * 2020-09-04 2020-11-24 腾讯科技(深圳)有限公司 Audio processing method and device, computer equipment and storage medium
CN112002339A (en) * 2020-07-22 2020-11-27 海尔优家智能科技(北京)有限公司 Voice noise reduction method and device, computer-readable storage medium and electronic device
CN112053702A (en) * 2020-09-30 2020-12-08 北京大米科技有限公司 Voice processing method and device and electronic equipment
CN112992190A (en) * 2021-02-02 2021-06-18 北京字跳网络技术有限公司 Audio signal processing method and device, electronic equipment and storage medium
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal
CN113724720A (en) * 2021-07-19 2021-11-30 电信科学技术第五研究所有限公司 Non-human voice filtering method in noisy environment based on neural network and MFCC
CN113808608A (en) * 2021-09-17 2021-12-17 随锐科技集团股份有限公司 Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy
WO2022160715A1 (en) * 2021-01-29 2022-08-04 北京达佳互联信息技术有限公司 Voice signal processing method and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor
CN103295580A (en) * 2013-05-13 2013-09-11 北京百度网讯科技有限公司 Method and device for suppressing noise of voice signals
CN103730126A (en) * 2012-10-16 2014-04-16 联芯科技有限公司 Noise suppression method and noise suppressor
CN104867497A (en) * 2014-02-26 2015-08-26 北京信威通信技术股份有限公司 Voice noise-reducing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5706395A (en) * 1995-04-19 1998-01-06 Texas Instruments Incorporated Adaptive weiner filtering using a dynamic suppression factor
CN103730126A (en) * 2012-10-16 2014-04-16 联芯科技有限公司 Noise suppression method and noise suppressor
CN103295580A (en) * 2013-05-13 2013-09-11 北京百度网讯科技有限公司 Method and device for suppressing noise of voice signals
CN104867497A (en) * 2014-02-26 2015-08-26 北京信威通信技术股份有限公司 Voice noise-reducing method

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108122561A (en) * 2017-12-19 2018-06-05 广东小天才科技有限公司 A kind of spoken voice assessment method and electronic equipment based on electronic equipment
CN109979476B (en) * 2017-12-28 2021-05-14 电信科学技术研究院 Method and device for removing reverberation of voice
CN109979476A (en) * 2017-12-28 2019-07-05 电信科学技术研究院 A kind of method and device of speech dereverbcration
CN108648765A (en) * 2018-04-27 2018-10-12 海信集团有限公司 A kind of method, apparatus and terminal of voice abnormality detection
CN108648765B (en) * 2018-04-27 2020-09-25 海信集团有限公司 Method, device and terminal for detecting abnormal voice
CN108847249A (en) * 2018-05-30 2018-11-20 苏州思必驰信息科技有限公司 Sound converts optimization method and system
CN108847249B (en) * 2018-05-30 2020-06-05 苏州思必驰信息科技有限公司 Sound conversion optimization method and system
CN108735229A (en) * 2018-06-12 2018-11-02 华南理工大学 A kind of amplitude based on noise Ratio Weighted and phase combining compensation anti-noise sound enhancement method and realization device
CN108922556B (en) * 2018-07-16 2019-08-27 百度在线网络技术(北京)有限公司 Sound processing method, device and equipment
CN108922556A (en) * 2018-07-16 2018-11-30 百度在线网络技术(北京)有限公司 sound processing method, device and equipment
CN109189975A (en) * 2018-09-06 2019-01-11 深圳市三宝创新智能有限公司 A kind of method for playing music, device, computer equipment and readable storage medium storing program for executing
CN109189975B (en) * 2018-09-06 2021-12-24 深圳市三宝创新智能有限公司 Music playing method and device, computer equipment and readable storage medium
CN109616133A (en) * 2018-09-28 2019-04-12 广州智伴人工智能科技有限公司 A kind of environmental noise removal system
CN109616133B (en) * 2018-09-28 2021-11-30 广州智伴人工智能科技有限公司 Environmental noise removing system
CN109087657A (en) * 2018-10-17 2018-12-25 成都天奥信息科技有限公司 A kind of sound enhancement method applied to ultrashort wave radio set
CN111261183A (en) * 2018-12-03 2020-06-09 珠海格力电器股份有限公司 Method and device for denoising voice
CN111261183B (en) * 2018-12-03 2022-11-22 珠海格力电器股份有限公司 Method and device for denoising voice
CN109829035A (en) * 2018-12-19 2019-05-31 平安国际融资租赁有限公司 Process searching method, device, computer equipment and storage medium
CN110277087A (en) * 2019-07-03 2019-09-24 四川大学 A kind of broadcast singal anticipation preprocess method
CN110277087B (en) * 2019-07-03 2021-04-23 四川大学 Pre-judging preprocessing method for broadcast signals
CN110910906A (en) * 2019-11-12 2020-03-24 国网山东省电力公司临沂供电公司 Audio endpoint detection and noise reduction method based on power intranet
CN111429930A (en) * 2020-03-16 2020-07-17 云知声智能科技股份有限公司 Noise reduction model processing method and system based on adaptive sampling rate
CN111429930B (en) * 2020-03-16 2023-02-28 云知声智能科技股份有限公司 Noise reduction model processing method and system based on adaptive sampling rate
CN111429932A (en) * 2020-06-10 2020-07-17 浙江远传信息技术股份有限公司 Voice noise reduction method, device, equipment and medium
CN111986686A (en) * 2020-07-09 2020-11-24 厦门快商通科技股份有限公司 Short-time speech signal-to-noise ratio estimation method, device, equipment and storage medium
CN111986686B (en) * 2020-07-09 2023-01-03 厦门快商通科技股份有限公司 Short-time speech signal-to-noise ratio estimation method, device, equipment and storage medium
CN112002339B (en) * 2020-07-22 2024-01-26 海尔优家智能科技(北京)有限公司 Speech noise reduction method and device, computer-readable storage medium and electronic device
CN112002339A (en) * 2020-07-22 2020-11-27 海尔优家智能科技(北京)有限公司 Voice noise reduction method and device, computer-readable storage medium and electronic device
CN111986691B (en) * 2020-09-04 2024-02-02 腾讯科技(深圳)有限公司 Audio processing method, device, computer equipment and storage medium
CN111986691A (en) * 2020-09-04 2020-11-24 腾讯科技(深圳)有限公司 Audio processing method and device, computer equipment and storage medium
CN112053702B (en) * 2020-09-30 2024-03-19 北京大米科技有限公司 Voice processing method and device and electronic equipment
CN112053702A (en) * 2020-09-30 2020-12-08 北京大米科技有限公司 Voice processing method and device and electronic equipment
WO2022160715A1 (en) * 2021-01-29 2022-08-04 北京达佳互联信息技术有限公司 Voice signal processing method and electronic device
CN112992190A (en) * 2021-02-02 2021-06-18 北京字跳网络技术有限公司 Audio signal processing method and device, electronic equipment and storage medium
CN113724720B (en) * 2021-07-19 2023-07-11 电信科学技术第五研究所有限公司 Non-human voice filtering method based on neural network and MFCC (multiple frequency component carrier) in noisy environment
CN113724720A (en) * 2021-07-19 2021-11-30 电信科学技术第五研究所有限公司 Non-human voice filtering method in noisy environment based on neural network and MFCC
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal
CN113808608B (en) * 2021-09-17 2023-07-25 随锐科技集团股份有限公司 Method and device for suppressing mono noise based on time-frequency masking smoothing strategy
CN113808608A (en) * 2021-09-17 2021-12-17 随锐科技集团股份有限公司 Single sound channel noise suppression method and device based on time-frequency masking smoothing strategy

Also Published As

Publication number Publication date
CN106486131B (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN106486131B (en) A kind of method and device of speech de-noising
Srinivasan et al. Binary and ratio time-frequency masks for robust speech recognition
US7133826B2 (en) Method and apparatus using spectral addition for speaker recognition
Cui et al. Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR
CN109256138B (en) Identity verification method, terminal device and computer readable storage medium
Yadav et al. Addressing noise and pitch sensitivity of speech recognition system through variational mode decomposition based spectral smoothing
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
Talmon et al. Single-channel transient interference suppression with diffusion maps
Hansen et al. Speech enhancement based on generalized minimum mean square error estimators and masking properties of the auditory system
US20110218803A1 (en) Method and system for assessing intelligibility of speech represented by a speech signal
González et al. MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition
Venturini et al. On speech features fusion, α-integration Gaussian modeling and multi-style training for noise robust speaker classification
Nathwani et al. An extended experimental investigation of DNN uncertainty propagation for noise robust ASR
Schmidt et al. Reduction of non-stationary noise using a non-negative latent variable decomposition
Wang et al. IRM estimation based on data field of cochleagram for speech enhancement
Do et al. A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech
Tian et al. Spoofing detection under noisy conditions: a preliminary investigation and an initial database
Han et al. Reverberation and noise robust feature compensation based on IMM
Herrera-Camacho et al. Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE
Ou et al. Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis
Abka et al. Speech recognition features: Comparison studies on robustness against environmental distortions
CN114302301A (en) Frequency response correction method and related product
Chen et al. Robust speech recognition using spatial–temporal feature distribution characteristics
Shome et al. Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech
Kammee et al. Sound Identification using MFCC with Machine Learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200820

Address after: No.259 Nanjing West Road, Tangqiao town, Zhangjiagang City, Suzhou City, Jiangsu Province

Patentee after: Suzhou Qianwen wandaba Education Technology Co., Ltd

Address before: Yangpu District State Road 200433 Shanghai City No. 200 Building 5 room 2002

Patentee before: SHANGHAI QIANWENWANDABA CLOUD TECH. Co.,Ltd.