CN104658544A - Method for inhibiting transient noise in voice - Google Patents

Method for inhibiting transient noise in voice Download PDF

Info

Publication number
CN104658544A
CN104658544A CN201310590841.8A CN201310590841A CN104658544A CN 104658544 A CN104658544 A CN 104658544A CN 201310590841 A CN201310590841 A CN 201310590841A CN 104658544 A CN104658544 A CN 104658544A
Authority
CN
China
Prior art keywords
signal
frame
voice
gamma
lev
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310590841.8A
Other languages
Chinese (zh)
Inventor
盖丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian You Jia Software Science And Technology Ltd
Original Assignee
Dalian You Jia Software Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian You Jia Software Science And Technology Ltd filed Critical Dalian You Jia Software Science And Technology Ltd
Priority to CN201310590841.8A priority Critical patent/CN104658544A/en
Publication of CN104658544A publication Critical patent/CN104658544A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a method for inhibiting transient noise in a voice and belongs to the technical field of signal processing. The method for inhibiting the transient noise in the voice is characterized in that a Gammatone frequency cepstral coefficient extracting module, a transient noise detecting module and a voice signal re-establishing module are employed in the method; an input end of the Gammatone frequency cepstral coefficient extracting module receives a noise-containing voice signal, an output end of the Gammatone frequency cepstral coefficient extracting module is connected with an input end of the transient noise detecting module, an output end of the transient noise detecting module is connected with an input end of the voice signal re-establishing module, the input end of the voice signal re-establishing module receives the noise-containing voice signal and is further connected with the output end of the transient noise detecting module, and the voice signal re-establishing module outputs a de-noised voice.

Description

A kind of method that in voice, transient noise suppresses
Technical field
The present invention relates to a kind of method that in voice, transient noise suppresses, belong to signal processing technology field.
Background technology
Transient noise is present in a lot of application scenario, as in the voice communication terminal equipment such as osophone, hands-free kits, mobile phone and video conference equipment.The existence of transient noise has a strong impact on voice quality, voice signal sharpness and intelligibility is declined, causes auditory fatigue.Transient noise in voice normally additive noise, also referred to as transient noise.Transient noise has the features such as sudden, pulse feature in the time domain usually, and its energy concentrates in shorter temporal interval usually, then very wide in frequency domain distribution.Typical transient signal is usually about by an initial peak value and one period of duration the oscillatory process in short-term that 10 ~ 50ms decays and forms, and as knocked at the door, mouse click, metronome, keyboard knock, hammer sound etc. all belongs to transient noise.In most cases, the elimination of transient noise is more difficult, because most transient noise and voice signal are at the complete aliasing of time-frequency domain, and has the features such as noncontinuity.Current voice noise Restrainable algorithms is most for steady-state noise and continuing noise, and as spectrum subtracts method, adaptive filter method, Wiener Filtering etc., this type of algorithm is very poor to transient noise inhibition.Therefore, be necessary that invention is to the voice noise suppression technology under transient noise conditions.
Because the final tolerance of voice noise inhibition is the subjective feeling of people, be therefore necessary to consider that the auditory perception property of people's ear is on the impact of voice noise rejection.In the process that Auditory Perception is formed, people's ear basilar memebrane has played important effect, and basilar memebrane has good He Ne laser and resolution characteristic.Based on this characteristic, can be realized the frequency division effect of basilar membrane by design band-pass filter group, this bank of filters is just called human auditory system wave filter.Johannesma proposed logical (Gammatone, the GT) filter model of gamma in 1972, it is based on the basilar membrane model realization in auditory model, at first for describing the characteristic of the physiology impulse response of the auditory nerve of cat.This wave filter can the frequency response of simulate human auditory system, meets the auditory perception property of people's ear.The time-domain expression of its impulse response function is
g(t)=[B nt n-1e -2πBtcos(2πf it+φ)]u(t)
B=b 1·ERB(f i)
Wherein, parameter B nfor filter gain; N is filter order; The gamma bandpass filter of n=4 just can the filtering characteristic of simulated substrate film well; for initial phase, u (t) is unit step function; f icentered by frequency; ERB (f i) be the Equivalent Rectangular Bandwidth of gamma bandpass filter, itself and centre frequency f ipass be:
ERB(f i)=24.7+0.108f i
The centre frequency of gamma bandpass filter determines the characteristic such as equivalent bandwidth, frequency response of wave filter, and from auditory perceptual characteristic, the centre frequency of each gamma bandpass filter meets logarithm and is uniformly distributed, and centre frequency is determined by following formula:
f i = ( f H + 228.7 ) exp ( i × v ) - 228.7
= ( f H + 228.7 ) exp ( i × ln f L + 228.7 f H + 228.7 CH ) - 228.7,1 ≤ i ≤ CH
v = ln f L + 228.7 f H + 228.7 CH
Wherein, parameter v is the overlap factor between each wave filter, is used for representing the overlapping degree between each wave filter, parameter f l, f hfor the cutoff frequency of bank of filters, CH represents the port number of gamma bandpass filter group.Do Laplace conversion to this gamma bandpass filter impulse is corresponding, obtaining 4 rank gamma bandpass filters in the transport function of continuous domain is:
G ( s ) = [ s + b + ( 2 - 1 ) ω i ] [ s + b - ( 2 - 1 ) ω i ] [ s + b + ( 2 + 1 ) ω i ] [ s + b - ( 2 + 1 ) ω i ] [ ( s + b ) 2 + ω i 2 ] 4
Wherein, ω i=2 π f i, represent the center angular frequency of each wave filter.By Impulse invariance procedure, the Laplace of gamma bandpass filter impulse response is converted G (s) and is transformed into Z territory, then have:
G i ( z ) = T s - T s a 3 ( a 1 + ( 2 - 1 ) a 2 ) z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 × T s - T s a 3 ( a 1 - ( 2 - 1 ) a 2 ) z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2
× T s - T s a 3 ( a 1 + ( 2 + 1 ) a 2 ) z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 × T s - T s a 3 ( a 1 - ( 2 + 1 ) a 2 ) z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2
= G 1 , i ( z ) · G 2 , i ( z ) · G 3 , i ( z ) · G 4 , i ( z )
Wherein, T sfor the sampling period, a 1=cos (ω it s), a 2=sin (ω it s),
From above formula, the gamma bandpass filter on 4 rank can be realized by 4 order transfer function cascades, carries out inverse transformation respectively, obtain the impulse response of 4 second order filters, that is: g to 4 order transfer function 1, i(n), g 2, i(n), g 3, i(n), g 4, i(n).By voice signal respectively with each impulse response convolution after, obtain gamma bandpass filter filtering export.64 channel gamma bandpass filter amplitude-frequency responses under 48kHz sampling rate are as shown in the accompanying drawing 1 of instructions.
(2) packet loss concealment is believed
VoIP is namely by the packet of IP network transferring voice, and substitute communication mode as legacy public switched telephone net (PSTN), it is more and more paid close attention to.Because network congestion or transmitting procedure delay jitter can cause letter packet loss, that is some letter bag can not appear at receiving end in time, and this situation is called letter packet loss.Design the technology of good solution packet loss problem, greatly can improve quality of voice transmission.This kind of technology can be divided into the loss recovery technology (PLR) based on transmitting terminal and the packet loss concealment (PLC) based on receiving end.Loss recovery technology comprises forward error correction (FEC) and intertexture (interleaving) etc.In general, adopt good than based on receiving end concealing technology of the effect based on transmitting terminal recovery technology, but this technology is more complicated, can increase the network bandwidth and propagation delay time simultaneously.Consider the factor of real-time, the VoIP system of many practicalities have employed packet loss concealment.Conventional PLC algorithm have quietly to substitute, packet replication technology, pattern match, pitch waveform copy and linear prediction etc.The present invention will adopt bidirectional linear to predict the bag-losing hide method transient suppression noise of (Bidirectional Linear Prediction, BLP).
Vaseghi etc. propose impulse noise detection based on linear prediction model and interpolation algorithm and Restrainable algorithms.This algorithm is divided into impulse noise detection and signal interpolation to repair two parts, and detecting portion comprises based on the linear prediction analysis of AR model, inverse filter and threshold detector.The output of detecting device is binary switch value, is used for controlling interpolator, if detect that impulsive noise exists, interpolator is activated and replaces contaminated sample value, and the method functional block diagram is as shown in the accompanying drawing 2 of instructions.
At monograph " Advanced Digital Signal Processing and Noise Reduction " (3rd editor.New York:Wiley, 2006) in, S.V.Vaseghi gives a kind of detection and suppressing method of impulsive noise, the major defect of the method: (a) not easily obtains due to the accurate model of a lot of one-dimensional signal (as voice), easily introduces harmonic distortion; B () cannot the less pulse signal of detected amplitude.
Phillip A.Hetherington and Shreyas A.Paranjpe is at patent of invention " Repetitive transient noise removal " (US patent:2006116873,2003) propose in and carry out modeling according to noise behavior, then utilize the related coefficient of the signal of modeling and signal to be detected to determine that whether data to be tested are containing noise, if there is noise, then according to modelled signal, the noise contribution in signal to be detected is removed.The process flow diagram of the method is as shown in Figure of description Fig. 3.
This technology is suitable for removing the noise with repeatability.And the type of transient noise is varied, when there is the transient noise of number of different types in the short time, modeling can be caused inaccurate, affect denoising effect.
Summary of the invention
The present invention is directed to the proposition of above problem, and develop a kind of method that in voice, transient noise suppresses.The present invention is directed to the transient noise in voice, based on the thought detecting-repair, gamma is adopted to lead to frequency cepstral coefficient (GFCC) and voice signal method for reconstructing, to improve the accuracy of detection of transient noise, propose a kind of voice transient noise denoising method, improve the voice quality of voice signal.
The technical scheme that the present invention takes is as follows:
A kind of method that in voice, transient noise suppresses: comprise three modules: gamma leads to frequency cepstral coefficient extraction module, transient noise detection module, voice signal reconstruction module;
Described gamma is led to frequency cepstral coefficient extraction module input end and receives noisy voice signal, output terminal is connected with transient noise detection module input end, described transient noise detection module output terminal is connected with the input end that voice signal rebuilds module, the input end that described voice signal rebuilds module just receives outside noisy voice signal, also be connected with described transient noise detection module output terminal, voice signal is rebuild module and is exported the voice after for denoising; Described gamma is led to frequency cepstral coefficient extraction module and may extract gamma noisy voice signal from input and lead to frequency cepstral coefficient, whether the difference that described transient noise detection module leads to frequency cepstral coefficient according to consecutive frame gamma adjudicates in current speech frame containing transient noise, if containing transient noise, then use voice signal reconstruction module reconstructs current speech frame, and replace current speech frame with this reconstructed speech frame, and export; If not containing transient noise, then do not process current speech frame, directly export.
The principle of the invention and beneficial effect: the Figure 12 in being illustrated from accompanying drawing, there is the undetected situation of transient noise in traditional detection---recovery technique, and voice flatness after repairing is bad, easily introduces new frequency component; As seen from Figure 13, the present invention can detection noise rebuild voice signal effectively, and the noise after reconstruction is residual few compared with traditional algorithm.
Accompanying drawing explanation
Fig. 1 is the GT wave filter amplitude-frequency response of 64 passages.
The detection of a kind of impulsive noise that Fig. 2 monograph " Advanced Digital Signal Processing and Noise Reduction " (3rd editor.New York:Wiley, 2006) provides and suppressing method block diagram.
The process flow diagram of Fig. 3 patent of invention " Repetitive transient noise removal " (US patent:2006116873,2003) method.
Fig. 4 functional block diagram of the present invention.
Fig. 5 gamma leads to frequency cepstral coefficient (GFCC) abstraction function block diagram.
The functional block diagram of the voice signal method for reconstructing that Fig. 6 predicts based on bidirectional linear.
The functional block diagram of Fig. 7 forward direction Periodical pitch detection method.
Fig. 8 transient noise inhibition (using SNRSeg metrics evaluation).
Fig. 9 transient noise inhibition (using segmentation logarithmic spectrum distortion LSDSeg to evaluate).
Figure 10 is not containing the sound spectrograph example of transient noise voice.
Voice sound spectrograph after Figure 11 adds noise in Figure 11 voice.
The sound spectrograph of a kind of detection of impulsive noise that Figure 12 uses S.V.Vaseghi to provide in monograph " Advanced Digital Signal Processing and Noise Reduction " (3rd editor.New York:Wiley, 2006) and the result of suppressing method process Figure 11 voice.
Figure 13 uses the sound spectrograph of the result of process Figure 11 voice of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described: gray-scale map can illustrate technique effect of the present invention, and spy provides gray-scale map so that technique effect of the present invention to be described, also better examines technique effect of the present invention for auditor simultaneously.Gray-scale map is Figure 10 to Figure 13.
The present invention program mainly comprises gamma and leads to frequency cepstral coefficient extraction module, transient noise detection module, voice signal reconstruction module, as shown in Figure 4.At transient noise detection-phase, utilize logical (Gammatone) wave filter of gamma of simulation people ear cochlea auditory model to extract gamma and lead to frequency cepstral coefficient (GFCC), lead to frequency cepstral coefficient difference according to the gamma between consecutive frame and carry out detected transient noise; If be detected as Noise frame, utilize correlativity and the short-term stationarity of voice signal, adopt receiving end bag-losing hide (PLC) algorithm based on bidirectional linear prediction, by the information of contiguous frames, waveform reconstruction is carried out to noisy speech frame.If be detected as non-noise frame, then do not do extra process, directly export.
Input sampling rate f sthe monophonic voices signal of=48kHz.Noisy input speech signal x (n) is expressed as
x(n)=s(n)+d(n),
Wherein, s (n) is clean speech signal, and d (n) is transient noise.Below technical solution of the present invention is described in detail.
Gamma is led to frequency cepstral coefficient (GFCC) and extracts
Concrete gamma leads to frequency cepstral coefficient (GFCC) abstraction function block diagram as shown in Figure 5, and step is as follows:
A () carries out pre-emphasis to original noisy speech x (n), strengthen high fdrequency component,
x e(n)=x(n)-αx(n-1),
Wherein, α is pre emphasis factor, and its span is 0< α <1, and the present invention advises that α value is 0.97,
B () gamma leads to the filtering of (Gammatone) bank of filters, use the filtering of following gamma bandpass filter group,
G i ( z ) T s - T s a 3 [ a 1 + ( 2 - 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 &times; T s - T s a 3 [ a 1 - ( 2 - 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2
&times; T s - T s a 3 [ a 1 + ( 2 + 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 &times; T s - T s a 3 [ a 1 - ( 2 + 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 , 1 &le; i &le; CH
= G 1 , i ( z ) &CenterDot; G 2 , i ( z ) &CenterDot; G 3 , i ( z ) &CenterDot; G 4 , i ( z )
a 1=cos(ω iT s),1≤i≤CH
a 2=sin(ω iT s),1≤i≤CH
a 3 = e - bT s
ω i=2πf i,1≤i≤CH
f i = ( f H + 228.7 ) exp ( i &times; v ) - 228.7
= ( f H + 228.7 ) exp ( i &times; ln f L + 228.7 f H + 228.7 CH ) - 228.7,1 &le; i &le; CH
v = ln f L + 228.7 f H + 228.7 CH
Wherein, T sfor the sampling period, CH represents the port number of gamma bandpass filter group, and the present invention advises CH=64.Parameter v is the overlap factor between each wave filter, is used for representing the overlapping degree between each wave filter, f hfor the cutoff frequency of bank of filters, its value is the sampling rate of input signal, and f lspan be 10 ~ 100Hz, the present invention suggestion get 50Hz; By x wn (), respectively by 64 wave filters, exports y after obtaining filtering i(n);
y i(n)=x w(n)*g 1,i(n)*g 2,i(n)*g 3,i(n)*g 4,i(n),i=0,1,…,63
Wherein, the convolution operation in ' * ' representative digit signal transacting field;
C () asks the energy of each passage frequency-region signal, framing is carried out to GT filterbank output signals, frame length is N, the span of N is 240≤N≤960, when the present invention advises N to be taken as 480(sample frequency is 48KHz, equivalent time length is 10 milliseconds), calculate each path filter output component in present frame logarithmic energy and;
E ( m ) = log e &Sigma; n = start start + N - 1 [ y i ( n ) ] 2 , m = 0,1 , . . . , 63
Wherein, strat represents the starting position of present frame in signal x (n);
D () is compressed each passage cepstrum energy with discrete cosine transform, obtain gamma and lead to frequency cepstral coefficient (Gammatone frequency cesptrum coefficient, GFCC).
C ( p ) ( 0 ) = 2 L &Sigma; m = 0 CH - 1 E ( m ) , l = 0
C ( p ) ( l ) = 2 L &Sigma; m = 0 CH - 1 E ( m ) cos [ &pi;l ( 2 m + 1 ) 2 CH ] , 1 &le; l < L
Wherein, L is the exponent number that gamma leads to frequency cepstral coefficient (GFCC), and the span of L is 16≤L≤64, and the present invention advises that L gets 32.
Walkaway
Leading to the difference of frequency cepstral coefficient by comparing gamma between consecutive frame, distinguishing speech frame and noise frame.
Detailed process is as follows:
Calculate present frame, i.e. p frame, the gamma of signal leads to frequency cepstral coefficient vector C (p)l () and former frame, i.e. p-1 frame, the level and smooth gamma of signal leads to frequency cepstral coefficient vector euclidean distance Dis,
Dis = &Sigma; l = 0 L - 1 [ C ( p ) ( l ) - C aver p - 1 ( l ) ] 2
Level and smooth gamma leads to frequency cepstral coefficient vector renewal process be
C aver p - 1 ( l ) = &beta; &CenterDot; C aver p - 2 ( l ) + ( 1 - &beta; ) &CenterDot; C ( p - 1 ) ( l )
Wherein, 0< β <1, be that gamma leads to frequency cepstral coefficient vector smoothing factor, the present invention advises β=0.6;
The soft-threshold decision method based on noise energy is adopted to detect noise frame, first present frame and former frame input signal ENERGY E (p), E (p-1) is calculated, according to signal energy setting threshold value thres=q [E (p)+E (p-1)]/2, the span of q is 0.01≤q≤100, the present invention advises that q gets 0.25, when gamma lead to frequency cepstral coefficient vector distance value Dis be greater than threshold value thres time, namely judge present frame there is transient noise.
Voice signal based on bidirectional linear prediction is rebuild
According to the waveform of adjacent voice, adopt interpolation reconstruction algorithm to generate by the speech frame of noise pollution, first bidirectional linear prediction is carried out to the front and back frame of noisy speech frame, and according to linear predictor coefficient design inverse filter, calculate residual signals; Again residual signals is calculated pitch period by pitch determination algorithm, according to residual signals and the pitch period of consecutive frame, produce the pumping signal of current noisy frame, according to the linear predictor coefficient of pumping signal and former frame, rebuild current frame speech signal, and carry out with consecutive frame signal fading in, the data smoothing of mode of fading out, reach the object suppressing transient noise in voice; Based on bidirectional linear prediction voice signal method for reconstructing functional block diagram as shown in Figure 6.
If D represents the time delay of output signal, carry out border fusion for current frame signal and consecutive frame, the span of D is 16≤D≤48, and Lev represents the exponent number of linear prediction filter, and the span of Lev is 10≤D≤30;
Voice signal method for reconstructing based on bidirectional linear prediction passes through the estimated value of the sampled point generation present frame of consecutive frame, therefore needs to store B the sampling point nearest with present frame as historical data, for estimating forward linear prediction coefficient and forward direction pumping signal, is designated as correspondingly, for the B after present frame sampling point data, for estimating backward linear predictor coefficient and backward pumping signal, the present invention advises that the value of D, Lev, B is respectively 24,20,1.5N; In order to describe and write easy, all symbols of claims remainder, all for present frame, no longer mark frame number p,
Concrete grammar is as follows:
(1) by the testing result of transient noise detection module, when detecting that present frame is noisy frame, and when former frame is non-noisy frame, linear prediction analysis is carried out to the historical data in buffer zone; First to x en () windowing, obtains the signal x after windowing w(n)=x e(n) w (n), window function is chosen as Hamming window w (n)=0.54-0.46cos [(2n+1) π/N], n=0, and 1 ..., N-1, x wn the autocorrelation function of () is,
r corr ( m ) = &Sigma; n = 0 B / 2 - 1 - m bu f f ( n + B / 2 ) &CenterDot; bu f f ( n + B / 2 + m ) , 0 &le; m < Lev
Then forward linear prediction coefficient is calculated according to Levinson – Durbin algorithm
(2) according to forward linear prediction coefficient design inverse filter, and right carry out filtering, obtain residual signals
e f ( n ) = bu f f ( n + Lev ) - &Sigma; i = 1 L a f ( i ) buf f ( n + Lev - i ) , 0 &le; n < B - Lev
(3) forward direction pitch determination
This method adopts residual signals to carry out pitch determination; The functional block diagram of forward direction pitch determination as shown in Figure 7.Pitch determination is as follows:
(a) low-pass filtering
Because pitch determination result is often subject to the impact of formant frequency, in order to eliminate the impact of resonance peak as far as possible, first low-pass filtering is carried out to residual signals, the resonance peak of filter out high frequency as far as possible, for different speaker, pitch period is generally distributed in 2 ~ 12ms, the cut-off frequecy of passband f of this low-pass filter pspan be 0.8kHz<f p<1.2kHz, the present invention arranges f pfor 0.9kHz,
The process of (b) center clipping;
The Pitch Information of voice signal is mainly hidden in envelope, and resonance peak information is present in low amplitude value part in a large number, for reducing the impact of resonance peak, adopt center clipping function to carry out Nonlinear Processing to the residual signals after low-pass filtering, center clipping function is defined as follows:
e fc ( n ) = e f ( n ) - T c , e f ( n ) > T c 0 , | e f ( n ) | &le; T c e f ( n ) + T c , e f ( n ) < - T c ,
Wherein, threshold value T cfor clipping level, T c=γ max{e fc(n), 0≤n<B-Lev}, it is 0.4 that the present invention's suggestion arranges γ;
C () pitch period is estimated
Calculate e fcn the normalized autocorrelation computing of (), namely at (P mIN, P mAX) search for auto-correlation maximum value position, as pitch period estimated value P in scope f,
r fc ( m ) = &Sigma; n = B - Lev - C B - Lev - 1 e fc ( n - m ) e fc ( n ) &Sigma; n = B - Lev - C B - Lev - 1 e fc ( n - m ) e fc ( n - m ) , P MIN &le; m &le; P MAX
P f = arg max P MIN &le; m &le; P MAX r fc ( m )
Wherein, C is the average length of auto-correlation computation, P mIN, P mAXrepresent minimum value and the maximal value of pitch period search respectively, suggestion C value is 150, and P mIN, P mAXvalue be respectively number of samples corresponding to 2ms and 12ms 96 and 576;
(4) backward pitch determination
If detecting present frame is non-noisy frame, adopts the method similar with forward direction pitch determination in step (3) and step, detect backward pitch period; First to backward buffer zone in data do linear prediction analysis, obtain backward linear predictor coefficient and design backward inverse filter according to backward linear predictor coefficient, right carry out backward filtering, obtain backward residual signals and pitch Detection is carried out to backward residual signals, obtain backward pitch period estimated value P b,
(5) pitch period correction
The pitch period caused for preventing frequency multiplication estimates inaccurate situation, and the present invention is to forward direction pitch period P fwith backward pitch period P bsmoothing process, and the pitch period P estimating present frame according to the pitch period after level and smooth c, detailed process is,
P c = P f + P b - P f 2 ,
Wherein, δ is pitch period difference decision threshold, and determined by the difference of voice signal consecutive frame pitch period, the present invention advises that δ value is 10;
(6) generation of present frame pumping signal;
The residual signals of consecutive frame and the pitch period of estimation is adopted to estimate the pumping signal of current noisy frame, respectively to e f(n) and e bn () is with P cfor the cycle carries out periodic extension, obtain the forward direction pumping signal of present frame with backward pumping signal
e ~ f ( n ) = e f ( B - Lev - P c - D + n ) , 0 &le; n < D + P c e ~ f ( n - P c ) , D + P c &le; n < N + 2 D
e ~ b ( n ) = e b ( n - ( N + D - P c ) ) , N + D - P c &le; n < N + 2 D e ~ b ( n + P c ) , N + D - P c > n &GreaterEqual; 0
In order to reconstruction signal and consecutive frame are carried out overlap-add, realize edge smoothing, it is the length that N+2D, D are overlapping region that the present invention sets pumping signal length,
(7) waveform reconstruction;
By pumping signal and the corresponding linear prediction model coefficient of generation, rebuild current frame speech signal,
x ~ f ( n ) = &Sigma; i = 1 Lev a f ( i ) x ~ f ( n - i ) + e ~ f ( n ) , 0 &le; n < N + 2 D x ~ b ( n ) = &Sigma; i = 1 Lev a b ( i ) x ~ b ( n - i ) + e ~ b ( n ) , N + 2 D > n &GreaterEqual; 0 ;
Be assigned in buffer zone from Lev the sampling point that present frame is nearest respectively with as the original state of reconstruction signal;
x ~ f ( n ) = buf f ( B - D + n ) , - L &le; n < - 1 x ~ b ( n ) = buf f ( n - ( N + 2 D ) ) , N + 2 D &le; n < N + 2 D + Lev ,
The N number of sampling point in centre of reconstruction signal replaces current noisy frame, and D the sampling point at two ends is used for carrying out overlap-add with front and back frame, carries out process of being fade-in fade-out, smooth reconstruct signal, ensures that reconstruction signal and two side datas have continuity,
(8) reconstruction signal and front and back frame boundaries merge
In order to the border of smooth reconstruct signal and consecutive frame, by forward direction buffer zone last D sampling point and forward direction reconstruction signal a front D sampling point carries out overlap-add by quarter window, and for last D sampling point upgrading former frame data and buffer zone buf f(n),
x ( p - 1 ) ( n ) = n + 1 D + 1 x ~ f ( n ) + D - n D + 1 buf f ( B - D - n ) , 0 &le; n < D ,
buf f ( n ) = x ( p - 1 ) ( n - B + 2 N ) , 0 &le; n < B - N x ~ f ( n - ( B - N ) ) , B - N &le; n < B ,
(a) linear weighted function
Forward linear prediction signal is higher for the degree of accuracy of prediction present frame first half, and backward linear prediction signal is just in time contrary, based on this, linear weighted function is carried out to two prediction signal and obtains final reconstruction signal, and when backward linear prediction disappearance, only with forward linear prediction signal, noisy speech frame is replaced
x ~ ( p ) ( n ) = N - n N + 1 x ~ f ( n + D ) + n + 1 N + 1 x ~ b ( n + D ) , 0 &le; n < N ,
The noisy situation of (b) continuous multiple frames signal:
When there is this type of situation, no longer calculating LP coefficient and the pumping signal of former frame, but replacing by the corresponding estimated value of former frame, then carry out the synthetic filtering in step (6) ~ (8), border is merged and linear weighted function.
The beneficial effect that technical solution of the present invention is brought
Use segmental signal-to-noise ratio SNR segwith segmentation logarithmic spectrum distortion LSD segcarry out transient noise and suppress outcome evaluation.Segmental signal-to-noise ratio, segmentation logarithmic spectrum distortion definition are respectively
SNR seg = 1 N t &Sigma; k = 1 N t 10 &CenterDot; log 10 &Sigma; n &Element; frm k | x ( n ) | 2 &Sigma; n &Element; frm k | x ^ ( n ) - x ( n ) | 2 ,
LSD seg = 1 N t &Sigma; l = 0 N t - 1 { 2 N &Sigma; k = 0 N / 2 - 1 [ 10 &CenterDot; log 10 TX ( k , l ) - 10 &CenterDot; log 10 T X ^ ( k , l ) ] 2 } 1 2 ,
Wherein, X is the short time DFT Fourier transform of raw tone, for the Short Time Fourier Transform of voice to be measured, N tfor the frame number of speech frame to be measured, TX is defined as follows:
TX(k,l)=max{|X(k,l)| 2,δ},
&delta; = 10 - 50 10 max k , l | X ( k , l ) | 2 ,
The beneficial effect that can obtain in the present invention
Here by technical scheme of the present invention and S.V.Vaseghi at monograph " Advanced Digital Signal Processing and Noise Reduction " (3rd editor.New York:Wiley, 2006) detection and the suppressing method of a kind of impulsive noise provided in compare, segmental signal-to-noise ratio and Spectrum Segmentation distortion the results are shown in Figure 8, Fig. 9.As seen from Figure 8, the present invention program is under three kinds of different input signal-to-noise ratios, and the increasing amount of its segmental signal-to-noise ratio is all higher than traditional detection---recovery technique; As seen from Figure 9, the Spectrum Segmentation distortion of the present invention program is less than traditional detection---recovery technique scheme, illustrate in frequency domain distortion, the performance of the program is better than conventional solution, but still have certain gap with raw tone, this is mainly because when all transient noise frames are all correctly detected, still there is distortion spectrum in reconstruction signal;
In sound spectrograph, Figure 10 ~ Figure 13 is respectively: raw tone sound spectrograph example, after Figure 10 voice add noise voice sound spectrograph, use the sound spectrograph of a kind of detection of impulsive noise of providing in monograph " Advanced Digital Signal Processing and Noise Reduction " (3rd editor.New York:Wiley, 2006) of S.V.Vaseghi and the result of suppressing method process Figure 11 voice, use the sound spectrograph of the result of process Figure 11 voice of the present invention.
As seen from Figure 12, there is the undetected situation of transient noise in traditional detection---recovery technique, and voice flatness after repairing is bad, easily introduces new frequency component; As seen from Figure 13, the present invention can detection noise rebuild voice signal effectively, and the noise after reconstruction is residual few compared with traditional algorithm.
The above; be only the present invention's preferably embodiment; but protection scope of the present invention is not limited thereto; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to according to technical scheme of the present invention and inventive concept thereof and replace or change; as sample frequency changes 44.1KHz, 32KHz, 16KHz, 8KHz etc. into by 48KHz, all should be encompassed within protection scope of the present invention.
The abbreviation that the present invention relates to and Key Term are defined as follows:
AR model: AutoRegressive, autoregressive model.
BLP:Bidirectional Linear Prediction, bidirectional linear is predicted.
DCT:Discrete Cosine Transform, discrete cosine transform.
FEC:Forward Error Correction, forward error correction technique.
GFCC:Gammatone Frequency Cepstrum Coefficient, Gammatone frequency cepstral coefficient.
LPF:Low Pass Filter, low-pass filter.
LSD:Log-spectrum Distortion, logarithmic spectrum distortion.
PLC:Packet Loss Concealment, packet loss concealment.
PLR:Packet Lost Recovery, loss recovery technology.
PSTN:Public Switched Telephone Network, public switch telephone network.
PWR:Pitch Waveform Replication, pitch period waveform copy.
SNR:Signal Noise Ratio, signal to noise ratio (S/N ratio).
VoIP:Voice over IP, based on the voice of IP network.

Claims (4)

1. the method that in voice, transient noise suppresses, is characterized in that: comprise three modules: gamma leads to frequency cepstral coefficient extraction module, transient noise detection module, voice signal reconstruction module; Described gamma is led to frequency cepstral coefficient extraction module input end and receives noisy voice signal, output terminal is connected with transient noise detection module input end, described transient noise detection module output terminal is connected with the input end that voice signal rebuilds module, the input end that described voice signal rebuilds module just receives outside noisy voice signal, also be connected with described transient noise detection module output terminal, voice signal is rebuild module and is exported the voice after for denoising; Described gamma is led to frequency cepstral coefficient extraction module and may extract gamma noisy voice signal from input and lead to frequency cepstral coefficient, whether the difference that described transient noise detection module leads to frequency cepstral coefficient according to consecutive frame gamma adjudicates in current speech frame containing transient noise, if containing transient noise, then use voice signal reconstruction module reconstructs current speech frame, and replace current speech frame with this reconstructed speech frame, and export; If not containing transient noise, then do not process current speech frame, directly export.
2. the method that in a kind of voice according to claim 1, transient noise suppresses, is characterized in that: the treatment step that gamma leads to frequency cepstral coefficient extraction module is as follows:
A (), to original noisy speech x (n) pre-emphasis, strengthens high fdrequency component; Defining original noisy speech signal is x (n), and the voice signal after pre-emphasis is x e(n),
x e(n)=x(n)-ax(n-1),
Wherein, a is pre emphasis factor, and α value is 0.97;
B the filtering of () gamma bandpass filter group, uses the filtering of following gamma bandpass filter group,
G i ( z ) = T s - T s a 3 [ a 1 + ( 2 - 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 &times; T s - T s a 3 [ a 1 - ( 2 - 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2
&times; T s - T s a 3 [ a 1 + ( 2 + 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 &times; T s - T s a 3 [ a 1 - ( 2 + 1 ) a 2 ] z - 1 1 - 2 a 1 a 3 z - 1 + a 3 2 z - 2 , 1 &le; i &le; CH
= G 1 , i ( z ) &CenterDot; G 2 , i ( z ) &CenterDot; G 3 , i ( z ) &CenterDot; G 4 , i ( z )
a 1=cos(ω iT s),1≤i≤CH,
a 2=sin(ω iT s),1≤i≤CH,
a 3 = e - bT s ,
ω i=2πf i,1≤i≤CH,
f i = ( f H + 228.7 ) exp ( i &times; v ) - 228.7
= ( f H + 228.7 ) exp ( i &times; ln f L + 228.7 f H + 228.7 CH ) - 228.7,1 &le; i &le; CH ,
v = ln f L + 228.7 f H + 228.7 CH ,
Wherein, T sfor the sampling period, CH represents the port number of gamma bandpass filter group, CH=64; Parameter v is the overlap factor between each wave filter, is used for representing the overlapping degree between each wave filter, f hfor the cutoff frequency of bank of filters, its value is the sampling rate of input signal, and the span of fL is 10 ~ 100Hz, f lget 50Hz; By x wn (), respectively by 64 wave filters, exports y after obtaining filtering i(n);
y i(n)=x w(n)*g 1,i(n)*g 2,i(n)*g 3,i(n)*g 4,i(n),i=0,1,…,63;
Wherein, the convolution operation in ' * ' representative digit signal transacting field;
C () asks the energy of each passage frequency-region signal, framing is carried out to GT filterbank output signals, frame length is N, the span of N is 240≤N≤960, N is taken as 480, when sample frequency is 48KHz, its equivalent time length is 10 milliseconds, calculate each path filter output component in present frame logarithmic energy and;
E ( m ) = log e &Sigma; n = start start + N - 1 [ y i ( n ) ] 2 , m = 0,1 , . . . , 63
Wherein, strat represents the starting position of present frame in signal x (n);
D () is compressed each passage cepstrum energy with discrete cosine transform, obtain gamma and lead to frequency cepstral coefficient;
c ( p ) ( 0 ) = 2 L &Sigma; m = 0 CH - 1 E ( m ) , l = 0
C ( p ) ( l ) = 2 L &Sigma; m = 0 CH - 1 E ( m ) cos [ &pi;l ( 2 m + 1 ) 2 CH ] , 1 &le; l < L
Wherein, L is the exponent number that gamma leads to frequency cepstral coefficient, and the span of L is 16≤L≤64, and L gets 32.
3. the method that in a kind of voice according to claim 1, transient noise suppresses, is characterized in that: the testing process of transient noise detection module is as follows:
Calculate present frame, i.e. p frame, the gamma of signal leads to frequency cepstral coefficient vector C (p)l () and former frame, i.e. p-1 frame, the level and smooth gamma of signal leads to frequency cepstral coefficient vector euclidean distance Dis,
Dis = &Sigma; l = 0 L - 1 [ C ( p ) ( l ) - C aver p - 1 ( l ) ] 2 ;
Level and smooth gamma leads to frequency cepstral coefficient vector renewal process be
C aver p - 1 ( l ) = &beta; &CenterDot; C aver p - 2 ( l ) + ( 1 - &beta; ) &CenterDot; C ( p - 1 ) ( l ) ;
Wherein, β is the smoothing factor that gamma leads to frequency cepstral coefficient, its β=0.6; The soft-threshold decision method based on noise energy is adopted to detect noise frame, first present frame and former frame input signal ENERGY E (p), E (p-1) is calculated, according to signal energy setting threshold value thres=q [E (p)+E (p-1)]/2, q gets 0.25, when gamma lead to frequency cepstral coefficient vector distance value Dis be greater than threshold value thres time, namely judge present frame there is transient noise.
4. the method that in a kind of voice according to claim 1, transient noise suppresses, is characterized in that: the disposal route that voice signal rebuilds module is as follows:
According to the waveform of adjacent voice, adopt interpolation reconstruction algorithm to generate by the speech frame of noise pollution, first bidirectional linear prediction is carried out to the front and back frame of noisy speech frame, and according to linear predictor coefficient design inverse filter, calculate residual signals; Again residual signals is calculated pitch period by pitch determination algorithm, according to residual signals and the pitch period of consecutive frame, produce the pumping signal of current noisy frame, according to the linear predictor coefficient of pumping signal and former frame, rebuild current frame speech signal, and carry out with consecutive frame signal fading in, the data smoothing of mode of fading out, reach the object suppressing transient noise in voice; If D represents the time delay of output signal, carry out border fusion for current frame signal and consecutive frame, the span of D is 16≤D≤48, and Lev represents the exponent number of linear prediction filter, and the span of Lev is 10≤D≤30; Voice signal method for reconstructing based on bidirectional linear prediction passes through the estimated value of the sampled point generation present frame of consecutive frame, therefore needs to store B the sampling point nearest with present frame as historical data, for estimating forward linear prediction coefficient and forward direction pumping signal, is designated as correspondingly, for the B after present frame sampling point data, for estimating backward linear predictor coefficient and backward pumping signal, the value of D, Lev, B is respectively 24,20,1.5N; In order to describe and write easy, all symbols of claims remainder, all for present frame, no longer mark frame number p,
Concrete grammar is as follows:
(1) by the testing result of transient noise detection module, when detecting that present frame is noisy frame, and when former frame is non-noisy frame, linear prediction analysis is carried out to the historical data in buffer zone; First to x en () windowing, obtains the signal x after windowing w(n)=x e(n) w (n), window function is chosen as Hamming window w (n)=0.54-0.46cos [(2n+1) π/N], n=0, and 1 ..., N-1, x wn the autocorrelation function of () is,
r corr ( m ) = &Sigma; n = 0 B / 2 - 1 - m bu f f ( n + B / 2 ) &CenterDot; buf f ( n + B / 2 + m ) , 0 &le; m < Lev ;
Then forward linear prediction coefficient is calculated according to Levinson – Durbin algorithm
(2) according to forward linear prediction coefficient design inverse filter, and right carry out filtering, obtain residual signals
e f ( n ) = buf f ( n + Lev ) - &Sigma; i = 1 L a f ( i ) buf f ( n + Lev - i ) , 0 &le; n < B - Lev ;
(3) forward direction pitch determination
This method adopts residual signals to carry out pitch determination; Pitch determination is as follows:
(a) low-pass filtering
Because pitch determination result is often subject to the impact of formant frequency, in order to eliminate the impact of resonance peak as far as possible, first low-pass filtering is carried out to residual signals, the resonance peak of filter out high frequency as far as possible, for different speaker, pitch period is generally distributed in 2 ~ 12ms, therefore, and the cut-off frequecy of passband f of this low-pass filter pspan be 0.8kHz<f p<1.2kHz, f pfor 0.9kHz,
The process of (b) center clipping
The Pitch Information of voice signal is mainly hidden in envelope, and resonance peak information is present in low amplitude value part in a large number, for reducing the impact of resonance peak, adopt center clipping function to carry out Nonlinear Processing to the residual signals after low-pass filtering, center clipping function is defined as follows:
e fc ( n ) = e f ( n ) - T c , e f ( n ) > T c 0 , | e f ( n ) | &le; T c e f ( n ) + T c , e f ( n ) < - T c ,
Wherein, threshold value T cfor clipping level, T c=γ max{e fc(n), 0≤n<B-Lev}, γ are 0.4;
C () pitch period is estimated
Calculate e fcn the normalized autocorrelation computing of (), namely at (P mIN, P mAX) search for auto-correlation maximum value position, as pitch period estimated value P in scope f,
r fc ( m ) = &Sigma; n = B - Lev - C B - Lev - 1 e fc ( n - m ) e fc ( n ) &Sigma; n = B - Lev - C B - Lev - 1 e fc ( n - m ) e fc ( n - m ) , P MIN &le; m &le; P MAX ;
P f = arg max P MIN &le; m &le; P MAX r fc ( m ) ,
Wherein, C is the average length of auto-correlation computation, and the span of C is 120<C<240, C value is 150, P mIN, P mAXrepresent minimum value and the maximal value of pitch period search respectively, and P mIN, P mAXvalue be respectively number of samples corresponding to 2ms and 12ms 96 and 576;
(4) backward pitch determination
If detecting present frame is non-noisy frame, adopts the method similar with forward direction pitch determination in step (3) and step, detect backward pitch period; First to backward buffer zone in data do linear prediction analysis, obtain backward linear predictor coefficient and design backward inverse filter according to backward linear predictor coefficient, right carry out backward filtering, obtain backward residual signals and pitch Detection is carried out to backward residual signals, obtain backward pitch period estimated value P b;
(5) pitch period correction
The pitch period caused for preventing frequency multiplication estimates inaccurate situation, to forward direction pitch period P fwith backward pitch period P bsmoothing process, and the pitch period P estimating present frame according to the pitch period after level and smooth c, detailed process is,
P c = P f + P b - P f 2 ,
Wherein, δ is pitch period difference decision threshold, and determined by the difference of voice signal consecutive frame pitch period, δ value is 10;
(6) generation of present frame pumping signal;
The residual signals of consecutive frame and the pitch period of estimation is adopted to estimate the pumping signal of current noisy frame, respectively to e f(n) and e bn () is with P cfor the cycle carries out periodic extension, obtain the forward direction pumping signal of present frame with backward pumping signal
e ~ f ( n ) = e f ( B - Lev - P c - D + n ) , 0 &le; n < D + P c e ~ f ( n - P c ) , D + P c &le; n < N + 2 D
e ~ b ( n ) = e b ( n - ( N + D - P c ) ) , N + D - P c &le; n < N + 2 D e ~ b ( n + P c ) , N + D - P c > n &GreaterEqual; 0
In order to reconstruction signal and consecutive frame are carried out overlap-add, realize edge smoothing, setting pumping signal length is the length that N+2D, D are overlapping region,
(7) waveform reconstruction
By pumping signal and the corresponding linear prediction model coefficient of generation, rebuild current frame speech signal,
x ~ f ( n ) = &Sigma; i = 1 Lev a f ( i ) x ~ f ( n - i ) + e ~ f ( n ) , 0 &le; n < N + 2 D x ~ b ( n ) = &Sigma; i = 1 Lev a b ( i ) x ~ b ( n - i ) + e ~ b ( n ) , N + 2 D > n &GreaterEqual; 0 ;
Be assigned in buffer zone from Lev the sampling point that present frame is nearest respectively with as the original state of reconstruction signal;
x ~ f ( n ) = bu f f ( B - D + n ) , - L &le; n < - 1 x ~ b ( n ) = buf b ( n - ( N + 2 D ) ) , N + 2 D &le; n < N + 2 d + Lev ,
The N number of sampling point in centre of reconstruction signal replaces current noisy frame, and D the sampling point at two ends is used for carrying out overlap-add with front and back frame, carries out process of being fade-in fade-out, smooth reconstruct signal, ensures that reconstruction signal and two side datas have continuity,
(8) reconstruction signal and front and back frame boundaries merge
In order to the border of smooth reconstruct signal and consecutive frame, by forward direction buffer zone last D sampling point and forward direction reconstruction signal a front D sampling point carries out overlap-add by quarter window, and for last D sampling point upgrading former frame data and buffer zone buf f(n),
x ( p - 1 ) ( n ) = n + 1 D + 1 x ~ f ( n ) + D - n D + 1 buf f ( B - D - n ) , 0 &le; N < D ,
buf f ( n ) = x ( p - 1 ) ( n - B + 2 N ) , 0 &le; n < B - N x ~ f ( n - ( B - N ) ) , B - N &le; n < B ,
(a) linear weighted function
Forward linear prediction signal is higher for the degree of accuracy of prediction present frame first half, and backward linear prediction signal is just in time contrary, based on this, linear weighted function is carried out to two prediction signal and obtains final reconstruction signal, and when backward linear prediction disappearance, only with forward linear prediction signal, noisy speech frame is replaced
x ~ ( p ) ( n ) = N - n N + 1 x ~ f ( n + D ) + n + 1 N + 1 x ~ b ( n + D ) , 0 &le; n < N ,
The noisy situation of (b) continuous multiple frames signal:
When there is this type of situation, no longer calculating LP coefficient and the pumping signal of former frame, but replacing by the corresponding estimated value of former frame, then carry out the synthetic filtering in step (6) ~ (8), border is merged and linear weighted function.
CN201310590841.8A 2013-11-20 2013-11-20 Method for inhibiting transient noise in voice Pending CN104658544A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310590841.8A CN104658544A (en) 2013-11-20 2013-11-20 Method for inhibiting transient noise in voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310590841.8A CN104658544A (en) 2013-11-20 2013-11-20 Method for inhibiting transient noise in voice

Publications (1)

Publication Number Publication Date
CN104658544A true CN104658544A (en) 2015-05-27

Family

ID=53249584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310590841.8A Pending CN104658544A (en) 2013-11-20 2013-11-20 Method for inhibiting transient noise in voice

Country Status (1)

Country Link
CN (1) CN104658544A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656448A (en) * 2015-12-28 2016-06-08 广东工业大学 Audio filter
CN106651500A (en) * 2016-10-12 2017-05-10 大连文森特软件科技有限公司 Online shopping system based on space feedback characteristic-based video image identification technology and virtual reality technology
CN107924684A (en) * 2015-12-30 2018-04-17 谷歌有限责任公司 Use the acoustics keystroke transient state arrester of the communication terminal of half-blindness sef-adapting filter model
CN109413038A (en) * 2018-09-19 2019-03-01 华中科技大学 A kind of single channel encryption transmission method
CN110400573A (en) * 2018-04-25 2019-11-01 华为技术有限公司 A kind of method and device of data processing
CN110503973A (en) * 2019-08-28 2019-11-26 浙江大华技术股份有限公司 Audio signal Transient Noise suppressing method, system and storage medium
CN113238206A (en) * 2021-04-21 2021-08-10 中国科学院声学研究所 Signal detection method and system based on decision statistic design
CN113345469A (en) * 2021-05-24 2021-09-03 北京小米移动软件有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN116312545A (en) * 2023-05-26 2023-06-23 北京道大丰长科技有限公司 Speech recognition system and method in a multi-noise environment

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105656448B (en) * 2015-12-28 2018-12-25 广东工业大学 A kind of tone filter
CN105656448A (en) * 2015-12-28 2016-06-08 广东工业大学 Audio filter
CN107924684A (en) * 2015-12-30 2018-04-17 谷歌有限责任公司 Use the acoustics keystroke transient state arrester of the communication terminal of half-blindness sef-adapting filter model
CN107924684B (en) * 2015-12-30 2022-01-11 谷歌有限责任公司 Acoustic keystroke transient canceller for communication terminals using semi-blind adaptive filter models
CN106651500A (en) * 2016-10-12 2017-05-10 大连文森特软件科技有限公司 Online shopping system based on space feedback characteristic-based video image identification technology and virtual reality technology
CN106651500B (en) * 2016-10-12 2021-03-30 大连知行天下网络科技有限公司 Online shopping system based on video image recognition technology and virtual reality technology of spatial feedback characteristics
CN110400573B (en) * 2018-04-25 2022-02-01 华为技术有限公司 Data processing method and device
CN110400573A (en) * 2018-04-25 2019-11-01 华为技术有限公司 A kind of method and device of data processing
CN109413038A (en) * 2018-09-19 2019-03-01 华中科技大学 A kind of single channel encryption transmission method
CN110503973A (en) * 2019-08-28 2019-11-26 浙江大华技术股份有限公司 Audio signal Transient Noise suppressing method, system and storage medium
CN110503973B (en) * 2019-08-28 2022-03-22 浙江大华技术股份有限公司 Audio signal transient noise suppression method, system and storage medium
CN113238206A (en) * 2021-04-21 2021-08-10 中国科学院声学研究所 Signal detection method and system based on decision statistic design
CN113238206B (en) * 2021-04-21 2022-02-22 中国科学院声学研究所 Signal detection method and system based on decision statistic design
CN113345469A (en) * 2021-05-24 2021-09-03 北京小米移动软件有限公司 Voice signal processing method and device, electronic equipment and storage medium
CN116312545A (en) * 2023-05-26 2023-06-23 北京道大丰长科技有限公司 Speech recognition system and method in a multi-noise environment
CN116312545B (en) * 2023-05-26 2023-07-21 北京道大丰长科技有限公司 Speech recognition system and method in a multi-noise environment

Similar Documents

Publication Publication Date Title
CN103440871B (en) A kind of method that in voice, transient noise suppresses
CN104658544A (en) Method for inhibiting transient noise in voice
CN103440872B (en) The denoising method of transient state noise
KR101461774B1 (en) A bandwidth extender
Seneff Real-time harmonic pitch detector
US10832701B2 (en) Pitch detection algorithm based on PWVT of Teager energy operator
US20050288923A1 (en) Speech enhancement by noise masking
CN103440869B (en) Audio-reverberation inhibiting device and inhibiting method thereof
KR100549133B1 (en) Noise reduction method and device
Krishnamoorthy et al. Enhancement of noisy speech by temporal and spectral processing
CN101083640A (en) Low complexity noise reduction method
CN106340292A (en) Voice enhancement method based on continuous noise estimation
Drugman Residual excitation skewness for automatic speech polarity detection
CN110767244A (en) Speech enhancement method
Kim et al. End-to-end multi-task denoising for joint SDR and PESQ optimization
Daqrouq et al. An investigation of speech enhancement using wavelet filtering method
CN104658543A (en) Method for eliminating indoor reverberation
CN110349598A (en) A kind of end-point detecting method under low signal-to-noise ratio environment
Zhang et al. A novel fast nonstationary noise tracking approach based on MMSE spectral power estimator
US7890319B2 (en) Signal processing apparatus and method thereof
Chen et al. Online monaural speech enhancement based on periodicity analysis and a priori SNR estimation
Yegnanarayana et al. Study of robustness of zero frequency resonator method for extraction of fundamental frequency
EP0336685A2 (en) Impulse noise detection and supression
Yuan et al. Noise estimation based on time–frequency correlation for speech enhancement
Unoki et al. MTF-based power envelope restoration in noisy reverberant environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150527