CN103440872A - Transient state noise removing method - Google Patents

Transient state noise removing method Download PDF

Info

Publication number
CN103440872A
CN103440872A CN2013103572116A CN201310357211A CN103440872A CN 103440872 A CN103440872 A CN 103440872A CN 2013103572116 A CN2013103572116 A CN 2013103572116A CN 201310357211 A CN201310357211 A CN 201310357211A CN 103440872 A CN103440872 A CN 103440872A
Authority
CN
China
Prior art keywords
frame
buf
pitch
data
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103572116A
Other languages
Chinese (zh)
Other versions
CN103440872B (en
Inventor
陈喆
殷福亮
周文颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201310357211.6A priority Critical patent/CN103440872B/en
Publication of CN103440872A publication Critical patent/CN103440872A/en
Application granted granted Critical
Publication of CN103440872B publication Critical patent/CN103440872B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention discloses a transient state noise removing method and belongs to the technical field of signal processing. The transient state noise removing method comprises the steps of calculating the Mel-frequency cepstrum coefficient of current frame signals, and predicting the pitch period of the current frame signals; utilizing the Mel-frequency cepstrum coefficient to detect whether noise exist in the current frame signals, and utilizing the pitch period predicting value to rebuild the current frame signals if noise exists.

Description

The denoising method of transient noise
Technical field
The present invention relates to the denoising method of transient noise, belong to the signal processing technology field.
Background technology
Transient state additive noise in sound signal, also referred to as transient noise, or impulsive noise.Usually, transient noise in time domain, be discontinuous, intermittently, pulsed, noise energy mainly concentrates in shorter time interval, in this interval, the energy of the energy Ratios purified signal of transient noise is obviously much larger.Typical transient noise impacts sound etc. as desk knock, the sound of closing the door, brouhaha, keyboard keystroke sound, mousebutton sound, hammer, and they often appear at a lot of application scenarios, as osophone, mobile phone, video conference equipment etc.The existence of transient noise has a strong impact on audio quality, therefore, is necessary to take measures transient noise is suppressed, to strengthen the quality of audio frequency.Current noise suppression algorithm is for steady-state noise and continuing noise situation mostly, usually use method described in document " research of voice enhancing and correlation technique thereof " to carry out the voice enhancing, as spectrum-subtraction, auto adapted filtering method etc., but these algorithms are helpless to above-mentioned transient noise, substantially there is no inhibition.
Summary of the invention
The present invention is directed to the proposition of above problem, and the denoising method of development transient noise.
The technical scheme that the present invention takes is: the Mel cepstrum coefficient that at first calculates this frame signal, predict the pitch period of this frame signal simultaneously, whether then with the Mel cepstrum coefficient, detect this frame signal exists noise to carry out walkaway, if there is noise, by the pitch period predicted value, carry out waveform reconstruction.
Beneficial effect of the present invention: use 20 first clean speech audio frequency (comprising man, woman, children speech audio frequency) and the noise audio frequency of 4 types to be tested, noise type is respectively: mouse sound, knock, metronome sound, keyboard sound.The duration of four kinds of noises is respectively: mouse sound is 10ms, and knock, metronome sound are 20ms, and keyboard sound is 30ms.The pure audio frequency of every head is added respectively to this 4 kinds of noises, obtain 80 first noisy audio frequency.It is 30 that every first audio frequency adds the number of noise, and the distance between noise equates.The sampling rate of all audio frequency is f s=48kHz, frame length is N=480.The MFCC calculation stages, be NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 dimension MFCC; The transient noise detection-phase, adaptive threshold is set to Thres=constener, for making thresholding, is applicable to all noises, and constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; The pitch period estimation stages, search pitch period in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, the points N of being fade-in fade-out 1, N 2be 32, buffer zone buf (n) length is 2240.After using the present invention to carry out denoising to noisy speech, increase substantially the intelligibility of voice, reduced hearer's sense of fatigue.Use segmental signal-to-noise ratio SNR segwith two kinds of indexs of PEAQ, this method denoising effect being carried out to assessment result sees shown in the Figure 12 and Figure 13 in the accompanying drawing explanation.
The accompanying drawing explanation
The relation of Fig. 1 Mel frequency and linear frequency.
The technical scheme flow process of Fig. 2 prior art one.
The technical scheme flow process of Fig. 3 prior art two.
Fig. 4 the technical program block diagram.
Fig. 5 MFCC feature extraction block diagram.
Fig. 6 Mel frequency filter group.
Fig. 7 pitch period is estimated block diagram.
The linear interpolation of Fig. 8 point-to-point transmission.
Signal when Fig. 9 (a) present frame is not repaired.
The new pitch cycle waveform pw of Fig. 9 (b) (p)(n).
Signal after the reparation of Fig. 9 (c) present frame.
Signal when Figure 10 (a) present frame is not repaired.
Figure 10 (b) current frame signal.
Signal after Figure 10 (c) repairs.
Figure 11 (a) denoising front signal.
Signal after Figure 11 (b) denoising.
Figure 12 denoising effect evaluation form (SNR).
Figure 13 denoising effect assessment (PEAQ).
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described:
The Mel cepstrum coefficient:
Research to people's hearing mechanism finds, people's ear has different ear sensitivities to the sound wave of different frequency, and to the voice signal between 5kHz, the sharpness to voice has the greatest impact at 200Hz.In addition, people's ear has masking effect, and the voice signal that energy is large has certain effect of covering to weak voice signal.Usually, the audio frequency of the audio masking upper frequency of lower frequency is easy, otherwise more difficult, that is to say, little at the critical bandwidth higher-frequency end of the sound mask at low frequency place.Accordingly, people according to the size of critical bandwidth by close to one group of bandpass filter of rare arrangement, input signal is carried out to filtering.If by the signal essential characteristic of the energy of each bandpass filter output signal, after this feature further being processed, just can be used as the feature of voice, Here it is Mel cepstrum coefficient (MFCC).This feature does not rely on the character of signal, input signal is not done to any hypothesis and restriction, utilized again the auditory perception property of people's ear simultaneously, therefore, with the linear prediction cepstrum coefficient coefficient (LPCC) based on channel model, compare, it has better robustness, and, when signal to noise ratio (S/N ratio) is low, still has speech recognition performance preferably.
MFCC is the cepstrum parameter extracted in Mel scale frequency territory.The Mel scale has been described the nonlinear characteristic of people's ear frequency, but the relation approximate representation of it and frequency is
f mel = 2595 log 10 ( 1 + f linear 700 ) - - - ( 18 )
In formula, f is frequency, and unit is Hz.Be the relation of Mel frequency and linear frequency shown in Fig. 1, along with f linearlinear growth, f melthe form of logarithm increases.
The letter packet loss concealment:
At the voice communication system based on the IP agreement, in the voice based on IP network (VoIP), due to network congestion or transmitting procedure delay jitter, can cause the letter packet loss, be that some letter bag can not appear at receiving end on time, have a strong impact on the voice quality of receiving end.Therefore, at receiving end, must take some measures, to reduce the voice distortion caused because of the letter packet loss.Usually, the measure of this processing packet loss problem is called letter packet loss concealment algorithm (PLC) algorithm.
The PLC algorithm mainly is divided into Processing Algorithm and two classes of the Processing Algorithm based on receiving end based on transmitting terminal.Based on transmitting terminal PLC algorithm, by the sending and receiving two ends, jointly participated in; Based on receiving end PLC algorithm, the letter bag only normally received according to receiving end, the coded system of losing the letter packet number and knowing in advance, recover original voice as far as possible.Because the PLC technology based on receiving end does not need the relevant data of transmitting terminal, so can not increase flow and the time delay of network.PLC method based on receiving end commonly used has quiet alternative method, last letter bag repetition methods, template matching method, pitch waveform clone method and linear prediction method etc.
Pitch cycle waveform in this paper copies (PWR) method, belongs to the PLC method based on receiving end.
Prior art one related to the present invention
The technical scheme of prior art one
What will is brave waits in paper " voice based on Kalman filtering under impulse noise environment strengthen ", has proposed the sound enhancement method under a kind of transient noise environment.Whether the process flow diagram of the method as shown in Figure 2, is at first found out the frequency range of transient noise sample energy and the ratio maximum of signals and associated noises sample energy, then utilizes the energy distribution situation of this frequency range, differentiate frame by frame voice signal and disturbed by transient noise; On this basis, the speech frame that the method is disturbed for transient noise, the application Kalman filtering algorithm carries out denoising; In addition, the method is improved autoregression (AR) model parameter estimation process.
The shortcoming of prior art one
(1), for the longer noise that trails, the hangover part likely detects not out.
(2) when denoising, Kalman filtering used is applicable to steady-state noise is carried out to denoising, be not suitable for the transient noise of non-stationary, so denoising effect is limited, and noise is residual more, has affected voice quality.
Prior art two related to the present invention
The technical scheme of prior art two
Hetherington etc., in patent of invention " Repetitive transient noise removal ", propose a kind of transient noise inhibition method.The process flow diagram of Hetherington method as shown in Figure 3.The method is first carried out modeling according to noise behavior, then utilizes the related coefficient of modeling signal and signal to be detected to determine whether data to be tested contain noise, if there is noise, according to the modeling signal, the noise contribution in signal to be detected is removed.
The shortcoming of prior art two
The Hetherington method can be carried out denoising to the noise repeated effectively, but because the transient noise type is varied, while having the transient noise of number of different types within the short time, can cause modeling inaccurate, now the denoising effect of Hetherington method is poor.
Elaborating of technical solution of the present invention
Technical matters to be solved by this invention
The audio frequency that transient noise is disturbed carries out the voice enhancing, and the transient suppression noise improves voice quality, improves the audio frequency intelligibility.
Complete skill scheme provided by the invention:
Fig. 4 is shown in by the technical solution of the present invention block diagram: utilize input audio signal, extract the MFCC parameter; Then detect in sound signal whether contain noise by the MFCC parameter; If testing result, for containing noise, is replaced noisy frame data by the PWR method, carry out waveform reconstruction; If testing result is Noise not, former state output of sound signal.
The technical solution of the present invention performing step:
The sampling rate of input monophonic audio signal is f s=48kHz.Input noisy sound signal x (n) and can be expressed as x (n)=s (n)+d (n), wherein s (n) means the clean speech signal, and d (n) means the transient noise signal.
(1) the MFCC feature extraction of sound signal
As shown in Figure 5, gray-scale map can better be understood technique effect of the present invention to the leaching process of MFCC, and the spy provides gray-scale map that technique effect of the present invention is described.In order to allow the clearer understanding of auditor technique effect spy of the present invention provide gray-scale map Fig. 5 that technique effect of the present invention is described.For your guidance.At first time-domain audio signal is carried out to time-frequency conversion, calculate its energy spectrum; Then the triangle filter group of this energy spectrum and Mel scale is multiplied each other, then the logarithm energy of multiplied result is done to discrete cosine transform (DCT), the front L dimensional vector obtained like this is called MFCC, calculates the concrete steps of MFCC:
1) input signal divides frame, and frame length is made as 10ms, because sample frequency is f s=48kHz, so the data length of a frame is the N=480 point; Then data are carried out to normalization: if the signal quantization figure place is 16bit, by data divided by 2 15, the scope of data is narrowed down to (1,1), complete the normalization of data.If current frame signal is the p frame signal, have
x (p)(n)=x[p·(N-1)+n],n=0,1,…,N-1 (19)
2) pre-service.Current frame signal is carried out to pre-emphasis and windowing process,
y (p)(n)=x (p)(n)-βx (p)(n-1) (20)
y w ( p ) ( n ) = y ( p ) ( n ) w ( n ) - - - ( 21 )
Pre-emphasis factor-beta=0.938 wherein; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n π/N).
3) pretreated signal is done to N=1024 point FFT, obtain frequency-region signal Y (p)(k).
4) calculate frequency-region signal Y (p)(k) energy spectrum | Y (p)(k) | 2.
5) energy spectrum of frequency-region signal is passed through to the triangle filter group H of one group of Mel scale, carry out frequency domain filtering.
In bank of filters, M wave filter arranged, each wave filter is triangular filter, overlapped between wave filter, as shown in Figure 6: the centre frequency of each wave filter is f (m), m=1,2 ..., M, the present invention gets M=24.Filter design method: by input signal end frequency f s/ 2, i.e. 24kHz, through type (1) transforms to Mel scale frequency territory, obtains F smel; By interval (0, F smel) be divided into 25 parts, remove 0 and F smeltwo end points, 24 remaining cut-points are respectively as the centre frequency of 24 wave filters.Each cut-point f (m) is evenly distributed in the Mel scale frequency, then through type (1) transforms to linear frequency scale.After conversion, the interval between f (m) dwindles along with reducing of m value, the broadening along with the increase of m value.
According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
H ( m , k ) = 0 , f ( k ) < f ( m + 1 ) 2 [ f ( k ) - f ( m - 1 ) ] [ f ( m + 1 ) - f ( m - 1 ) ] [ f ( m ) - f ( m - 1 ) ] , f ( m - 1 ) &le; f ( k ) < f ( m ) 2 [ f ( m + 1 ) - f ( k ) ] [ f ( m + 1 ) - f ( m - 1 ) ] [ f ( m + 1 ) - f ( m ) ] , f ( m ) &le; f ( k ) &le; f ( m + 1 ) 0 , f ( k ) > f ( m + 1 ) - - - ( 22 )
6) calculate energy and the logarithm of each filters H (m, k) output, obtain E (m),
E ( m ) = log 10 [ &Sigma; k H ( m , k ) | Y ( p ) ( k ) | 2 ] , m = 1,2 , &CenterDot; &CenterDot; &CenterDot; , M - - - ( 23 )
E (m) is done to discrete cosine transform, can obtain L=12 rank MFCC, be designated as C (l)
C ( p ) ( 0 ) = 2 L &Sigma; m = 0 M - 1 E ( m ) , l = 0 (24)
C ( p ) ( l ) = 2 L &Sigma; m = 0 M - 1 E ( m ) cos ( &pi;l ( 2 m + 1 ) 2 M ) , 1 &le; l &le; L - 1
(2) walkaway:
Euclidean distance dist between the MFCC of calculating current frame signal and the MFCC of former frame signal
dist = &Sigma; l = 0 L [ C ( p ) ( l ) - C ( p - 1 ) ( l ) ] 2 , - - - ( 25 )
Judge according to distance value and threshold T hres whether present frame contains noise.Threshold T hres is determined by the following formula self-adaptation
Thres=10·ener, (26)
Wherein ener is the energy after each frame signal normalization, and its minimum value is made as to 60.0.
After detection completes, upgrade the MFCC feature of present frame,
C (p)(l)=b·C (p-1)(l)+(1-b)·C (p-1)(l), (27)
Forgetting factor b=0.4 wherein.When the next frame of noise frame is speech frame, this update method can prevent flase drop.
(3) pitch period prediction:
Each frame voice signal is estimated to pitch period.If present frame is noise frame, according to the pitch period of front cross frame signal, predict the present frame pitch period.Pitch period is estimated block diagram as shown in Figure 7: for different speakers, pitch period is generally in 2-12ms, and therefore, this paper searches for pitch period in 2-12ms.If PMAX is the corresponding data amount check of 12ms, i.e. PMAX=576; PMIN is the corresponding data amount check of 2ms, i.e. PMIN=96.Use the buffer zone buf (n) that length is 3PMAX+N=2208 to estimate pitch period, wherein buffer zone buf (n) is used for storing the data of having exported.
The pitch period method of estimation is as follows:
1) buf (n) is carried out to low-pass filtering, obtain buf d(n).Wherein the cutoff frequency of low-pass filter (LPF) is 900Hz.
2) to buf d(n) carry out center clipping, obtain buf c(n),
buf c ( n ) = buf d ( n ) - C L , buf d ( n ) > C L buf d ( n ) + C L , buf d ( n ) < - C L 0 , | buf d ( n ) | &le; C L , - - - ( 28 )
C wherein lfor clipping lever, usually be made as normalization data peaked 68%.
3) to buf c(n) carry out auto-correlation computation, the autocorrelative maximum value position of search in (96,576) scope, using it as pitch period estimated value Pitch.
r buf c ( n ) = &Sigma; m = 0 2 PMAX - 1 buf c ( m ) buf c ( m + n ) , PMIN &le; n &le; PMAX - - - ( 29 )
Pitch = arg max PMIN &le; n &le; PMAX r buf c ( n ) - - - ( 30 )
4) for preventing that frequency multiplication from occurring, use formula (13) to front cross frame pitch period predicted value Pitch (p-1)and Pitch (p-2)carry out smoothing processing,
Figure BDA0000367351410000064
Predict present frame pitch period Pitch according to two pitch periods after level and smooth (p),
Pitch (p)=Pitch (p-1)+(Pitch (p-1)-Pitch (p-2))。(32)
(4) waveform reconstruction:
Extract last pitch cycle waveform of former frame, it is carried out to linear interpolation, obtain new pitch cycle waveform.
1) owing to storing output frame data in buf (n), so can from buf (n), extract the pitch cycle waveform of former frame, the i.e. last Pitch of former frame output signal (p-1)individual, its Wave data is designated as to pw (p-1)(n).To pw (p-1) (n) carry out linear interpolation, obtaining length is Pitch (p)new waveform, be designated as pw (p)(n).As shown in Figure 8, interpolation formula is the linear interpolation of point-to-point transmission
pw ( p ) ( n &prime; ) = ( pitch ( p - 1 ) pitch ( p ) &CenterDot; n &prime; - n + 1 ) &CenterDot; [ pw ( p - 1 ) ( n ) - pw ( p - 1 ) ( n - 1 ) ] + pw ( p - 1 ) , n - 1 &le; pitct ( p - 1 ) pitch ( p ) &CenterDot; n &prime; < n - - - ( 33 )
2) using new waveform to carry out wave period copies:
D. the principle that wave period copies as Fig. 9 (a) to Fig. 9 (c): if present frame is noise frame (no matter former frame is noise frame or clean speech frame), processing procedure is: according to formula (15), AB segment data in buf (n) and CD segment data are carried out to overlap-add, and the processing of being fade-in fade-out, there is continuity with the data that guarantee the D both sides,
buf CD(n)=α·buf CD(n)+(1-α)·buf AB(n) (34)
=α·buf CD(n)+(1-α)·buf CD(n-Pitch)0≤n<N 1
&alpha; = N 1 - i N 1 , i = 0,1 , &CenterDot; &CenterDot; &CenterDot; , N 1 - 1 , - - - ( 35 )
Wherein, α is decay factor, from 1 linear attenuation to 0; AB section and CD segment data length N 1=32.
E. according to cycle Pitch (p), with new waveform pw (p)(n) constantly copy in the DF zone.Wherein, the DE section is the present frame after repairing; EF segment data length is N 2=32, its role is to, when next frame is speech frame, for data, be fade-in fade-out, to guarantee that the E two ends are the continuity between frame and frame.
F. export frame data that start with the C point in buf (n).The method output exists and postpones, and be the CD segment length time delay.Again by all data reaches of buf (n) N point (frame length).
To as shown in Figure 10 (c), when Figure 10 (a) is multi-frame to be repaired for present frame, present frame is abandoned to the signal diagram as Figure 10 (a); The current frame signal of Figure 10 (b) for using this patent method to rebuild; Figure 10 (c) is signal after repairing.If present frame is the clean speech frame, and former frame is noise frame, processing procedure is as follows,
D. now in buf (n) the DG segment data be the EF segment data of previous frame.Front N by DG section and present frame input 2individual data point is carried out data fusion (calculating with formula (15) similar), stores in DG.
E. after the strong point slavish copying of present frame remainder being arrived to the G point in buf (n).
F. export the data of the frame length started with the C point, then by all data of buf (n) the move forward data length of a frame signal, i.e. N point.
If present frame and former frame are all the clean speech frame, present frame is inputted to the data slavish copying to buf (n) middle area to be repaired, i.e. DE zone in Fig. 8; The data of the frame length that output starts with the C point.
The beneficial effect that technical solution of the present invention is brought:
Use 20 first clean speech audio frequency (comprising man, woman, children speech audio frequency) and the noise audio frequency of 4 types to be tested, noise type is respectively: mouse sound, knock, metronome sound, keyboard sound.The duration of four kinds of noises is respectively: mouse sound is 10ms, and knock, metronome sound are 20ms, and keyboard sound is 30ms.The pure audio frequency of every head is added respectively to this 4 kinds of noises, obtain 80 first noisy audio frequency.It is 30 that every first audio frequency adds the number of noise, and the distance between noise equates.
The sampling rate of all audio frequency is f s=48kHz, frame length is N=480.The MFCC calculation stages, be NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 dimension MFCC; The transient noise detection-phase, adaptive threshold is set to Thres=constener, for making thresholding, is applicable to all noises, and constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; The pitch period estimation stages, search pitch period in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, the points N of being fade-in fade-out 1, N 2be 32, buffer zone buf (n) length is 2240.
After using the present invention to carry out denoising to noisy speech, increase substantially the intelligibility of voice, reduced hearer's sense of fatigue.Use segmental signal-to-noise ratio SNR segwith two kinds of indexs of PEAQ, this method denoising effect is assessed, wherein the segmental signal-to-noise ratio computing method are
SNR seg in = 1 R &Sigma; i = 1 R 10 log 10 &Sigma; n &Element; frame i | s ( n ) | 2 &Sigma; n &Element; frame i | x ( n ) - s ( n ) | 2 , - - - ( 36 )
SNR seg out = 1 R &Sigma; i = 1 R 10 log 10 &Sigma; n &Element; frame i | s ( n ) | 2 &Sigma; n &Element; frame i | s ^ ( n ) - s ( n ) | 2 , - - - ( 37 )
By two kinds of indexs, this method denoising effect is assessed, result as shown in Figure 12 and Figure 13, Figure 12 for before using signal to noise ratio (S/N ratio) to the signals and associated noises denoising with denoising after objective audio quality compare; Figure 13 for before using PEAQ to the signals and associated noises denoising with denoising after objective audio quality compare.
Signals and associated noises and sound spectrograph with signal after this programme denoising as Figure 11 (a) and Figure 11 (b) and as shown in; Gray-scale map can better be understood technique effect of the present invention, and the spy provides gray-scale map that technique effect of the present invention is described.In order to allow the clearer understanding of auditor technique effect spy of the present invention that gray-scale map is provided, be that Figure 11 (a) and Figure 11 (b) illustrate technique effect of the present invention.For your guidance.Figure 11 (a) is for being subject to mouse to click the sound spectrograph of the audio frequency of sound pollution; Figure 11 (b) is for to frequently carrying out the audio frequency sound spectrograph after denoising with noise shown in Figure 11 (a).
The above; it is only preferably embodiment of the present invention; but protection scope of the present invention is not limited to this; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to replacement or changed according to technical scheme of the present invention and inventive concept thereof, within all should being encompassed in protection scope of the present invention.
The abbreviation the present invention relates to and Key Term definition
AR:Autoregressive Model, autoregressive model.
DCT:Discrete Cosine Transform, discrete cosine transform.
FFT:Fast Fourier Transform, Fast Fourier Transform (FFT).
LPF:Low Pass Filter, low-pass filter.
LPCC:Linear Prediction Cepstrum Coefficient, the linear prediction cepstrum coefficient coefficient.
MFCC:Mel Frequency Cepstrum Coefficient, the Mel cepstrum coefficient.
VoIP:Voice over IP, the voice based on IP network.
PLC:Packet Loss Concealment, letter packet loss concealment algorithm.
PWR:Pitch Waveform Replication, pitch cycle waveform copies.
SNR:Signal_to_Noise Ratio, signal to noise ratio (S/N ratio).
PEAQ:Perceptual Evaluation of Audio Quality, a kind of standard of the objective evaluation for the audio quality perception of ITU-R BS.1387 suggestion.

Claims (5)

1. the denoising method of transient noise, it is characterized in that: the Mel cepstrum coefficient that at first calculates this frame signal, predict the pitch period of this frame signal simultaneously, whether then with the Mel cepstrum coefficient, detect this frame signal exists noise to carry out walkaway, if there is noise, by the pitch period predicted value, carry out waveform reconstruction.
2. the denoising method of transient noise according to claim 1, it is characterized in that: Mel cepstrum coefficient computing method are as follows:
1) input signal divides frame, and it is that data length is 10ms that frame length is made as N=480, and data are carried out to normalization; If current frame signal is the p frame signal, have
x (p)(n)=x[p·(N-1)+n],n=0,1,…,N-1; (1)
2) pre-service, carry out pre-emphasis and windowing process to current frame signal,
y (p)(n)=x (p)(n)-βx (p)(n-1); (2)
Figure FDA0000367351400000011
Pre-emphasis factor-beta=0.938 wherein; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n π/N);
3) pretreated signal is done to N=1024 point FFT, obtain frequency-region signal Y (p)(k);
4) calculate frequency-region signal Y (p)(k) energy spectrum | Y (p)(k) | 2;
5) energy spectrum of frequency-region signal is passed through to the triangle filter group H of one group of Mel scale, carry out frequency domain filtering;
In bank of filters, M wave filter arranged, each wave filter is triangular filter, overlapped between wave filter, the centre frequency of each wave filter is f (m), m=1,2 ..., M, M=24;
Filter design method: by input signal end frequency f s/ 2, i.e. 24kHz, through type
Figure FDA0000367351400000012
In formula, f is frequency, and unit is Hz; Transform to Mel scale frequency territory, obtain F smel; By interval (0, F smel) be divided into 25 parts, remove 0 and F smeltwo end points, 24 remaining cut-points are respectively as the centre frequency of 24 wave filters; Each cut-point f (m) is evenly distributed in the Mel scale frequency, then through type (1) transforms to linear frequency scale; After conversion, the interval between f (m) dwindles along with reducing of m value, the broadening along with the increase of m value; According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
Figure FDA0000367351400000021
6) calculate energy and the logarithm of each filters H (m, k) output, obtain E (m),
Figure FDA0000367351400000022
E (m) is done to discrete cosine transform, can obtain L=12 rank MFCC, be designated as C (l)
(6)
Figure FDA0000367351400000024
3. the denoising method of transient noise according to claim 1, it is characterized in that: the process of walkaway is as follows:
Euclidean distance dist between the MFCC of calculating current frame signal and the MFCC of former frame signal
Figure FDA0000367351400000025
Judge according to distance value and threshold T hres whether present frame contains noise; Threshold T hres is determined by the following formula self-adaptation
Thres=10·ener, (8)
Wherein ener is the energy after each frame signal normalization, and its minimum value is made as to 60.0; After detection completes, upgrade the MFCC feature of present frame,
C (p)(l)=b·C (p-1)(l)+(1-b)·C (p-1)(l), (9)
Forgetting factor b=0.4 wherein; When the next frame of noise frame is speech frame, this update method can prevent flase drop.
4. the denoising method of transient noise according to claim 1 is characterized in that: the method for pitch period prediction is as follows:
1) buf (n) is carried out to low-pass filtering, obtain buf d(n); Wherein the cutoff frequency of low-pass filter (LPF) is 900Hz;
2) to buf d(n) carry out center clipping, obtain buf c(n),
Figure FDA0000367351400000031
C wherein lfor clipping lever, usually be made as normalization data peaked 68%;
3) to buf c(n) carry out auto-correlation computation, the autocorrelative maximum value position of search in (96,576) scope, using it as pitch period estimated value Pitch;
Figure FDA0000367351400000032
Figure FDA0000367351400000033
4) for preventing that frequency multiplication from occurring, use formula (13) to front cross frame pitch period predicted value Pitch (p-1)and Pitch (p-2)carry out smoothing processing,
Figure FDA0000367351400000034
Predict present frame pitch period Pitch according to two pitch periods after level and smooth (p),
Pitch (p)=Pitch (p-1)+(Pitch (p-1)-Pitch (p-2)) (14) 。
5. the denoising method of transient noise according to claim 1, it is characterized in that: the method for waveform reconstruction is:
1) owing to storing output frame data in buf (n), so can from buf (n), extract the pitch cycle waveform of former frame, the i.e. last Pitch of former frame output signal (p-1)individual, its Wave data is designated as to pw (p-1)(n); To pw (p-1)(n) carry out linear interpolation, obtaining length is Pitch (p)new waveform, be designated as pw (p)(n) interpolation formula is
Figure FDA0000367351400000035
(15)
Figure FDA0000367351400000036
2) using new waveform to carry out wave period copies; It is as follows that new waveform carries out the method that wave period copies:
If a. present frame is no matter that noise frame and former frame are noise frame or clean speech frame, processing procedure is: according to formula (15), AB segment data in buf (n) and CD segment data are carried out to overlap-add, and the processing of being fade-in fade-out, there is continuity with the data that guarantee the D both sides,
buf CD(n)=α·buf CD(n)+(1-α)·buf AB(n) (16)
=α·buf CD(n)+(1-α)·buf CD(n-Pitch)0≤n<N 1
Figure FDA0000367351400000042
Wherein, α is decay factor, from 1 linear attenuation to 0; AB section and CD segment data length N 1=32;
B. according to cycle Pitch (p), with new waveform pw (p)(n) constantly copy in the DF zone; Wherein, the DE section is the present frame after repairing; EF segment data length is N 2=32, its role is to, when next frame is speech frame, for data, be fade-in fade-out, to guarantee that the E two ends are the continuity between frame and frame;
C. export frame data that start with the C point in buf (n); The method output exists and postpones, and be the CD segment length time delay, then by all data reaches of buf (n) N point;
If present frame is the clean speech frame, and former frame is noise frame, processing procedure is as follows,
A. now in buf (n) the DG segment data be the EF segment data of previous frame; Front N by DG section and present frame input 2it is similar with formula (15) that individual data point is carried out its calculating of data fusion, stores in DG;
B. after the strong point slavish copying of present frame remainder being arrived to the G point in buf (n);
C. export the data of the frame length started with the C point, then be a frame length by all data reaches of buf (n) N point; If present frame and former frame are all the clean speech frame, present frame is inputted to the data slavish copying to the middle area to be repaired of buf (n); The data of the frame length that output starts with the C point.
CN201310357211.6A 2013-08-15 2013-08-15 The denoising method of transient state noise Expired - Fee Related CN103440872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310357211.6A CN103440872B (en) 2013-08-15 2013-08-15 The denoising method of transient state noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310357211.6A CN103440872B (en) 2013-08-15 2013-08-15 The denoising method of transient state noise

Publications (2)

Publication Number Publication Date
CN103440872A true CN103440872A (en) 2013-12-11
CN103440872B CN103440872B (en) 2016-06-01

Family

ID=49694563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310357211.6A Expired - Fee Related CN103440872B (en) 2013-08-15 2013-08-15 The denoising method of transient state noise

Country Status (1)

Country Link
CN (1) CN103440872B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103745729A (en) * 2013-12-16 2014-04-23 深圳百科信息技术有限公司 Audio de-noising method and audio de-noising system
CN103778914A (en) * 2014-01-27 2014-05-07 华南理工大学 Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN104157295A (en) * 2014-08-22 2014-11-19 中国科学院上海高等研究院 Method used for detecting and suppressing transient noise
CN105830152A (en) * 2014-01-28 2016-08-03 三菱电机株式会社 Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system
CN106652624A (en) * 2016-10-12 2017-05-10 大连文森特软件科技有限公司 Medical surgery simulation system based on VR technology and transient noise removal technology
CN108182953A (en) * 2017-12-27 2018-06-19 上海传英信息技术有限公司 Audio tail portion POP voice handling methods and device
CN108899043A (en) * 2018-06-15 2018-11-27 深圳市康健助力科技有限公司 The research and realization of digital deaf-aid instantaneous noise restrainable algorithms
CN109346105A (en) * 2018-07-27 2019-02-15 南京理工大学 Directly display the pitch period spectrogram method of pitch period track
CN110010145A (en) * 2019-02-28 2019-07-12 广东工业大学 A method of eliminating electronic auscultation device grating
CN110703144A (en) * 2019-09-08 2020-01-17 广东石油化工学院 Transformer operation state detection method and system based on discrete cosine transform
CN111081269A (en) * 2018-10-19 2020-04-28 中国移动通信集团浙江有限公司 Noise detection method and system in call process
CN114333880A (en) * 2022-03-04 2022-04-12 南京大鱼半导体有限公司 Signal processing method, device, equipment and storage medium
CN115063895A (en) * 2022-06-10 2022-09-16 深圳市智远联科技有限公司 Ticket selling method and system based on voice recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243644A (en) * 2005-03-07 2006-09-14 Nippon Telegr & Teleph Corp <Ntt> Method for reducing noise, device, program, and recording medium
CN1956058A (en) * 2005-10-17 2007-05-02 哈曼贝克自动系统-威美科公司 Minimization of transient noises in a voice signal
US20080183466A1 (en) * 2007-01-30 2008-07-31 Rajeev Nongpiur Transient noise removal system using wavelets

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243644A (en) * 2005-03-07 2006-09-14 Nippon Telegr & Teleph Corp <Ntt> Method for reducing noise, device, program, and recording medium
CN1956058A (en) * 2005-10-17 2007-05-02 哈曼贝克自动系统-威美科公司 Minimization of transient noises in a voice signal
US20080183466A1 (en) * 2007-01-30 2008-07-31 Rajeev Nongpiur Transient noise removal system using wavelets

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张兆伟: "语音中瞬态噪声抑制算法研究", 《CNKI中国知网》 *
张兆伟: "语音中瞬态噪声抑制算法研究", 《CNKI中国知网》, 1 May 2013 (2013-05-01) *
韩丨等: "基于MFCC的语音情感识别", 《重庆邮电大学学报(自然科学版)》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103745729B (en) * 2013-12-16 2017-01-04 深圳百科信息技术有限公司 A kind of audio frequency denoising method and system
CN103745729A (en) * 2013-12-16 2014-04-23 深圳百科信息技术有限公司 Audio de-noising method and audio de-noising system
CN103778914A (en) * 2014-01-27 2014-05-07 华南理工大学 Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN103778914B (en) * 2014-01-27 2017-02-15 华南理工大学 Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching
CN105830152A (en) * 2014-01-28 2016-08-03 三菱电机株式会社 Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system
CN104157295A (en) * 2014-08-22 2014-11-19 中国科学院上海高等研究院 Method used for detecting and suppressing transient noise
CN104157295B (en) * 2014-08-22 2018-03-09 中国科学院上海高等研究院 For detection and the method for transient suppression noise
CN106652624B (en) * 2016-10-12 2019-05-24 快创科技(大连)有限公司 A kind of medical operating analogue system based on VR technology and transient noise noise-removed technology
CN106652624A (en) * 2016-10-12 2017-05-10 大连文森特软件科技有限公司 Medical surgery simulation system based on VR technology and transient noise removal technology
CN108182953B (en) * 2017-12-27 2021-03-16 上海传英信息技术有限公司 Audio tail POP sound processing method and device
CN108182953A (en) * 2017-12-27 2018-06-19 上海传英信息技术有限公司 Audio tail portion POP voice handling methods and device
CN108899043A (en) * 2018-06-15 2018-11-27 深圳市康健助力科技有限公司 The research and realization of digital deaf-aid instantaneous noise restrainable algorithms
CN109346105A (en) * 2018-07-27 2019-02-15 南京理工大学 Directly display the pitch period spectrogram method of pitch period track
CN109346105B (en) * 2018-07-27 2022-04-15 南京理工大学 Pitch period spectrogram method for directly displaying pitch period track
CN111081269A (en) * 2018-10-19 2020-04-28 中国移动通信集团浙江有限公司 Noise detection method and system in call process
CN110010145A (en) * 2019-02-28 2019-07-12 广东工业大学 A method of eliminating electronic auscultation device grating
CN110010145B (en) * 2019-02-28 2021-05-11 广东工业大学 Method for eliminating friction sound of electronic stethoscope
CN110703144A (en) * 2019-09-08 2020-01-17 广东石油化工学院 Transformer operation state detection method and system based on discrete cosine transform
CN110703144B (en) * 2019-09-08 2021-07-09 广东石油化工学院 Transformer operation state detection method and system based on discrete cosine transform
CN114333880A (en) * 2022-03-04 2022-04-12 南京大鱼半导体有限公司 Signal processing method, device, equipment and storage medium
CN115063895A (en) * 2022-06-10 2022-09-16 深圳市智远联科技有限公司 Ticket selling method and system based on voice recognition

Also Published As

Publication number Publication date
CN103440872B (en) 2016-06-01

Similar Documents

Publication Publication Date Title
CN103440872B (en) The denoising method of transient state noise
CN103440871B (en) A kind of method that in voice, transient noise suppresses
CN103854662B (en) Adaptive voice detection method based on multiple domain Combined estimator
KR100330230B1 (en) Noise suppression for low bitrate speech coder
EP2151822A1 (en) Apparatus and method for processing and audio signal for speech enhancement using a feature extraction
CN105023572A (en) Noised voice end point robustness detection method
EP3411876B1 (en) Babble noise suppression
CN106885971B (en) Intelligent background noise reduction method for cable fault detection pointing instrument
CN106340292A (en) Voice enhancement method based on continuous noise estimation
CN104658544A (en) Method for inhibiting transient noise in voice
CN110047470A (en) A kind of sound end detecting method
CN103109320A (en) Noise suppression device
Verteletskaya et al. Noise reduction based on modified spectral subtraction method
CN103474074B (en) Pitch estimation method and apparatus
Hu et al. A cepstrum-based preprocessing and postprocessing for speech enhancement in adverse environments
US20160365099A1 (en) Method and system for consonant-vowel ratio modification for improving speech perception
CN103578477A (en) Denoising method and device based on noise estimation
EP2362390A1 (en) Noise suppression
CN106297795A (en) Audio recognition method and device
Shome et al. Reference free speech quality estimation for diverse data condition
Kamble et al. Teager energy subband filtered features for near and far-field automatic speech recognition
Jebara A perceptual approach to reduce musical noise phenomenon with wiener denoising technique
Flynn et al. Combined speech enhancement and auditory modelling for robust distributed speech recognition
Yuan et al. Noise estimation based on time–frequency correlation for speech enhancement
Lu Reduction of musical residual noise using block-and-directional-median filter adapted by harmonic properties

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160601