CN103440872A - Transient state noise removing method - Google Patents
Transient state noise removing method Download PDFInfo
- Publication number
- CN103440872A CN103440872A CN2013103572116A CN201310357211A CN103440872A CN 103440872 A CN103440872 A CN 103440872A CN 2013103572116 A CN2013103572116 A CN 2013103572116A CN 201310357211 A CN201310357211 A CN 201310357211A CN 103440872 A CN103440872 A CN 103440872A
- Authority
- CN
- China
- Prior art keywords
- frame
- buf
- pitch
- data
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 230000001052 transient effect Effects 0.000 title claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000001914 filtration Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 7
- 238000001228 spectrum Methods 0.000 claims description 6
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 238000009499 grossing Methods 0.000 claims description 2
- 238000002620 method output Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 8
- 230000005236 sound signal Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010019133 Hangover Diseases 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002386 leaching Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The invention discloses a transient state noise removing method and belongs to the technical field of signal processing. The transient state noise removing method comprises the steps of calculating the Mel-frequency cepstrum coefficient of current frame signals, and predicting the pitch period of the current frame signals; utilizing the Mel-frequency cepstrum coefficient to detect whether noise exist in the current frame signals, and utilizing the pitch period predicting value to rebuild the current frame signals if noise exists.
Description
Technical field
The present invention relates to the denoising method of transient noise, belong to the signal processing technology field.
Background technology
Transient state additive noise in sound signal, also referred to as transient noise, or impulsive noise.Usually, transient noise in time domain, be discontinuous, intermittently, pulsed, noise energy mainly concentrates in shorter time interval, in this interval, the energy of the energy Ratios purified signal of transient noise is obviously much larger.Typical transient noise impacts sound etc. as desk knock, the sound of closing the door, brouhaha, keyboard keystroke sound, mousebutton sound, hammer, and they often appear at a lot of application scenarios, as osophone, mobile phone, video conference equipment etc.The existence of transient noise has a strong impact on audio quality, therefore, is necessary to take measures transient noise is suppressed, to strengthen the quality of audio frequency.Current noise suppression algorithm is for steady-state noise and continuing noise situation mostly, usually use method described in document " research of voice enhancing and correlation technique thereof " to carry out the voice enhancing, as spectrum-subtraction, auto adapted filtering method etc., but these algorithms are helpless to above-mentioned transient noise, substantially there is no inhibition.
Summary of the invention
The present invention is directed to the proposition of above problem, and the denoising method of development transient noise.
The technical scheme that the present invention takes is: the Mel cepstrum coefficient that at first calculates this frame signal, predict the pitch period of this frame signal simultaneously, whether then with the Mel cepstrum coefficient, detect this frame signal exists noise to carry out walkaway, if there is noise, by the pitch period predicted value, carry out waveform reconstruction.
Beneficial effect of the present invention: use 20 first clean speech audio frequency (comprising man, woman, children speech audio frequency) and the noise audio frequency of 4 types to be tested, noise type is respectively: mouse sound, knock, metronome sound, keyboard sound.The duration of four kinds of noises is respectively: mouse sound is 10ms, and knock, metronome sound are 20ms, and keyboard sound is 30ms.The pure audio frequency of every head is added respectively to this 4 kinds of noises, obtain 80 first noisy audio frequency.It is 30 that every first audio frequency adds the number of noise, and the distance between noise equates.The sampling rate of all audio frequency is f
s=48kHz, frame length is N=480.The MFCC calculation stages, be NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 dimension MFCC; The transient noise detection-phase, adaptive threshold is set to Thres=constener, for making thresholding, is applicable to all noises, and constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; The pitch period estimation stages, search pitch period in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, the points N of being fade-in fade-out
1, N
2be 32, buffer zone buf (n) length is 2240.After using the present invention to carry out denoising to noisy speech, increase substantially the intelligibility of voice, reduced hearer's sense of fatigue.Use segmental signal-to-noise ratio SNR
segwith two kinds of indexs of PEAQ, this method denoising effect being carried out to assessment result sees shown in the Figure 12 and Figure 13 in the accompanying drawing explanation.
The accompanying drawing explanation
The relation of Fig. 1 Mel frequency and linear frequency.
The technical scheme flow process of Fig. 2 prior art one.
The technical scheme flow process of Fig. 3 prior art two.
Fig. 4 the technical program block diagram.
Fig. 5 MFCC feature extraction block diagram.
Fig. 6 Mel frequency filter group.
Fig. 7 pitch period is estimated block diagram.
The linear interpolation of Fig. 8 point-to-point transmission.
Signal when Fig. 9 (a) present frame is not repaired.
The new pitch cycle waveform pw of Fig. 9 (b)
(p)(n).
Signal after the reparation of Fig. 9 (c) present frame.
Signal when Figure 10 (a) present frame is not repaired.
Figure 10 (b) current frame signal.
Signal after Figure 10 (c) repairs.
Figure 11 (a) denoising front signal.
Signal after Figure 11 (b) denoising.
Figure 12 denoising effect evaluation form (SNR).
Figure 13 denoising effect assessment (PEAQ).
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described:
The Mel cepstrum coefficient:
Research to people's hearing mechanism finds, people's ear has different ear sensitivities to the sound wave of different frequency, and to the voice signal between 5kHz, the sharpness to voice has the greatest impact at 200Hz.In addition, people's ear has masking effect, and the voice signal that energy is large has certain effect of covering to weak voice signal.Usually, the audio frequency of the audio masking upper frequency of lower frequency is easy, otherwise more difficult, that is to say, little at the critical bandwidth higher-frequency end of the sound mask at low frequency place.Accordingly, people according to the size of critical bandwidth by close to one group of bandpass filter of rare arrangement, input signal is carried out to filtering.If by the signal essential characteristic of the energy of each bandpass filter output signal, after this feature further being processed, just can be used as the feature of voice, Here it is Mel cepstrum coefficient (MFCC).This feature does not rely on the character of signal, input signal is not done to any hypothesis and restriction, utilized again the auditory perception property of people's ear simultaneously, therefore, with the linear prediction cepstrum coefficient coefficient (LPCC) based on channel model, compare, it has better robustness, and, when signal to noise ratio (S/N ratio) is low, still has speech recognition performance preferably.
MFCC is the cepstrum parameter extracted in Mel scale frequency territory.The Mel scale has been described the nonlinear characteristic of people's ear frequency, but the relation approximate representation of it and frequency is
In formula, f is frequency, and unit is Hz.Be the relation of Mel frequency and linear frequency shown in Fig. 1, along with f
linearlinear growth, f
melthe form of logarithm increases.
The letter packet loss concealment:
At the voice communication system based on the IP agreement, in the voice based on IP network (VoIP), due to network congestion or transmitting procedure delay jitter, can cause the letter packet loss, be that some letter bag can not appear at receiving end on time, have a strong impact on the voice quality of receiving end.Therefore, at receiving end, must take some measures, to reduce the voice distortion caused because of the letter packet loss.Usually, the measure of this processing packet loss problem is called letter packet loss concealment algorithm (PLC) algorithm.
The PLC algorithm mainly is divided into Processing Algorithm and two classes of the Processing Algorithm based on receiving end based on transmitting terminal.Based on transmitting terminal PLC algorithm, by the sending and receiving two ends, jointly participated in; Based on receiving end PLC algorithm, the letter bag only normally received according to receiving end, the coded system of losing the letter packet number and knowing in advance, recover original voice as far as possible.Because the PLC technology based on receiving end does not need the relevant data of transmitting terminal, so can not increase flow and the time delay of network.PLC method based on receiving end commonly used has quiet alternative method, last letter bag repetition methods, template matching method, pitch waveform clone method and linear prediction method etc.
Pitch cycle waveform in this paper copies (PWR) method, belongs to the PLC method based on receiving end.
Prior art one related to the present invention
The technical scheme of prior art one
What will is brave waits in paper " voice based on Kalman filtering under impulse noise environment strengthen ", has proposed the sound enhancement method under a kind of transient noise environment.Whether the process flow diagram of the method as shown in Figure 2, is at first found out the frequency range of transient noise sample energy and the ratio maximum of signals and associated noises sample energy, then utilizes the energy distribution situation of this frequency range, differentiate frame by frame voice signal and disturbed by transient noise; On this basis, the speech frame that the method is disturbed for transient noise, the application Kalman filtering algorithm carries out denoising; In addition, the method is improved autoregression (AR) model parameter estimation process.
The shortcoming of prior art one
(1), for the longer noise that trails, the hangover part likely detects not out.
(2) when denoising, Kalman filtering used is applicable to steady-state noise is carried out to denoising, be not suitable for the transient noise of non-stationary, so denoising effect is limited, and noise is residual more, has affected voice quality.
Prior art two related to the present invention
The technical scheme of prior art two
Hetherington etc., in patent of invention " Repetitive transient noise removal ", propose a kind of transient noise inhibition method.The process flow diagram of Hetherington method as shown in Figure 3.The method is first carried out modeling according to noise behavior, then utilizes the related coefficient of modeling signal and signal to be detected to determine whether data to be tested contain noise, if there is noise, according to the modeling signal, the noise contribution in signal to be detected is removed.
The shortcoming of prior art two
The Hetherington method can be carried out denoising to the noise repeated effectively, but because the transient noise type is varied, while having the transient noise of number of different types within the short time, can cause modeling inaccurate, now the denoising effect of Hetherington method is poor.
Elaborating of technical solution of the present invention
Technical matters to be solved by this invention
The audio frequency that transient noise is disturbed carries out the voice enhancing, and the transient suppression noise improves voice quality, improves the audio frequency intelligibility.
Complete skill scheme provided by the invention:
Fig. 4 is shown in by the technical solution of the present invention block diagram: utilize input audio signal, extract the MFCC parameter; Then detect in sound signal whether contain noise by the MFCC parameter; If testing result, for containing noise, is replaced noisy frame data by the PWR method, carry out waveform reconstruction; If testing result is Noise not, former state output of sound signal.
The technical solution of the present invention performing step:
The sampling rate of input monophonic audio signal is f
s=48kHz.Input noisy sound signal x (n) and can be expressed as x (n)=s (n)+d (n), wherein s (n) means the clean speech signal, and d (n) means the transient noise signal.
(1) the MFCC feature extraction of sound signal
As shown in Figure 5, gray-scale map can better be understood technique effect of the present invention to the leaching process of MFCC, and the spy provides gray-scale map that technique effect of the present invention is described.In order to allow the clearer understanding of auditor technique effect spy of the present invention provide gray-scale map Fig. 5 that technique effect of the present invention is described.For your guidance.At first time-domain audio signal is carried out to time-frequency conversion, calculate its energy spectrum; Then the triangle filter group of this energy spectrum and Mel scale is multiplied each other, then the logarithm energy of multiplied result is done to discrete cosine transform (DCT), the front L dimensional vector obtained like this is called MFCC, calculates the concrete steps of MFCC:
1) input signal divides frame, and frame length is made as 10ms, because sample frequency is f
s=48kHz, so the data length of a frame is the N=480 point; Then data are carried out to normalization: if the signal quantization figure place is 16bit, by data divided by 2
15, the scope of data is narrowed down to (1,1), complete the normalization of data.If current frame signal is the p frame signal, have
x
(p)(n)=x[p·(N-1)+n],n=0,1,…,N-1 (19)
2) pre-service.Current frame signal is carried out to pre-emphasis and windowing process,
y
(p)(n)=x
(p)(n)-βx
(p)(n-1) (20)
Pre-emphasis factor-beta=0.938 wherein; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n π/N).
3) pretreated signal is done to N=1024 point FFT, obtain frequency-region signal Y
(p)(k).
4) calculate frequency-region signal Y
(p)(k) energy spectrum | Y
(p)(k) |
2.
5) energy spectrum of frequency-region signal is passed through to the triangle filter group H of one group of Mel scale, carry out frequency domain filtering.
In bank of filters, M wave filter arranged, each wave filter is triangular filter, overlapped between wave filter, as shown in Figure 6: the centre frequency of each wave filter is f (m), m=1,2 ..., M, the present invention gets M=24.Filter design method: by input signal end frequency f
s/ 2, i.e. 24kHz, through type (1) transforms to Mel scale frequency territory, obtains F
smel; By interval (0, F
smel) be divided into 25 parts, remove 0 and F
smeltwo end points, 24 remaining cut-points are respectively as the centre frequency of 24 wave filters.Each cut-point f (m) is evenly distributed in the Mel scale frequency, then through type (1) transforms to linear frequency scale.After conversion, the interval between f (m) dwindles along with reducing of m value, the broadening along with the increase of m value.
According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
6) calculate energy and the logarithm of each filters H (m, k) output, obtain E (m),
E (m) is done to discrete cosine transform, can obtain L=12 rank MFCC, be designated as C (l)
(2) walkaway:
Euclidean distance dist between the MFCC of calculating current frame signal and the MFCC of former frame signal
Judge according to distance value and threshold T hres whether present frame contains noise.Threshold T hres is determined by the following formula self-adaptation
Thres=10·ener, (26)
Wherein ener is the energy after each frame signal normalization, and its minimum value is made as to 60.0.
After detection completes, upgrade the MFCC feature of present frame,
C
(p)(l)=b·C
(p-1)(l)+(1-b)·C
(p-1)(l), (27)
Forgetting factor b=0.4 wherein.When the next frame of noise frame is speech frame, this update method can prevent flase drop.
(3) pitch period prediction:
Each frame voice signal is estimated to pitch period.If present frame is noise frame, according to the pitch period of front cross frame signal, predict the present frame pitch period.Pitch period is estimated block diagram as shown in Figure 7: for different speakers, pitch period is generally in 2-12ms, and therefore, this paper searches for pitch period in 2-12ms.If PMAX is the corresponding data amount check of 12ms, i.e. PMAX=576; PMIN is the corresponding data amount check of 2ms, i.e. PMIN=96.Use the buffer zone buf (n) that length is 3PMAX+N=2208 to estimate pitch period, wherein buffer zone buf (n) is used for storing the data of having exported.
The pitch period method of estimation is as follows:
1) buf (n) is carried out to low-pass filtering, obtain buf
d(n).Wherein the cutoff frequency of low-pass filter (LPF) is 900Hz.
2) to buf
d(n) carry out center clipping, obtain buf
c(n),
C wherein
lfor clipping lever, usually be made as normalization data peaked 68%.
3) to buf
c(n) carry out auto-correlation computation, the autocorrelative maximum value position of search in (96,576) scope, using it as pitch period estimated value Pitch.
4) for preventing that frequency multiplication from occurring, use formula (13) to front cross frame pitch period predicted value Pitch
(p-1)and Pitch
(p-2)carry out smoothing processing,
Predict present frame pitch period Pitch according to two pitch periods after level and smooth
(p),
Pitch
(p)=Pitch
(p-1)+(Pitch
(p-1)-Pitch
(p-2))。(32)
(4) waveform reconstruction:
Extract last pitch cycle waveform of former frame, it is carried out to linear interpolation, obtain new pitch cycle waveform.
1) owing to storing output frame data in buf (n), so can from buf (n), extract the pitch cycle waveform of former frame, the i.e. last Pitch of former frame output signal
(p-1)individual, its Wave data is designated as to pw
(p-1)(n).To pw
(p-1) (n) carry out linear interpolation, obtaining length is Pitch
(p)new waveform, be designated as pw
(p)(n).As shown in Figure 8, interpolation formula is the linear interpolation of point-to-point transmission
2) using new waveform to carry out wave period copies:
D. the principle that wave period copies as Fig. 9 (a) to Fig. 9 (c): if present frame is noise frame (no matter former frame is noise frame or clean speech frame), processing procedure is: according to formula (15), AB segment data in buf (n) and CD segment data are carried out to overlap-add, and the processing of being fade-in fade-out, there is continuity with the data that guarantee the D both sides,
buf
CD(n)=α·buf
CD(n)+(1-α)·buf
AB(n) (34)
=α·buf
CD(n)+(1-α)·buf
CD(n-Pitch)0≤n<N
1
Wherein, α is decay factor, from 1 linear attenuation to 0; AB section and CD segment data length N
1=32.
E. according to cycle Pitch
(p), with new waveform pw
(p)(n) constantly copy in the DF zone.Wherein, the DE section is the present frame after repairing; EF segment data length is N
2=32, its role is to, when next frame is speech frame, for data, be fade-in fade-out, to guarantee that the E two ends are the continuity between frame and frame.
F. export frame data that start with the C point in buf (n).The method output exists and postpones, and be the CD segment length time delay.Again by all data reaches of buf (n) N point (frame length).
To as shown in Figure 10 (c), when Figure 10 (a) is multi-frame to be repaired for present frame, present frame is abandoned to the signal diagram as Figure 10 (a); The current frame signal of Figure 10 (b) for using this patent method to rebuild; Figure 10 (c) is signal after repairing.If present frame is the clean speech frame, and former frame is noise frame, processing procedure is as follows,
D. now in buf (n) the DG segment data be the EF segment data of previous frame.Front N by DG section and present frame input
2individual data point is carried out data fusion (calculating with formula (15) similar), stores in DG.
E. after the strong point slavish copying of present frame remainder being arrived to the G point in buf (n).
F. export the data of the frame length started with the C point, then by all data of buf (n) the move forward data length of a frame signal, i.e. N point.
If present frame and former frame are all the clean speech frame, present frame is inputted to the data slavish copying to buf (n) middle area to be repaired, i.e. DE zone in Fig. 8; The data of the frame length that output starts with the C point.
The beneficial effect that technical solution of the present invention is brought:
Use 20 first clean speech audio frequency (comprising man, woman, children speech audio frequency) and the noise audio frequency of 4 types to be tested, noise type is respectively: mouse sound, knock, metronome sound, keyboard sound.The duration of four kinds of noises is respectively: mouse sound is 10ms, and knock, metronome sound are 20ms, and keyboard sound is 30ms.The pure audio frequency of every head is added respectively to this 4 kinds of noises, obtain 80 first noisy audio frequency.It is 30 that every first audio frequency adds the number of noise, and the distance between noise equates.
The sampling rate of all audio frequency is f
s=48kHz, frame length is N=480.The MFCC calculation stages, be NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 dimension MFCC; The transient noise detection-phase, adaptive threshold is set to Thres=constener, for making thresholding, is applicable to all noises, and constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; The pitch period estimation stages, search pitch period in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, the points N of being fade-in fade-out
1, N
2be 32, buffer zone buf (n) length is 2240.
After using the present invention to carry out denoising to noisy speech, increase substantially the intelligibility of voice, reduced hearer's sense of fatigue.Use segmental signal-to-noise ratio SNR
segwith two kinds of indexs of PEAQ, this method denoising effect is assessed, wherein the segmental signal-to-noise ratio computing method are
By two kinds of indexs, this method denoising effect is assessed, result as shown in Figure 12 and Figure 13, Figure 12 for before using signal to noise ratio (S/N ratio) to the signals and associated noises denoising with denoising after objective audio quality compare; Figure 13 for before using PEAQ to the signals and associated noises denoising with denoising after objective audio quality compare.
Signals and associated noises and sound spectrograph with signal after this programme denoising as Figure 11 (a) and Figure 11 (b) and as shown in; Gray-scale map can better be understood technique effect of the present invention, and the spy provides gray-scale map that technique effect of the present invention is described.In order to allow the clearer understanding of auditor technique effect spy of the present invention that gray-scale map is provided, be that Figure 11 (a) and Figure 11 (b) illustrate technique effect of the present invention.For your guidance.Figure 11 (a) is for being subject to mouse to click the sound spectrograph of the audio frequency of sound pollution; Figure 11 (b) is for to frequently carrying out the audio frequency sound spectrograph after denoising with noise shown in Figure 11 (a).
The above; it is only preferably embodiment of the present invention; but protection scope of the present invention is not limited to this; anyly be familiar with those skilled in the art in the technical scope that the present invention discloses; be equal to replacement or changed according to technical scheme of the present invention and inventive concept thereof, within all should being encompassed in protection scope of the present invention.
The abbreviation the present invention relates to and Key Term definition
AR:Autoregressive Model, autoregressive model.
DCT:Discrete Cosine Transform, discrete cosine transform.
FFT:Fast Fourier Transform, Fast Fourier Transform (FFT).
LPF:Low Pass Filter, low-pass filter.
LPCC:Linear Prediction Cepstrum Coefficient, the linear prediction cepstrum coefficient coefficient.
MFCC:Mel Frequency Cepstrum Coefficient, the Mel cepstrum coefficient.
VoIP:Voice over IP, the voice based on IP network.
PLC:Packet Loss Concealment, letter packet loss concealment algorithm.
PWR:Pitch Waveform Replication, pitch cycle waveform copies.
SNR:Signal_to_Noise Ratio, signal to noise ratio (S/N ratio).
PEAQ:Perceptual Evaluation of Audio Quality, a kind of standard of the objective evaluation for the audio quality perception of ITU-R BS.1387 suggestion.
Claims (5)
1. the denoising method of transient noise, it is characterized in that: the Mel cepstrum coefficient that at first calculates this frame signal, predict the pitch period of this frame signal simultaneously, whether then with the Mel cepstrum coefficient, detect this frame signal exists noise to carry out walkaway, if there is noise, by the pitch period predicted value, carry out waveform reconstruction.
2. the denoising method of transient noise according to claim 1, it is characterized in that: Mel cepstrum coefficient computing method are as follows:
1) input signal divides frame, and it is that data length is 10ms that frame length is made as N=480, and data are carried out to normalization; If current frame signal is the p frame signal, have
x
(p)(n)=x[p·(N-1)+n],n=0,1,…,N-1; (1)
2) pre-service, carry out pre-emphasis and windowing process to current frame signal,
y
(p)(n)=x
(p)(n)-βx
(p)(n-1); (2)
Pre-emphasis factor-beta=0.938 wherein; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n π/N);
3) pretreated signal is done to N=1024 point FFT, obtain frequency-region signal Y
(p)(k);
4) calculate frequency-region signal Y
(p)(k) energy spectrum | Y
(p)(k) |
2;
5) energy spectrum of frequency-region signal is passed through to the triangle filter group H of one group of Mel scale, carry out frequency domain filtering;
In bank of filters, M wave filter arranged, each wave filter is triangular filter, overlapped between wave filter, the centre frequency of each wave filter is f (m), m=1,2 ..., M, M=24;
Filter design method: by input signal end frequency f
s/ 2, i.e. 24kHz, through type
In formula, f is frequency, and unit is Hz; Transform to Mel scale frequency territory, obtain F
smel; By interval (0, F
smel) be divided into 25 parts, remove 0 and F
smeltwo end points, 24 remaining cut-points are respectively as the centre frequency of 24 wave filters; Each cut-point f (m) is evenly distributed in the Mel scale frequency, then through type (1) transforms to linear frequency scale; After conversion, the interval between f (m) dwindles along with reducing of m value, the broadening along with the increase of m value; According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
6) calculate energy and the logarithm of each filters H (m, k) output, obtain E (m),
E (m) is done to discrete cosine transform, can obtain L=12 rank MFCC, be designated as C (l)
(6)
3. the denoising method of transient noise according to claim 1, it is characterized in that: the process of walkaway is as follows:
Euclidean distance dist between the MFCC of calculating current frame signal and the MFCC of former frame signal
Judge according to distance value and threshold T hres whether present frame contains noise; Threshold T hres is determined by the following formula self-adaptation
Thres=10·ener, (8)
Wherein ener is the energy after each frame signal normalization, and its minimum value is made as to 60.0; After detection completes, upgrade the MFCC feature of present frame,
C
(p)(l)=b·C
(p-1)(l)+(1-b)·C
(p-1)(l), (9)
Forgetting factor b=0.4 wherein; When the next frame of noise frame is speech frame, this update method can prevent flase drop.
4. the denoising method of transient noise according to claim 1 is characterized in that: the method for pitch period prediction is as follows:
1) buf (n) is carried out to low-pass filtering, obtain buf
d(n); Wherein the cutoff frequency of low-pass filter (LPF) is 900Hz;
2) to buf
d(n) carry out center clipping, obtain buf
c(n),
C wherein
lfor clipping lever, usually be made as normalization data peaked 68%;
3) to buf
c(n) carry out auto-correlation computation, the autocorrelative maximum value position of search in (96,576) scope, using it as pitch period estimated value Pitch;
4) for preventing that frequency multiplication from occurring, use formula (13) to front cross frame pitch period predicted value Pitch
(p-1)and Pitch
(p-2)carry out smoothing processing,
Predict present frame pitch period Pitch according to two pitch periods after level and smooth
(p),
Pitch
(p)=Pitch
(p-1)+(Pitch
(p-1)-Pitch
(p-2)) (14) 。
5. the denoising method of transient noise according to claim 1, it is characterized in that: the method for waveform reconstruction is:
1) owing to storing output frame data in buf (n), so can from buf (n), extract the pitch cycle waveform of former frame, the i.e. last Pitch of former frame output signal
(p-1)individual, its Wave data is designated as to pw
(p-1)(n); To pw
(p-1)(n) carry out linear interpolation, obtaining length is Pitch
(p)new waveform, be designated as pw
(p)(n) interpolation formula is
(15)
2) using new waveform to carry out wave period copies; It is as follows that new waveform carries out the method that wave period copies:
If a. present frame is no matter that noise frame and former frame are noise frame or clean speech frame, processing procedure is: according to formula (15), AB segment data in buf (n) and CD segment data are carried out to overlap-add, and the processing of being fade-in fade-out, there is continuity with the data that guarantee the D both sides,
buf
CD(n)=α·buf
CD(n)+(1-α)·buf
AB(n) (16)
=α·buf
CD(n)+(1-α)·buf
CD(n-Pitch)0≤n<N
1
Wherein, α is decay factor, from 1 linear attenuation to 0; AB section and CD segment data length N
1=32;
B. according to cycle Pitch
(p), with new waveform pw
(p)(n) constantly copy in the DF zone; Wherein, the DE section is the present frame after repairing; EF segment data length is N
2=32, its role is to, when next frame is speech frame, for data, be fade-in fade-out, to guarantee that the E two ends are the continuity between frame and frame;
C. export frame data that start with the C point in buf (n); The method output exists and postpones, and be the CD segment length time delay, then by all data reaches of buf (n) N point;
If present frame is the clean speech frame, and former frame is noise frame, processing procedure is as follows,
A. now in buf (n) the DG segment data be the EF segment data of previous frame; Front N by DG section and present frame input
2it is similar with formula (15) that individual data point is carried out its calculating of data fusion, stores in DG;
B. after the strong point slavish copying of present frame remainder being arrived to the G point in buf (n);
C. export the data of the frame length started with the C point, then be a frame length by all data reaches of buf (n) N point; If present frame and former frame are all the clean speech frame, present frame is inputted to the data slavish copying to the middle area to be repaired of buf (n); The data of the frame length that output starts with the C point.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310357211.6A CN103440872B (en) | 2013-08-15 | 2013-08-15 | The denoising method of transient state noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310357211.6A CN103440872B (en) | 2013-08-15 | 2013-08-15 | The denoising method of transient state noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103440872A true CN103440872A (en) | 2013-12-11 |
CN103440872B CN103440872B (en) | 2016-06-01 |
Family
ID=49694563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310357211.6A Expired - Fee Related CN103440872B (en) | 2013-08-15 | 2013-08-15 | The denoising method of transient state noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103440872B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103745729A (en) * | 2013-12-16 | 2014-04-23 | 深圳百科信息技术有限公司 | Audio de-noising method and audio de-noising system |
CN103778914A (en) * | 2014-01-27 | 2014-05-07 | 华南理工大学 | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching |
CN104157295A (en) * | 2014-08-22 | 2014-11-19 | 中国科学院上海高等研究院 | Method used for detecting and suppressing transient noise |
CN105830152A (en) * | 2014-01-28 | 2016-08-03 | 三菱电机株式会社 | Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system |
CN106652624A (en) * | 2016-10-12 | 2017-05-10 | 大连文森特软件科技有限公司 | Medical surgery simulation system based on VR technology and transient noise removal technology |
CN108182953A (en) * | 2017-12-27 | 2018-06-19 | 上海传英信息技术有限公司 | Audio tail portion POP voice handling methods and device |
CN108899043A (en) * | 2018-06-15 | 2018-11-27 | 深圳市康健助力科技有限公司 | The research and realization of digital deaf-aid instantaneous noise restrainable algorithms |
CN109346105A (en) * | 2018-07-27 | 2019-02-15 | 南京理工大学 | Directly display the pitch period spectrogram method of pitch period track |
CN110010145A (en) * | 2019-02-28 | 2019-07-12 | 广东工业大学 | A method of eliminating electronic auscultation device grating |
CN110703144A (en) * | 2019-09-08 | 2020-01-17 | 广东石油化工学院 | Transformer operation state detection method and system based on discrete cosine transform |
CN111081269A (en) * | 2018-10-19 | 2020-04-28 | 中国移动通信集团浙江有限公司 | Noise detection method and system in call process |
CN114333880A (en) * | 2022-03-04 | 2022-04-12 | 南京大鱼半导体有限公司 | Signal processing method, device, equipment and storage medium |
CN115063895A (en) * | 2022-06-10 | 2022-09-16 | 深圳市智远联科技有限公司 | Ticket selling method and system based on voice recognition |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006243644A (en) * | 2005-03-07 | 2006-09-14 | Nippon Telegr & Teleph Corp <Ntt> | Method for reducing noise, device, program, and recording medium |
CN1956058A (en) * | 2005-10-17 | 2007-05-02 | 哈曼贝克自动系统-威美科公司 | Minimization of transient noises in a voice signal |
US20080183466A1 (en) * | 2007-01-30 | 2008-07-31 | Rajeev Nongpiur | Transient noise removal system using wavelets |
-
2013
- 2013-08-15 CN CN201310357211.6A patent/CN103440872B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006243644A (en) * | 2005-03-07 | 2006-09-14 | Nippon Telegr & Teleph Corp <Ntt> | Method for reducing noise, device, program, and recording medium |
CN1956058A (en) * | 2005-10-17 | 2007-05-02 | 哈曼贝克自动系统-威美科公司 | Minimization of transient noises in a voice signal |
US20080183466A1 (en) * | 2007-01-30 | 2008-07-31 | Rajeev Nongpiur | Transient noise removal system using wavelets |
Non-Patent Citations (3)
Title |
---|
张兆伟: "语音中瞬态噪声抑制算法研究", 《CNKI中国知网》 * |
张兆伟: "语音中瞬态噪声抑制算法研究", 《CNKI中国知网》, 1 May 2013 (2013-05-01) * |
韩丨等: "基于MFCC的语音情感识别", 《重庆邮电大学学报(自然科学版)》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103745729B (en) * | 2013-12-16 | 2017-01-04 | 深圳百科信息技术有限公司 | A kind of audio frequency denoising method and system |
CN103745729A (en) * | 2013-12-16 | 2014-04-23 | 深圳百科信息技术有限公司 | Audio de-noising method and audio de-noising system |
CN103778914A (en) * | 2014-01-27 | 2014-05-07 | 华南理工大学 | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching |
CN103778914B (en) * | 2014-01-27 | 2017-02-15 | 华南理工大学 | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching |
CN105830152A (en) * | 2014-01-28 | 2016-08-03 | 三菱电机株式会社 | Sound collecting device, input signal correction method for sound collecting device, and mobile apparatus information system |
CN104157295A (en) * | 2014-08-22 | 2014-11-19 | 中国科学院上海高等研究院 | Method used for detecting and suppressing transient noise |
CN104157295B (en) * | 2014-08-22 | 2018-03-09 | 中国科学院上海高等研究院 | For detection and the method for transient suppression noise |
CN106652624B (en) * | 2016-10-12 | 2019-05-24 | 快创科技(大连)有限公司 | A kind of medical operating analogue system based on VR technology and transient noise noise-removed technology |
CN106652624A (en) * | 2016-10-12 | 2017-05-10 | 大连文森特软件科技有限公司 | Medical surgery simulation system based on VR technology and transient noise removal technology |
CN108182953B (en) * | 2017-12-27 | 2021-03-16 | 上海传英信息技术有限公司 | Audio tail POP sound processing method and device |
CN108182953A (en) * | 2017-12-27 | 2018-06-19 | 上海传英信息技术有限公司 | Audio tail portion POP voice handling methods and device |
CN108899043A (en) * | 2018-06-15 | 2018-11-27 | 深圳市康健助力科技有限公司 | The research and realization of digital deaf-aid instantaneous noise restrainable algorithms |
CN109346105A (en) * | 2018-07-27 | 2019-02-15 | 南京理工大学 | Directly display the pitch period spectrogram method of pitch period track |
CN109346105B (en) * | 2018-07-27 | 2022-04-15 | 南京理工大学 | Pitch period spectrogram method for directly displaying pitch period track |
CN111081269A (en) * | 2018-10-19 | 2020-04-28 | 中国移动通信集团浙江有限公司 | Noise detection method and system in call process |
CN110010145A (en) * | 2019-02-28 | 2019-07-12 | 广东工业大学 | A method of eliminating electronic auscultation device grating |
CN110010145B (en) * | 2019-02-28 | 2021-05-11 | 广东工业大学 | Method for eliminating friction sound of electronic stethoscope |
CN110703144A (en) * | 2019-09-08 | 2020-01-17 | 广东石油化工学院 | Transformer operation state detection method and system based on discrete cosine transform |
CN110703144B (en) * | 2019-09-08 | 2021-07-09 | 广东石油化工学院 | Transformer operation state detection method and system based on discrete cosine transform |
CN114333880A (en) * | 2022-03-04 | 2022-04-12 | 南京大鱼半导体有限公司 | Signal processing method, device, equipment and storage medium |
CN115063895A (en) * | 2022-06-10 | 2022-09-16 | 深圳市智远联科技有限公司 | Ticket selling method and system based on voice recognition |
Also Published As
Publication number | Publication date |
---|---|
CN103440872B (en) | 2016-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103440872B (en) | The denoising method of transient state noise | |
CN103440871B (en) | A kind of method that in voice, transient noise suppresses | |
CN103854662B (en) | Adaptive voice detection method based on multiple domain Combined estimator | |
KR100330230B1 (en) | Noise suppression for low bitrate speech coder | |
EP2151822A1 (en) | Apparatus and method for processing and audio signal for speech enhancement using a feature extraction | |
CN105023572A (en) | Noised voice end point robustness detection method | |
EP3411876B1 (en) | Babble noise suppression | |
CN106885971B (en) | Intelligent background noise reduction method for cable fault detection pointing instrument | |
CN106340292A (en) | Voice enhancement method based on continuous noise estimation | |
CN104658544A (en) | Method for inhibiting transient noise in voice | |
CN110047470A (en) | A kind of sound end detecting method | |
CN103109320A (en) | Noise suppression device | |
Verteletskaya et al. | Noise reduction based on modified spectral subtraction method | |
CN103474074B (en) | Pitch estimation method and apparatus | |
Hu et al. | A cepstrum-based preprocessing and postprocessing for speech enhancement in adverse environments | |
US20160365099A1 (en) | Method and system for consonant-vowel ratio modification for improving speech perception | |
CN103578477A (en) | Denoising method and device based on noise estimation | |
EP2362390A1 (en) | Noise suppression | |
CN106297795A (en) | Audio recognition method and device | |
Shome et al. | Reference free speech quality estimation for diverse data condition | |
Kamble et al. | Teager energy subband filtered features for near and far-field automatic speech recognition | |
Jebara | A perceptual approach to reduce musical noise phenomenon with wiener denoising technique | |
Flynn et al. | Combined speech enhancement and auditory modelling for robust distributed speech recognition | |
Yuan et al. | Noise estimation based on time–frequency correlation for speech enhancement | |
Lu | Reduction of musical residual noise using block-and-directional-median filter adapted by harmonic properties |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160601 |