CN103440872B - The denoising method of transient state noise - Google Patents
The denoising method of transient state noise Download PDFInfo
- Publication number
- CN103440872B CN103440872B CN201310357211.6A CN201310357211A CN103440872B CN 103440872 B CN103440872 B CN 103440872B CN 201310357211 A CN201310357211 A CN 201310357211A CN 103440872 B CN103440872 B CN 103440872B
- Authority
- CN
- China
- Prior art keywords
- frame
- buf
- data
- pitch
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 230000001052 transient effect Effects 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 15
- 238000001914 filtration Methods 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 238000009499 grossing Methods 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims description 2
- 238000002203 pretreatment Methods 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000000694 effects Effects 0.000 description 18
- 230000005236 sound signal Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 239000004568 cement Substances 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 206010019133 Hangover Diseases 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000002386 leaching Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Landscapes
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The present invention discloses the denoising method of transient state noise, belongs to signal processing technology field. First the present invention calculates the mel cepstrum coefficients of this frame signal, predicts the fundamental tone cycle of this frame signal simultaneously, then uses mel cepstrum coefficients to detect whether this frame signal exists noise, if there is noise, then uses fundamental tone period forecasting value to rebuild this frame signal.
Description
Technical field
The present invention relates to the denoising method of transient state noise, belong to signal processing technology field.
Background technology
Transient state adding property noise in sound signal, also referred to as transient noise, or impulse noise. Usually, transient state noise be in the time domain discontinuous, intermittently, pulsed, noise enerty mainly concentrates in shorter time interval, and in this interval, the energy of transient state noise is obviously more much larger than the energy of pure signal. Typical transient state noise as desk knock sound, sound of closing the door, brouhaha, keyboard hit key sound, mousebutton sound, hammer hit and beat sound etc., they often appear at a lot of application scenario, such as osophone, mobile phone, video signal meeting equipment etc. The existence of transient state noise seriously affects audio frequency quality, therefore, it is necessary to take measures to be suppressed by transient state noise, to strengthen the quality of audio frequency. Current noise suppression algorithm is for steady noise and continuous noise situation mostly, usually method described in document " research of speech enhan-cement and correlation technique thereof " is used to carry out speech enhan-cement, such as spectrum-subtraction, self-adaptive routing etc., but these algorithms are helpless to above-mentioned transient state noise, substantially do not have inhibition.
Summary of the invention
The present invention is directed to the proposition of above problem, and develop the denoising method of transient state noise.
The technical scheme that the present invention takes is: the mel cepstrum coefficients first calculating this frame signal, predict the fundamental tone cycle of this frame signal simultaneously, then use mel cepstrum coefficients to detect this frame signal whether to there is noise and namely carry out walkaway, if there is noise, then fundamental tone period forecasting value is used to carry out waveform reconstruction.
The useful effect of the present invention: using the noised audio of 20 first clean speech audio frequency (comprise adult man, grow up woman, children speech audio frequency) and 4 types to test, noise type is respectively: mouse sound, knock sound, metronome sound, keyboard sound. The time length of four kinds of noises is respectively: mouse sound is 10ms, knocks sound, metronome sound is 20ms, and keyboard sound is 30ms. Every first pure audio frequency is added this 4 kinds of noises respectively, obtains 80 first containing noise frequency. The number that every first audio frequency adds noise is 30, and the distance between noise is equal. The sampling rate of all audio frequency is fs=48kHz, frame length is N=480. MFCC calculation stages, is NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 and ties up MFCC; In the transient state walkaway stage, adaptive threshold is set to Thres=const ener, and for making thresholding be applicable to all noises, constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; In the fundamental tone phase estimate stage, searching for the fundamental tone cycle in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, points N of being fade-in fade-out1, N2Being 32, buffer zone buf (n) length is 2240. After using the present invention that noisy speech is carried out denoising, increase substantially the intelligibility of voice, decrease the tired sense of hearer. Use segmentation signal to noise ratio snrSegWith PEAQ two kinds of indexs, present method denoising effect is carried out assessment result to see shown in Figure 12 and the Figure 13 in accompanying drawing explanation.
Accompanying drawing explanation
The relation of Fig. 1 mel-frequency and linear frequency.
The technical scheme flow process of Fig. 2 prior art one.
The technical scheme flow process of Fig. 3 prior art two.
Fig. 4 the technical program block diagram.
Fig. 5 MFCC feature extracts block diagram.
Fig. 6 mel-frequency bank of filters.
Fig. 7 fundamental tone phase estimate block diagram.
The linear interpolation of Fig. 8 point-to-point transmission.
Signal when Fig. 9 (a) present frame is not repaired.
The new pitch cycle waveform pw of Fig. 9 (b)(p)(n)��
Signal after Fig. 9 (c) present frame reparation.
Signal when Figure 10 (a) present frame is not repaired.
Figure 10 (b) current frame signal.
Figure 10 (c) repairs rear signal.
Figure 11 (a) denoising front signal.
Signal after Figure 11 (b) denoising.
Figure 12 denoising effect evaluation form (SNR).
Figure 13 denoising effect assessment (PEAQ).
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described:
Mel cepstrum coefficients:
The research of the sense of hearing mechanism of people being found, the sound wave of different frequency is had different hearing sensitivities by people's ear, and the sharpness of voice is affected maximum by the speech signal between 200Hz to 5kHz. In addition, people's ear has masking effect, and more weak speech signal is had by the speech signal that namely energy is big certain covers effect. Usually, the audio frequency of more low-frequency audio masking higher-frequency rate is easy, otherwise then more difficult, and that is, the critical bandwidth higher-frequency end of the sound mask at low frequency place is little. Accordingly, people according to the size of critical bandwidth by close to rare arrangement one group of bandpass filter, input signal is carried out filtering. If the energy each bandpass filter outputed signal is as the essential characteristic of signal, then after this feature being processed further, so that it may as the feature of voice, this is exactly mel cepstrum coefficients (MFCC). This kind of feature does not rely on the character of signal, namely input signal is not done any hypothesis and restriction, make use of again the auditory perception property of people's ear simultaneously, therefore, compared with the linear prediction residue error (LPCC) based on channel model, it has better robustness, and when signal to noise ratio is lower, still has good speech recognition performance.
MFCC is the cepstrum parameter extracted in Mel scale frequency territory. Mel scale describes the non-linear character of people's ear frequency, and the relation of it and frequency can approximate representation be
In formula, f is frequency, and unit is Hz. The relation being mel-frequency and linear frequency shown in Fig. 1, along with flinearLinear increase, fmelThe form of logarithm increases.
Letter packet loss concealment:
In the voice communication system based on IP agreement, such as based on IP net voice (VoIP) in, due to network is congested or transmitting procedure postpone shake, letter packet loss can be caused, namely some letter bag can not appear at receiving end on time, seriously affects the voice quality of receiving end. Therefore, must take some measures at receiving end, to reduce the voice distortion caused because of letter packet loss. Usually, the measure of this kind of process packet loss problem is called letter packet loss concealment algorithm (PLC) algorithm.
PLC algorithm is mainly divided into the Processing Algorithm based on sending end and Processing Algorithm two class based on receiving end. Jointly participate in by sending and receiving two ends based on sending end PLC algorithm; Based on receiving end PLC algorithm, then the letter bag that only normally receives according to receiving end, the coded system lost letter packet number and know in advance, recover original voice as far as possible. Owing to not needing the relevant data of sending end based on the PLC technology of receiving end, flow and the time delay of network therefore can not be increased. The conventional PLC method based on receiving end has quiet alternative method, a front letter bag repetition methods, template matching method, pitch waveform clone method and linear prediction method etc.
Pitch cycle waveform in this paper copies (PWR) method, belongs to the PLC method based on receiving end.
Prior art one related to the present invention
The technical scheme of prior art one
What will bravely waits in paper " based on the speech enhan-cement of Kalman filtering under impulse noise environment ", it is proposed that the sound enhancement method under a kind of transient noise conditions. The schema of the method as shown in Figure 2, first finds out the frequency range that transient state noise sample energy is maximum with the ratio of signals and associated noises sample energy, then utilizes the energy distribution situation of this frequency range, differentiates that whether speech signal is by transient state noise jamming frame by frame; On this basis, the method carries out denoising for the voice frame of transient state noise jamming, application card Kalman Filtering algorithm; In addition, autoregression (AR) model parameter estimation process has been improved by the method.
The shortcoming of prior art one
(1) for the noise that hangover is longer, trailing portion likely can detect out.
(2) when denoising, Kalman filtering used is applicable to steady noise is carried out denoising, is not suitable for non-stable transient state noise, and therefore denoising effect is limited, and noise residual is more, have impact on voice quality.
Prior art two related to the present invention
The technical scheme of prior art two
Hetherington etc. are in patent of invention " Repetitivetransientnoiseremoval ", it is proposed to a kind of transient state noise suppressing method. The schema of Hetherington method is as shown in Figure 3. The method first carries out modeling according to noise behavior, then utilizes the relation conefficient of modeling signal and signal to be detected to determine that whether data to be tested are containing noise, if there is noise, then remove the noise contribution in signal to be detected according to modeling signal.
The shortcoming of prior art two
The noise repeated can be carried out denoising by Hetherington method effectively, but owing to transient state noise type is varied, when there is the transient state noise of number of different types in the short period of time, modeling being caused inaccurate, now the denoising effect of Hetherington method is poor.
Elaborating of technical solution of the present invention
Technical problem to be solved by this invention
The audio frequency of transient state noise jamming is carried out speech enhan-cement, transient suppression noise, improve voice quality, it is to increase audio frequency intelligibility.
Complete skill scheme provided by the invention:
Fig. 4 is shown in by technical solution of the present invention block diagram: utilize input audio signal, extracts MFCC parameter; Then whether detect in sound signal containing noise by MFCC parameter; If detected result is for containing noise, then using PWR method to replace containing making an uproar frame data, carry out waveform reconstruction; If detected result is not containing noise, sound signal is former state output then.
Technical solution of the present invention performing step:
The sampling rate of input monophonic audio signal is fs=48kHz. Containing the sound signal x (n) that makes an uproar, input can represent that wherein s (n) represents clean speech signal for x (n)=s (n)+d (n), d (n) represents transient state noise signal.
(1) the MFCC feature of sound signal is extracted
As shown in Figure 5, gray-scale map can better understand the technique effect of the present invention to the leaching process of MFCC, the special technique effect providing gray-scale map that the present invention is described. Technique effect spy in order to allow auditor more clearly understand the present invention provides gray-scale map Fig. 5 so that the technique effect of the present invention to be described. For reference. First time-domain audio signal is carried out time-frequency conversion, calculate its energy spectrum; Then being multiplied with the triangle filter group of Mel scale by this energy spectrum, then the logarithm energy of multiplied result is done discrete cosine transform (DCT), the front L dimensional vector obtained like this is called MFCC, calculates the concrete steps of MFCC:
1) input signal framing, frame length is set to 10ms, owing to sample frequency is fs=48kHz, so the data length of a frame is N=480 point; Then data are normalized: if signal quantization figure place is 16bit, then by data divided by 215, the scope of data is narrowed down to (-1,1), namely completes the normalization method of data. If current frame signal is p frame signal, then have
x(p)(n)=x[p��(N-1)+n],n=0,1,��,N-1(19)
2) pre-treatment. Current frame signal is carried out pre-emphasis and windowing process, namely
y(p)(n)=x(p)(n)-��x(p)(n-1)(20)
Wherein pre-emphasis factor-beta=0.938; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n ��/N).
3) pretreated signal is N=1024 point FFT, obtains frequency domain signal Y(p)(k)��
4) frequency domain signal Y is calculated(p)The energy spectrum of (k) | Y(p)(k)|2��
5) by the energy spectrum of frequency domain signal by the triangle filter group H of one group of Mel scale, frequency domain filtering is carried out.
In bank of filters, having M wave filter, each wave filter is triangular filter, overlapped between wave filter, as shown in Figure 6: the mid-frequency of each wave filter is f (m), m=1,2 ..., M, the present invention gets M=24. Filter design method: by input signal end frequency fs/ 2, i.e. 24kHz, transform to Mel scale frequency territory by formula (1), obtain Fsmel; By interval (0, Fsmel) it is divided into 25 parts, remove 0 and FsmelTwo end points, 24 points of remaining cutpoints are respectively as the mid-frequency of 24 wave filters. Each point of cutpoint f (m) is evenly distributed in Mel scale frequency, then transforms to linear frequency scale by formula (1). After conversion, the interval between f (m) reduces along with the reduction of m value, broadening along with the increase of m value.
According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
6) calculate energy and logarithm that each filters H (m, k) exports, obtain E (m), namely
E (m) is done discrete cosine transform, L=12 rank MFCC can be obtained, be designated as C (l)
(2) walkaway:
Calculate the Euclidean distance dist between MFCC and the MFCC of former frame signal of current frame signal
Judge that whether present frame is containing noise according to distance value and threshold T hres. Threshold T hres is determined by following formula self-adaptation
Thres=10 ener, (26)
Wherein ener is the energy after each frame signal normalization method, and its minimum value is set to 60.0.
After having detected, upgrade the MFCC feature of present frame, namely
C(p)(l)=b��C(p-1)(l)+(1-b)��C(p-1)(l),(27)
Wherein forgetting factor b=0.4. When next frame of noise frame is voice frame, this update method can prevent inspection by mistake.
(3) fundamental tone period forecasting:
Each frame speech signal is estimated the fundamental tone cycle. If present frame is noise frame, then fundamental tone cycle according to front cross frame signal predicts the present frame fundamental tone cycle. Fundamental tone phase estimate block diagram is as shown in Figure 7: for different speaker, and the fundamental tone cycle, generally in 2-12ms, therefore, searches for the fundamental tone cycle herein in 2-12ms. If PMAX is the data amount check corresponding to 12ms, i.e. PMAX=576; PMIN is the data amount check corresponding to 2ms, i.e. PMIN=96. Using buffer zone buf (n) that length is 3PMAX+N=2208 to estimate the fundamental tone cycle, wherein buffer zone buf (n) is used for storing the data exported.
Fundamental tone phase estimate method is as follows:
1) buf (n) is carried out low-pass filtering, obtain bufd(n). Wherein the limiting frequency of low-pass filter (LPF) is 900Hz.
2) to bufdN () carries out center slicing, obtain bufc(n), namely
Wherein CLFor clipping lever, usually it is set to the 68% of normalization data maximum value.
3) to bufcN () carries out auto-correlation computation, search for autocorrelative maximum value position in (96,576) scope, it can be used as fundamental tone phase estimate value Pitch.
4) for preventing frequency multiplication from occurring, by formula (13) to front cross frame fundamental tone period forecasting value Pitch(p-1)And Pitch(p-2)Carry out smoothing processing, namely
Present frame fundamental tone cycle Pitch is predicted according to two fundamental tone cycles after level and smooth(p), namely
Pitch(p)=Pitch(p-1)+(Pitch(p-1)-Pitch(p-2))��(32)
(4) waveform reconstruction:
Extract last pitch cycle waveform of former frame, it is carried out linear interpolation, obtain new pitch cycle waveform.
1) owing to buf (n) storing output frame data, so the pitch cycle waveform of former frame can be extracted from buf (n), i.e. the last Pitch of former frame output signal(p-1)Individual, its Wave data is designated as pw(p-1)(n). To pw(p-1) (n) carry out linear interpolation, obtaining length is Pitch(p)New waveform, be designated as pw(p)(n). As shown in Figure 8, interpolation formula is the linear interpolation of point-to-point transmission
2) new waveform is used to carry out wave period duplication:
D. the principle that wave period copies is if Fig. 9 (a) is to Fig. 9 (c): if present frame is noise frame (no matter former frame is noise frame or clean speech frame), treating processes is: according to formula (15), AB section data in buf (n) are carried out overlapping addition with CD section data, and carry out process of being fade-in fade-out, to ensure the data of D both sides, there is continuity, namely
bufCD(n)=�� bufCD(n)+(1-��)��bufAB(n)(34)
=�� bufCD(n)+(1-��)��bufCD(n-Pitch) 0��n < N1
Wherein, �� is decay factor, linearly decays to 0 from 1; AB section and CD section data length N1=32��
E. according to cycle Pitch(p), with new waveform pw(p)N () constantly copies in DF region. Wherein, DE section is the present frame after repairing; EF section data length is N2=32, its role is to, when next frame is voice frame, it is fade-in fade-out for data, to ensure E two ends and the continuity between frame and frame.
F. the frame data started with C point are exported in buf (n). This method exports to exist and postpones, time of lag and CD segment length. Move forward N point (a frame length degree) again by all for buf (n) data.
As shown in Figure 10 (a) to Figure 10 (c), Figure 10 (a) for present frame be multi-frame to be repaired time, present frame is abandoned signal diagram; The current frame signal that Figure 10 (b) rebuilds for using this patent method; Figure 10 (c) is for repairing rear signal. If present frame is clean speech frame, and former frame is noise frame, treating processes is as follows,
D. now DG section data are the EF section data of previous frame in buf (n). By the front N of DG section and present frame input2Individual data point carries out data fusion (calculating with formula (15) similar), is stored in DG.
E. by after present frame remainder strong point slavish copying to the G point in buf (n).
F. the data of the frame length degree started with C point are exported, then the data length of a frame signal that all for buf (n) data are moved forward, i.e. N point.
If present frame and former frame are all clean speech frame, then present frame is inputted area to be repaired in data slavish copying to buf (n), i.e. DE region in Fig. 8; Export the data of the frame length degree started with C point.
The useful effect that technical solution of the present invention is brought:
Using the noised audio of 20 first clean speech audio frequency (comprise adult man, grow up woman, children speech audio frequency) and 4 types to test, noise type is respectively: mouse sound, knock sound, metronome sound, keyboard sound. The time length of four kinds of noises is respectively: mouse sound is 10ms, knocks sound, metronome sound is 20ms, and keyboard sound is 30ms. Every first pure audio frequency is added this 4 kinds of noises respectively, obtains 80 first containing noise frequency. The number that every first audio frequency adds noise is 30, and the distance between noise is equal.
The sampling rate of all audio frequency is fs=48kHz, frame length is N=480. MFCC calculation stages, is NFFT=1024 point FFT, and the number of filter of Mel bank of filters is M=24, asks for L=12 and ties up MFCC; In the transient state walkaway stage, adaptive threshold is set to Thres=const ener, and for making thresholding be applicable to all noises, constant const is set to the energy that 10, ener is each frame input signal, and minimum value is set to 60.0; When thresholding upgrades, forgetting factor b is set to 0.4; In the fundamental tone phase estimate stage, searching for the fundamental tone cycle in (2ms, 12ms), correspondence is counted as (76,576); The waveform reconstruction stage, points N of being fade-in fade-out1, N2Being 32, buffer zone buf (n) length is 2240.
After using the present invention that noisy speech is carried out denoising, increase substantially the intelligibility of voice, decrease the tired sense of hearer. Use segmentation signal to noise ratio snrSegBeing assessed by present method denoising effect with PEAQ two kinds of indexs, wherein segmentation signal-noise ratio computation method is
By two kinds of indexs, present method denoising effect is assessed, result as shown in Figure 12 and Figure 13, Figure 12 be use signal to noise ratio to before signals and associated noises denoising with denoising after objective audio frequency quality compare; Figure 13 be use PEAQ to before signals and associated noises denoising with denoising after objective audio frequency quality compare.
Signals and associated noises with the language spectrogram of signal after the denoising of this scheme as Figure 11 (a) and Figure 11 (b) and shown in; Gray-scale map can better understand the technique effect of the present invention, the special technique effect providing gray-scale map that the present invention is described. Technique effect spy in order to allow auditor more clearly understand the present invention provides gray-scale map and Figure 11 (a) and Figure 11 (b) that the technique effect of the present invention is described. For reference. Figure 11 (a) is the language spectrogram of the audio frequency by the pollution of mouse click sound; Figure 11 (b) for frequently carrying out the audio frequency language spectrogram after denoising to the noise of band shown in Figure 11 (a).
The above; it is only the present invention's preferably embodiment; but protection scope of the present invention is not limited thereto; any it is familiar with those skilled in the art in the technical scope that the present invention discloses; technical scheme and invention design thereof according to the present invention are equal to replacement or are changed, and all should be encompassed within protection scope of the present invention.
The shortenings that the present invention relates to and Key Term definition
AR:AutoregressiveModel, autoregressive model.
DCT:DiscreteCosineTransform, discrete cosine transform.
FFT:FastFourierTransform, Fast Fourier Transform (FFT).
LPF:LowPassFilter, low-pass filter.
LPCC:LinearPredictionCepstrumCoefficient, linear prediction residue error.
MFCC:MelFrequencyCepstrumCoefficient, mel cepstrum coefficients.
VoIP:VoiceoverIP, based on the voice of IP net.
PLC:PacketLossConcealment, letter packet loss concealment algorithm.
PWR:PitchWaveformReplication, pitch cycle waveform copies.
SNR:Signal_to_NoiseRatio, signal to noise ratio.
A kind of objective evaluation standard for audio frequency quality perception of PEAQ:PerceptualEvaluationofAudioQuality, ITU-RBS.1387 suggestion.
Claims (3)
1. the denoising method of transient state noise, it is characterized in that: the mel cepstrum coefficients first calculating this frame signal, predict the fundamental tone cycle of this frame signal simultaneously, then use mel cepstrum coefficients to detect this frame signal whether to there is noise and namely carry out walkaway, if there is noise, then fundamental tone period forecasting value is used to carry out waveform reconstruction;
The method of fundamental tone period forecasting is as follows:
1) buf (n) is carried out low-pass filtering, obtain bufd(n); Wherein the limiting frequency of low-pass filter (LPF) is 900Hz;
2) to bufdN () carries out center slicing, obtain bufc(n), namely
Wherein CLFor clipping lever, usually it is set to the 68% of normalization data maximum value;
3) to bufcN () carries out auto-correlation computation, search for autocorrelative maximum value position in (96,576) scope, it can be used as fundamental tone phase estimate value Pitch;
4) for preventing frequency multiplication from occurring, by formula (4) to front cross frame fundamental tone period forecasting value Pitch(p-1)And Pitch(p-2)Carry out smoothing processing, namely
Present frame fundamental tone cycle Pitch is predicted according to two fundamental tone cycles after level and smooth(p), namely
Pitch(p)=Pitch(p-1)+(Pitch(p-1)-Pitch(p-2))(5)
The method of waveform reconstruction is:
1) owing to buf (n) storing output frame data, so the pitch cycle waveform of former frame can be extracted from buf (n), i.e. the last Pitch of former frame output signal(p-1)Individual, its Wave data is designated as pw(p-1)(n); To pw(p-1)N () carries out linear interpolation, obtaining length is Pitch(p)New waveform, be designated as pw(p)N () interpolation formula is
2) new waveform is used to carry out wave period duplication; The method that new waveform carries out wave period duplication is as follows:
No matter if a. present frame is noise frame and former frame is noise frame or clean speech frame, treating processes is: according to formula (7), AB section data in buf (n) are carried out overlapping addition with CD section data, and carry out process of being fade-in fade-out, to ensure the data of D both sides, there is continuity, namely
Wherein, �� is decay factor, linearly decays to 0 from 1; AB section and CD section data length N1=32;
B. according to cycle Pitch(p), with new waveform pw(p)N () constantly copies in DF region; Wherein, DE section is the present frame after repairing; EF section data length is N2=32, its role is to, when next frame is voice frame, it is fade-in fade-out for data, to ensure E two ends and the continuity between frame and frame;
C. the frame data started with C point are exported in buf (n); This method exports to exist and postpones, time of lag and CD segment length, then the N point that all for buf (n) data moved forward;
If present frame is clean speech frame, and former frame is noise frame, treating processes is as follows,
A. now DG section data are the EF section data of previous frame in buf (n); By the front N of DG section and present frame input2It is identical with the process that CD section data overlap is added with AB section in formula (7) and be stored in DG that individual data point carries out its method of calculation of data fusion;
B. by after present frame remainder strong point slavish copying to the G point in buf (n);
C. the data of the frame length degree started with C point are exported, then a N point i.e. frame length degree that all for buf (n) data are moved forward; If present frame and former frame are all clean speech frame, then present frame is inputted area to be repaired in data slavish copying to buf (n); Export the data of the frame length degree started with C point.
2. the denoising method of transient state noise according to claim 1, it is characterised in that: mel cepstrum coefficients method of calculation are as follows:
1) input signal framing, frame length is set to N=480 and data length is 10ms, and data is normalized; If current frame signal is p frame signal, then have
x(p)(n)=x [p (N-1)+n], n=0,1 ..., N-1; (9)
2) pre-treatment, carries out pre-emphasis and windowing process, namely to current frame signal
y(p)(n)=x(p)(n)-��x(p)(n-1); (10)
Wherein pre-emphasis factor-beta=0.938; W (n) is Hamming window, i.e. w (n)=0.54-0.46cos (n ��/N);
3) pretreated signal is N=1024 point FFT, obtains frequency domain signal Y(p)(k);
4) frequency domain signal Y is calculated(p)The energy spectrum of (k) | Y(p)(k)|2;
5) by the energy spectrum of frequency domain signal by the triangle filter group H of one group of Mel scale, frequency domain filtering is carried out;
In bank of filters, having M wave filter, each wave filter is triangular filter, overlapped between wave filter, and the mid-frequency of each wave filter is f (m), m=1,2 ..., M, M=24;
Filter design method: by input signal end frequency fs/ 2, i.e. 24kHz, passes through formula
In formula, f is frequency, and unit is Hz; Transform to Mel scale frequency territory, obtain Fsmel; By interval (0, Fsmel) it is divided into 25 parts, remove 0 and FsmelTwo end points, 24 points of remaining cutpoints are respectively as the mid-frequency of 24 wave filters; Each point of cutpoint f (m) is evenly distributed in Mel scale frequency, then transforms to linear frequency scale by formula (12); After conversion, the interval between f (m) reduces along with the reduction of m value, broadening along with the increase of m value; According to frequency division point f (m), the frequency response that can obtain triangular filter group H (m, k) is
6) calculate energy and logarithm that each filters H (m, k) exports, obtain E (m), namely
E (m) is done discrete cosine transform, L=12 rank MFCC can be obtained, be designated as C (l)
3. the denoising method of transient state noise according to claim 1, it is characterised in that: the process of walkaway is as follows:
Calculate the Euclidean distance dist between MFCC and the MFCC of former frame signal of current frame signal
Judge that whether present frame is containing noise according to distance value and threshold T hres; Threshold T hres is determined by following formula self-adaptation
Thres=10 ener, (17)
Wherein ener is the energy after each frame signal normalization method, and its minimum value is set to 60.0; After having detected, upgrade the MFCC feature of present frame, namely
C(p)(l)=b C(p-1)(l)+(1-b)��C(p-1)(l), (18)
Wherein forgetting factor b=0.4; When next frame of noise frame is voice frame, this update method can prevent inspection by mistake.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310357211.6A CN103440872B (en) | 2013-08-15 | 2013-08-15 | The denoising method of transient state noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310357211.6A CN103440872B (en) | 2013-08-15 | 2013-08-15 | The denoising method of transient state noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103440872A CN103440872A (en) | 2013-12-11 |
CN103440872B true CN103440872B (en) | 2016-06-01 |
Family
ID=49694563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310357211.6A Expired - Fee Related CN103440872B (en) | 2013-08-15 | 2013-08-15 | The denoising method of transient state noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103440872B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103745729B (en) * | 2013-12-16 | 2017-01-04 | 深圳百科信息技术有限公司 | A kind of audio frequency denoising method and system |
CN103778914B (en) * | 2014-01-27 | 2017-02-15 | 华南理工大学 | Anti-noise voice identification method and device based on signal-to-noise ratio weighing template characteristic matching |
CN105830152B (en) * | 2014-01-28 | 2019-09-06 | 三菱电机株式会社 | The input signal bearing calibration and mobile device information system of audio collecting device, audio collecting device |
CN104157295B (en) * | 2014-08-22 | 2018-03-09 | 中国科学院上海高等研究院 | For detection and the method for transient suppression noise |
CN106652624B (en) * | 2016-10-12 | 2019-05-24 | 快创科技(大连)有限公司 | A kind of medical operating analogue system based on VR technology and transient noise noise-removed technology |
CN108182953B (en) * | 2017-12-27 | 2021-03-16 | 上海传英信息技术有限公司 | Audio tail POP sound processing method and device |
CN108899043A (en) * | 2018-06-15 | 2018-11-27 | 深圳市康健助力科技有限公司 | The research and realization of digital deaf-aid instantaneous noise restrainable algorithms |
CN109346105B (en) * | 2018-07-27 | 2022-04-15 | 南京理工大学 | Pitch period spectrogram method for directly displaying pitch period track |
CN111081269B (en) * | 2018-10-19 | 2022-06-14 | 中国移动通信集团浙江有限公司 | Noise detection method and system in call process |
CN110010145B (en) * | 2019-02-28 | 2021-05-11 | 广东工业大学 | Method for eliminating friction sound of electronic stethoscope |
CN110703144B (en) * | 2019-09-08 | 2021-07-09 | 广东石油化工学院 | Transformer operation state detection method and system based on discrete cosine transform |
CN114333880B (en) * | 2022-03-04 | 2022-06-14 | 南京大鱼半导体有限公司 | Signal processing method, device, equipment and storage medium |
CN115063895A (en) * | 2022-06-10 | 2022-09-16 | 深圳市智远联科技有限公司 | Ticket selling method and system based on voice recognition |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1956058A (en) * | 2005-10-17 | 2007-05-02 | 哈曼贝克自动系统-威美科公司 | Minimization of transient noises in a voice signal |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4448464B2 (en) * | 2005-03-07 | 2010-04-07 | 日本電信電話株式会社 | Noise reduction method, apparatus, program, and recording medium |
US7869994B2 (en) * | 2007-01-30 | 2011-01-11 | Qnx Software Systems Co. | Transient noise removal system using wavelets |
-
2013
- 2013-08-15 CN CN201310357211.6A patent/CN103440872B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1956058A (en) * | 2005-10-17 | 2007-05-02 | 哈曼贝克自动系统-威美科公司 | Minimization of transient noises in a voice signal |
Non-Patent Citations (2)
Title |
---|
基于MFCC的语音情感识别;韩丨等;《重庆邮电大学学报(自然科学版)》;20081031;第20卷(第5期);597-602 * |
语音中瞬态噪声抑制算法研究;张兆伟;《CNKI中国知网》;20130501;14-19,31-37 * |
Also Published As
Publication number | Publication date |
---|---|
CN103440872A (en) | 2013-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103440872B (en) | The denoising method of transient state noise | |
CN103440871B (en) | A kind of method that in voice, transient noise suppresses | |
CN103854662B (en) | Adaptive voice detection method based on multiple domain Combined estimator | |
Ghanbari et al. | A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets | |
KR100330230B1 (en) | Noise suppression for low bitrate speech coder | |
JP5666444B2 (en) | Apparatus and method for processing an audio signal for speech enhancement using feature extraction | |
EP1891624B1 (en) | Multi-sensory speech enhancement using a speech-state model | |
CN105023572A (en) | Noised voice end point robustness detection method | |
CN106885971B (en) | Intelligent background noise reduction method for cable fault detection pointing instrument | |
CN106340292A (en) | Voice enhancement method based on continuous noise estimation | |
CN104658544A (en) | Method for inhibiting transient noise in voice | |
Shahnaz et al. | Pitch estimation based on a harmonic sinusoidal autocorrelation model and a time-domain matching scheme | |
CN110047470A (en) | A kind of sound end detecting method | |
Verteletskaya et al. | Noise reduction based on modified spectral subtraction method | |
CN101271686A (en) | Method and apparatus for estimating noise by using harmonics of voice signal | |
CN110349598A (en) | A kind of end-point detecting method under low signal-to-noise ratio environment | |
Jain et al. | Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals | |
Katsir et al. | Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation | |
Patil et al. | Effectiveness of Teager energy operator for epoch detection from speech signals | |
Wenlu et al. | Modified Wiener filtering speech enhancement algorithm with phase spectrum compensation | |
Unoki et al. | MTF-based power envelope restoration in noisy reverberant environments | |
Gbadamosi et al. | Development of non-parametric noise reduction algorithm for GSM voice signal | |
EP4254409A1 (en) | Voice detection method | |
Cui | Pitch extraction based on weighted autocorrelation function in speech signal processing | |
Gao et al. | Improved endpoint detection of multi-parameter fusion under noise reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160601 |