CN105957537A - Voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition - Google Patents
Voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition Download PDFInfo
- Publication number
- CN105957537A CN105957537A CN201610452012.7A CN201610452012A CN105957537A CN 105957537 A CN105957537 A CN 105957537A CN 201610452012 A CN201610452012 A CN 201610452012A CN 105957537 A CN105957537 A CN 105957537A
- Authority
- CN
- China
- Prior art keywords
- voice
- centerdot
- noise
- sigma
- overbar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 87
- 239000011159 matrix material Substances 0.000 title claims abstract description 78
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 20
- 238000001228 spectrum Methods 0.000 claims abstract description 41
- 238000012549 training Methods 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 22
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 230000008901 benefit Effects 0.000 claims description 7
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 239000004576 sand Substances 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 5
- 238000003786 synthesis reaction Methods 0.000 claims description 5
- 230000033228 biological regulation Effects 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000010189 synthetic method Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 abstract description 4
- 230000003595 spectral effect Effects 0.000 abstract description 2
- 238000007796 conventional method Methods 0.000 abstract 1
- 238000001914 filtration Methods 0.000 abstract 1
- 238000011410 subtraction method Methods 0.000 abstract 1
- 239000004568 cement Substances 0.000 description 9
- 238000011156 evaluation Methods 0.000 description 7
- 230000002708 enhancing effect Effects 0.000 description 6
- 238000005728 strengthening Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001831 conversion spectrum Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005303 weighing Methods 0.000 description 2
- 201000004569 Blindness Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition. In single-channel voice enhancement, it is assumed that noised voice signals v(i) are additively relevant to noise signals n(i) and voice signals s(i), i.e., v(i)=n(i)+s(i), and noise-base information is obtained by training specific noise by use of a CNMF method; and then by taking a noise base as prior information, a voice base is obtained by decomposing noised voice by use of a CNMF_L1/2 method, and finally, voice after denoising is synthesized. According to the method, correlation of voice between frames can be better described; and strong-sparse constraining is performed on a voice-base coefficient matrix by use of L1/2 regular item, and the voice after separation comprises less residual noise. Compared to conventional methods such as a spectral subtraction method, a wiener filtering method and a minimum mean square deviation logarithm domain spectrum estimation method and the like, the voice after enhancement can be understood more easily.
Description
Technical field
The invention belongs to acoustics signal processing field, be specifically related to a kind of based on L1/2Sparse constraint convolution nonnegative matrix is divided
The speech de-noising method and system solved.
Background technology
Voice is the important carrier of daily exchange, but is often subject to various sound pollution in actual environment, makes people listen not
Clear content.Speech enhan-cement is by suppression or eliminates these noise jamming, extracts dry from contaminated voice signal as far as possible
Clean voice, thus obtain the voice that can understand.Speech enhancement technique is frequently used in speech recognition, voice coding and intelligent communication
In field.
Sound enhancement method based on Non-negative Matrix Factorization (Non-negative matrix factorization, NMF)
It is a kind of expression based on part, base vector and the coefficient square of characteristics of speech sounds can be obtained representing by decomposing voice signal
Battle array.At present, NMF i.e. Non-negative Matrix Factorization method is the emphasis that Many researchers is paid close attention to.The ultimate principle of NMF is
Fundamental matrix and corresponding coefficient matrix, calculate the fundamental matrix corresponding to each information source composition and coefficient according to cost function
Matrix, thus realize the separation of signal.According to the priori of known audio signal, NMF can be divided into blind signal model, prison
Superintend and direct model and half-blindness model, do not know apriori signals composition fundamental matrix the most completely it is known that all aliasing signal compositions basic
Matrix, with the fundamental matrix only knowing part aliasing signal composition.And choosing of cost function mainly includes signal before and after separation
Similarity and some restrictive condition two classes added according to the characteristic of handled signal.
Chinese patent CN201220541700.2 discloses a kind of audio frequency separation method based on NMF Non-negative Matrix Factorization,
Including supplementary music speech differentiation module and NMF Non-negative Matrix Factorization module, the method is ground by introducing this new mathematics of NMF
Study carefully achievement, in conjunction with the audio frequency characteristics difference of speech audio Yu music VF, can be preferably by voice sound in the middle of the audio frequency of mixing
Frequency separates with music VF, thus obtains the most clearly music VF and speech audio, in conjunction with NMF method and engineering
Learning method, classifies to audio frequency.
Convolution non-negative matrix factorization method (Convolutive non-negative matrix factorization,
CNMF), use the Non-negative Matrix Factorization sum of a succession of displacement to the Time Continuous information representing in signal, this decomposition method
The time-varying characteristics of voice signal can be described well.
Summary of the invention
It is an object of the invention to provide a kind of CNMF that voice signal is carried out when decomposing, it is added LqOpenness restriction,
To obtain better quality, intelligibility higher speech de-noising method and system.
To achieve these goals, the present invention is by the following technical solutions: a kind of based on L1/2Sparse constraint convolution non-negative square
Battle array decomposes the method for noisy speech denoising, it is characterised in that: assume that noisy speech signal v (i) is noise signal n (i) and voice
Signal s (i) additivity is uncorrelated, i.e. v (i)=n (i)+s (i), and the method for noisy speech denoising comprises the following steps:
Step 1: utilize CNMF method that specific noise is trained obtaining noise basis information:
Step 2: using noise basis as prior information, uses CNMF_L1/2Method carries out decomposition to noisy language and obtains voice
Base, is finally synthesizing the voice after denoising.
Described step 1 specifically includes following steps:
Step 1.1: noise is carried out Short Time Fourier Transform conversion and obtains its amplitude spectrum N;
Step 1.2: noise amplitude spectrum is carried out CNMF decomposition and obtains noise basis WnAnd the coefficient matrix H of correspondencen, decompose
Object function as follows:
Wherein, V is the noise amplitude spectrum matrix that band decomposes, and Λ is the convolution estimated value to V:
In formula (2), W (t) and H represents t basic matrix and coefficient matrix respectively,Represent and matrix moved to right t step by row,
Row benefit 0 is vacated on the left side.
In described step 1.2, target function type (1) is owing to respectively for being convex function for W and H, can alternately update W
And H, use gradient descent method to obtain its renewal equation:
Described step 2 specifically includes following steps:
Step 2.1: noisy speech is carried out STFT conversion, time-frequency domain obtain following nonnegative matrix and:
V=S+N (5)
Wherein, V, S, N are the amplitude spectrum matrix of noisy speech, clean speech and noise respectively, obtain speech manual simultaneously
Phase information;
Formula (5) the right is carried out convolution Non-negative Matrix Factorization, obtains:
Wherein, WsAnd HsRepresent voice base and the coefficient matrix of correspondence, WnAnd HnRepresent noise basis and the coefficient of correspondence thereof
Matrix;Step 2.2: noise basis W that integrating step 1.2 obtainsnThe amplitude spectrum matrix of noisy speech is carried out CNMF_L1/2Decompose
Obtain voice base Ws, voice base system number HsWith new noise basis coefficient matrixCNMF_L1/2Object function during decomposition is as follows:
Step 2.3: utilize the voice base W that step 2.2 obtainss, voice base system number HsAfter obtaining denoising with phase information synthesis
The amplitude spectrum S of voice, synthetic method is as follows:
Step 2.4: the amplitude spectrum S of voice after denoising is carried out inverse STFT conversion and obtains enhanced voice spectrum.
In described step 2.2, target function type (7) solves by being carried out by alternately update mode, it may be assumed that
1st step, fixing Wn、And Hs, update Ws;
2nd step, fixing Ws、WnWithUpdate Hs;
3rd step, fixing Ws、HsAnd Wn, update
Owing to formula (7) is all a convex function for above-mentioned each step, gradient descent method can be used to obtain more new regulation:
A kind of based on L1/2Sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising system, including:
STFT conversion module, obtains its amplitude for specific noise and noisy speech carry out Short Time Fourier Transform conversion
Spectrum;
Noise training module, is used for utilizing CNMF method to be trained specific noise obtaining noise basis information;
Speech decomposition module, for using training the noise basis obtained as prior information, using CNMF_L1/2Method is to containing
Language of making an uproar carries out decomposition and obtains voice base;
Voice synthetic module, the amplitude spectrum of voice after the voice base obtained and phase information synthesis obtain denoising;
Frequency spectrum modular converter, obtains enhanced voice frequency for the amplitude spectrum of voice after denoising carries out inverse STFT conversion
Spectrum.
Non-negative Matrix Factorization (Non-negative matrix factorization, NMF) is that a kind of special base divides
Solve, it is desirable to matrix all elements is all non-negative.I.e. for a given nonnegative matrix V ∈ R≥0,n×m, it can be entered by NMF
Row decomposes two nonnegative matrix W ∈ R of generation≥0,n×rWith H ∈ R≥0,r×mSo that V ≈ WH.It is generally recognised that V each in NMF
Row are separate, not in view of the time-varying characteristics of some signal (such as voice signal), i.e. past between adjacent column (frame)
Toward there being certain dependency.In order to describe this inter-frame relation, use the time that the NMF sum of a succession of displacement represents in signal
Continuous information, thus produces convolution non-negative matrix factorization method (Convolutive non-negative matrix
factorization,CNMF).CNMF is a kind of extension form of NMF,
The base that NMF and CNMF produces when decomposing voice signal amplitude spectrum matrix V tends to sparse, if on W or H
Increase an openness restriction, be possible not only to obtain more sparse base, and sparse degree and reconstructed error can be weighed.Due to
In NMF, every a line of coefficient matrix H and every string of basic matrix W are corresponding, if to H plus sparse restriction, then and can be corresponding
Produce the most sparse base W, theoretical according to sparse signal representation, it is possible on dictionary, to represent original letter with less base
Number.
The object function that H adds on CNMF openness restriction can be expressed as:
Wherein, λ ∈ R≥0It is a regularization parameter, is used for balancing sparse degree and reconstruction error;Q=0,1/2,1,2 point
Do not represent L0、L1/2、L1、L2Regularization.
To the restriction of H in (12) formula, it is desirable in the H obtained, element is 0 as far as possible, referred to as L0Regularization.But L0Canonical
Change is the np hard problem of a non-convex, can not find locally optimal solution.In order to solve this problem, it is proposed that a kind of use L1Regularization
Carry out approximate solution L0The method of regularization.CNMF adds L1Sparse constraint (hereinafter referred to as " L1_ CNMF ") it is expressed as follows:
Although L1And L2Regularization can change into convex optimization problem, but their solution might not be the most sparse.Special
Be not when 0 < q < when 1, LqRegularization can produce compares L1More sparse solution;When 1/2≤q < when 1, the least L of qqSolution the most sparse;And
When 1/2 < q < when 1, LqThe sparse degree solved does not has the biggest difference.
In speech de-noising processes, if it is possible to use and represent original voice signal than sparse base, then going
The stage of making an uproar can carry less noise, and the voice after so can making denoising becomes apparent from understanding.The coefficient matrix H of voice base is more
Sparse, more can utilize less voice basic weight structure primitive tone signal.The present invention uses more sparse L to H1/2Regularization constraint,
Target function type (12) can be rewritten as:
By when voice signal carries out CNMF decomposition, it being added LqOpenness restriction, obtains more sparse voice
Basis representation primitive sound.When synthesizing enhanced voice with the voice base obtained, less noise basis can be carried, obtain quality more
Good, the higher voice of intelligibility.
The inventive method can preferably portray the dependency of voice between frame;And use L1/2Regular terms is to voice base system
Matrix number carries out strong sparse constraint, can realize the voice packet after separating containing less residual noise.Compare and traditional method such as spectrum
Subtraction, Wiener Filter Method and Minimum Mean Square Error log-domain Power estimation method etc., more can improve the intelligibility of voice after enhancing.
Accompanying drawing explanation
Fig. 1 is the present invention flow chart to noisy speech signal denoising.
Fig. 2 is the flow chart of the noise training process of step 1.
Fig. 3 is that step 2 carries out the flow chart of voice after local behavior enhancing to noisy speech.
Fig. 4 is L1/2The CNMF denoising method of sparse constraint convergence curve on speech enhan-cement.
Fig. 5 is the coefficient matrix H and voice base W implementing enhancing stage voice signal, and (a) and (b) is CNMF_L respectively1
And CNMF_L1/2Decompose H and W during voice signal.
Fig. 6 is six kinds of distinct methods STOI values under different noise circumstances.
Fig. 7 is six kinds of methods SegSNR improvement values under different noise circumstances.
Detailed description of the invention
Below in conjunction with specific embodiments and the drawings, the present invention is described further.
The present invention is a kind of based on L1/2Sparse constraint convolution Non-negative Matrix Factorization (hereinafter referred to as " CNMF_L1/2") voice
Denoising method, Fig. 1 is the overview flow chart of denoising of the present invention.After overall input is certain particular type noise and mixed noise
Voice, wherein noise can be different types of (such as stationary noise, nonstationary noise etc.);It is output as the voice after denoising.
Fig. 2 is the flow chart of the noise training process of step 1.
Step 1.1, carries out Short Time Fourier Transform ((Short-Time Fourier Transform, STFT)) to noise
Conversion obtains its amplitude spectrum N.
Step 1.2, carries out CNMF decomposition to noise amplitude spectrum and obtains noise basis WnAnd the coefficient matrix H of correspondencen, decompose
Object function as follows:
Wherein, V is the noise amplitude spectrum matrix that band decomposes, and Λ is the convolution estimated value to V:
In formula (2), W (t) and H represents t basic matrix and coefficient matrix respectively,Represent and matrix moved to right t step by row,
Row benefit 0 is vacated on the left side.
Owing to formula (1) is respectively for being convex function for W and H, can alternately update W and H, use gradient descent method to obtain
Its renewal equation:
Fig. 3 is the flow chart that step 2 carries out the voice after local behavior denoising to noisy language, and single-channel voice strengthens logical
Often assume that noisy speech signal v (i) is that noise signal n (i) is uncorrelated with voice signal s (i) additivity, i.e. v (i)=n (i)+s
(i)。
Step 2.1, carries out STFT conversion to noisy speech, can time-frequency domain obtain following nonnegative matrix and:
V=S+N (5)
Wherein, V, S, N are the amplitude spectrum matrix of noisy speech, clean speech and noise respectively.Obtain speech manual simultaneously
Phase information.
Formula (5) the right is carried out convolution Non-negative Matrix Factorization, obtains
Wherein, WsAnd HsRepresent voice base and the coefficient of correspondence, WnAnd HnRepresent noise basis and the coefficient of correspondence thereof.
Step 2.2, noise basis W that combined training module obtainssThe amplitude spectrum matrix of noisy speech is carried out CNMF_L1/2
Decompose and obtain voice base Ws, voice base system number HsWith new noise basis coefficient matrixCNMF_L1/2Object function during decomposition
As follows:
Formula (7) needs to solve Ws、HsWith new noise basis coefficient matrixCan be carried out by alternately update mode for this,
I.e.
1st step, fixing Wn、And Hs, update Ws;
2nd step, fixing Ws、WnWithUpdate Hs;
3rd step, fixing Ws、HsAnd Wn, update
Owing to formula (7) is all a convex function for above-mentioned each step, gradient descent method can be used to obtain more new regulation:
Step 2.3, utilizes the voice base W that step 2 obtainss, voice base system number HsSynthesize with phase information and obtain language after denoising
The amplitude spectrum S of sound, synthetic method is as follows:
Step 4, carries out inverse STFT conversion and obtains enhanced voice spectrum the amplitude spectrum S of voice after denoising.
A kind of based on L1/2Sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising system, it is characterised in that including:
STFT conversion module, obtains its amplitude spectrum for specific noise and noisy speech carry out Short Time Fourier Transform conversion;Noise
Training module, is used for utilizing CNMF method to be trained specific noise obtaining noise basis information;Speech decomposition module, being used for will
The noise basis that training obtains, as prior information, uses CNMF_L1/2Method carries out decomposition to noisy language and obtains voice base;Language
Sound synthesis module, the amplitude spectrum of voice after the voice base obtained and phase information synthesis obtain denoising;Frequency spectrum modular converter,
Enhanced voice spectrum is obtained for the amplitude spectrum of voice after denoising being carried out inverse STFT conversion.
Beneficial effect by emulation experiment sunykatuib analysis noisy speech of the present invention denoising method:
Experimental Hardware is configured to Core i5 3.2GHz, internal memory 4G, and simulation software is Matlab2013a.In order to verify this
The effectiveness of method that invention is proposed, have chosen the language material in TIMIT sound bank as clean speech, respectively male voice 25
Sentence, female voice 25, every about 3s;The noise choosing NOISEX-92 noise storehouse again includes putting down as experimental data, the noise chosen
Steady noise and nonstationary noise, be tetra-kinds of noises of Babble, F16, White and M109 respectively.Clean speech and the standard of noise
It is sample rate 8kHz, precision 16bit.In experimentation, to clean speech, according to five kinds of different signal to noise ratios, (SNR is respectively with noise
For-5dB, 0dB, 5dB, 10dB) mix.When noise spectrum and noisy speech are composed and calculated, all signal is carried out sub-frame processing,
Frame length is 512 sampling points, and interframe overlap 50% carries out the discrete Fourier transform of 512 to every frame.
In order to compare the advantage of the present invention, use the CNMF method without sparse constraint, L1The CNMF method of sparse constraint with
And the L that the present invention proposes1/2The CNMF method of sparse constraint (is referred to as CNMF, CNMF_L individually below1、CNMF_L1/2) three kinds
Distinct methods compares.Simultaneously in order to verify the practicality of the put forward sound enhancement method of the present invention, also increase with three kinds of conventional voices
Strong method includes spectrum-subtraction, (divides below based on prior weight Wiener Filter Method, Minimum Mean Square Error log-domain spectral amplitude estimation method
Jian Chengwei PS, Wiener, logMMSE) contrast, compare the diversity in speech enhan-cement effect between them.
Parameter for method performance impact mainly has three: number R of time-frequency base, and the number of times Iter of method iteration is dilute
Sparse coefficient λ.Time-frequency base represents the feature of voice signal, and R chooses time-frequency in the number represented about time-frequency base, and experiment
The number value of base is empirically corresponding with the phoneme in voice.Every the clean speech used due to this experiment is all containing about 12
The voice of individual phoneme, so R chooses 12.On the iterations of method, Fig. 4 is L of the present invention1/2The CNMF method of sparse constraint
The convergence curve being used on speech enhan-cement, it can be seen that the value of object function is the most steady after 200 times, from experimental viewpoint
Confirm that method restrains, therefore in subsequent experimental, Iter is set to 200.Sparse coefficient λ is the weight of balance cost function and sparse degree
Want parameter, by adjusting λ value, the speech enhan-cement stage can be rolled between residual noise and reduction voice distortion removing
In.Voice base owes sparse or the most sparse all can make enhanced sound effect be deteriorated, and voice can under to different λ for the present invention
The experiment of degree of understanding obtains empirical value 0.01.
Experimental result
The present invention uses two kinds of evaluation criterions to carry out objective evaluation present invention performance on speech enhan-cement.The first is at language
Evaluation in sound intelligibility, for STOI (the Short-Time Objective of the intelligibility of speech after denoising
Intelligibility) score.The degree that in voice, information loaded is understood mainly is investigated in intelligibility of speech evaluation.STOI is
A kind of intelligibility measurement index, for weighing the intelligibility performance of sound enhancement method, and the value of STOI is in (0,1) scope
In, it is worth the biggest, after showing to strengthen, the intelligibility of voice is the highest.
The second is the evaluation in voice quality, for the segmental signal-to-noise ratio SegSNR (Segmental of voice after denoising
SNR).Voice quality assessment mainly investigates the audition comfort level of voice, naturalness and joyful degree.SegSNR is speech enhan-cement side
A kind of objective effective evaluation that method is conventional, is mainly used in weighing after strengthening voice relative to the wave distortion journey of clean speech
Degree.
The present invention uses L1/2Regularization constraint, with L1Constraint is compared, when decomposing noisy speech, due at object function
In, coefficient matrix H is made L1/2Openness restriction, can produce more sparse H, thus obtain more when decomposing voice signal
Add sparse voice base.In order to verify CNMF_L1/2Effect on openness, will strengthen the coefficient matrix H of stage voice signal
And voice base W shows in Figure 5, Fig. 5 (a) and (b) are CNMF_L respectively1And CNMF_L1/2Decompose voice signal time H and
W, the top half of figure is coefficient matrix H, and the latter half is 12 voice bases.From fig. 5, it can be seen that and CNMF_L1And CNMF_
L1/2H more sparse.
Fig. 6 is six kinds of distinct methods STOI values under different noise circumstances, and wherein UN represents original noisy speech.From figure
In 6 it will be seen that with two kinds of similar methods CNMF and CNMF_L1Compare, the CNMF_L of the present invention1/2Method has on STOI
Have superiority, illustrate that the noise basis carried when using the most sparse voice basis representation clean speech during denoising is the fewest, because of
After this enhancing, the intelligibility of voice is the highest.Meanwhile, compared with classical sound enhancement method PS, Wiener, logMMSE, this
The STOI value of inventive method promotes substantially, particularly has higher intelligibility when below 0dB.This is because the inventive method
The dictionary learning first passing through supervision obtains noise dictionary, when strengthening stage decomposition noisy speech, and the aspect ratio to noise
More sensitive and insensitive to the energy of noise, therefore more noise basis can be isolated, the enhancing intelligibility of speech obtained is the best.
Fig. 7 is that six kinds of methods SegSNR improvement values under different noise circumstances (utilizes the SegSNR of voice after strengthening to subtract
The SegSNR strengthening front voice is gone to be calculated), the value of improvement is the biggest, and voice quality is the best.It can be seen from figure 7 that increasing
After strong in the quality of voice, CNMF_L1/2Method is better than similar CNMF and CNMF_L1Two kinds of methods.Compared with classical way,
The slightly advantage when low signal-to-noise ratio, during high s/n ratio, effect is slightly not as classical way.When this is likely due to high s/n ratio, noisy
Noise basis in voice is less, although combines the noise dictionary of training, but also results in certain voice base during burbling noise
It is pulled away, the degradation of voice after causing strengthening.
Performance evaluation
CNMF_L1/2Sound enhancement method needs to be trained noise in advance obtaining noise dictionary as prior information, instruction
When practicing noise, an iteration is time-consumingly about 0.1s.L is used during strengthening1/2Noisy speech is carried out point by the CNMF of instruction constraint
Solving, solution procedure uses the property taken advantage of more new regulation, and the time-consuming of an iteration is 0.15s.
Additionally, due to CNMF_L1/2Method is a kind of method based on dictionary learning, is first made an uproar in speech enhan-cement
Sound dictionary, therefore more sensitive to the characteristic information of noise, when decomposing noisy speech, it is possible to more effectively burbling noise base.
When low signal-to-noise ratio, more noise basis can be isolated;When high s/n ratio, then certain voice base can be caused to lose.
Conclusion
The present invention proposes a kind of based on L1/2The single-channel voice Enhancement Method of sparse constraint CNMF.CNMF_L1/2Method
Make full use of convolution nonnegative matrix describe voice signal time response advantage on, use L1/2Coefficient matrix H is entered by regular terms
Row sparse constraint.By using the more preferable L of sparse performance1/2Item is decomposed in constraint, can obtain more sparse voice base and represent
Clean speech, and use herein and have the sound enhancement method of supervision, the noise basis obtained by the noise training stage, more can have
Effect ground helps enhancing stage voice signal to separate with noise signal.Emulate under four class difference noises and five kinds of signal to noise ratios
Experiment, demonstrates CNMF_L1/2Method has good performance in speech enhan-cement.From experimental section it is found that this method exists
Effect under low signal-to-noise ratio environment is pretty good, and following work will study CNMF_L under Low SNR further1/2Fitting of method
The property used.It addition, also by sparse coefficient λ based on experience is carried out deeper into research.
The above, be only presently preferred embodiments of the present invention, and the present invention not does any type of restriction.Every
Any simple modification, equivalent variations and modification above example made according to technology and the method essence of the present invention, the most still
Belong in the range of technology and the method scheme of the present invention.
Claims (6)
1. one kind based on L1/2The method of sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising, it is characterised in that: assume to contain
Noisy speech signal v (i) is that noise signal n (i) is uncorrelated with voice signal s (i) additivity, i.e. v (i)=n (i)+s (i), noisy language
The method of sound denoising comprises the following steps:
Step 1: utilize CNMF method that specific noise is trained obtaining noise basis information:
Step 2: using noise basis as prior information, uses CNMF_L1/2Method carries out decomposition to noisy language and obtains voice base,
It is finally synthesizing the voice after denoising.
The method of noisy speech denoising the most according to claim 1, it is characterised in that: described step 1 specifically includes following step
Rapid:
Step 1.1: noise is carried out Short Time Fourier Transform conversion and obtains its amplitude spectrum N;
Step 1.2: noise amplitude spectrum is carried out CNMF decomposition and obtains noise basis WnAnd the coefficient matrix H of correspondencen, the mesh of decomposition
Scalar functions is as follows:
Wherein, V is the noise amplitude spectrum matrix that band decomposes, and Λ is the convolution estimated value to V:
In formula (2), W (t) and H represents t basic matrix and coefficient matrix respectively,Represent and matrix is moved to right t step, the left side by row
Vacate row benefit 0.
The method of noisy speech denoising the most according to claim 2, it is characterised in that: target function type in described step 1.2
(1) owing to respectively for being convex function for W and H, can alternately update W and H, gradient descent method is used to obtain its renewal side
Journey:
The method of noisy speech denoising the most according to claim 1, it is characterised in that: described step 2 specifically includes following step
Rapid:
Step 2.1: noisy speech is carried out STFT conversion, time-frequency domain obtain following nonnegative matrix and:
V=S+N (5)
Wherein, V, S, N are the amplitude spectrum matrix of noisy speech, clean speech and noise respectively, obtain the phase place of speech manual simultaneously
Information;
Formula (5) the right is carried out convolution Non-negative Matrix Factorization, obtains:
Wherein, WsAnd HsRepresent voice base and the coefficient matrix of correspondence, WnAnd HnRepresent noise basis and the coefficient matrix of correspondence thereof;
Step 2.2: noise basis W that integrating step 1.2 obtainsnThe amplitude spectrum matrix of noisy speech is carried out CNMF_L1/2Decompose
To voice base Ws, voice base system number HsWith new noise basis coefficient matrixCNMF_L1/2Object function during decomposition is as follows:
Step 2.3: utilize the voice base W that step 2.2 obtainss, voice base system number HsSynthesize with phase information and obtain voice after denoising
Amplitude spectrum S, synthetic method is as follows:
Step 2.4: the amplitude spectrum S of voice after denoising is carried out inverse STFT conversion and obtains enhanced voice spectrum.
The method of the most described noisy speech denoising, it is characterised in that: target letter in described step 2.2
Numerical expression (7) solve by being carried out by alternately update mode, it may be assumed that
1st step, fixing Wn、And Hs, update Ws;
2nd step, fixing Ws、WnWithUpdate Hs;
3rd step, fixing Ws、HsAnd Wn, update
Owing to formula (7) is all a convex function for above-mentioned each step, gradient descent method can be used to obtain more new regulation:
6. one kind based on L1/2Sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising system, it is characterised in that including:
STFT conversion module, obtains its amplitude spectrum for specific noise and noisy speech carry out Short Time Fourier Transform conversion;
Noise training module, is used for utilizing CNMF method to be trained specific noise obtaining noise basis information;
Speech decomposition module, for using training the noise basis obtained as prior information, using CNMF_L1/2Method is to noisy language
Sound carries out decomposition and obtains voice base;
Voice synthetic module, the amplitude spectrum of voice after the voice base obtained and phase information synthesis obtain denoising;
Frequency spectrum modular converter, obtains enhanced voice spectrum for the amplitude spectrum of voice after denoising carries out inverse STFT conversion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610452012.7A CN105957537B (en) | 2016-06-20 | 2016-06-20 | One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610452012.7A CN105957537B (en) | 2016-06-20 | 2016-06-20 | One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105957537A true CN105957537A (en) | 2016-09-21 |
CN105957537B CN105957537B (en) | 2019-10-08 |
Family
ID=56907052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610452012.7A Active CN105957537B (en) | 2016-06-20 | 2016-06-20 | One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105957537B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106597439A (en) * | 2016-12-12 | 2017-04-26 | 电子科技大学 | Synthetic aperture radar target identification method based on incremental learning |
CN107301382A (en) * | 2017-06-06 | 2017-10-27 | 西安电子科技大学 | The Activity recognition method of lower depth Non-negative Matrix Factorization is constrained based on Time Dependent |
CN108574911A (en) * | 2017-03-09 | 2018-09-25 | 中国科学院声学研究所 | The unsupervised single microphone voice de-noising method of one kind and system |
CN108573711A (en) * | 2017-03-09 | 2018-09-25 | 中国科学院声学研究所 | A kind of single microphone speech separating method based on NMF algorithms |
CN108573698A (en) * | 2017-03-09 | 2018-09-25 | 中国科学院声学研究所 | A kind of voice de-noising method based on gender fuse information |
CN108962229A (en) * | 2018-07-26 | 2018-12-07 | 汕头大学 | A kind of target speaker's voice extraction method based on single channel, unsupervised formula |
CN109448749A (en) * | 2018-12-19 | 2019-03-08 | 中国科学院自动化研究所 | Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing |
CN110060699A (en) * | 2019-05-21 | 2019-07-26 | 哈尔滨工程大学 | A kind of single channel speech separating method based on the sparse expansion of depth |
CN110189761A (en) * | 2019-05-21 | 2019-08-30 | 哈尔滨工程大学 | A kind of single channel speech dereverberation method based on greedy depth dictionary learning |
CN110222781A (en) * | 2019-06-12 | 2019-09-10 | 成都嗨翻屋科技有限公司 | Audio denoising method, device, user terminal and storage medium |
CN110428848A (en) * | 2019-06-20 | 2019-11-08 | 西安电子科技大学 | A kind of sound enhancement method based on the prediction of public space speech model |
CN111276154A (en) * | 2020-02-26 | 2020-06-12 | 中国电子科技集团公司第三研究所 | Wind noise suppression method and system and shot sound detection method and system |
CN111710343A (en) * | 2020-06-03 | 2020-09-25 | 中国科学技术大学 | Single-channel voice separation method on double transform domains |
CN111726723A (en) * | 2020-06-23 | 2020-09-29 | 声耕智能科技(西安)研究院有限公司 | Wireless talkback noise reduction earmuff and noise reduction method based on bone conduction technology |
CN112558757A (en) * | 2020-11-20 | 2021-03-26 | 中国科学院宁波材料技术与工程研究所慈溪生物医学工程研究所 | Muscle collaborative extraction method based on smooth constraint non-negative matrix factorization |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050222840A1 (en) * | 2004-03-12 | 2005-10-06 | Paris Smaragdis | Method and system for separating multiple sound sources from monophonic input with non-negative matrix factor deconvolution |
CN101441872A (en) * | 2007-11-19 | 2009-05-27 | 三菱电机株式会社 | Denoising acoustic signals using constrained non-negative matrix factorization |
US20140229168A1 (en) * | 2013-02-08 | 2014-08-14 | Asustek Computer Inc. | Method and apparatus for audio signal enhancement in reverberant environment |
-
2016
- 2016-06-20 CN CN201610452012.7A patent/CN105957537B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050222840A1 (en) * | 2004-03-12 | 2005-10-06 | Paris Smaragdis | Method and system for separating multiple sound sources from monophonic input with non-negative matrix factor deconvolution |
CN101441872A (en) * | 2007-11-19 | 2009-05-27 | 三菱电机株式会社 | Denoising acoustic signals using constrained non-negative matrix factorization |
US20140229168A1 (en) * | 2013-02-08 | 2014-08-14 | Asustek Computer Inc. | Method and apparatus for audio signal enhancement in reverberant environment |
Non-Patent Citations (2)
Title |
---|
孙健等: "基于卷积非负矩阵分解的语音转换方法", 《数据采集与处理》 * |
张立伟等: "稀疏卷积非负矩阵分解的语音增强算法", 《数据采集与处理》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106597439A (en) * | 2016-12-12 | 2017-04-26 | 电子科技大学 | Synthetic aperture radar target identification method based on incremental learning |
CN108574911B (en) * | 2017-03-09 | 2019-10-22 | 中国科学院声学研究所 | The unsupervised single microphone voice de-noising method of one kind and system |
CN108574911A (en) * | 2017-03-09 | 2018-09-25 | 中国科学院声学研究所 | The unsupervised single microphone voice de-noising method of one kind and system |
CN108573711A (en) * | 2017-03-09 | 2018-09-25 | 中国科学院声学研究所 | A kind of single microphone speech separating method based on NMF algorithms |
CN108573698A (en) * | 2017-03-09 | 2018-09-25 | 中国科学院声学研究所 | A kind of voice de-noising method based on gender fuse information |
CN108573698B (en) * | 2017-03-09 | 2021-06-08 | 中国科学院声学研究所 | Voice noise reduction method based on gender fusion information |
CN107301382A (en) * | 2017-06-06 | 2017-10-27 | 西安电子科技大学 | The Activity recognition method of lower depth Non-negative Matrix Factorization is constrained based on Time Dependent |
CN107301382B (en) * | 2017-06-06 | 2020-05-19 | 西安电子科技大学 | Behavior identification method based on deep nonnegative matrix factorization under time dependence constraint |
CN108962229A (en) * | 2018-07-26 | 2018-12-07 | 汕头大学 | A kind of target speaker's voice extraction method based on single channel, unsupervised formula |
CN108962229B (en) * | 2018-07-26 | 2020-11-13 | 汕头大学 | Single-channel and unsupervised target speaker voice extraction method |
CN109448749A (en) * | 2018-12-19 | 2019-03-08 | 中国科学院自动化研究所 | Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing |
CN109448749B (en) * | 2018-12-19 | 2022-02-15 | 中国科学院自动化研究所 | Voice extraction method, system and device based on supervised learning auditory attention |
CN110189761A (en) * | 2019-05-21 | 2019-08-30 | 哈尔滨工程大学 | A kind of single channel speech dereverberation method based on greedy depth dictionary learning |
CN110189761B (en) * | 2019-05-21 | 2021-03-30 | 哈尔滨工程大学 | Single-channel speech dereverberation method based on greedy depth dictionary learning |
CN110060699A (en) * | 2019-05-21 | 2019-07-26 | 哈尔滨工程大学 | A kind of single channel speech separating method based on the sparse expansion of depth |
CN110222781A (en) * | 2019-06-12 | 2019-09-10 | 成都嗨翻屋科技有限公司 | Audio denoising method, device, user terminal and storage medium |
CN110428848B (en) * | 2019-06-20 | 2021-10-29 | 西安电子科技大学 | Speech enhancement method based on public space speech model prediction |
CN110428848A (en) * | 2019-06-20 | 2019-11-08 | 西安电子科技大学 | A kind of sound enhancement method based on the prediction of public space speech model |
CN111276154A (en) * | 2020-02-26 | 2020-06-12 | 中国电子科技集团公司第三研究所 | Wind noise suppression method and system and shot sound detection method and system |
CN111276154B (en) * | 2020-02-26 | 2022-12-09 | 中国电子科技集团公司第三研究所 | Wind noise suppression method and system and shot sound detection method and system |
CN111710343A (en) * | 2020-06-03 | 2020-09-25 | 中国科学技术大学 | Single-channel voice separation method on double transform domains |
CN111710343B (en) * | 2020-06-03 | 2022-09-30 | 中国科学技术大学 | Single-channel voice separation method on double transform domains |
CN111726723A (en) * | 2020-06-23 | 2020-09-29 | 声耕智能科技(西安)研究院有限公司 | Wireless talkback noise reduction earmuff and noise reduction method based on bone conduction technology |
CN112558757A (en) * | 2020-11-20 | 2021-03-26 | 中国科学院宁波材料技术与工程研究所慈溪生物医学工程研究所 | Muscle collaborative extraction method based on smooth constraint non-negative matrix factorization |
CN112558757B (en) * | 2020-11-20 | 2022-08-23 | 中国科学院宁波材料技术与工程研究所慈溪生物医学工程研究所 | Muscle collaborative extraction method based on smooth constraint non-negative matrix factorization |
Also Published As
Publication number | Publication date |
---|---|
CN105957537B (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105957537A (en) | Voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition | |
CN111223493B (en) | Voice signal noise reduction processing method, microphone and electronic equipment | |
CN108447495B (en) | Deep learning voice enhancement method based on comprehensive feature set | |
CN105023580B (en) | Unsupervised noise estimation based on separable depth automatic coding and sound enhancement method | |
CN111081268A (en) | Phase-correlated shared deep convolutional neural network speech enhancement method | |
CN105488466B (en) | A kind of deep-neural-network and Acoustic Object vocal print feature extracting method | |
CN107845389A (en) | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks | |
CN109215674A (en) | Real-time voice Enhancement Method | |
CN111128209B (en) | Speech enhancement method based on mixed masking learning target | |
Srinivasan et al. | Transforming binary uncertainties for robust speech recognition | |
CN107967920A (en) | A kind of improved own coding neutral net voice enhancement algorithm | |
CN111899757B (en) | Single-channel voice separation method and system for target speaker extraction | |
CN106653056A (en) | Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof | |
CN103117059A (en) | Voice signal characteristics extracting method based on tensor decomposition | |
CN111192598A (en) | Voice enhancement method for jump connection deep neural network | |
CN112259119B (en) | Music source separation method based on stacked hourglass network | |
CN111724806B (en) | Double-visual-angle single-channel voice separation method based on deep neural network | |
CN110189766A (en) | A kind of voice style transfer method neural network based | |
CN112885368A (en) | Multi-band spectral subtraction vibration signal denoising method based on improved capsule network | |
CN106356058A (en) | Robust speech recognition method based on multi-band characteristic compensation | |
Wiem et al. | Unsupervised single channel speech separation based on optimized subspace separation | |
Saleem et al. | On improvement of speech intelligibility and quality: A survey of unsupervised single channel speech enhancement algorithms | |
CN103345920A (en) | Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation | |
Fan et al. | A regression approach to binaural speech segregation via deep neural network | |
CN103886859A (en) | Voice conversion method based on one-to-many codebook mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |