CN105957537A - Voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition - Google Patents

Voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition Download PDF

Info

Publication number
CN105957537A
CN105957537A CN201610452012.7A CN201610452012A CN105957537A CN 105957537 A CN105957537 A CN 105957537A CN 201610452012 A CN201610452012 A CN 201610452012A CN 105957537 A CN105957537 A CN 105957537A
Authority
CN
China
Prior art keywords
voice
centerdot
noise
sigma
overbar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610452012.7A
Other languages
Chinese (zh)
Other versions
CN105957537B (en
Inventor
周健
路成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN201610452012.7A priority Critical patent/CN105957537B/en
Publication of CN105957537A publication Critical patent/CN105957537A/en
Application granted granted Critical
Publication of CN105957537B publication Critical patent/CN105957537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition. In single-channel voice enhancement, it is assumed that noised voice signals v(i) are additively relevant to noise signals n(i) and voice signals s(i), i.e., v(i)=n(i)+s(i), and noise-base information is obtained by training specific noise by use of a CNMF method; and then by taking a noise base as prior information, a voice base is obtained by decomposing noised voice by use of a CNMF_L1/2 method, and finally, voice after denoising is synthesized. According to the method, correlation of voice between frames can be better described; and strong-sparse constraining is performed on a voice-base coefficient matrix by use of L1/2 regular item, and the voice after separation comprises less residual noise. Compared to conventional methods such as a spectral subtraction method, a wiener filtering method and a minimum mean square deviation logarithm domain spectrum estimation method and the like, the voice after enhancement can be understood more easily.

Description

A kind of based on L1/2The speech de-noising method of sparse constraint convolution Non-negative Matrix Factorization and System
Technical field
The invention belongs to acoustics signal processing field, be specifically related to a kind of based on L1/2Sparse constraint convolution nonnegative matrix is divided The speech de-noising method and system solved.
Background technology
Voice is the important carrier of daily exchange, but is often subject to various sound pollution in actual environment, makes people listen not Clear content.Speech enhan-cement is by suppression or eliminates these noise jamming, extracts dry from contaminated voice signal as far as possible Clean voice, thus obtain the voice that can understand.Speech enhancement technique is frequently used in speech recognition, voice coding and intelligent communication In field.
Sound enhancement method based on Non-negative Matrix Factorization (Non-negative matrix factorization, NMF) It is a kind of expression based on part, base vector and the coefficient square of characteristics of speech sounds can be obtained representing by decomposing voice signal Battle array.At present, NMF i.e. Non-negative Matrix Factorization method is the emphasis that Many researchers is paid close attention to.The ultimate principle of NMF is Fundamental matrix and corresponding coefficient matrix, calculate the fundamental matrix corresponding to each information source composition and coefficient according to cost function Matrix, thus realize the separation of signal.According to the priori of known audio signal, NMF can be divided into blind signal model, prison Superintend and direct model and half-blindness model, do not know apriori signals composition fundamental matrix the most completely it is known that all aliasing signal compositions basic Matrix, with the fundamental matrix only knowing part aliasing signal composition.And choosing of cost function mainly includes signal before and after separation Similarity and some restrictive condition two classes added according to the characteristic of handled signal.
Chinese patent CN201220541700.2 discloses a kind of audio frequency separation method based on NMF Non-negative Matrix Factorization, Including supplementary music speech differentiation module and NMF Non-negative Matrix Factorization module, the method is ground by introducing this new mathematics of NMF Study carefully achievement, in conjunction with the audio frequency characteristics difference of speech audio Yu music VF, can be preferably by voice sound in the middle of the audio frequency of mixing Frequency separates with music VF, thus obtains the most clearly music VF and speech audio, in conjunction with NMF method and engineering Learning method, classifies to audio frequency.
Convolution non-negative matrix factorization method (Convolutive non-negative matrix factorization, CNMF), use the Non-negative Matrix Factorization sum of a succession of displacement to the Time Continuous information representing in signal, this decomposition method The time-varying characteristics of voice signal can be described well.
Summary of the invention
It is an object of the invention to provide a kind of CNMF that voice signal is carried out when decomposing, it is added LqOpenness restriction, To obtain better quality, intelligibility higher speech de-noising method and system.
To achieve these goals, the present invention is by the following technical solutions: a kind of based on L1/2Sparse constraint convolution non-negative square Battle array decomposes the method for noisy speech denoising, it is characterised in that: assume that noisy speech signal v (i) is noise signal n (i) and voice Signal s (i) additivity is uncorrelated, i.e. v (i)=n (i)+s (i), and the method for noisy speech denoising comprises the following steps:
Step 1: utilize CNMF method that specific noise is trained obtaining noise basis information:
Step 2: using noise basis as prior information, uses CNMF_L1/2Method carries out decomposition to noisy language and obtains voice Base, is finally synthesizing the voice after denoising.
Described step 1 specifically includes following steps:
Step 1.1: noise is carried out Short Time Fourier Transform conversion and obtains its amplitude spectrum N;
Step 1.2: noise amplitude spectrum is carried out CNMF decomposition and obtains noise basis WnAnd the coefficient matrix H of correspondencen, decompose Object function as follows:
D ( V | Λ ) = 1 2 | | V - Λ | | 2 - - - ( 1 )
Wherein, V is the noise amplitude spectrum matrix that band decomposes, and Λ is the convolution estimated value to V:
Λ = Σ t = 0 T 0 - 1 W n ( t ) H n t → - - - ( 2 )
In formula (2), W (t) and H represents t basic matrix and coefficient matrix respectively,Represent and matrix moved to right t step by row, Row benefit 0 is vacated on the left side.
In described step 1.2, target function type (1) is owing to respectively for being convex function for W and H, can alternately update W And H, use gradient descent method to obtain its renewal equation:
W i k n ( t ) ← W i k n ( t ) · Σ k = 1 T ( V i j + W ‾ i k n ( t ) · W ‾ k i n ( t ) · Λ i j ) · H j k n t → Σ k = 1 T ( Λ i j + W ‾ i k n ( t ) · W ‾ k i n ( t ) · V i j ) · H j k n t → - - - ( 3 )
H k j n ← H k j n · Σ i = 1 M ( W ‾ k i n ( t ) · V i j ) Σ i = 1 M ( W ‾ k i n ( t ) · Λ ← t i j ) - - - ( 4 )
Described step 2 specifically includes following steps:
Step 2.1: noisy speech is carried out STFT conversion, time-frequency domain obtain following nonnegative matrix and:
V=S+N (5)
Wherein, V, S, N are the amplitude spectrum matrix of noisy speech, clean speech and noise respectively, obtain speech manual simultaneously Phase information;
Formula (5) the right is carried out convolution Non-negative Matrix Factorization, obtains:
V = Σ t = 0 T - 1 W t n W t s H n t → H s t → = Σ t = 0 T - 1 Σ k = 1 R W i k n ( t ) H k j n t → + Σ t = 0 T - 1 Σ k = 1 R W i k s ( t ) H k j s t → - - - ( 6 )
Wherein, WsAnd HsRepresent voice base and the coefficient matrix of correspondence, WnAnd HnRepresent noise basis and the coefficient of correspondence thereof Matrix;Step 2.2: noise basis W that integrating step 1.2 obtainsnThe amplitude spectrum matrix of noisy speech is carried out CNMF_L1/2Decompose Obtain voice base Ws, voice base system number HsWith new noise basis coefficient matrixCNMF_L1/2Object function during decomposition is as follows:
D ( V | W n , W s , H n , H s ) = 1 2 | | V - Σ t = 0 T - 1 W n ( t ) H n t → - Σ t = 0 T - 1 W s ( t ) H s t → | | 2 + λ | | H s | | 1 / 2 1 / 2 - - - ( 7 )
Step 2.3: utilize the voice base W that step 2.2 obtainss, voice base system number HsAfter obtaining denoising with phase information synthesis The amplitude spectrum S of voice, synthetic method is as follows:
S = Σ t = 0 T - 1 W s ( t ) H s t → - - - ( 8 )
Step 2.4: the amplitude spectrum S of voice after denoising is carried out inverse STFT conversion and obtains enhanced voice spectrum.
In described step 2.2, target function type (7) solves by being carried out by alternately update mode, it may be assumed that
1st step, fixing WnAnd Hs, update Ws
2nd step, fixing Ws、WnWithUpdate Hs
3rd step, fixing Ws、HsAnd Wn, update
Owing to formula (7) is all a convex function for above-mentioned each step, gradient descent method can be used to obtain more new regulation:
W i k s ( t ) ← W i k s ( t ) · Σ k = 1 T ( V i j + W ‾ i k s ( t ) · W ‾ k i s ( t ) · Λ i j ) · H j k s t → Σ k = 1 T ( Λ i j + W ‾ i k s ( t ) · W ‾ k i s ( t ) · V i j ) · H j k s t → - - - ( 9 )
H k j s ← H k j s · Σ i = 1 M ( W ‾ k i s ( t ) · V i j ) Σ i = 1 M ( W ‾ k i s ( t ) Λ ← t i j + λ 2 ( H k j s ) ( - 1 / 2 ) ) - - - ( 10 )
H ‾ k j n ← H ‾ k j n · Σ i = 1 M ( W ‾ i k n ( t ) ) · V i j Σ i = 1 M ( W ‾ i k n ( t ) ) · Λ ← t i j - - - ( 11 )
A kind of based on L1/2Sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising system, including:
STFT conversion module, obtains its amplitude for specific noise and noisy speech carry out Short Time Fourier Transform conversion Spectrum;
Noise training module, is used for utilizing CNMF method to be trained specific noise obtaining noise basis information;
Speech decomposition module, for using training the noise basis obtained as prior information, using CNMF_L1/2Method is to containing Language of making an uproar carries out decomposition and obtains voice base;
Voice synthetic module, the amplitude spectrum of voice after the voice base obtained and phase information synthesis obtain denoising;
Frequency spectrum modular converter, obtains enhanced voice frequency for the amplitude spectrum of voice after denoising carries out inverse STFT conversion Spectrum.
Non-negative Matrix Factorization (Non-negative matrix factorization, NMF) is that a kind of special base divides Solve, it is desirable to matrix all elements is all non-negative.I.e. for a given nonnegative matrix V ∈ R≥0,n×m, it can be entered by NMF Row decomposes two nonnegative matrix W ∈ R of generation≥0,n×rWith H ∈ R≥0,r×mSo that V ≈ WH.It is generally recognised that V each in NMF Row are separate, not in view of the time-varying characteristics of some signal (such as voice signal), i.e. past between adjacent column (frame) Toward there being certain dependency.In order to describe this inter-frame relation, use the time that the NMF sum of a succession of displacement represents in signal Continuous information, thus produces convolution non-negative matrix factorization method (Convolutive non-negative matrix factorization,CNMF).CNMF is a kind of extension form of NMF,
The base that NMF and CNMF produces when decomposing voice signal amplitude spectrum matrix V tends to sparse, if on W or H Increase an openness restriction, be possible not only to obtain more sparse base, and sparse degree and reconstructed error can be weighed.Due to In NMF, every a line of coefficient matrix H and every string of basic matrix W are corresponding, if to H plus sparse restriction, then and can be corresponding Produce the most sparse base W, theoretical according to sparse signal representation, it is possible on dictionary, to represent original letter with less base Number.
The object function that H adds on CNMF openness restriction can be expressed as:
L ( V | | Λ ) = 1 2 | | V - Λ | | 2 + λ | | H | | q q - - - ( 12 )
Wherein, λ ∈ R≥0It is a regularization parameter, is used for balancing sparse degree and reconstruction error;Q=0,1/2,1,2 point Do not represent L0、L1/2、L1、L2Regularization.
To the restriction of H in (12) formula, it is desirable in the H obtained, element is 0 as far as possible, referred to as L0Regularization.But L0Canonical Change is the np hard problem of a non-convex, can not find locally optimal solution.In order to solve this problem, it is proposed that a kind of use L1Regularization Carry out approximate solution L0The method of regularization.CNMF adds L1Sparse constraint (hereinafter referred to as " L1_ CNMF ") it is expressed as follows:
L ( V | | Λ ) = 1 2 | | V - Λ | | 2 + λ | | H | | 1 1 - - - ( 13 )
Although L1And L2Regularization can change into convex optimization problem, but their solution might not be the most sparse.Special Be not when 0 < q < when 1, LqRegularization can produce compares L1More sparse solution;When 1/2≤q < when 1, the least L of qqSolution the most sparse;And When 1/2 < q < when 1, LqThe sparse degree solved does not has the biggest difference.
In speech de-noising processes, if it is possible to use and represent original voice signal than sparse base, then going The stage of making an uproar can carry less noise, and the voice after so can making denoising becomes apparent from understanding.The coefficient matrix H of voice base is more Sparse, more can utilize less voice basic weight structure primitive tone signal.The present invention uses more sparse L to H1/2Regularization constraint, Target function type (12) can be rewritten as:
L ( V | | &Lambda; ) = 1 2 | | V - &Lambda; | | 2 + &lambda; | | H | | 1 / 2 1 / 2 - - - ( 14 )
By when voice signal carries out CNMF decomposition, it being added LqOpenness restriction, obtains more sparse voice Basis representation primitive sound.When synthesizing enhanced voice with the voice base obtained, less noise basis can be carried, obtain quality more Good, the higher voice of intelligibility.
The inventive method can preferably portray the dependency of voice between frame;And use L1/2Regular terms is to voice base system Matrix number carries out strong sparse constraint, can realize the voice packet after separating containing less residual noise.Compare and traditional method such as spectrum Subtraction, Wiener Filter Method and Minimum Mean Square Error log-domain Power estimation method etc., more can improve the intelligibility of voice after enhancing.
Accompanying drawing explanation
Fig. 1 is the present invention flow chart to noisy speech signal denoising.
Fig. 2 is the flow chart of the noise training process of step 1.
Fig. 3 is that step 2 carries out the flow chart of voice after local behavior enhancing to noisy speech.
Fig. 4 is L1/2The CNMF denoising method of sparse constraint convergence curve on speech enhan-cement.
Fig. 5 is the coefficient matrix H and voice base W implementing enhancing stage voice signal, and (a) and (b) is CNMF_L respectively1 And CNMF_L1/2Decompose H and W during voice signal.
Fig. 6 is six kinds of distinct methods STOI values under different noise circumstances.
Fig. 7 is six kinds of methods SegSNR improvement values under different noise circumstances.
Detailed description of the invention
Below in conjunction with specific embodiments and the drawings, the present invention is described further.
The present invention is a kind of based on L1/2Sparse constraint convolution Non-negative Matrix Factorization (hereinafter referred to as " CNMF_L1/2") voice Denoising method, Fig. 1 is the overview flow chart of denoising of the present invention.After overall input is certain particular type noise and mixed noise Voice, wherein noise can be different types of (such as stationary noise, nonstationary noise etc.);It is output as the voice after denoising.
Fig. 2 is the flow chart of the noise training process of step 1.
Step 1.1, carries out Short Time Fourier Transform ((Short-Time Fourier Transform, STFT)) to noise Conversion obtains its amplitude spectrum N.
Step 1.2, carries out CNMF decomposition to noise amplitude spectrum and obtains noise basis WnAnd the coefficient matrix H of correspondencen, decompose Object function as follows:
D ( V | &Lambda; ) = 1 2 | | V - &Lambda; | | 2 - - - ( 1 )
Wherein, V is the noise amplitude spectrum matrix that band decomposes, and Λ is the convolution estimated value to V:
&Lambda; = &Sigma; t = 0 T 0 - 1 W n ( t ) H n t &RightArrow; - - - ( 2 )
In formula (2), W (t) and H represents t basic matrix and coefficient matrix respectively,Represent and matrix moved to right t step by row, Row benefit 0 is vacated on the left side.
Owing to formula (1) is respectively for being convex function for W and H, can alternately update W and H, use gradient descent method to obtain Its renewal equation:
W i k n ( t ) &LeftArrow; W i k n ( t ) &CenterDot; &Sigma; k = 1 T ( V i j + W &OverBar; i k n ( t ) &CenterDot; W &OverBar; k i n ( t ) &CenterDot; &Lambda; i j ) &CenterDot; H j k n t &RightArrow; &Sigma; k = 1 T ( &Lambda; i j + W &OverBar; i k n ( t ) &CenterDot; W &OverBar; k i n ( t ) &CenterDot; V i j ) &CenterDot; H j k n t &RightArrow; - - - ( 3 )
H k j n &LeftArrow; H k j n &CenterDot; &Sigma; i = 1 M ( W &OverBar; k i n ( t ) &CenterDot; V i j ) &Sigma; i = 1 M ( W &OverBar; k i n ( t ) &CenterDot; &Lambda; &LeftArrow; t i j ) - - - ( 4 )
Fig. 3 is the flow chart that step 2 carries out the voice after local behavior denoising to noisy language, and single-channel voice strengthens logical Often assume that noisy speech signal v (i) is that noise signal n (i) is uncorrelated with voice signal s (i) additivity, i.e. v (i)=n (i)+s (i)。
Step 2.1, carries out STFT conversion to noisy speech, can time-frequency domain obtain following nonnegative matrix and:
V=S+N (5)
Wherein, V, S, N are the amplitude spectrum matrix of noisy speech, clean speech and noise respectively.Obtain speech manual simultaneously Phase information.
Formula (5) the right is carried out convolution Non-negative Matrix Factorization, obtains
V = &Sigma; t = 0 T - 1 W t n W t s H n t &RightArrow; H s t &RightArrow; = &Sigma; t = 0 T - 1 &Sigma; k = 1 R W i k n ( t ) H k j n t &RightArrow; + &Sigma; t = 0 T - 1 &Sigma; k = 1 R W i k s ( t ) H k j s t &RightArrow; - - - ( 6 )
Wherein, WsAnd HsRepresent voice base and the coefficient of correspondence, WnAnd HnRepresent noise basis and the coefficient of correspondence thereof.
Step 2.2, noise basis W that combined training module obtainssThe amplitude spectrum matrix of noisy speech is carried out CNMF_L1/2 Decompose and obtain voice base Ws, voice base system number HsWith new noise basis coefficient matrixCNMF_L1/2Object function during decomposition As follows:
D ( V | W n , W s , H n , H s ) = 1 2 | | V - &Sigma; t = 0 T - 1 W n ( t ) H n t &RightArrow; - &Sigma; t = 0 T - 1 W s ( t ) H s t &RightArrow; | | 2 + &lambda; | | H s | | 1 / 2 1 / 2 - - - ( 7 )
Formula (7) needs to solve Ws、HsWith new noise basis coefficient matrixCan be carried out by alternately update mode for this, I.e.
1st step, fixing WnAnd Hs, update Ws
2nd step, fixing Ws、WnWithUpdate Hs
3rd step, fixing Ws、HsAnd Wn, update
Owing to formula (7) is all a convex function for above-mentioned each step, gradient descent method can be used to obtain more new regulation:
W i k s ( t ) &LeftArrow; W i k s ( t ) &CenterDot; &Sigma; k = 1 T ( V i j + W &OverBar; i k s ( t ) &CenterDot; W &OverBar; k i s ( t ) &CenterDot; &Lambda; i j ) &CenterDot; H j k s t &RightArrow; &Sigma; k = 1 T ( &Lambda; i j + W &OverBar; i k s ( t ) &CenterDot; W &OverBar; k i s ( t ) &CenterDot; V i j ) &CenterDot; H j k s t &RightArrow; - - - ( 8 )
H k j s &LeftArrow; H k s &CenterDot; &Sigma; i = 1 M ( W &OverBar; k i s ( t ) &CenterDot; V i j ) &Sigma; i = 1 M ( W &OverBar; k i s ( t ) &Lambda; &LeftArrow; t i j + &lambda; 2 ( H k j s ) ( - 1 / 2 ) ) - - - ( 9 )
H &OverBar; k j n &LeftArrow; H &OverBar; k j n &CenterDot; &Sigma; i = 1 M ( W &OverBar; i k n ( t ) ) &CenterDot; V i j &Sigma; i = 1 M ( W &OverBar; i k n ( t ) ) &CenterDot; &Lambda; &LeftArrow; t i j - - - ( 10 )
Step 2.3, utilizes the voice base W that step 2 obtainss, voice base system number HsSynthesize with phase information and obtain language after denoising The amplitude spectrum S of sound, synthetic method is as follows:
S = &Sigma; t = 0 T - 1 W s ( t ) H s t &RightArrow; - - - ( 11 )
Step 4, carries out inverse STFT conversion and obtains enhanced voice spectrum the amplitude spectrum S of voice after denoising.
A kind of based on L1/2Sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising system, it is characterised in that including: STFT conversion module, obtains its amplitude spectrum for specific noise and noisy speech carry out Short Time Fourier Transform conversion;Noise Training module, is used for utilizing CNMF method to be trained specific noise obtaining noise basis information;Speech decomposition module, being used for will The noise basis that training obtains, as prior information, uses CNMF_L1/2Method carries out decomposition to noisy language and obtains voice base;Language Sound synthesis module, the amplitude spectrum of voice after the voice base obtained and phase information synthesis obtain denoising;Frequency spectrum modular converter, Enhanced voice spectrum is obtained for the amplitude spectrum of voice after denoising being carried out inverse STFT conversion.
Beneficial effect by emulation experiment sunykatuib analysis noisy speech of the present invention denoising method:
Experimental Hardware is configured to Core i5 3.2GHz, internal memory 4G, and simulation software is Matlab2013a.In order to verify this The effectiveness of method that invention is proposed, have chosen the language material in TIMIT sound bank as clean speech, respectively male voice 25 Sentence, female voice 25, every about 3s;The noise choosing NOISEX-92 noise storehouse again includes putting down as experimental data, the noise chosen Steady noise and nonstationary noise, be tetra-kinds of noises of Babble, F16, White and M109 respectively.Clean speech and the standard of noise It is sample rate 8kHz, precision 16bit.In experimentation, to clean speech, according to five kinds of different signal to noise ratios, (SNR is respectively with noise For-5dB, 0dB, 5dB, 10dB) mix.When noise spectrum and noisy speech are composed and calculated, all signal is carried out sub-frame processing, Frame length is 512 sampling points, and interframe overlap 50% carries out the discrete Fourier transform of 512 to every frame.
In order to compare the advantage of the present invention, use the CNMF method without sparse constraint, L1The CNMF method of sparse constraint with And the L that the present invention proposes1/2The CNMF method of sparse constraint (is referred to as CNMF, CNMF_L individually below1、CNMF_L1/2) three kinds Distinct methods compares.Simultaneously in order to verify the practicality of the put forward sound enhancement method of the present invention, also increase with three kinds of conventional voices Strong method includes spectrum-subtraction, (divides below based on prior weight Wiener Filter Method, Minimum Mean Square Error log-domain spectral amplitude estimation method Jian Chengwei PS, Wiener, logMMSE) contrast, compare the diversity in speech enhan-cement effect between them.
Parameter for method performance impact mainly has three: number R of time-frequency base, and the number of times Iter of method iteration is dilute Sparse coefficient λ.Time-frequency base represents the feature of voice signal, and R chooses time-frequency in the number represented about time-frequency base, and experiment The number value of base is empirically corresponding with the phoneme in voice.Every the clean speech used due to this experiment is all containing about 12 The voice of individual phoneme, so R chooses 12.On the iterations of method, Fig. 4 is L of the present invention1/2The CNMF method of sparse constraint The convergence curve being used on speech enhan-cement, it can be seen that the value of object function is the most steady after 200 times, from experimental viewpoint Confirm that method restrains, therefore in subsequent experimental, Iter is set to 200.Sparse coefficient λ is the weight of balance cost function and sparse degree Want parameter, by adjusting λ value, the speech enhan-cement stage can be rolled between residual noise and reduction voice distortion removing In.Voice base owes sparse or the most sparse all can make enhanced sound effect be deteriorated, and voice can under to different λ for the present invention The experiment of degree of understanding obtains empirical value 0.01.
Experimental result
The present invention uses two kinds of evaluation criterions to carry out objective evaluation present invention performance on speech enhan-cement.The first is at language Evaluation in sound intelligibility, for STOI (the Short-Time Objective of the intelligibility of speech after denoising Intelligibility) score.The degree that in voice, information loaded is understood mainly is investigated in intelligibility of speech evaluation.STOI is A kind of intelligibility measurement index, for weighing the intelligibility performance of sound enhancement method, and the value of STOI is in (0,1) scope In, it is worth the biggest, after showing to strengthen, the intelligibility of voice is the highest.
The second is the evaluation in voice quality, for the segmental signal-to-noise ratio SegSNR (Segmental of voice after denoising SNR).Voice quality assessment mainly investigates the audition comfort level of voice, naturalness and joyful degree.SegSNR is speech enhan-cement side A kind of objective effective evaluation that method is conventional, is mainly used in weighing after strengthening voice relative to the wave distortion journey of clean speech Degree.
The present invention uses L1/2Regularization constraint, with L1Constraint is compared, when decomposing noisy speech, due at object function In, coefficient matrix H is made L1/2Openness restriction, can produce more sparse H, thus obtain more when decomposing voice signal Add sparse voice base.In order to verify CNMF_L1/2Effect on openness, will strengthen the coefficient matrix H of stage voice signal And voice base W shows in Figure 5, Fig. 5 (a) and (b) are CNMF_L respectively1And CNMF_L1/2Decompose voice signal time H and W, the top half of figure is coefficient matrix H, and the latter half is 12 voice bases.From fig. 5, it can be seen that and CNMF_L1And CNMF_ L1/2H more sparse.
Fig. 6 is six kinds of distinct methods STOI values under different noise circumstances, and wherein UN represents original noisy speech.From figure In 6 it will be seen that with two kinds of similar methods CNMF and CNMF_L1Compare, the CNMF_L of the present invention1/2Method has on STOI Have superiority, illustrate that the noise basis carried when using the most sparse voice basis representation clean speech during denoising is the fewest, because of After this enhancing, the intelligibility of voice is the highest.Meanwhile, compared with classical sound enhancement method PS, Wiener, logMMSE, this The STOI value of inventive method promotes substantially, particularly has higher intelligibility when below 0dB.This is because the inventive method The dictionary learning first passing through supervision obtains noise dictionary, when strengthening stage decomposition noisy speech, and the aspect ratio to noise More sensitive and insensitive to the energy of noise, therefore more noise basis can be isolated, the enhancing intelligibility of speech obtained is the best.
Fig. 7 is that six kinds of methods SegSNR improvement values under different noise circumstances (utilizes the SegSNR of voice after strengthening to subtract The SegSNR strengthening front voice is gone to be calculated), the value of improvement is the biggest, and voice quality is the best.It can be seen from figure 7 that increasing After strong in the quality of voice, CNMF_L1/2Method is better than similar CNMF and CNMF_L1Two kinds of methods.Compared with classical way, The slightly advantage when low signal-to-noise ratio, during high s/n ratio, effect is slightly not as classical way.When this is likely due to high s/n ratio, noisy Noise basis in voice is less, although combines the noise dictionary of training, but also results in certain voice base during burbling noise It is pulled away, the degradation of voice after causing strengthening.
Performance evaluation
CNMF_L1/2Sound enhancement method needs to be trained noise in advance obtaining noise dictionary as prior information, instruction When practicing noise, an iteration is time-consumingly about 0.1s.L is used during strengthening1/2Noisy speech is carried out point by the CNMF of instruction constraint Solving, solution procedure uses the property taken advantage of more new regulation, and the time-consuming of an iteration is 0.15s.
Additionally, due to CNMF_L1/2Method is a kind of method based on dictionary learning, is first made an uproar in speech enhan-cement Sound dictionary, therefore more sensitive to the characteristic information of noise, when decomposing noisy speech, it is possible to more effectively burbling noise base. When low signal-to-noise ratio, more noise basis can be isolated;When high s/n ratio, then certain voice base can be caused to lose.
Conclusion
The present invention proposes a kind of based on L1/2The single-channel voice Enhancement Method of sparse constraint CNMF.CNMF_L1/2Method Make full use of convolution nonnegative matrix describe voice signal time response advantage on, use L1/2Coefficient matrix H is entered by regular terms Row sparse constraint.By using the more preferable L of sparse performance1/2Item is decomposed in constraint, can obtain more sparse voice base and represent Clean speech, and use herein and have the sound enhancement method of supervision, the noise basis obtained by the noise training stage, more can have Effect ground helps enhancing stage voice signal to separate with noise signal.Emulate under four class difference noises and five kinds of signal to noise ratios Experiment, demonstrates CNMF_L1/2Method has good performance in speech enhan-cement.From experimental section it is found that this method exists Effect under low signal-to-noise ratio environment is pretty good, and following work will study CNMF_L under Low SNR further1/2Fitting of method The property used.It addition, also by sparse coefficient λ based on experience is carried out deeper into research.
The above, be only presently preferred embodiments of the present invention, and the present invention not does any type of restriction.Every Any simple modification, equivalent variations and modification above example made according to technology and the method essence of the present invention, the most still Belong in the range of technology and the method scheme of the present invention.

Claims (6)

1. one kind based on L1/2The method of sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising, it is characterised in that: assume to contain Noisy speech signal v (i) is that noise signal n (i) is uncorrelated with voice signal s (i) additivity, i.e. v (i)=n (i)+s (i), noisy language The method of sound denoising comprises the following steps:
Step 1: utilize CNMF method that specific noise is trained obtaining noise basis information:
Step 2: using noise basis as prior information, uses CNMF_L1/2Method carries out decomposition to noisy language and obtains voice base, It is finally synthesizing the voice after denoising.
The method of noisy speech denoising the most according to claim 1, it is characterised in that: described step 1 specifically includes following step Rapid:
Step 1.1: noise is carried out Short Time Fourier Transform conversion and obtains its amplitude spectrum N;
Step 1.2: noise amplitude spectrum is carried out CNMF decomposition and obtains noise basis WnAnd the coefficient matrix H of correspondencen, the mesh of decomposition Scalar functions is as follows:
D ( V | &Lambda; ) = 1 2 | | V - &Lambda; | | 2 - - - ( 1 )
Wherein, V is the noise amplitude spectrum matrix that band decomposes, and Λ is the convolution estimated value to V:
&Lambda; = &Sigma; t = 0 T 0 - 1 W n ( t ) H n t &RightArrow; - - - ( 2 )
In formula (2), W (t) and H represents t basic matrix and coefficient matrix respectively,Represent and matrix is moved to right t step, the left side by row Vacate row benefit 0.
The method of noisy speech denoising the most according to claim 2, it is characterised in that: target function type in described step 1.2 (1) owing to respectively for being convex function for W and H, can alternately update W and H, gradient descent method is used to obtain its renewal side Journey:
W i k n ( t ) &LeftArrow; W i k n ( t ) &CenterDot; &Sigma; k = 1 T ( V i j + W &OverBar; i k n ( t ) &CenterDot; W &OverBar; k i n ( t ) &CenterDot; &Lambda; i j ) &CenterDot; H j k n t &RightArrow; &Sigma; k = 1 T ( &Lambda; i j + W &OverBar; i k n ( t ) &CenterDot; W &OverBar; k i n ( t ) &CenterDot; V i j ) &CenterDot; H j k n t &RightArrow; - - - ( 3 )
H k j n &LeftArrow; H k j n &CenterDot; &Sigma; i = 1 M ( W &OverBar; k i n ( t ) &CenterDot; V i j ) &Sigma; i = 1 M ( W &OverBar; k i n ( t ) &CenterDot; &Lambda; i j &LeftArrow; t ) - - - ( 4 )
The method of noisy speech denoising the most according to claim 1, it is characterised in that: described step 2 specifically includes following step Rapid:
Step 2.1: noisy speech is carried out STFT conversion, time-frequency domain obtain following nonnegative matrix and:
V=S+N (5)
Wherein, V, S, N are the amplitude spectrum matrix of noisy speech, clean speech and noise respectively, obtain the phase place of speech manual simultaneously Information;
Formula (5) the right is carried out convolution Non-negative Matrix Factorization, obtains:
V = &Sigma; t = 0 T - 1 W t n W t s H n t &RightArrow; H s t &RightArrow; = &Sigma; t = 0 T - 1 &Sigma; k = 1 R W i k n ( t ) H k j n t &RightArrow; + &Sigma; t = 0 T - 1 &Sigma; k = 1 R W i k s ( t ) H k j s t &RightArrow; - - - ( 6 )
Wherein, WsAnd HsRepresent voice base and the coefficient matrix of correspondence, WnAnd HnRepresent noise basis and the coefficient matrix of correspondence thereof;
Step 2.2: noise basis W that integrating step 1.2 obtainsnThe amplitude spectrum matrix of noisy speech is carried out CNMF_L1/2Decompose To voice base Ws, voice base system number HsWith new noise basis coefficient matrixCNMF_L1/2Object function during decomposition is as follows:
D ( V | W n , W s , H n , H s ) = 1 2 | | V - &Sigma; t = 0 T - 1 W n ( t ) H n t &RightArrow; - &Sigma; t = 0 T - 1 W s ( t ) H s t &RightArrow; | | 2 + &lambda; | | H s | | 1 / 2 1 / 2 - - - ( 7 )
Step 2.3: utilize the voice base W that step 2.2 obtainss, voice base system number HsSynthesize with phase information and obtain voice after denoising Amplitude spectrum S, synthetic method is as follows:
S = &Sigma; t = 0 T - 1 W s ( t ) H s t &RightArrow; - - - ( 8 )
Step 2.4: the amplitude spectrum S of voice after denoising is carried out inverse STFT conversion and obtains enhanced voice spectrum.
The method of the most described noisy speech denoising, it is characterised in that: target letter in described step 2.2 Numerical expression (7) solve by being carried out by alternately update mode, it may be assumed that
1st step, fixing WnAnd Hs, update Ws
2nd step, fixing Ws、WnWithUpdate Hs
3rd step, fixing Ws、HsAnd Wn, update
Owing to formula (7) is all a convex function for above-mentioned each step, gradient descent method can be used to obtain more new regulation:
W i k s ( t ) &LeftArrow; W i k s ( t ) &CenterDot; &Sigma; k = 1 T ( V i j + W &OverBar; i k s ( t ) &CenterDot; W &OverBar; k i s ( t ) &CenterDot; &Lambda; i j ) &CenterDot; H j k s t &RightArrow; &Sigma; k = 1 T ( &Lambda; i j + W &OverBar; i k s ( t ) &CenterDot; W &OverBar; k i s ( t ) &CenterDot; V i j ) &CenterDot; H j k s t &RightArrow; - - - ( 9 )
H k j s &LeftArrow; H k j s &CenterDot; &Sigma; i = 1 M ( W &OverBar; k i s ( t ) &CenterDot; V j ) &Sigma; i = 1 M ( W &OverBar; k i s ( t ) &CenterDot; &Lambda; i j &LeftArrow; t + &lambda; 2 ( H k j s ) ( - 1 / 2 ) ) - - - ( 10 )
H &OverBar; k j n &LeftArrow; H &OverBar; k j n &CenterDot; &Sigma; i = 1 M ( W &OverBar; i k n ( t ) ) &CenterDot; V i j &Sigma; i = 1 M ( W &OverBar; i k n ( t ) ) &CenterDot; &Lambda; i j &LeftArrow; t - - - ( 11 )
6. one kind based on L1/2Sparse constraint convolution Non-negative Matrix Factorization noisy speech denoising system, it is characterised in that including:
STFT conversion module, obtains its amplitude spectrum for specific noise and noisy speech carry out Short Time Fourier Transform conversion;
Noise training module, is used for utilizing CNMF method to be trained specific noise obtaining noise basis information;
Speech decomposition module, for using training the noise basis obtained as prior information, using CNMF_L1/2Method is to noisy language Sound carries out decomposition and obtains voice base;
Voice synthetic module, the amplitude spectrum of voice after the voice base obtained and phase information synthesis obtain denoising;
Frequency spectrum modular converter, obtains enhanced voice spectrum for the amplitude spectrum of voice after denoising carries out inverse STFT conversion.
CN201610452012.7A 2016-06-20 2016-06-20 One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization Active CN105957537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610452012.7A CN105957537B (en) 2016-06-20 2016-06-20 One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610452012.7A CN105957537B (en) 2016-06-20 2016-06-20 One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization

Publications (2)

Publication Number Publication Date
CN105957537A true CN105957537A (en) 2016-09-21
CN105957537B CN105957537B (en) 2019-10-08

Family

ID=56907052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610452012.7A Active CN105957537B (en) 2016-06-20 2016-06-20 One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization

Country Status (1)

Country Link
CN (1) CN105957537B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106597439A (en) * 2016-12-12 2017-04-26 电子科技大学 Synthetic aperture radar target identification method based on incremental learning
CN107301382A (en) * 2017-06-06 2017-10-27 西安电子科技大学 The Activity recognition method of lower depth Non-negative Matrix Factorization is constrained based on Time Dependent
CN108574911A (en) * 2017-03-09 2018-09-25 中国科学院声学研究所 The unsupervised single microphone voice de-noising method of one kind and system
CN108573711A (en) * 2017-03-09 2018-09-25 中国科学院声学研究所 A kind of single microphone speech separating method based on NMF algorithms
CN108573698A (en) * 2017-03-09 2018-09-25 中国科学院声学研究所 A kind of voice de-noising method based on gender fuse information
CN108962229A (en) * 2018-07-26 2018-12-07 汕头大学 A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN109448749A (en) * 2018-12-19 2019-03-08 中国科学院自动化研究所 Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing
CN110060699A (en) * 2019-05-21 2019-07-26 哈尔滨工程大学 A kind of single channel speech separating method based on the sparse expansion of depth
CN110189761A (en) * 2019-05-21 2019-08-30 哈尔滨工程大学 A kind of single channel speech dereverberation method based on greedy depth dictionary learning
CN110222781A (en) * 2019-06-12 2019-09-10 成都嗨翻屋科技有限公司 Audio denoising method, device, user terminal and storage medium
CN110428848A (en) * 2019-06-20 2019-11-08 西安电子科技大学 A kind of sound enhancement method based on the prediction of public space speech model
CN111276154A (en) * 2020-02-26 2020-06-12 中国电子科技集团公司第三研究所 Wind noise suppression method and system and shot sound detection method and system
CN111710343A (en) * 2020-06-03 2020-09-25 中国科学技术大学 Single-channel voice separation method on double transform domains
CN111726723A (en) * 2020-06-23 2020-09-29 声耕智能科技(西安)研究院有限公司 Wireless talkback noise reduction earmuff and noise reduction method based on bone conduction technology
CN112558757A (en) * 2020-11-20 2021-03-26 中国科学院宁波材料技术与工程研究所慈溪生物医学工程研究所 Muscle collaborative extraction method based on smooth constraint non-negative matrix factorization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050222840A1 (en) * 2004-03-12 2005-10-06 Paris Smaragdis Method and system for separating multiple sound sources from monophonic input with non-negative matrix factor deconvolution
CN101441872A (en) * 2007-11-19 2009-05-27 三菱电机株式会社 Denoising acoustic signals using constrained non-negative matrix factorization
US20140229168A1 (en) * 2013-02-08 2014-08-14 Asustek Computer Inc. Method and apparatus for audio signal enhancement in reverberant environment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050222840A1 (en) * 2004-03-12 2005-10-06 Paris Smaragdis Method and system for separating multiple sound sources from monophonic input with non-negative matrix factor deconvolution
CN101441872A (en) * 2007-11-19 2009-05-27 三菱电机株式会社 Denoising acoustic signals using constrained non-negative matrix factorization
US20140229168A1 (en) * 2013-02-08 2014-08-14 Asustek Computer Inc. Method and apparatus for audio signal enhancement in reverberant environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孙健等: "基于卷积非负矩阵分解的语音转换方法", 《数据采集与处理》 *
张立伟等: "稀疏卷积非负矩阵分解的语音增强算法", 《数据采集与处理》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106597439A (en) * 2016-12-12 2017-04-26 电子科技大学 Synthetic aperture radar target identification method based on incremental learning
CN108574911B (en) * 2017-03-09 2019-10-22 中国科学院声学研究所 The unsupervised single microphone voice de-noising method of one kind and system
CN108574911A (en) * 2017-03-09 2018-09-25 中国科学院声学研究所 The unsupervised single microphone voice de-noising method of one kind and system
CN108573711A (en) * 2017-03-09 2018-09-25 中国科学院声学研究所 A kind of single microphone speech separating method based on NMF algorithms
CN108573698A (en) * 2017-03-09 2018-09-25 中国科学院声学研究所 A kind of voice de-noising method based on gender fuse information
CN108573698B (en) * 2017-03-09 2021-06-08 中国科学院声学研究所 Voice noise reduction method based on gender fusion information
CN107301382A (en) * 2017-06-06 2017-10-27 西安电子科技大学 The Activity recognition method of lower depth Non-negative Matrix Factorization is constrained based on Time Dependent
CN107301382B (en) * 2017-06-06 2020-05-19 西安电子科技大学 Behavior identification method based on deep nonnegative matrix factorization under time dependence constraint
CN108962229A (en) * 2018-07-26 2018-12-07 汕头大学 A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN108962229B (en) * 2018-07-26 2020-11-13 汕头大学 Single-channel and unsupervised target speaker voice extraction method
CN109448749A (en) * 2018-12-19 2019-03-08 中国科学院自动化研究所 Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing
CN109448749B (en) * 2018-12-19 2022-02-15 中国科学院自动化研究所 Voice extraction method, system and device based on supervised learning auditory attention
CN110189761A (en) * 2019-05-21 2019-08-30 哈尔滨工程大学 A kind of single channel speech dereverberation method based on greedy depth dictionary learning
CN110189761B (en) * 2019-05-21 2021-03-30 哈尔滨工程大学 Single-channel speech dereverberation method based on greedy depth dictionary learning
CN110060699A (en) * 2019-05-21 2019-07-26 哈尔滨工程大学 A kind of single channel speech separating method based on the sparse expansion of depth
CN110222781A (en) * 2019-06-12 2019-09-10 成都嗨翻屋科技有限公司 Audio denoising method, device, user terminal and storage medium
CN110428848B (en) * 2019-06-20 2021-10-29 西安电子科技大学 Speech enhancement method based on public space speech model prediction
CN110428848A (en) * 2019-06-20 2019-11-08 西安电子科技大学 A kind of sound enhancement method based on the prediction of public space speech model
CN111276154A (en) * 2020-02-26 2020-06-12 中国电子科技集团公司第三研究所 Wind noise suppression method and system and shot sound detection method and system
CN111276154B (en) * 2020-02-26 2022-12-09 中国电子科技集团公司第三研究所 Wind noise suppression method and system and shot sound detection method and system
CN111710343A (en) * 2020-06-03 2020-09-25 中国科学技术大学 Single-channel voice separation method on double transform domains
CN111710343B (en) * 2020-06-03 2022-09-30 中国科学技术大学 Single-channel voice separation method on double transform domains
CN111726723A (en) * 2020-06-23 2020-09-29 声耕智能科技(西安)研究院有限公司 Wireless talkback noise reduction earmuff and noise reduction method based on bone conduction technology
CN112558757A (en) * 2020-11-20 2021-03-26 中国科学院宁波材料技术与工程研究所慈溪生物医学工程研究所 Muscle collaborative extraction method based on smooth constraint non-negative matrix factorization
CN112558757B (en) * 2020-11-20 2022-08-23 中国科学院宁波材料技术与工程研究所慈溪生物医学工程研究所 Muscle collaborative extraction method based on smooth constraint non-negative matrix factorization

Also Published As

Publication number Publication date
CN105957537B (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN105957537A (en) Voice denoising method and system based on L1/2 sparse constraint convolution non-negative matrix decomposition
CN111223493B (en) Voice signal noise reduction processing method, microphone and electronic equipment
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN105023580B (en) Unsupervised noise estimation based on separable depth automatic coding and sound enhancement method
CN111081268A (en) Phase-correlated shared deep convolutional neural network speech enhancement method
CN105488466B (en) A kind of deep-neural-network and Acoustic Object vocal print feature extracting method
CN107845389A (en) A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks
CN109215674A (en) Real-time voice Enhancement Method
CN111128209B (en) Speech enhancement method based on mixed masking learning target
Srinivasan et al. Transforming binary uncertainties for robust speech recognition
CN107967920A (en) A kind of improved own coding neutral net voice enhancement algorithm
CN111899757B (en) Single-channel voice separation method and system for target speaker extraction
CN106653056A (en) Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof
CN103117059A (en) Voice signal characteristics extracting method based on tensor decomposition
CN111192598A (en) Voice enhancement method for jump connection deep neural network
CN112259119B (en) Music source separation method based on stacked hourglass network
CN111724806B (en) Double-visual-angle single-channel voice separation method based on deep neural network
CN110189766A (en) A kind of voice style transfer method neural network based
CN112885368A (en) Multi-band spectral subtraction vibration signal denoising method based on improved capsule network
CN106356058A (en) Robust speech recognition method based on multi-band characteristic compensation
Wiem et al. Unsupervised single channel speech separation based on optimized subspace separation
Saleem et al. On improvement of speech intelligibility and quality: A survey of unsupervised single channel speech enhancement algorithms
CN103345920A (en) Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation
Fan et al. A regression approach to binaural speech segregation via deep neural network
CN103886859A (en) Voice conversion method based on one-to-many codebook mapping

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant