CN106340304A - Online speech enhancement method for non-stationary noise environment - Google Patents

Online speech enhancement method for non-stationary noise environment Download PDF

Info

Publication number
CN106340304A
CN106340304A CN201610843483.0A CN201610843483A CN106340304A CN 106340304 A CN106340304 A CN 106340304A CN 201610843483 A CN201610843483 A CN 201610843483A CN 106340304 A CN106340304 A CN 106340304A
Authority
CN
China
Prior art keywords
noise
theta
estimation
voice signal
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610843483.0A
Other languages
Chinese (zh)
Other versions
CN106340304B (en
Inventor
冯宝
张绍荣
孙山林
郑伟
张国宁
武博
韦周耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Aerospace Technology
Original Assignee
Guilin University of Aerospace Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Aerospace Technology filed Critical Guilin University of Aerospace Technology
Priority to CN201610843483.0A priority Critical patent/CN106340304B/en
Publication of CN106340304A publication Critical patent/CN106340304A/en
Application granted granted Critical
Publication of CN106340304B publication Critical patent/CN106340304B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention provides an online speech enhancement method for a non-stationary noise environment. The method comprises the steps of (1) establishing a system model in a non-stationary noise environment, (2) framing and windowing, (3) carrying out system initialization, (4) estimating an AR parameter, and (5) estimating a speech signal state sequence. For a problem that the AR parameter in a speech model can not be updated with noise change in real time, the invention put forward a dual Calman filtering frame, two Calman filters are in parallel computing, speech signal state estimation and AR parameter estimation are in mutual updating, a data estimation process and a parameter estimation process are carried out alternately, thus the parameter estimation process can be adapted to the noise change process so as to improve the accuracy of the system model, and thus the performance of speech enhancement is enhanced. For a problem that a traditional Calman filtering algorithm can not process non-stationary noise, combined with a convex optimization technique, an improved Calman filtering frame is put forward, Gauss noise and non-stationary noise can be accurately estimated, and the accuracy of speech enhancement is improved.

Description

A kind of online sound enhancement method being applied under nonstationary noise environment
Technical field
The present invention relates to field of speech enhancement, refer in particular to a kind of online voice being applied under nonstationary noise environment and increase Strong method.
Background technology
In speech recognition front-ends processing procedure, voice signal always by various noise jamming and flooding, due to interference Randomness, signal processing technology can only go to strengthen as far as possible voice quality.The main purpose of speech enhan-cement is from noisy speech In extract pure raw tone.
Common voice enhancement algorithm mainly has following several:
1st, noise cancellation method: the method is according in a time domain or in a frequency domain, directly subtracts noise component(s) from noisy speech The method gone is realized.The maximum feature of the method is to need using background signal as reference signal, reference signal accurately with The no performance directly determining the method.
2nd, harmonic signal enhancement method: because the voiced sound in voice has obvious periodicity, this periodicity reflects in frequency domain It is then a series of peak component one by one corresponding to fundamental frequency (fundamental tone) and its harmonic wave respectively, these frequency components occupy voice Most of energy, can carry out speech enhan-cement using this periodicity, to extract fundamental tone using comb filter and its harmonic wave divides Amount, suppresses other periodic noises and aperiodic broadband noise.
3rd, the enhancing algorithm based on speech production model: the voiced process of voice can be modeled as a linear time-varying filtering Device.Different driving sources are adopted to different types of voice.In the generation model of voice, most widely used is full limit mould Type.A series of voice enhancement algorithm, such as time-varying parameter Wiener filtering and Kalman can be obtained based on speech production model Filtering method.
4th, the enhancing algorithm based on short time spectrum: the enhancing algorithm species based on voice short time spectrum is a lot, such as composes Subtractive method, Wiener Filter Method, LMSE method etc..Such method has an adaptation, and SNR ranges are big, method simple, be easy to The advantages of real-time processing.
5th, the enhancing algorithm based on wavelet decomposition: wavelet decomposition method is as sending out of this tool of mathematical analysis of wavelet decomposition Open up and grow up, it combines some ultimate principles of subtractive method of spectrums simultaneously again.
6th, the enhancing algorithm based on audition shielding: audition screen method is that a kind of enhancing of the auditory properties using human ear is calculated Method.
Based on the voice enhancement algorithm of Kalman filtering belong to above the third, traditional Kalman filtering is carrying out voice increasing Two important hypothesis: process noise and measurement noise equal Gaussian distributed are had when strong.Traditional Kalman filtering is in actual speech Following both sides limitation is shown: 1. the estimation of ar parameter must be accurately in enhancing.But gather environment in actual speech In, noise is continually changing, and this requires that the estimation of ar parameter in speech model should have real-time, simultaneously should be in ar parameter Consider various effect of noise in estimation procedure, otherwise can lead to the decline of speech enhan-cement performance.2. traditional Kalman filtering is calculated Method only considers that the situation of Gaussian noise does not meet practical application.Can be by a kind of nonstationary noise (tool during speech signal collection Have openness, obey laplacian distribution) pollution, it is not common, but is implicitly present in and voice quality is affected larger.If In speech enhan-cement, when nonstationary noise is processed as Gaussian noise, it will serious reduction speech enhan-cement quality, it is unfavorable for follow-up The semantic identification of voice.
Based on the problems referred to above, provide a kind of can be in the case of real-time processing Gaussian noise and nonstationary noise exist simultaneously Online speech enhancement technique is very important.
Content of the invention
The technical problem to be solved is cannot to process ar in speech model for existing kalman filter method Parameter cannot real-time update, measure during there is nonstationary noise, in conjunction with convex optimisation technique, provide one kind to be applied to Online sound enhancement method under nonstationary noise environment, being capable of On-line Estimation ar parameter and nonstationary noise.
For achieving the above object, technical scheme provided by the present invention is: a kind of is applied under nonstationary noise environment Online sound enhancement method, comprises the following steps:
1) set up the system model under nonstationary noise environment
1.1) the autoregression ar model in the case of setting up that Gaussian noise and sparse noise are common and existing
The generation process of voice signal be one by white-noise excitation, through the output of full limit linear system from recurrence mistake Journey, i.e. current output is equal to the pumping signal of present moment and the weighted sum of p moment output in the past, and this is an autoregression Ar model, is expressed as follows:
s ( k ) = σ i = 1 p a i s ( k - i ) + u ( k ) - - - ( 1 )
Wherein, u (k) is the white Gaussian noise excitation value in k moment;S (k-i) is the voice signal in (k-i) moment;s(k) Voice signal for the kth moment;aiFor i-th linear predictor coefficient, also referred to as ar model parameter;P is the rank of ar model parameter Number;
Set up the voice signal model meeting actual measurement process, it is as follows that voice signal measures process description:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, y (k) is k moment voice signal measurement sequence;S (k) is the voice signal in k moment;N (k) is that the k moment is high This white noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has openness;
1.2) set up voice signal state-space model
Formula (1) and formula (2) are converted to state-space model, are described as follows:
X (k)=fx (k-1)+p (k) (3)
Y (k)=cx (k)+n (k)+v (k) (4)
Wherein,
f = 0 1 0 ... 0 0 0 1 ... 0 ... ... ... ... ... 0 0 0 ... 1 a p ( k ) a p - 1 ( k ) a p - 2 ( k ) a 1 ( k ) - - - ( 5 )
C=[0 0 ... 0 1] (6)
X (k)=[s (k-p+1) ... s (k)]t(7)
In voice signal state equation (3) and voice signal measurement equation (4), x (k) is k moment voice signal state Estimated sequence, i.e. the optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;y(k) For k moment voice signal measurement sequence;The state-transition matrix that f is constituted for linear predictor coefficient, last column [a in fp(k) … a1(k)] it is referred to as ar parameter;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, obeys high This distribution;N (k) is k moment measurement noise, Gaussian distributed;V (k) is the nonstationary noise in k moment, obeys Laplce Distribution;
The statistical property of the state of voice signal and measurement noise p (k) and n (k) is:
E (p (k))=q, e (n (k))=r
e(p(k)p(j)t)=q δkj,e(n(k)n(j)t)=r δkj(8)
Wherein, q and r is respectively the average of noise p (k) and n (k);Q and r is respectively the covariance of noise p (k) and n (k); δkjFor kronecker function;Speech Enhancement problem is to go to estimate optimum voice on the premise of known measurement voice signal y (k) Signal x (k);
2) framing and adding window
Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, this makes it possible to voice to believe Number it is divided into some short sections come being processed, here it is framing, the framing of voice signal is using moveable finite length Method that window is weighted is realizing;Frame number generally per second is 33~100 frames, and framing method is the side of overlapping segmentation The overlapping part of method, former frame and a later frame is referred to as frame and moves, and frame moves and the ratio of frame length is 0~0.5;
3) system initialization
3.1) improved Kalman filter device parameter initialization
Initialization voice signal state estimation sequence x (0/0), covariance matrix p (0/0) are it is ensured that covariance matrix is just Fixed;
3.2) ar parameter initialization
Initialization ar parameter state estimated sequence θ (0/0);
4) estimate ar parameter
Ar parameter refers to last column [a in state-transition matrix f in formula (3)p(k) … a1(k)], it is mainly used to Description speech production process, its accuracy has direct impact to the result of speech enhan-cement;Propose in the estimation of ar parameter Consider voice signal state estimation sequence x (k-1), state-noise q (k), measurement noise n (k), nonstationary noise v (k), Set up new ar parameter estimation state-space model, realize the online Robust Estimation of ar parameter, and the real-time estimation mistake to ar parameter Journey is as follows:
4.1) set up the parameter estimation model of ar parameter
The ar parameter model that Gaussian noise and nonstationary noise mix under lower environment is described as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=a θ (k)+r (k)+w (k) (9)
Wherein, θ (k)=[ap(k) … a1(k)]tFor k moment ar parameter state sequence;Q (k) is k moment state-noise, Gaussian distributed, its covariance matrix is q (k);R (k) k moment measurement noise, Gaussian distributed, its covariance matrix is r(k);W (k) k moment measurement noise, Gaussian distributed, its covariance matrix is w (k);A=x (k-1)t=[s (k-p) ... S (k-1)] it is measurement matrix;Y (k) is k moment voice signal measurement sequence;State and the statistics of measurement noise q (k) and r (k) Characteristic is:
E (q (k))=d, e (r (k))=l
e(q(k)q(j)t)=d δkj,e(r(k)r(j)t)=l δkj(10)
Wherein, d and l is respectively the average of noise q (k) and r (k);D and l is respectively the covariance of noise q (k) and r (k); δkjFor kronecker function;
4.2) from the traditional Kalman filtering problem of convex optimization angle reconstruct
In order to easily estimate to sparse noise, need to ask from the angle reconstruct Kalman filtering of convex optimization Topic, the state-space model of traditional Kalman filtering, without nonstationary noise w (k), as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=a θ (k)+r (k) (11)
According to Bayes principle, ar Parameter Estimation Problem is expressed as, under the premise of metric data y (k) is known, estimating Excellent ar argument sequence θ (k) it may be assumed that
p ( θ ( k ) | y ( k ) ) = p ( y ( k ) | θ ( k ) ) p ( θ ( k ) ) p ( y ( k ) ) - - - ( 12 )
Theoretical according to maximal possibility estimation, set up the likelihood function of p (y (k) | θ (k)) and p (θ (k)):
l 1 ( y ( k ) , θ ( k ) ) = p ( θ ( k ) ) p ( r ( k ) ) p ( θ ( k ) ) = p ( r ( k ) ) = 1 ( 2 π ) m | l | 1 / 2 exp ( - 1 2 r t ( k ) l - 1 r ( k ) ) - - - ( 13 )
l 2 ( θ ( k ) ) = p ( θ ( k ) ) = 1 ( 2 π ) n | σ | 1 / 2 exp ( - 1 2 ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) - - - ( 14 )
Wherein, ψ beThe covariance matrix ψ of conditional probability p in the case of known (θ (k) | y (k)) (k)=pθ(k | k)+d (k), wherein pθ(k | k) it is covariance updated value;When likelihood function condition l1(y (k), θ (k)) and l2(θ (k)) when obtaining maximum, conditional probability p (y (k) | θ (k)) obtains optimal estimation value;Observation type (12) and formula (13) find Bigization likelihood function condition l1(z (k), x (k+1)) and l2(x (k+1)) is equivalent to the index minimizing power exponent in likelihood function PartWithTherefore obtain as Lower optimization form:
min i m i z e r t ( k ) l - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) )
Subjiect to y (k)=a θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, ψ (k)=pθ(k | k)+d (k) is the covariance matrix of Gaussian noise;θ(k) Estimated value beR (k) is exactly the estimation to Gaussian noise;pθ(k | k) updates matrix for covariance:
pθ(k | k)=(i-kθ(k)a(k))pθ(k|k-1) (16)
pθ(k | k-1) be covariance prediction matrix:
pθ(k | k-1)=pθ(k-1|k-1)+d(k-1) (17)
kθK () is covariance gain:
kθ(k)=pθ(k|k-1)at(apθ(k|k-1)at+l(k-1))-1(18)
4.3) build, from the convex angle that optimizes, the optimization problem that nonstationary noise is estimated
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept that nonstationary noise is estimated is profit With the sparse characteristic of noise, through step 4.2) traditional Kalman filtering problem is converted into after convex optimization problem, can be excellent Increase the sparsity constraints of nonstationary noise w (k) completing the estimation to sparse noise, new optimization form is in change:
min i m i z e r t ( k ) l - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) + λ | | w ( k ) | | 1
Subjiect to y (k)=a θ (k)+r (k)+w (k) (19)
Wherein, w (k) is sparse noise, by above-mentioned optimization problem, obtaining the optimum of ar parameter is estimated Meter θ (k),The optimization problem that formula (17) represents is a convex optimization problem, can be using the interior point in engineering Method is solved;
5) estimated speech signal status switch
5.1) from the traditional Kalman filtering problem of convex optimization angle reconstruct
In order to easily estimate to sparse noise, need to ask from the angle reconstruct Kalman filtering of convex optimization Topic, the state-space model of traditional Kalman filtering is as follows:
X (k)=fx (k-1)+p (k) (20)
Y (k)=cx (k)+n (k) (21)
According to Bayes principle, Kalman filtering problem is expressed as, under the premise of metric data y (k) is known, estimating Excellent voice status sequence x (k) it may be assumed that
p ( x ( k ) | y ( k ) ) = p ( y ( k ) | x ( k ) ) p ( x ( k ) p ( y ( k ) ) - - - ( 22 )
Theoretical according to maximal possibility estimation, set up p (y (k) | x (k)) and p (likelihood function of x (k):
l 1 ( y ( k ) , x ( k ) ) = p ( x ( k ) ) p ( n ( k ) ) p ( x ( k ) ) = p ( w ( k ) ) = 1 ( 2 π ) m | r | 1 / 2 exp ( - 1 2 w t ( k ) r - 1 w ( k ) ) - - - ( 23 )
l 2 ( x ( k ) ) = p ( x ( k ) ) = 1 ( 2 π ) n | σ | 1 / 2 exp ( - 1 2 ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) ) - - - ( 24 )
Wherein, θ beThe covariance matrix of conditional probability p in the case of known (x (k) | y (k-1)) θ=fp (k-1 | k-1) ft+ q (k-1), wherein p (k-1 | k-1) it is covariance updated value;When likelihood function condition l1(y(k),x (k)) and l2When (x (k)) obtains maximum, and conditional probability p (x (k) | y (k)) obtain optimal estimation value;Observation type (23) and formula (24) find to maximize likelihood function condition l1(y (k), x (k)) and l2(x (k)) is equivalent to power exponent in minimum likelihood function Exponential partWithTherefore To the following form that optimizes:
min i m i z e w t ( k ) r - 1 w ( k ) + ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) )
Subjiect to y (k)=cx (k)+n (k) (25)
Wherein, x (k) and n (k) is variable, and θ is the covariance matrix of Gaussian noise;The estimated value of x (k) isN (k) is exactly the estimation to Gaussian noise;
P (k | k) updates matrix for covariance:
P (k | k)=(i-k (k) c (k)) p (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=f (k-1) p (k-1 | k-1) f (k-1)t+q(k-1) (27)
kθK () is covariance gain:
K (k)=p (k | k-1) ct(cp(k|k-1)ct+r(k-1))-1(28)
5.2) build the estimation problem to sparse noise from the convex angle that optimizes
The core concept of the estimation of sparse noise is the sparse characteristic using noise, through step 5.1) by traditional Kalman After filtering problem is converted into convex optimization problem, sparse noise n can be increased in optimizationsK the sparsity constraints of () are right to complete The estimation of sparse noise, new optimization form is:
min i m i z e w t ( k ) r - 1 w ( k ) + ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) + λ | | v ( k ) | | 1
Subjiect to y (k)=cx (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, by above-mentioned optimization problem, obtaining the optimum to molten bath centroid position Estimate x (k), x (k) is the optimal estimation in traditional Kalman filtering to state valueThe optimization that formula (29) represents is asked An entitled convex optimization problem, can be solved using the interior point method in engineering;
5.3), after completing the enhancing to k moment voice signal, strengthen resultStep 4 will be returned to), it is used for Update the ar parameter θ (k+1) in k+1 moment, be further continued for carrying out the speech enhan-cement in k+1 moment afterwards, estimate x (k+1), until by institute There is Speech processing complete.
The present invention compared with prior art, has the advantage that and beneficial effect:
1st, the present invention is directed to ar parameter in speech model (especially autoregression ar model) and can not change in real time more with noise New problem is it is proposed that double card Kalman Filtering framework, two Kalman filter concurrent operations, voice signal state estimation and ar Parameter estimation updates mutually, and state estimation procedure and parameter estimation procedure are alternately so that parameter estimation procedure can adapt to Noise change procedure, to improve the accuracy of system model, and then improves the performance of speech enhan-cement.
2nd, the present invention cannot process the problem of nonstationary noise for traditional Kalman filter algorithm, in conjunction with convex optimization skill Art is it is proposed that improved Kalman filter framework.New algorithm has been simultaneously introduced Gauss to measurement process in speech enhan-cement model Noise and nonstationary noise item, set up rational Optimized model by using convex optimisation technique, can be to Gaussian noise and non-flat Steady noise is accurately estimated, improves the accuracy of speech enhan-cement.
Brief description
Fig. 1 is the flow chart of the sound enhancement method under nonstationary noise.
Fig. 2 a is primary speech signal schematic diagram.
Fig. 2 b is the voice signal schematic diagram with white Gaussian noise.
Fig. 2 c is the voice signal schematic diagram with white Gaussian noise and nonstationary noise.
Fig. 3 is the voice enhancement algorithm flow chart based on dual improved Kalman filter.
Fig. 4 a is primary speech signal.
Fig. 4 b is speech enhan-cement result schematic diagram.
Specific embodiment
With reference to specific embodiment, the invention will be further described.
As shown in figure 1, the online sound enhancement method being applied under nonstationary noise environment described in the present embodiment, including Following steps:
1) set up the system model under nonstationary noise environment
1.1) the autoregression ar model in the case of setting up that Gaussian noise and sparse noise are common and existing
The generation process of voice signal can be described as one by white-noise excitation, through the output of full limit linear system from Recursive procedure, i.e. current output is equal to the pumping signal of present moment and the weighted sum of p moment output in the past, and this is one Autoregression ar model, is expressed as follows
s ( k ) = σ i = 1 p a i s ( k - i ) + u ( k ) - - - ( 1 )
Wherein, u (k) is the white Gaussian noise excitation value in k moment;S (k-i) is the voice signal in (k-i) moment;s(k) Voice signal for the kth moment;aiFor i-th linear predictor coefficient, also referred to as ar model parameter;P is the rank of ar model parameter Number.
As shown in Fig. 2 a, 2b, 2c, the voice signal observing in actual environment can be by various sound pollutions, especially right and wrong Stationary noise, proposes in the present invention to consider Gaussian noise and nonstationary noise during voice signal measures simultaneously, sets up more Meet the voice signal model of actual measurement process.Voice signal in the present invention measures process and can be described as follows:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, y (k) is k moment voice signal measurement sequence;S (k) is the voice signal in k moment;N (k) is that the k moment is high This white noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has openness.
1.2) set up voice signal state-space model
Formula (1) and formula (2) are converted to state-space model, can be described as follows:
X (k)=fx (k-1)+p (k) (3)
Y (k)=cx (k)+n (k)+v (k) (4)
Wherein
f = 0 1 0 ... 0 0 0 1 ... 0 ... ... ... ... ... 0 0 0 ... 1 a p ( k ) a p - 1 ( k ) a p - 2 ( k ) a 1 ( k ) - - - ( 5 )
C=[0 0 ... 0 1] (6)
X (k)=[s (k-p+1) ... s (k)]t(7)
In voice signal state equation (3) and voice signal measurement equation (4), x (k) is k moment voice signal state Estimated sequence, i.e. the optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;y(k) For k moment voice signal measurement sequence;The state-transition matrix that f is constituted for linear predictor coefficient, last column [a in fp(k) … a1(k)] it is referred to as ar parameter.;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, obeys high This distribution;N (k) is k moment measurement noise, Gaussian distributed;V (k) is the nonstationary noise in k moment, obeys Laplce Distribution.
The statistical property of the state of voice signal and measurement noise p (k) and n (k) is:
E (p (k))=q, e (n (k))=r
e(p(k)p(j)t)=q δkj,e(n(k)n(j)t)=r δkj(8)
Wherein, q and r is respectively the average of noise p (k) and n (k);Q and r is respectively the covariance of noise p (k) and n (k). δkjFor kronecker function.Speech Enhancement problem is to go to estimate optimum voice on the premise of known measurement voice signal y (k) Signal x (k).
2) framing and adding window
Voice signal has short-term stationarity (it is considered that voice signal is approximately constant in 10~30ms), thus permissible Voice signal is divided into some short sections come being processed, here it is framing, the framing of voice signal is using movably having Method that the window of limit for length's degree is weighted is realizing.Frame number typically per second is about 33~100 frames.General framing method For the method for overlapping segmentation, the overlapping part of former frame and a later frame is referred to as frame and moves, frame move with the ratio generally 0 of frame length~ 0.5.In the present invention, frame length is 25ms, and frame moves as 10ms.
3) system initialization
3.1) improved Kalman filter device parameter initialization
Initialization voice signal state estimation sequence x (0/0), covariance matrix p (0/0) are it is ensured that covariance matrix is just Fixed.
3.2) ar parameter initialization
Initialization ar parameter state estimated sequence θ (0/0), in the present invention, the exponent number of ar parameter (rule of thumb sets for 13 Fixed).
4) estimate ar parameter
Ar parameter refers to last column [a in state-transition matrix f in formula (3)p(k) … a1(k)], it is mainly used to Description speech production process, its accuracy has direct impact to the result of speech enhan-cement.Ar parameter estimation in practical application Larger by voice signal itself, various influence of noise, therefore propose in the present invention to consider voice in the estimation of ar parameter Signal condition estimated sequence x (k-1), state-noise q (k), measurement noise n (k), nonstationary noise v (k) etc., set up new ar Parameter estimation state-space model, realizes the online Robust Estimation of ar parameter, and this is a core point of the present invention.As shown in figure 3, As follows to the real-time estimation process of ar parameter:
4.1) set up the parameter estimation model of ar parameter
The ar parameter model that Gaussian noise and nonstationary noise mix under lower environment is described as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=a θ (k)+r (k)+w (k) (9)
Wherein θ (k)=[ap(k) … a1(k)]tFor k moment ar parameter state sequence;Q (k) is k moment state-noise, Gaussian distributed, its covariance matrix is q (k);R (k) k moment measurement noise, Gaussian distributed, its covariance matrix is r(k);W (k) k moment measurement noise, Gaussian distributed, its covariance matrix is w (k);A=x (k-1)t=[s (k-p) ... S (k-1)] it is measurement matrix;Y (k) is k moment voice signal measurement sequence.State and the statistics of measurement noise q (k) and r (k) Characteristic is:
E (q (k))=d, e (r (k))=l
e(q(k)q(j)t)=d δkj,e(r(k)r(j)t)=l δkj(10)
Wherein, d and l is respectively the average of noise q (k) and r (k);D and l is respectively the covariance of noise q (k) and r (k). δkjFor kronecker function.
4.2) from the traditional Kalman filtering problem of convex optimization angle reconstruct
In order to easily estimate to sparse noise, need to ask from the angle reconstruct Kalman filtering of convex optimization Topic.The state-space model (without nonstationary noise w (k)) of traditional Kalman filtering is as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=a θ (k)+r (k) (11)
According to Bayes principle, ar Parameter Estimation Problem can be expressed as, under the premise of metric data y (k) is known, estimating Optimum ar argument sequence θ of meter (k) it may be assumed that
p ( θ ( k ) | y ( k ) ) = p ( y ( k ) | θ ( k ) ) p ( θ ( k ) ) p ( y ( k ) ) - - - ( 12 )
Theoretical according to maximal possibility estimation, set up the likelihood function of p (y (k) | θ (k)) and p (θ (k)):
l 1 ( y ( k ) , θ ( k ) ) = p ( θ ( k ) ) p ( r ( k ) ) p ( θ ( k ) ) = p ( r ( k ) ) = 1 ( 2 π ) m | l | 1 / 2 exp ( - 1 2 r t ( k ) l - 1 r ( k ) ) - - - ( 13 )
l 2 ( θ ( k ) ) = p ( θ ( k ) ) = 1 ( 2 π ) n | σ | 1 / 2 exp ( - 1 2 ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) - - - ( 14 )
Wherein, ψ beThe covariance matrix ψ of conditional probability p in the case of known (θ (k) | y (k)) (k)=pθ(k | k)+d (k) (wherein pθ(k | k) be covariance updated value).When likelihood function condition l1(y (k), θ (k)) and l2(θ (k)) when obtaining maximum, conditional probability p (y (k) | θ (k)) obtains optimal estimation value.Observation type (12) and formula (13) can be sent out Now maximize likelihood function condition l1(z (k), x (k+1)) and l2(x (k+1)) is equivalent to power exponent in minimum likelihood function Exponential partWithTherefore may be used To be optimized form as follows:
min i m i z e r t ( k ) l - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) )
Subjiect to y (k)=a θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, ψ (k)=pθ(k | k)+d (k) is the covariance matrix of Gaussian noise.θ(k) Estimated value beR (k) is exactly the estimation to Gaussian noise.pθ(k | k) updates matrix for covariance:
pθ(k | k)=(i-kθ(k)a(k))pθ(k|k-1) (16)
pθ(k | k-1) be covariance prediction matrix:
pθ(k | k-1)=pθ(k-1|k-1)+d(k-1) (17)
kθK () is covariance gain:
kθ(k)=pθ(k|k-1)at(apθ(k|k-1)at+l(k-1))-1(18)
4.3) build, from the convex angle that optimizes, the optimization problem that nonstationary noise is estimated
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept that nonstationary noise is estimated is profit With the sparse characteristic of noise, through step 4.2) traditional Kalman filtering problem is converted into after convex optimization problem, can be excellent Increase the sparsity constraints of nonstationary noise w (k) completing the estimation to sparse noise, new optimization form is in change:
min i m i z e r t ( k ) l - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) + λ | | w ( k ) | | 1
Subjiect to y (k)=a θ (k)+r (k)+w (k) (19)
Wherein, w (k) is sparse noise, by above-mentioned optimization problem, obtaining the optimum of ar parameter is estimated Meter θ (k) (note:), the optimization problem that formula (17) represents is a convex optimization problem, it is possible to use in engineering relatively Solved for ripe interior point method.
5) estimated speech signal status switch.
During speech signal collection, nonstationary noise affects larger on voice quality.In order to improve voice quality, Voice enhancement algorithm allows for tackling the situation of Gaussian noise and nonstationary noise mixing simultaneously.Nonstationary noise is typically obeyed Laplacian distribution, has sparse characteristic, and the estimation of nonstationary noise mainly be make use of with the sparse characteristic of noise.For convenience In optimization problem introduce noise sparsity constraints, initially with convex optimisation technique by traditional Kalman filtering problem reformulation be one Individual convex optimization problem, then introduces the sparsity constraints to sparse noise in the new optimization building, is finally completed speech enhan-cement Task, this is another core point of the present invention.
5.1) from the traditional Kalman filtering problem of convex optimization angle reconstruct
In order to easily estimate to sparse noise, need to ask from the angle reconstruct Kalman filtering of convex optimization Topic.The state-space model of traditional Kalman filtering is as follows:
X (k)=fx (k-1)+p (k) (20)
Y (k)=cx (k)+n (k) (21)
According to Bayes principle, Kalman filtering problem can be expressed as, under the premise of metric data y (k) is known, estimating Optimum voice status sequence x of meter (k) it may be assumed that
p ( x ( k ) | y ( k ) ) = p ( y ( k ) | x ( k ) ) p ( x ( k ) p ( y ( k ) ) - - - ( 22 )
Theoretical according to maximal possibility estimation, set up p (y (k) | x (k)) and p (likelihood function of x (k):
l 1 ( y ( k ) , x ( k ) ) = p ( x ( k ) ) p ( n ( k ) ) p ( x ( k ) ) = p ( w ( k ) ) = 1 ( 2 π ) m | r | 1 / 2 exp ( - 1 2 w t ( k ) r - 1 w ( k ) ) - - - ( 23 )
l 2 ( x ( k ) ) = p ( x ( k ) ) = 1 ( 2 π ) n | σ | 1 / 2 exp ( - 1 2 ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) ) - - - ( 24 )
Wherein, θ beThe covariance matrix of conditional probability p in the case of known (x (k) | y (k-1)) θ=fp (k-1 | k-1) ft+ q (k-1) (wherein p (k-1 | k-1) be covariance updated value).When likelihood function condition l1(y(k),x (k)) and l2When (x (k)) obtains maximum, and conditional probability p (x (k) | y (k)) obtain optimal estimation value.Observation type (23) and formula (24) it can be found that maximizing likelihood function condition l1(y (k), x (k)) and l2(x (k)) is equivalent to power in minimum likelihood function The exponential part of indexWithCause This can be optimized form as follows:
min i m i z e w t ( k ) r - 1 w ( k ) + ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) )
Subjiect to y (k)=cx (k)+n (k) (25)
Wherein, x (k) and n (k) is variable, and θ is the covariance matrix of Gaussian noise.The estimated value of x (k) isN (k) is exactly the estimation to Gaussian noise.
P (k | k) updates matrix for covariance:
P (k | k)=(i-k (k) c (k)) p (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=f (k-1) p (k-1 | k-1) f (k-1)t+q(k-1) (27)
kθK () is covariance gain:
K (k)=p (k | k-1) ct(cp(k|k-1)ct+r(k-1))-1(28)
5.2) build the estimation problem to sparse noise from the convex angle that optimizes
The core concept of the estimation of sparse noise is the sparse characteristic using noise, through step 5.1) by traditional Kalman After filtering problem is converted into convex optimization problem, sparse noise n can be increased in optimizationsK the sparsity constraints of () are right to complete The estimation of sparse noise, new optimization form is:
min i m i z e w t ( k ) r - 1 w ( k ) + ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) + λ | | v ( k ) | | 1
Subjiect to y (k)=cx (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, by above-mentioned optimization problem, obtaining to molten bath centroid position (note: x (k) is the optimal estimation to state value in traditional Kalman filtering for optimal estimation x (k)), formula (29) represents Optimization problem be a convex optimization problem, it is possible to use in engineering, more ripe interior point method is solved.
5.3), after completing the enhancing to k moment voice signal, strengthen resultStep 4 will be returned to), it is used for Update the ar parameter θ (k+1) in k+1 moment, be further continued for carrying out the speech enhan-cement in k+1 moment afterwards, estimate x (k+1), until by institute There is Speech processing complete.
As shown in Figs. 4a and 4b, can relatively accurately Gaussian noise and non-stationary be made an uproar through method proposed by the present invention Sound is filtered, and former voice signal is strengthened.
Using the present invention, can accurately estimate and filter white noise and nonstationary noise, realize white noise and non-stationary Speech enhan-cement under noise mixing, provides more pure estimated speech signal simultaneously, is that the raising of speech recognition accuracy carries Support for front end.
Because the present invention establishes two Robust Kalman Filter models, the generating process model of voice signal is carried out Mathematical modeling, has all done on the temporal characteristics and time-varying characteristics of voice and has targetedly considered, ar parameter estimation has taken dynamic reality Shi Gengxin iteration, meets the requirement of parameter time varying characteristic, often estimated speech signal can be gone to utilize by state estimation by frame again Voice short-term stationarity characteristic, so that filter effect is better than traditional Kalman filtering in result, is worthy to be popularized.
Embodiment described above is only the preferred embodiments of the invention, not limits the enforcement model of the present invention with this Enclose, therefore the change that all shapes according to the present invention, principle are made, all should cover within the scope of the present invention.

Claims (1)

1. a kind of online sound enhancement method being applied under nonstationary noise environment is it is characterised in that comprise the following steps:
1) set up the system model under nonstationary noise environment
1.1) the autoregression ar model in the case of setting up that Gaussian noise and sparse noise are common and existing
The generation process of voice signal be one by white-noise excitation, through the output of full limit linear system from recursive procedure, that is, Current output is equal to the pumping signal of present moment and the weighted sum of p moment output in the past, and this is an autoregression ar mould Type, is expressed as follows:
s ( k ) = σ i = 1 p a i s ( k - i ) + u ( k ) - - - ( 1 )
Wherein, u (k) is the white Gaussian noise excitation value in k moment;S (k-i) is the voice signal in (k-i) moment;S (k) is the The voice signal in k moment;aiFor i-th linear predictor coefficient, also referred to as ar model parameter;P is the exponent number of ar model parameter;
Set up the voice signal model meeting actual measurement process, it is as follows that voice signal measures process description:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, y (k) is k moment voice signal measurement sequence;S (k) is the voice signal in k moment;N (k) is k moment white Gaussian Noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has openness;
1.2) set up voice signal state-space model
Formula (1) and formula (2) are converted to state-space model, are described as follows:
X (k)=fx (k-1)+p (k) (3)
Y (k)=cx (k)+n (k)+v (k) (4)
Wherein,
f = 0 1 0 ... 0 0 0 1 ... 0 ... ... ... ... ... 0 0 0 ... 1 a p ( k ) a p - 1 ( k ) a p - 2 ( k ) a 1 ( k ) - - - ( 5 )
C=[0 0 ... 0 1] (6)
X (k)=[s (k-p+1) ... s (k)]t(7)
In voice signal state equation (3) and voice signal measurement equation (4), x (k) is k moment voice signal state estimation Sequence, i.e. the optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;When y (k) is k Carve voice signal measurement sequence;The state-transition matrix that f is constituted for linear predictor coefficient, last column [a in fp(k) … a1 (k)] it is referred to as ar parameter;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, obeys Gauss and divides Cloth;N (k) is k moment measurement noise, Gaussian distributed;V (k) is the nonstationary noise in k moment, obeys laplacian distribution;
The statistical property of the state of voice signal and measurement noise p (k) and n (k) is:
E (p (k))=q, e (n (k))=r
e(p(k)p(j)t)=q δkj,e(n(k)n(j)t)=r δkj(8)
Wherein, q and r is respectively the average of noise p (k) and n (k);Q and r is respectively the covariance of noise p (k) and n (k);δkjFor Kronecker function;Speech Enhancement problem is to go to estimate optimum voice signal x on the premise of known measurement voice signal y (k) (k);
2) framing and adding window
Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, this makes it possible to voice signal to divide For some short sections come being processed, here it is framing, the framing of voice signal is the window using moveable finite length The method that is weighted is realizing;Frame number generally per second is 33~100 frames, and framing method is the method for overlapping segmentation, front The overlapping part of one frame and a later frame is referred to as frame and moves, and frame moves and the ratio of frame length is 0~0.5;
3) system initialization
3.1) improved Kalman filter device parameter initialization
Initialization voice signal state estimation sequence x (0/0), covariance matrix p (0/0) are it is ensured that covariance matrix is positive definite;
3.2) ar parameter initialization
Initialization ar parameter state estimated sequence θ (0/0);
4) estimate ar parameter
Ar parameter refers to last column [a in state-transition matrix f in formula (3)p(k) … a1(k)], it is mainly used to describe Speech production process, its accuracy has direct impact to the result of speech enhan-cement;Propose comprehensive in the estimation of ar parameter Consider voice signal state estimation sequence x (k-1), state-noise q (k), measurement noise n (k), nonstationary noise v (k), set up New ar parameter estimation state-space model, realizes the online Robust Estimation of ar parameter, and to the real-time estimation process of ar parameter such as Under:
4.1) set up the parameter estimation model of ar parameter
The ar parameter model that Gaussian noise and nonstationary noise mix under lower environment is described as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=a θ (k)+r (k)+w (k) (9)
Wherein, θ (k)=[ap(k) … a1(k)]tFor k moment ar parameter state sequence;Q (k) is k moment state-noise, obeys Gauss distribution, its covariance matrix is q (k);R (k) k moment measurement noise, Gaussian distributed, its covariance matrix is r (k);W (k) k moment measurement noise, Gaussian distributed, its covariance matrix is w (k);A=x (k-1)t=[s (k-p) ... s (k-1)] it is measurement matrix;Y (k) is k moment voice signal measurement sequence;The statistics of state and measurement noise q (k) and r (k) is special Property is:
E (q (k))=d, e (r (k))=l
e(q(k)q(j)t)=d δkj,e(r(k)r(j)t)=l δkj(10)
Wherein, d and l is respectively the average of noise q (k) and r (k);D and l is respectively the covariance of noise q (k) and r (k);δkjFor Kronecker function;
4.2) from the traditional Kalman filtering problem of convex optimization angle reconstruct
In order to easily estimate to sparse noise, need to reconstruct Kalman filtering problem from the angle of convex optimization, pass The state-space model of system Kalman filtering, without nonstationary noise w (k), as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=a θ (k)+r (k) (11)
According to Bayes principle, ar Parameter Estimation Problem is expressed as, under the premise of metric data y (k) is known, estimating optimum ar Argument sequence θ (k) it may be assumed that
p ( θ ( k ) | y ( k ) ) = p ( y ( k ) | θ ( k ) ) p ( θ ( k ) ) p ( y ( k ) ) - - - ( 12 )
Theoretical according to maximal possibility estimation, set up the likelihood function of p (y (k) | θ (k)) and p (θ (k)):
l 1 ( y ( k ) , θ ( k ) ) = p ( θ ( k ) ) p ( r ( k ) ) p ( θ ( k ) ) = p ( r ( k ) ) = 1 ( 2 π ) m | l | 1 / 2 exp ( - 1 2 r t ( k ) l - 1 r ( k ) ) - - - ( 13 )
l 2 ( θ ( k ) ) = p ( θ ( k ) ) = 1 ( 2 π ) n | σ | 1 / 2 exp ( - 1 2 ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) - - - ( 14 )
Wherein, ψ beCovariance matrix ψ (k) of conditional probability p in the case of known (θ (k) | y (k))= pθ(k | k)+d (k), wherein pθ(k | k) it is covariance updated value;When likelihood function condition l1(y (k), θ (k)) and l2(θ (k)) takes When obtaining maximum, and conditional probability p (y (k) | θ (k)) obtain optimal estimation value;Observation type (12) and formula (13) find to maximize seemingly So function condition l1(z (k), x (k+1)) and l2(x (k+1)) is equivalent to the exponential part minimizing power exponent in likelihood functionWithTherefore obtain excellent as follows Change form:
min i m i z e r t ( k ) l - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) s u b j i e c t t o y ( k ) = a θ ( k ) + r ( k ) - - - ( 15 )
Wherein, θ (k) and r (k) is variable, ψ (k)=pθ(k | k)+d (k) is the covariance matrix of Gaussian noise;The estimation of θ (k) Value isR (k) is exactly the estimation to Gaussian noise;pθ(k | k) updates matrix for covariance:
pθ(k | k)=(i-kθ(k)a(k))pθ(k|k-1) (16)
pθ(k | k-1) be covariance prediction matrix:
pθ(k | k-1)=pθ(k-1|k-1)+d(k-1) (17)
kθK () is covariance gain:
kθ(k)=pθ(k|k-1)at(apθ(k|k-1)at+l(k-1))-1(18)
4.3) build, from the convex angle that optimizes, the optimization problem that nonstationary noise is estimated
Nonstationary noise obeys laplacian distribution, has a sparse characteristic, and the core concept that nonstationary noise is estimated is using making an uproar The sparse characteristic of sound, through step 4.2) traditional Kalman filtering problem is converted into after convex optimization problem, can be in optimization Completing the estimation to sparse noise, new optimization form is the sparsity constraints increasing nonstationary noise w (k):
min i m i z e r t ( k ) l - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) t ψ - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) + λ | | w ( k ) | | 1 s u b j i e c t t o y ( k ) = a θ ( k ) + r ( k ) + w ( k ) - - - ( 19 )
Wherein, w (k) is sparse noise, by above-mentioned optimization problem, obtaining the optimal estimation θ to ar parameter (k),The optimization problem that formula (17) represents is a convex optimization problem, can be using the interior point method in engineering Solved;
5) estimated speech signal status switch
5.1) from the traditional Kalman filtering problem of convex optimization angle reconstruct
In order to easily estimate to sparse noise, need to reconstruct Kalman filtering problem from the angle of convex optimization, pass The state-space model of system Kalman filtering is as follows:
X (k)=fx (k-1)+p (k) (20)
Y (k)=cx (k)+n (k) (21)
According to Bayes principle, Kalman filtering problem is expressed as, under the premise of metric data y (k) is known, estimating optimum language Sound status switch x (k) it may be assumed that
p ( x ( k ) | y ( k ) ) = p ( y ( k ) | x ( k ) ) p ( x ( k ) ) p ( y ( k ) ) - - - ( 22 )
Theoretical according to maximal possibility estimation, set up p (y (k) | x (k)) and p (likelihood function of x (k):
l 1 ( y ( k ) , x ( k ) ) = p ( x ( k ) ) p ( n ( k ) ) p ( x ( k ) ) = p ( w ( k ) ) = 1 ( 2 π ) m | r | 1 / 2 exp ( - 1 2 w t ( k ) r - 1 w ( k ) ) - - - ( 23 )
l 2 ( x ( k ) ) = p ( x ( k ) ) = 1 ( 2 π ) n | σ | 1 / 2 exp ( - 1 2 ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) ) - - - ( 24 )
Wherein, θ beThe covariance matrix θ of conditional probability p in the case of known (x (k) | y (k-1))= fp(k-1|k-1)ft+ q (k-1), wherein p (k-1 | k-1) it is covariance updated value;When likelihood function condition l1(y(k),x(k)) And l2When (x (k)) obtains maximum, and conditional probability p (x (k) | y (k)) obtain optimal estimation value;Observation type (23) and formula (24) Find to maximize likelihood function condition l1(y (k), x (k)) and l2(x (k)) is equivalent to the finger minimizing power exponent in likelihood function Fractional partWithTherefore obtain as Lower optimization form:
min i m i z e w t ( k ) r - 1 w ( k ) + ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) s u b j i e c t t o y ( k ) = c x ( k ) + n ( k ) - - - ( 25 )
Wherein, x (k) and n (k) is variable, and θ is the covariance matrix of Gaussian noise;The estimated value of x (k) isn K () is exactly the estimation to Gaussian noise;
P (k | k) updates matrix for covariance:
P (k | k)=(i-k (k) c (k)) p (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=f (k-1) p (k-1 | k-1) f (k-1)t+q(k-1) (27)
kθK () is covariance gain:
K (k)=p (k | k-1) ct(cp(k|k-1)ct+r(k-1))-1(28)
5.2) build the estimation problem to sparse noise from the convex angle that optimizes
The core concept of the estimation of sparse noise is the sparse characteristic using noise, through step 5.1) by traditional Kalman filtering After problem is converted into convex optimization problem, sparse noise n can be increased in optimizationsK the sparsity constraints of () are completing to sparse The estimation of noise, new optimization form is:
min i m i z e w t ( k ) r - 1 w ( k ) + ( x ( k ) - x ^ ( k | k - 1 ) ) t θ - 1 ( x ( k ) - x ^ ( k | k - 1 ) ) + λ | | v ( k ) | | 1 s u b j i e c t t o y ( k ) = c x ( k ) + n ( k ) + v ( k ) - - - ( 29 )
Wherein, v (k) is sparse noise, by above-mentioned optimization problem, obtaining the optimal estimation to molten bath centroid position X (k), x (k) are the optimal estimation in traditional Kalman filtering to state valueThe optimization problem that formula (29) represents is one Individual convex optimization problem, can be solved using the interior point method in engineering;
5.3), after completing the enhancing to k moment voice signal, strengthen resultStep 4 will be returned to), for updating k The ar parameter θ (k+1) in+1 moment, is further continued for carrying out the speech enhan-cement in k+1 moment afterwards, estimates x (k+1), until by all languages Sound signal processing is complete.
CN201610843483.0A 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise Expired - Fee Related CN106340304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610843483.0A CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610843483.0A CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Publications (2)

Publication Number Publication Date
CN106340304A true CN106340304A (en) 2017-01-18
CN106340304B CN106340304B (en) 2019-09-06

Family

ID=57840174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610843483.0A Expired - Fee Related CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Country Status (1)

Country Link
CN (1) CN106340304B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248212A (en) * 2019-05-27 2019-09-17 上海交通大学 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system
CN110648680A (en) * 2019-09-23 2020-01-03 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN112557925A (en) * 2020-11-11 2021-03-26 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305345A1 (en) * 2009-02-03 2011-12-15 University Of Ottawa Method and system for a multi-microphone noise reduction
CN102890935A (en) * 2012-10-22 2013-01-23 北京工业大学 Robust speech enhancement method based on fast Kalman filtering
CN103323815A (en) * 2013-03-05 2013-09-25 上海交通大学 Underwater acoustic locating method based on equivalent sound velocity
CN103903630A (en) * 2014-03-18 2014-07-02 北京捷通华声语音技术有限公司 Method and device used for eliminating sparse noise

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305345A1 (en) * 2009-02-03 2011-12-15 University Of Ottawa Method and system for a multi-microphone noise reduction
CN102890935A (en) * 2012-10-22 2013-01-23 北京工业大学 Robust speech enhancement method based on fast Kalman filtering
CN103323815A (en) * 2013-03-05 2013-09-25 上海交通大学 Underwater acoustic locating method based on equivalent sound velocity
CN103903630A (en) * 2014-03-18 2014-07-02 北京捷通华声语音技术有限公司 Method and device used for eliminating sparse noise

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯宝: "基于凸优化技术的改进型卡尔曼滤波算法", 《自动化与信息工程》 *
吴飞: "一种具有在线参数调整功能的Kalman滤波及其应用", 《计算机工程与科学》 *
吴飞: "鲁棒卡尔曼算法及其应用研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248212A (en) * 2019-05-27 2019-09-17 上海交通大学 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system
CN110248212B (en) * 2019-05-27 2020-06-02 上海交通大学 Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system
CN110648680A (en) * 2019-09-23 2020-01-03 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN110648680B (en) * 2019-09-23 2024-05-14 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN112557925A (en) * 2020-11-11 2021-03-26 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device
CN112557925B (en) * 2020-11-11 2023-05-05 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device

Also Published As

Publication number Publication date
CN106340304B (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN111261183B (en) Method and device for denoising voice
Deng et al. Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
WO2020107269A1 (en) Self-adaptive speech enhancement method, and electronic device
CN102750956B (en) Method and device for removing reverberation of single channel voice
Mahmmod et al. Speech enhancement algorithm based on super-Gaussian modeling and orthogonal polynomials
CN106971740A (en) Probability and the sound enhancement method of phase estimation are had based on voice
CN103325381B (en) A kind of speech separating method based on fuzzy membership functions
CN109192200B (en) Speech recognition method
CN103065629A (en) Speech recognition system of humanoid robot
CN111968658B (en) Speech signal enhancement method, device, electronic equipment and storage medium
CN111785288B (en) Voice enhancement method, device, equipment and storage medium
CN106157964A (en) A kind of determine the method for system delay in echo cancellor
CN106340304A (en) Online speech enhancement method for non-stationary noise environment
CN107785028A (en) Voice de-noising method and device based on signal autocorrelation
Do et al. Speech source separation using variational autoencoder and bandpass filter
González et al. MMSE-based missing-feature reconstruction with temporal modeling for robust speech recognition
EP4325487A1 (en) Voice signal enhancement method and apparatus, and electronic device
Shi et al. Fusion feature extraction based on auditory and energy for noise-robust speech recognition
CN116013344A (en) Speech enhancement method under multiple noise environments
CN115171712A (en) Speech enhancement method suitable for transient noise suppression
CN114495969A (en) Voice recognition method integrating voice enhancement
CN103903630A (en) Method and device used for eliminating sparse noise
Ernawan et al. Efficient discrete tchebichef on spectrum analysis of speech recognition
CN115223583A (en) Voice enhancement method, device, equipment and medium
CN103903631A (en) Speech signal blind separating method based on variable step size natural gradient algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190906