CN106340304B

CN106340304B - A kind of online sound enhancement method under the environment suitable for nonstationary noise

Info

Publication number: CN106340304B
Application number: CN201610843483.0A
Authority: CN
Inventors: 冯宝; 张绍荣; 孙山林; 郑伟; 张国宁; 武博; 韦周耀
Original assignee: Guilin University of Aerospace Technology
Current assignee: Guilin University of Aerospace Technology
Priority date: 2016-09-23
Filing date: 2016-09-23
Publication date: 2019-09-06
Anticipated expiration: 2036-09-23
Also published as: CN106340304A

Abstract

The invention discloses the online sound enhancement methods under a kind of environment suitable for nonstationary noise, comprising steps of 1) establishing the system model under nonstationary noise environment；2) framing and adding window；3) system initialization；4) estimate AR parameter；5) estimated speech signal status switch.The present invention is aiming at the problem that AR parameter in speech model cannot change real-time update with noise, propose double card Kalman Filtering frame, two Kalman filter concurrent operations, voice signal state estimation and AR parameter Estimation update mutually, state estimation procedure and parameter estimation procedure are alternately, so that parameter estimation procedure can adapt to noise change procedure, to improve the accuracy of system model, and then the performance of speech enhan-cement is improved.The present invention, in conjunction with convex optimisation technique, proposes improved Kalman filter frame aiming at the problem that traditional Kalman filter algorithm can not handle nonstationary noise, can accurately be estimated Gaussian noise and nonstationary noise, improve the accuracy of speech enhan-cement.

Description

A kind of online sound enhancement method under the environment suitable for nonstationary noise

Technical field

The present invention relates to field of speech enhancement, the online voice referred in particular under a kind of environment suitable for nonstationary noise increases Strong method.

Background technique

In speech recognition front-ends treatment process, voice signal always by various noise jammings and floods, due to interference Randomness, can only signal processing technology go as far as possible enhance voice quality.The main purpose of speech enhan-cement is from noisy speech In extract pure raw tone.

Common voice enhancement algorithm mainly include the following types:

1, noise cancellation method: this method is directly to subtract noise component(s) from noisy speech according in a time domain or in a frequency domain What the method gone was realized.The maximum feature of this method is to need using background signal as reference signal, reference signal accurately with The no performance for directly determining this method.

2, harmonic signal enhancement method: since the voiced sound in voice has significantly periodically, this periodically reflection is into frequency domain It is then a series of peak components one by one for respectively corresponding fundamental frequency (fundamental tone) and its harmonic wave, these frequency components occupy voice Most of energy can carry out speech enhan-cement using this periodicity, and fundamental tone and its harmonic wave point are extracted using comb filter Amount, inhibits other periodic noises and aperiodic broadband noise.

3, the enhancing algorithm based on speech production model: the voiced process of voice can be modeled as a linear time-varying filtering Device.Different driving sources is used to different types of voice.In the generation model of voice, most widely used is full pole mould Type.Based on the available a series of voice enhancement algorithm of speech production model, such as time-varying parameter Wiener filtering and Kalman Filtering method.

4, the enhancing algorithm based on short time spectrum: there are many enhancing algorithm type based on voice short time spectrum, such as compose Subtractive method, Wiener Filter Method, LMSE method etc..SNR ranges are big, method is simple, are easy to adapting to for such method In real time the advantages that processing.

5, the enhancing algorithm based on wavelet decomposition: wavelet decomposition method is the hair with this tool of mathematical analysis of wavelet decomposition It opens up and grows up, while it combines some basic principles of subtractive method of spectrums again.

6, the enhancing algorithm based on sense of hearing shielding: sense of hearing screen method is calculated using a kind of enhancing of the auditory properties of human ear Method.

Voice enhancement algorithm based on Kalman filtering belong to above the third, traditional Kalman filtering carry out voice increasing There are two important hypothesis when strong: process noise and the measurement equal Gaussian distributed of noise.Traditional Kalman filtering is in actual speech Following both sides limitation is shown in enhancing: 1. the estimation of AR parameter must be accurate.However environment is acquired in actual speech In, noise be it is continually changing, this requires the estimations of AR parameter in speech model should have real-time, while should be in AR parameter The influence of various noises is considered in estimation procedure, otherwise will lead to the decline of speech enhan-cement performance.2. traditional Kalman filtering is calculated Method only considers that the case where Gaussian noise does not meet practical application.It can be by a kind of nonstationary noise (tool during speech signal collection Have sparsity, obey laplacian distribution) pollution, it is not common, but is implicitly present in and is affected to voice quality.If In speech enhan-cement, when by nonstationary noise as Gaussian noise processing, it will it is serious to reduce speech enhan-cement quality, it is unfavorable for subsequent The identification of voice semanteme.

Based on the above issues, provide a kind of can handle Gaussian noise and in the case of nonstationary noise exists simultaneously in real time Online speech enhancement technique is very important.

Summary of the invention

The technical problem to be solved by the present invention is to can not handle AR in speech model for existing kalman filter method Parameter can not have nonstationary noise during real-time update, measurement, in conjunction with convex optimisation technique, provide one kind and be suitable for Online sound enhancement method under nonstationary noise environment, being capable of On-line Estimation AR parameter and nonstationary noise.

To achieve the above object, technical solution provided by the present invention are as follows: under a kind of environment suitable for nonstationary noise Online sound enhancement method, comprising the following steps:

1) system model under nonstationary noise environment is established

1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model

The generation process of voice signal is one by white-noise excitation, through the output of full pole linear system from recurrence mistake Journey, i.e., current output are equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is an autoregression AR model, is expressed as follows:

Wherein, u (k) is the white Gaussian noise excitation value at k moment；S (k-i) is the voice signal at (k-i) moment；s(k) For the voice signal at kth moment；a_iFor i-th of linear predictor coefficient, also referred to as AR model parameter；P is the rank of AR model parameter Number；

The voice signal model for meeting practical measurement process is established, it is as follows that voice signal measures process description:

Y (k)=s (k)+n (k)+v (k) (2)

Wherein, Y (k) is k moment voice signal measurement sequence；S (k) is the voice signal at k moment；N (k) is that the k moment is high This white noise；V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity；

1.2) voice signal state-space model is established

Formula (1) and formula (2) are converted into state-space model, are described as follows:

X (k)=FX (k-1)+p (k) (3)

Y (k)=CX (k)+n (k)+v (k) (4)

Wherein,

C=[0 0 ... 0 1] (6)

X (k)=[S (k-p+1) ... S (k)]^T (7)

In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state Estimated sequence, the i.e. optimal State Estimation of voice signal；X (k-1) is (k-1) moment voice signal state estimation sequence；Y(k) For k moment voice signal measurement sequence；F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in F_p(k) … a₁(k)] it is known as AR parameter；C=[0 0 ... 0 1] is to measure transfer matrix；P (k) is k moment state-noise, is obeyed high This distribution；N (k) is to measure noise, Gaussian distributed at the k moment；V (k) is the nonstationary noise at k moment, obeys Laplce Distribution；

The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:

E (p (k))=q, E (n (k))=r

E(p(k)p(j)^T)=Q δ_kj,E(n(k)n(j)^T)=R δ_kj (8)

Wherein, q and r is respectively the mean value of noise p (k) He n (k)；Q and R is respectively the covariance of noise p (k) He n (k)； δ_kjFor Kronecker function；Speech Enhancement problem is to go to estimate optimal voice under the premise of known measurement voice signal Y (k) Signal X (k)；

2) framing and adding window

Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, this makes it possible to voice is believed Number it is divided into some short sections to be handled, here it is framing, the framing of voice signal is using moveable finite length Method that window is weighted is realized；Frame number usually per second is 33~100 frames, and framing method is the side of overlapping segmentation The overlapping part of method, former frame and a later frame is known as frame shifting, and it is 0~0.5 that frame, which is moved with the ratio of frame length,；

3) system initialization

3.1) improved Kalman filter device parameter initialization

Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is just Fixed；

3.2) AR parameter initialization

It initializes AR parameter state estimated sequence θ (0/0)；

4) estimate AR parameter

AR parameter refers to the last line [a in formula (3) in state-transition matrix F_p(k) … a₁(k)], it is mainly used to Speech production process is described, accuracy has direct influence to the result of speech enhan-cement；It proposes in the estimation of AR parameter Comprehensively consider voice signal state estimation sequence X (k-1), state-noise q (k), measure noise n (k), nonstationary noise v (k), New AR parameter Estimation state-space model is established, realizes the online Robust Estimation of AR parameter, and to the real-time estimation mistake of AR parameter Journey is as follows:

4.1) parameter estimation model of AR parameter is established

AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:

Wherein, θ (k)=[a_p(k) … a₁(k)]^TFor k moment AR parameter state sequence；Q (k) is k moment state-noise, Gaussian distributed, covariance matrix are Q (k)；R (k) is to measure noise, Gaussian distributed, covariance matrix at the k moment For R (k)；V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k)；A =X (k-1)^T=[S (k-p) ... S (k-1)] is measurement matrix；Y (k) is k moment voice signal measurement sequence；State and amount Survey the statistical property of noise q (k) and r (k) are as follows:

E (q (k))=d, E (r (k))=l

E(q(k)q(j)^T)=D δ_kj,E(r(k)r(j)^T)=L δ_kj (10)

Wherein, d and l is respectively the mean value of noise q (k) He r (k)；D and L is respectively the covariance of noise q (k) He r (k)； δ_kjFor Kronecker function；

4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle

In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic, the state-space model of traditional Kalman filtering are free of nonstationary noise v (k), as follows:

θ (k)=θ (k-1)+q (k)

Y (k)=A θ (k)+r (k) (11)

According to bayesian principle, AR Parameter Estimation Problem is expressed as under the premise of metric data Y (k) is known, and estimation is most Excellent AR argument sequence θ (k), it may be assumed that

According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:

Wherein, Ψ beThe covariance matrix Ψ of conditional probability p in known situation (θ (k) | Y (k)) (k)=P_θ(k | k)+D (k), wherein P_θ(k | k) it is covariance updated value；As likelihood function condition L₁(Y (k), θ (k)) and L₂(θ (k)) when obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value；Observation type (13) and formula (14) discovery are most Bigization likelihood function condition L₁(Y (k), θ (k)) and L₂(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it obtains following excellent Change form:

Subjiect to Y (k)=A θ (k)+r (k) (15)

Wherein, θ (k) and r (k) is variable, Ψ (k)=P_θ(k | k)+D (k) is the covariance matrix of Gaussian noise；θ(k) Estimated value beR (k) is exactly the estimation to Gaussian noise；P_θ(k | k) be that covariance updates matrix:

P_θ(k | k)=(I-K_θ(k)A(k))P_θ(k|k-1) (16)

P_θ(k | k-1) be covariance prediction matrix:

P_θ(k | k-1)=P_θ(k-1|k-1)+D(k-1) (17)

K_θ(k) it is covariance gain:

K_θ(k)=P_θ(k|k-1)A^T(AP_θ(k|k-1)A^T+L(k-1))^-1 (18)

4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle

Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is benefit It can be excellent after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) with the sparse characteristic of noise Increase the sparsity constraints of nonstationary noise v (k) in change to complete the estimation to sparse noise, new optimization form are as follows:

Wherein, v (k) is sparse noise, by that can obtain estimating the optimal of AR parameter to above-mentioned optimization problem solving It counts θ (k),The optimization problem that formula (17) indicates is a convex optimization problem, the interior point being able to use in engineering Method is solved；

5) estimated speech signal status switch

5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle

In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic, the state-space model of traditional Kalman filtering are as follows:

X (k)=FX (k-1)+p (k) (20)

Y (k)=CX (k)+n (k) (21)

According to bayesian principle, Kalman filtering problem is expressed as under the premise of metric data Y (k) is known, and estimation is most Excellent voice status sequence X (k), it may be assumed that

According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):

Wherein, Θ beThe covariance square of conditional probability p in known situation (X (k) | Y (k-1)) Battle array Θ=FP (k-1 | k-1) F^T+ Q (k-1), wherein P (k-1 | k-1) is covariance updated value；As likelihood function condition L₁(Y(k), X (k)) and L₂When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value；Observation type (23) and formula (24) discovery maximizes likelihood function condition L₁(Y (k), X (k)) and L₂(X (k)), which is equivalent to, minimizes power exponent in likelihood function Exponential partWithTherefore Optimize form to following:

Subjiect to Y (k)=CX (k)+n (k) (25)

Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise；The estimated value of X (k) isN (k) is exactly the estimation to Gaussian noise；

P (k | k) be that covariance updates matrix:

P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)

P (k | k-1) be covariance prediction matrix:

P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)^T+Q(k-1) (27)

K_θ(k) it is covariance gain:

K (k)=P (k | k-1) C^T(CP(k|k-1)C^T+R(k-1))^-1 (28)

5.2) from convex optimization angle building to the estimation problem of sparse noise

The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman After filtering problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to dilute Dredge the estimation of noise, new optimization form are as follows:

Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)

Wherein, v (k) is sparse noise, by obtaining to the optimal of molten bath centroid position to above-mentioned optimization problem solving Estimate X (k), X (k) is the optimal estimation in traditional Kalman filtering to state valueThe optimization that formula (29) indicates is asked An entitled convex optimization problem, the interior point method being able to use in engineering are solved；

5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, will be used for The AR parameter θ (k+1) for updating the k+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by institute There is Speech processing complete.

Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that

1, the present invention cannot change in real time more for AR parameter in speech model (especially autoregression AR model) with noise New problem proposes double card Kalman Filtering frame, two Kalman filter concurrent operations, voice signal state estimation and AR Parameter Estimation updates mutually, state estimation procedure and parameter estimation procedure alternately so that parameter estimation procedure can adapt to Noise change procedure to improve the accuracy of system model, and then improves the performance of speech enhan-cement.

2, the present invention is aiming at the problem that traditional Kalman filter algorithm can not handle nonstationary noise, in conjunction with convex optimization skill Art proposes improved Kalman filter frame.New algorithm joined Gauss to measurement process in speech enhan-cement model simultaneously Noise and nonstationary noise item can be to Gaussian noises and non-flat by establishing reasonable Optimized model using convex optimisation technique Steady noise is accurately estimated, the accuracy of speech enhan-cement is improved.

Detailed description of the invention

Fig. 1 is the flow chart of the sound enhancement method under nonstationary noise.

Fig. 2 a is primary speech signal schematic diagram.

Fig. 2 b is the voice signal schematic diagram with white Gaussian noise.

Fig. 2 c is the voice signal schematic diagram with white Gaussian noise and nonstationary noise.

Fig. 3 is the voice enhancement algorithm flow chart based on dual improved Kalman filter.

Fig. 4 a is primary speech signal.

Fig. 4 b is speech enhan-cement result schematic diagram.

Specific embodiment

The present invention is further explained in the light of specific embodiments.

As shown in Figure 1, it is suitable for the online sound enhancement method under nonstationary noise environment described in the present embodiment, including Following steps:

1) system model under nonstationary noise environment is established

The generation process of voice signal can be described as one by white-noise excitation, through the output of full pole linear system from Recursive procedure, i.e., current output are equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is one Autoregression AR model, is expressed as follows

Wherein, u (k) is the white Gaussian noise excitation value at k moment；S (k-i) is the voice signal at (k-i) moment；s(k) For the voice signal at kth moment；a_iFor i-th of linear predictor coefficient, also referred to as AR model parameter；P is the rank of AR model parameter Number.

As shown in Fig. 2 a, 2b, 2c, the voice signal observed in actual environment can be by various noise pollutions, especially right and wrong Stationary noise is proposed to consider Gaussian noise and nonstationary noise simultaneously during voice signal measures in the present invention, be established more Meet the voice signal model of practical measurement process.Voice signal in the present invention measures process and can be described as follows:

Y (k)=s (k)+n (k)+v (k) (2)

Wherein, Y (k) is k moment voice signal measurement sequence；S (k) is the voice signal at k moment；N (k) is that the k moment is high This white noise；V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity.

1.2) voice signal state-space model is established

Formula (1) and formula (2) are converted into state-space model, can be described as follows:

X (k)=FX (k-1)+p (k) (3)

Y (k)=CX (k)+n (k)+v (k) (4)

Wherein

C=[0 0 ... 0 1] (6)

X (k)=[S (k-p+1) ... S (k)]^T (7)

In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state Estimated sequence, the i.e. optimal State Estimation of voice signal；X (k-1) is (k-1) moment voice signal state estimation sequence；Y(k) For k moment voice signal measurement sequence；F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in F_p(k) … a₁(k)] it is known as AR parameter.；C=[0 0 ... 0 1] is to measure transfer matrix；P (k) is k moment state-noise, is obeyed high This distribution；N (k) is to measure noise, Gaussian distributed at the k moment；V (k) is the nonstationary noise at k moment, obeys Laplce Distribution.

E (p (k))=q, E (n (k))=r

E(p(k)p(j)^T)=Q δ_kj,E(n(k)n(j)^T)=R δ_kj (8)

Wherein, q and r is respectively the mean value of noise p (k) He n (k)；Q and R is respectively the covariance of noise p (k) He n (k). δ_kjFor Kronecker function.Speech Enhancement problem is to go to estimate optimal voice under the premise of known measurement voice signal Y (k) Signal X (k).

2) framing and adding window

Voice signal has short-term stationarity (10~30ms in it is considered that voice signal approximation constant), thus can be with Voice signal is divided into some short sections to be handled, here it is framing, the framing of voice signal is using movably having Method that the window of limit for length's degree is weighted is realized.General frame number per second is about 33~100 frames.General framing method For the method for overlapping segmentation, the overlapping part of former frame and a later frame is known as frame shifting, frame move with the ratio of frame length be generally 0~ 0.5.Frame length is 25ms in the present invention, and it is 10ms that frame, which moves,.

3) system initialization

3.1) improved Kalman filter device parameter initialization

Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is just Fixed.

3.2) AR parameter initialization

It initializes AR parameter state estimated sequence θ (0/0), the order of AR parameter (rule of thumb sets in the present invention for 13 It is fixed).

4) estimate AR parameter

AR parameter refers to the last line [a in formula (3) in state-transition matrix F_p(k) … a₁(k)], it is mainly used to Speech production process is described, accuracy has direct influence to the result of speech enhan-cement.AR parameter Estimation in practical application It is larger by voice signal itself, various influence of noises, therefore propose to comprehensively consider voice in the estimation of AR parameter in the present invention Signal condition estimated sequence X (k-1), state-noise q (k), noise n (k), nonstationary noise v (k) etc. are measured, establishes new AR Parameter Estimation state-space model realizes the online Robust Estimation of AR parameter, this is a core point of the invention.As shown in figure 3, It is as follows to the real-time estimation process of AR parameter:

4.1) parameter estimation model of AR parameter is established

Wherein θ (k)=[a_p(k) … a₁(k)]^TFor k moment AR parameter state sequence；Q (k) is k moment state-noise, Gaussian distributed, covariance matrix are Q (k)；R (k) is to measure noise, Gaussian distributed, covariance matrix at the k moment For R (k)；V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k)；A =X (k-1)^T=[S (k-p) ... S (k-1)] is measurement matrix；Y (k) is k moment voice signal measurement sequence.State and amount Survey the statistical property of noise q (k) and r (k) are as follows:

E (q (k))=d, E (r (k))=l

E(q(k)q(j)^T)=D δ_kj,E(r(k)r(j)^T)=L δ_kj (10)

Wherein, d and l is respectively the mean value of noise q (k) He r (k)；D and L is respectively the covariance of noise q (k) He r (k). δ_kjFor Kronecker function.

In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic.The state-space model (being free of nonstationary noise v (k)) of traditional Kalman filtering is as follows:

θ (k)=θ (k-1)+q (k)

Y (k)=A θ (k)+r (k) (11)

According to bayesian principle, AR Parameter Estimation Problem can be expressed as estimating under the premise of metric data Y (k) is known Count optimal AR argument sequence θ (k), it may be assumed that

Wherein, Ψ beThe covariance matrix Ψ of conditional probability p in known situation (θ (k) | Y (k)) (k)=P_θ(k | k)+D (k) (wherein P_θ(k | k) be covariance updated value).As likelihood function condition L₁(Y (k), θ (k)) and L₂(θ (k)) when obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value.Observation type (13) and formula (14) discovery are most Bigization likelihood function condition L₁(Y (k), θ (k)) and L₂(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it is available such as Lower optimization form:

Subjiect to Y (k)=A θ (k)+r (k) (15)

Wherein, θ (k) and r (k) is variable, Ψ (k)=P_θ(k | k)+D (k) is the covariance matrix of Gaussian noise.θ(k) Estimated value beR (k) is exactly the estimation to Gaussian noise.P_θ(k | k) be that covariance updates matrix:

P_θ(k | k)=(I-K_θ(k)A(k))P_θ(k|k-1) (16)

P_θ(k | k-1) be covariance prediction matrix:

P_θ(k | k-1)=P_θ(k-1|k-1)+D(k-1) (17)

K_θ(k) it is covariance gain:

K_θ(k)=P_θ(k|k-1)A^T(AP_θ(k|k-1)A^T+L(k-1))^-1 (18)

Subjiect to Y (k)=A θ (k)+r (k)+v (k)

Wherein, v (k) is sparse noise, available to estimate to the optimal of AR parameter by above-mentioned optimization problem solving Meter θ (k) (note:), formula (17) indicate optimization problem be a convex optimization problem, can be used in engineering compared with It is solved for mature interior point method.

5) estimated speech signal status switch.

During speech signal collection, nonstationary noise is affected to voice quality.In order to improve voice quality, Voice enhancement algorithm allows for the case where coping with Gaussian noise and nonstationary noise mixing simultaneously.Nonstationary noise is generally obeyed Laplacian distribution has sparse characteristic, the estimation of nonstationary noise is mainly utilized the sparse characteristic of noise.For convenience Noise sparsity constraints are introduced in optimization problem, use convex optimisation technique by traditional Kalman filtering problem reformulation for one first Then a convex optimization problem introduces the sparsity constraints to sparse noise in the optimization newly constructed, is finally completed speech enhan-cement Task, this is another core point of the invention.

In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic.The state-space model of traditional Kalman filtering is as follows:

X (k)=FX (k-1)+p (k) (20)

Y (k)=CX (k)+n (k) (21)

According to bayesian principle, Kalman filtering problem can be expressed as estimating under the premise of metric data Y (k) is known Count optimal voice status sequence X (k), it may be assumed that

Wherein, Θ beThe covariance square of conditional probability p in known situation (X (k) | Y (k-1)) Battle array Θ=FP (k-1 | k-1) F^T+ Q (k-1) (wherein P (k-1 | k-1) is covariance updated value).As likelihood function condition L₁(Y (k), X (k)) and L₂When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value.Observation type (23) With formula (24) it can be found that maximizing likelihood function condition L₁(Y (k), X (k)) and L₂(X (k)) is equivalent to minimum likelihood function The exponential part of middle power exponentWith Therefore available following optimization form:

Subjiect to Y (k)=CX (k)+n (k) (25)

Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise.The estimated value of X (k) isN (k) is exactly the estimation to Gaussian noise.

P (k | k) be that covariance updates matrix:

P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)

P (k | k-1) be covariance prediction matrix:

P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)^T+Q(k-1) (27)

K_θ(k) it is covariance gain:

K (k)=P (k | k-1) C^T(CP(k|k-1)C^T+R(k-1))^-1 (28)

Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)

Wherein, v (k) is sparse noise, available to molten bath centroid position by above-mentioned optimization problem solving (note: X (k) is the optimal estimation in traditional Kalman filtering to state value to optimal estimation X (k)), formula (29) table The optimization problem shown is a convex optimization problem, and interior point method more mature in engineering can be used and solved.

As shown in Figs. 4a and 4b, it can relatively accurately make an uproar to Gaussian noise and non-stationary by method proposed by the present invention Sound is filtered out, and is enhanced primitive sound signal.

Using the present invention, white noise and nonstationary noise can be accurately estimated and filtered out, realize white noise and non-stationary Speech enhan-cement under noise mixing, while more pure estimated speech signal is provided, it is mentioned for the raising of speech recognition accuracy It is supported for front end.

Since the present invention establishes two Robust Kalman Filter models, the generating process model of voice signal is carried out Mathematical modeling has all done on the temporal characteristics and time-varying characteristics of voice and has targetedly considered, AR parameter Estimation takes dynamic real Shi Gengxin iteration, meets the requirement of parameter time varying characteristic, but can every frame go estimated speech signal to utilize by state estimation Voice short-term stationarity characteristic is worthy to be popularized so that filter effect is better than traditional Kalman filtering in result.

Embodiment described above is only the preferred embodiments of the invention, and implementation model of the invention is not limited with this It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.

Claims

1. the online sound enhancement method under a kind of environment suitable for nonstationary noise, which comprises the following steps:

1) system model under nonstationary noise environment is established

The generation process of voice signal is one by white-noise excitation, through the output of full pole linear system from recursive procedure, i.e., Current output is equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is an autoregression AR mould Type is expressed as follows:

Wherein, u (k) is the white Gaussian noise excitation value at k moment；S (k-i) is the voice signal at (k-i) moment；S (k) is the The voice signal at k moment；a_iFor i-th of linear predictor coefficient, also referred to as AR model parameter；P is the order of AR model parameter；

Y (k)=s (k)+n (k)+v (k) (2)

Wherein, Y (k) is k moment voice signal measurement sequence；S (k) is the voice signal at k moment；N (k) is k moment white Gaussian Noise；V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity；

1.2) voice signal state-space model is established

X (k)=FX (k-1)+p (k) (3)

Y (k)=CX (k)+n (k)+v (k) (4)

Wherein,

C=[0 0...0 1] (6)

X (k)=[S (k-p+1) ... S (k)]^T (7)

In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state estimation Sequence, the i.e. optimal State Estimation of voice signal；X (k-1) is (k-1) moment voice signal state estimation sequence；When Y (k) is k Carve voice signal measurement sequence；F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in F_p(k)…a₁ (k)] it is known as AR parameter；C=[0 0...0 1] is to measure transfer matrix；P (k) is k moment state-noise, Gaussian distributed； N (k) is to measure noise, Gaussian distributed at the k moment；V (k) is the nonstationary noise at k moment, obeys laplacian distribution；

E (p (k))=q, E (n (k))=r

E(p(k)p(j)^T)=Q δ_kj,E(n(k)n(j)^T)=R δ_kj (8)

Wherein, q and r is respectively the mean value of noise p (k) He n (k)；Q and R is respectively the covariance of noise p (k) He n (k)；δ_kjFor Kronecker function；Speech Enhancement problem is to go to estimate optimal voice signal X under the premise of known measurement voice signal Y (k) (k)；

2) framing and adding window

Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, and this makes it possible to voice signal point It is handled for some short sections, here it is framing, the framing of voice signal is the window using moveable finite length The method that is weighted is realized；Frame number usually per second is 33~100 frames, and framing method is the method for overlapping segmentation, preceding One frame and the overlapping part of a later frame are known as frame shifting, and it is 0~0.5 that frame, which is moved with the ratio of frame length,；

3) system initialization

3.1) improved Kalman filter device parameter initialization

Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is positive definite；

3.2) AR parameter initialization

It initializes AR parameter state estimated sequence θ (0/0)；

4) estimate AR parameter

AR parameter refers to the last line [a in formula (3) in state-transition matrix F_p(k)…a₁(k)], it is mainly used to describe language Sound generating process, accuracy have direct influence to the result of speech enhan-cement；It is proposed that synthesis is examined in the estimation of AR parameter Consider voice signal state estimation sequence X (k-1), state-noise q (k), measure noise n (k), nonstationary noise v (k), establishes new AR parameter Estimation state-space model, realize the online Robust Estimation of AR parameter, and it is as follows to the real-time estimation process of AR parameter:

4.1) parameter estimation model of AR parameter is established

Wherein, θ (k)=[a_p(k)...a₁(k)]^TFor k moment AR parameter state sequence；Q (k) is k moment state-noise, is obeyed Gaussian Profile, covariance matrix are Q (k)；R (k) is to measure noise, Gaussian distributed, covariance matrix R at the k moment (k)；V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k)；A=X (k-1)^T=[S (k-p) ... S (k-1)] is measurement matrix；Y (k) is k moment voice signal measurement sequence；State and measurement are made an uproar The statistical property of sound q (k) and r (k) are as follows:

E (q (k))=d, E (r (k))=l

E(q(k)q(j)^T)=D δ_kj,E(r(k)r(j)^T)=L δ_kj (10)

Wherein, d and l is respectively the mean value of noise q (k) He r (k)；D and L is respectively the covariance of noise q (k) He r (k)；δ_kjFor Kronecker function；

In order to easily estimate sparse noise, needs to reconstruct Kalman filtering problem from the angle of convex optimization, pass The state-space model for Kalman filtering of uniting is free of nonstationary noise v (k), as follows:

θ (k)=θ (k-1)+q (k)

Y (k)=A θ (k)+r (k) (11)

According to bayesian principle, AR Parameter Estimation Problem is expressed as estimating optimal AR under the premise of metric data Y (k) is known Argument sequence θ (k), it may be assumed that

Wherein, Ψ beThe covariance matrix Ψ (k) of conditional probability p in known situation (θ (k) | Y (k)) =P_θ(k | k)+D (k), wherein P_θ(k | k) it is covariance updated value；As likelihood function condition L₁(Y (k), θ (k)) and L₂(θ(k)) When obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value；Observation type (13) and formula (14) discovery maximize Likelihood function condition L₁(Y (k), θ (k)) and L₂(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it obtains following excellent Change form:

Subjiect to Y (k)=A θ (k)+r (k) (15)

Wherein, θ (k) and r (k) is variable, Ψ (k)=P_θ(k | k)+D (k) is the covariance matrix of Gaussian noise；The estimation of θ (k) Value isR (k) is exactly the estimation to Gaussian noise；P_θ(k | k) be that covariance updates matrix:

P_θ(k | k)=(I-K_θ(k)A(k))P_θ(k|k-1) (16)

P_θ(k | k-1) be covariance prediction matrix:

P_θ(k | k-1)=P_θ(k-1|k-1)+D(k-1) (17)

K_θ(k) it is covariance gain:

K_θ(k)=P_θ(k|k-1)A^T(AP_θ(k|k-1)A^T+L(k-1))^-1 (18)

Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is to utilize to make an uproar The sparse characteristic of sound can be in optimization after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) Increase the sparsity constraints of nonstationary noise v (k) to complete the estimation to sparse noise, new optimization form are as follows:

Wherein, v (k) is sparse noise, by that can obtain the optimal estimation θ to AR parameter to above-mentioned optimization problem solving (k),The optimization problem that formula (17) indicates is a convex optimization problem, the interior point method being able to use in engineering It is solved；

5) estimated speech signal status switch

In order to easily estimate sparse noise, needs to reconstruct Kalman filtering problem from the angle of convex optimization, pass The state-space model for Kalman filtering of uniting is as follows:

X (k)=FX (k-1)+p (k) (20)

Y (k)=CX (k)+n (k) (21)

According to bayesian principle, Kalman filtering problem is expressed as estimating optimal language under the premise of metric data Y (k) is known Sound-like state sequence X (k), it may be assumed that

Wherein, Θ beThe covariance matrix Θ of conditional probability p in known situation (X (k) | Y (k-1))= FP(k-1|k-1)F^T+ Q (k-1), wherein P (k-1 | k-1) is covariance updated value；As likelihood function condition L₁(Y(k),X(k)) And L₂When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value；Observation type (23) and formula (24) It was found that maximizing likelihood function condition L₁(Y (k), X (k)) and L₂(X (k)) is equivalent to the finger for minimizing power exponent in likelihood function Number partWithTherefore obtain as Lower optimization form:

Subjiect to Y (k)=CX (k)+n (k) (25)

Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise；The estimated value of X (k) isn (k) it is exactly estimation to Gaussian noise；

P (k | k) be that covariance updates matrix:

P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)

P (k | k-1) be covariance prediction matrix:

P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)^T+Q(k-1) (27)

K_θ(k) it is covariance gain:

K (k)=P (k | k-1) C^T(CP(k|k-1)C^T+R(k-1))^-1 (28)

The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman filtering After problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to make an uproar to sparse The estimation of sound, new optimization form are as follows:

Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)

Wherein, v (k) is sparse noise, by obtaining the optimal estimation to molten bath centroid position to above-mentioned optimization problem solving X (k), X (k) are the optimal estimation in traditional Kalman filtering to state valueFormula (29) indicate optimization problem be One convex optimization problem, the interior point method being able to use in engineering are solved；

5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, for updating k The AR parameter θ (k+1) at+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by all languages Sound signal has been handled.