CN106340304B - A kind of online sound enhancement method under the environment suitable for nonstationary noise - Google Patents
A kind of online sound enhancement method under the environment suitable for nonstationary noise Download PDFInfo
- Publication number
- CN106340304B CN106340304B CN201610843483.0A CN201610843483A CN106340304B CN 106340304 B CN106340304 B CN 106340304B CN 201610843483 A CN201610843483 A CN 201610843483A CN 106340304 B CN106340304 B CN 106340304B
- Authority
- CN
- China
- Prior art keywords
- noise
- estimation
- voice signal
- parameter
- moment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Abstract
The invention discloses the online sound enhancement methods under a kind of environment suitable for nonstationary noise, comprising steps of 1) establishing the system model under nonstationary noise environment;2) framing and adding window;3) system initialization;4) estimate AR parameter;5) estimated speech signal status switch.The present invention is aiming at the problem that AR parameter in speech model cannot change real-time update with noise, propose double card Kalman Filtering frame, two Kalman filter concurrent operations, voice signal state estimation and AR parameter Estimation update mutually, state estimation procedure and parameter estimation procedure are alternately, so that parameter estimation procedure can adapt to noise change procedure, to improve the accuracy of system model, and then the performance of speech enhan-cement is improved.The present invention, in conjunction with convex optimisation technique, proposes improved Kalman filter frame aiming at the problem that traditional Kalman filter algorithm can not handle nonstationary noise, can accurately be estimated Gaussian noise and nonstationary noise, improve the accuracy of speech enhan-cement.
Description
Technical field
The present invention relates to field of speech enhancement, the online voice referred in particular under a kind of environment suitable for nonstationary noise increases
Strong method.
Background technique
In speech recognition front-ends treatment process, voice signal always by various noise jammings and floods, due to interference
Randomness, can only signal processing technology go as far as possible enhance voice quality.The main purpose of speech enhan-cement is from noisy speech
In extract pure raw tone.
Common voice enhancement algorithm mainly include the following types:
1, noise cancellation method: this method is directly to subtract noise component(s) from noisy speech according in a time domain or in a frequency domain
What the method gone was realized.The maximum feature of this method is to need using background signal as reference signal, reference signal accurately with
The no performance for directly determining this method.
2, harmonic signal enhancement method: since the voiced sound in voice has significantly periodically, this periodically reflection is into frequency domain
It is then a series of peak components one by one for respectively corresponding fundamental frequency (fundamental tone) and its harmonic wave, these frequency components occupy voice
Most of energy can carry out speech enhan-cement using this periodicity, and fundamental tone and its harmonic wave point are extracted using comb filter
Amount, inhibits other periodic noises and aperiodic broadband noise.
3, the enhancing algorithm based on speech production model: the voiced process of voice can be modeled as a linear time-varying filtering
Device.Different driving sources is used to different types of voice.In the generation model of voice, most widely used is full pole mould
Type.Based on the available a series of voice enhancement algorithm of speech production model, such as time-varying parameter Wiener filtering and Kalman
Filtering method.
4, the enhancing algorithm based on short time spectrum: there are many enhancing algorithm type based on voice short time spectrum, such as compose
Subtractive method, Wiener Filter Method, LMSE method etc..SNR ranges are big, method is simple, are easy to adapting to for such method
In real time the advantages that processing.
5, the enhancing algorithm based on wavelet decomposition: wavelet decomposition method is the hair with this tool of mathematical analysis of wavelet decomposition
It opens up and grows up, while it combines some basic principles of subtractive method of spectrums again.
6, the enhancing algorithm based on sense of hearing shielding: sense of hearing screen method is calculated using a kind of enhancing of the auditory properties of human ear
Method.
Voice enhancement algorithm based on Kalman filtering belong to above the third, traditional Kalman filtering carry out voice increasing
There are two important hypothesis when strong: process noise and the measurement equal Gaussian distributed of noise.Traditional Kalman filtering is in actual speech
Following both sides limitation is shown in enhancing: 1. the estimation of AR parameter must be accurate.However environment is acquired in actual speech
In, noise be it is continually changing, this requires the estimations of AR parameter in speech model should have real-time, while should be in AR parameter
The influence of various noises is considered in estimation procedure, otherwise will lead to the decline of speech enhan-cement performance.2. traditional Kalman filtering is calculated
Method only considers that the case where Gaussian noise does not meet practical application.It can be by a kind of nonstationary noise (tool during speech signal collection
Have sparsity, obey laplacian distribution) pollution, it is not common, but is implicitly present in and is affected to voice quality.If
In speech enhan-cement, when by nonstationary noise as Gaussian noise processing, it will it is serious to reduce speech enhan-cement quality, it is unfavorable for subsequent
The identification of voice semanteme.
Based on the above issues, provide a kind of can handle Gaussian noise and in the case of nonstationary noise exists simultaneously in real time
Online speech enhancement technique is very important.
Summary of the invention
The technical problem to be solved by the present invention is to can not handle AR in speech model for existing kalman filter method
Parameter can not have nonstationary noise during real-time update, measurement, in conjunction with convex optimisation technique, provide one kind and be suitable for
Online sound enhancement method under nonstationary noise environment, being capable of On-line Estimation AR parameter and nonstationary noise.
To achieve the above object, technical solution provided by the present invention are as follows: under a kind of environment suitable for nonstationary noise
Online sound enhancement method, comprising the following steps:
1) system model under nonstationary noise environment is established
1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model
The generation process of voice signal is one by white-noise excitation, through the output of full pole linear system from recurrence mistake
Journey, i.e., current output are equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is an autoregression
AR model, is expressed as follows:
Wherein, u (k) is the white Gaussian noise excitation value at k moment;S (k-i) is the voice signal at (k-i) moment;s(k)
For the voice signal at kth moment;aiFor i-th of linear predictor coefficient, also referred to as AR model parameter;P is the rank of AR model parameter
Number;
The voice signal model for meeting practical measurement process is established, it is as follows that voice signal measures process description:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, Y (k) is k moment voice signal measurement sequence;S (k) is the voice signal at k moment;N (k) is that the k moment is high
This white noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity;
1.2) voice signal state-space model is established
Formula (1) and formula (2) are converted into state-space model, are described as follows:
X (k)=FX (k-1)+p (k) (3)
Y (k)=CX (k)+n (k)+v (k) (4)
Wherein,
C=[0 0 ... 0 1] (6)
X (k)=[S (k-p+1) ... S (k)]T (7)
In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state
Estimated sequence, the i.e. optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;Y(k)
For k moment voice signal measurement sequence;F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in Fp(k)
… a1(k)] it is known as AR parameter;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, is obeyed high
This distribution;N (k) is to measure noise, Gaussian distributed at the k moment;V (k) is the nonstationary noise at k moment, obeys Laplce
Distribution;
The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:
E (p (k))=q, E (n (k))=r
E(p(k)p(j)T)=Q δkj,E(n(k)n(j)T)=R δkj (8)
Wherein, q and r is respectively the mean value of noise p (k) He n (k);Q and R is respectively the covariance of noise p (k) He n (k);
δkjFor Kronecker function;Speech Enhancement problem is to go to estimate optimal voice under the premise of known measurement voice signal Y (k)
Signal X (k);
2) framing and adding window
Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, this makes it possible to voice is believed
Number it is divided into some short sections to be handled, here it is framing, the framing of voice signal is using moveable finite length
Method that window is weighted is realized;Frame number usually per second is 33~100 frames, and framing method is the side of overlapping segmentation
The overlapping part of method, former frame and a later frame is known as frame shifting, and it is 0~0.5 that frame, which is moved with the ratio of frame length,;
3) system initialization
3.1) improved Kalman filter device parameter initialization
Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is just
Fixed;
3.2) AR parameter initialization
It initializes AR parameter state estimated sequence θ (0/0);
4) estimate AR parameter
AR parameter refers to the last line [a in formula (3) in state-transition matrix Fp(k) … a1(k)], it is mainly used to
Speech production process is described, accuracy has direct influence to the result of speech enhan-cement;It proposes in the estimation of AR parameter
Comprehensively consider voice signal state estimation sequence X (k-1), state-noise q (k), measure noise n (k), nonstationary noise v (k),
New AR parameter Estimation state-space model is established, realizes the online Robust Estimation of AR parameter, and to the real-time estimation mistake of AR parameter
Journey is as follows:
4.1) parameter estimation model of AR parameter is established
AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:
Wherein, θ (k)=[ap(k) … a1(k)]TFor k moment AR parameter state sequence;Q (k) is k moment state-noise,
Gaussian distributed, covariance matrix are Q (k);R (k) is to measure noise, Gaussian distributed, covariance matrix at the k moment
For R (k);V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k);A
=X (k-1)T=[S (k-p) ... S (k-1)] is measurement matrix;Y (k) is k moment voice signal measurement sequence;State and amount
Survey the statistical property of noise q (k) and r (k) are as follows:
E (q (k))=d, E (r (k))=l
E(q(k)q(j)T)=D δkj,E(r(k)r(j)T)=L δkj (10)
Wherein, d and l is respectively the mean value of noise q (k) He r (k);D and L is respectively the covariance of noise q (k) He r (k);
δkjFor Kronecker function;
4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask
Topic, the state-space model of traditional Kalman filtering are free of nonstationary noise v (k), as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=A θ (k)+r (k) (11)
According to bayesian principle, AR Parameter Estimation Problem is expressed as under the premise of metric data Y (k) is known, and estimation is most
Excellent AR argument sequence θ (k), it may be assumed that
According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:
Wherein, Ψ beThe covariance matrix Ψ of conditional probability p in known situation (θ (k) | Y (k))
(k)=Pθ(k | k)+D (k), wherein Pθ(k | k) it is covariance updated value;As likelihood function condition L1(Y (k), θ (k)) and L2(θ
(k)) when obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value;Observation type (13) and formula (14) discovery are most
Bigization likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it obtains following excellent
Change form:
Subjiect to Y (k)=A θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, Ψ (k)=Pθ(k | k)+D (k) is the covariance matrix of Gaussian noise;θ(k)
Estimated value beR (k) is exactly the estimation to Gaussian noise;Pθ(k | k) be that covariance updates matrix:
Pθ(k | k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) be covariance prediction matrix:
Pθ(k | k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) it is covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1 (18)
4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is benefit
It can be excellent after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) with the sparse characteristic of noise
Increase the sparsity constraints of nonstationary noise v (k) in change to complete the estimation to sparse noise, new optimization form are as follows:
Wherein, v (k) is sparse noise, by that can obtain estimating the optimal of AR parameter to above-mentioned optimization problem solving
It counts θ (k),The optimization problem that formula (17) indicates is a convex optimization problem, the interior point being able to use in engineering
Method is solved;
5) estimated speech signal status switch
5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask
Topic, the state-space model of traditional Kalman filtering are as follows:
X (k)=FX (k-1)+p (k) (20)
Y (k)=CX (k)+n (k) (21)
According to bayesian principle, Kalman filtering problem is expressed as under the premise of metric data Y (k) is known, and estimation is most
Excellent voice status sequence X (k), it may be assumed that
According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):
Wherein, Θ beThe covariance square of conditional probability p in known situation (X (k) | Y (k-1))
Battle array Θ=FP (k-1 | k-1) FT+ Q (k-1), wherein P (k-1 | k-1) is covariance updated value;As likelihood function condition L1(Y(k),
X (k)) and L2When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value;Observation type (23) and formula
(24) discovery maximizes likelihood function condition L1(Y (k), X (k)) and L2(X (k)), which is equivalent to, minimizes power exponent in likelihood function
Exponential partWithTherefore
Optimize form to following:
Subjiect to Y (k)=CX (k)+n (k) (25)
Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise;The estimated value of X (k) isN (k) is exactly the estimation to Gaussian noise;
P (k | k) be that covariance updates matrix:
P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)T+Q(k-1) (27)
Kθ(k) it is covariance gain:
K (k)=P (k | k-1) CT(CP(k|k-1)CT+R(k-1))-1 (28)
5.2) from convex optimization angle building to the estimation problem of sparse noise
The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman
After filtering problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to dilute
Dredge the estimation of noise, new optimization form are as follows:
Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, by obtaining to the optimal of molten bath centroid position to above-mentioned optimization problem solving
Estimate X (k), X (k) is the optimal estimation in traditional Kalman filtering to state valueThe optimization that formula (29) indicates is asked
An entitled convex optimization problem, the interior point method being able to use in engineering are solved;
5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, will be used for
The AR parameter θ (k+1) for updating the k+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by institute
There is Speech processing complete.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, the present invention cannot change in real time more for AR parameter in speech model (especially autoregression AR model) with noise
New problem proposes double card Kalman Filtering frame, two Kalman filter concurrent operations, voice signal state estimation and AR
Parameter Estimation updates mutually, state estimation procedure and parameter estimation procedure alternately so that parameter estimation procedure can adapt to
Noise change procedure to improve the accuracy of system model, and then improves the performance of speech enhan-cement.
2, the present invention is aiming at the problem that traditional Kalman filter algorithm can not handle nonstationary noise, in conjunction with convex optimization skill
Art proposes improved Kalman filter frame.New algorithm joined Gauss to measurement process in speech enhan-cement model simultaneously
Noise and nonstationary noise item can be to Gaussian noises and non-flat by establishing reasonable Optimized model using convex optimisation technique
Steady noise is accurately estimated, the accuracy of speech enhan-cement is improved.
Detailed description of the invention
Fig. 1 is the flow chart of the sound enhancement method under nonstationary noise.
Fig. 2 a is primary speech signal schematic diagram.
Fig. 2 b is the voice signal schematic diagram with white Gaussian noise.
Fig. 2 c is the voice signal schematic diagram with white Gaussian noise and nonstationary noise.
Fig. 3 is the voice enhancement algorithm flow chart based on dual improved Kalman filter.
Fig. 4 a is primary speech signal.
Fig. 4 b is speech enhan-cement result schematic diagram.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
As shown in Figure 1, it is suitable for the online sound enhancement method under nonstationary noise environment described in the present embodiment, including
Following steps:
1) system model under nonstationary noise environment is established
1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model
The generation process of voice signal can be described as one by white-noise excitation, through the output of full pole linear system from
Recursive procedure, i.e., current output are equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is one
Autoregression AR model, is expressed as follows
Wherein, u (k) is the white Gaussian noise excitation value at k moment;S (k-i) is the voice signal at (k-i) moment;s(k)
For the voice signal at kth moment;aiFor i-th of linear predictor coefficient, also referred to as AR model parameter;P is the rank of AR model parameter
Number.
As shown in Fig. 2 a, 2b, 2c, the voice signal observed in actual environment can be by various noise pollutions, especially right and wrong
Stationary noise is proposed to consider Gaussian noise and nonstationary noise simultaneously during voice signal measures in the present invention, be established more
Meet the voice signal model of practical measurement process.Voice signal in the present invention measures process and can be described as follows:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, Y (k) is k moment voice signal measurement sequence;S (k) is the voice signal at k moment;N (k) is that the k moment is high
This white noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity.
1.2) voice signal state-space model is established
Formula (1) and formula (2) are converted into state-space model, can be described as follows:
X (k)=FX (k-1)+p (k) (3)
Y (k)=CX (k)+n (k)+v (k) (4)
Wherein
C=[0 0 ... 0 1] (6)
X (k)=[S (k-p+1) ... S (k)]T (7)
In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state
Estimated sequence, the i.e. optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;Y(k)
For k moment voice signal measurement sequence;F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in Fp(k)
… a1(k)] it is known as AR parameter.;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, is obeyed high
This distribution;N (k) is to measure noise, Gaussian distributed at the k moment;V (k) is the nonstationary noise at k moment, obeys Laplce
Distribution.
The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:
E (p (k))=q, E (n (k))=r
E(p(k)p(j)T)=Q δkj,E(n(k)n(j)T)=R δkj (8)
Wherein, q and r is respectively the mean value of noise p (k) He n (k);Q and R is respectively the covariance of noise p (k) He n (k).
δkjFor Kronecker function.Speech Enhancement problem is to go to estimate optimal voice under the premise of known measurement voice signal Y (k)
Signal X (k).
2) framing and adding window
Voice signal has short-term stationarity (10~30ms in it is considered that voice signal approximation constant), thus can be with
Voice signal is divided into some short sections to be handled, here it is framing, the framing of voice signal is using movably having
Method that the window of limit for length's degree is weighted is realized.General frame number per second is about 33~100 frames.General framing method
For the method for overlapping segmentation, the overlapping part of former frame and a later frame is known as frame shifting, frame move with the ratio of frame length be generally 0~
0.5.Frame length is 25ms in the present invention, and it is 10ms that frame, which moves,.
3) system initialization
3.1) improved Kalman filter device parameter initialization
Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is just
Fixed.
3.2) AR parameter initialization
It initializes AR parameter state estimated sequence θ (0/0), the order of AR parameter (rule of thumb sets in the present invention for 13
It is fixed).
4) estimate AR parameter
AR parameter refers to the last line [a in formula (3) in state-transition matrix Fp(k) … a1(k)], it is mainly used to
Speech production process is described, accuracy has direct influence to the result of speech enhan-cement.AR parameter Estimation in practical application
It is larger by voice signal itself, various influence of noises, therefore propose to comprehensively consider voice in the estimation of AR parameter in the present invention
Signal condition estimated sequence X (k-1), state-noise q (k), noise n (k), nonstationary noise v (k) etc. are measured, establishes new AR
Parameter Estimation state-space model realizes the online Robust Estimation of AR parameter, this is a core point of the invention.As shown in figure 3,
It is as follows to the real-time estimation process of AR parameter:
4.1) parameter estimation model of AR parameter is established
AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:
Wherein θ (k)=[ap(k) … a1(k)]TFor k moment AR parameter state sequence;Q (k) is k moment state-noise,
Gaussian distributed, covariance matrix are Q (k);R (k) is to measure noise, Gaussian distributed, covariance matrix at the k moment
For R (k);V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k);A
=X (k-1)T=[S (k-p) ... S (k-1)] is measurement matrix;Y (k) is k moment voice signal measurement sequence.State and amount
Survey the statistical property of noise q (k) and r (k) are as follows:
E (q (k))=d, E (r (k))=l
E(q(k)q(j)T)=D δkj,E(r(k)r(j)T)=L δkj (10)
Wherein, d and l is respectively the mean value of noise q (k) He r (k);D and L is respectively the covariance of noise q (k) He r (k).
δkjFor Kronecker function.
4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask
Topic.The state-space model (being free of nonstationary noise v (k)) of traditional Kalman filtering is as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=A θ (k)+r (k) (11)
According to bayesian principle, AR Parameter Estimation Problem can be expressed as estimating under the premise of metric data Y (k) is known
Count optimal AR argument sequence θ (k), it may be assumed that
According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:
Wherein, Ψ beThe covariance matrix Ψ of conditional probability p in known situation (θ (k) | Y (k))
(k)=Pθ(k | k)+D (k) (wherein Pθ(k | k) be covariance updated value).As likelihood function condition L1(Y (k), θ (k)) and L2(θ
(k)) when obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value.Observation type (13) and formula (14) discovery are most
Bigization likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it is available such as
Lower optimization form:
Subjiect to Y (k)=A θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, Ψ (k)=Pθ(k | k)+D (k) is the covariance matrix of Gaussian noise.θ(k)
Estimated value beR (k) is exactly the estimation to Gaussian noise.Pθ(k | k) be that covariance updates matrix:
Pθ(k | k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) be covariance prediction matrix:
Pθ(k | k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) it is covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1 (18)
4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is benefit
It can be excellent after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) with the sparse characteristic of noise
Increase the sparsity constraints of nonstationary noise v (k) in change to complete the estimation to sparse noise, new optimization form are as follows:
Subjiect to Y (k)=A θ (k)+r (k)+v (k)
Wherein, v (k) is sparse noise, available to estimate to the optimal of AR parameter by above-mentioned optimization problem solving
Meter θ (k) (note:), formula (17) indicate optimization problem be a convex optimization problem, can be used in engineering compared with
It is solved for mature interior point method.
5) estimated speech signal status switch.
During speech signal collection, nonstationary noise is affected to voice quality.In order to improve voice quality,
Voice enhancement algorithm allows for the case where coping with Gaussian noise and nonstationary noise mixing simultaneously.Nonstationary noise is generally obeyed
Laplacian distribution has sparse characteristic, the estimation of nonstationary noise is mainly utilized the sparse characteristic of noise.For convenience
Noise sparsity constraints are introduced in optimization problem, use convex optimisation technique by traditional Kalman filtering problem reformulation for one first
Then a convex optimization problem introduces the sparsity constraints to sparse noise in the optimization newly constructed, is finally completed speech enhan-cement
Task, this is another core point of the invention.
5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask
Topic.The state-space model of traditional Kalman filtering is as follows:
X (k)=FX (k-1)+p (k) (20)
Y (k)=CX (k)+n (k) (21)
According to bayesian principle, Kalman filtering problem can be expressed as estimating under the premise of metric data Y (k) is known
Count optimal voice status sequence X (k), it may be assumed that
According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):
Wherein, Θ beThe covariance square of conditional probability p in known situation (X (k) | Y (k-1))
Battle array Θ=FP (k-1 | k-1) FT+ Q (k-1) (wherein P (k-1 | k-1) is covariance updated value).As likelihood function condition L1(Y
(k), X (k)) and L2When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value.Observation type (23)
With formula (24) it can be found that maximizing likelihood function condition L1(Y (k), X (k)) and L2(X (k)) is equivalent to minimum likelihood function
The exponential part of middle power exponentWith
Therefore available following optimization form:
Subjiect to Y (k)=CX (k)+n (k) (25)
Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise.The estimated value of X (k) isN (k) is exactly the estimation to Gaussian noise.
P (k | k) be that covariance updates matrix:
P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)T+Q(k-1) (27)
Kθ(k) it is covariance gain:
K (k)=P (k | k-1) CT(CP(k|k-1)CT+R(k-1))-1 (28)
5.2) from convex optimization angle building to the estimation problem of sparse noise
The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman
After filtering problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to dilute
Dredge the estimation of noise, new optimization form are as follows:
Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, available to molten bath centroid position by above-mentioned optimization problem solving
(note: X (k) is the optimal estimation in traditional Kalman filtering to state value to optimal estimation X (k)), formula (29) table
The optimization problem shown is a convex optimization problem, and interior point method more mature in engineering can be used and solved.
5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, will be used for
The AR parameter θ (k+1) for updating the k+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by institute
There is Speech processing complete.
As shown in Figs. 4a and 4b, it can relatively accurately make an uproar to Gaussian noise and non-stationary by method proposed by the present invention
Sound is filtered out, and is enhanced primitive sound signal.
Using the present invention, white noise and nonstationary noise can be accurately estimated and filtered out, realize white noise and non-stationary
Speech enhan-cement under noise mixing, while more pure estimated speech signal is provided, it is mentioned for the raising of speech recognition accuracy
It is supported for front end.
Since the present invention establishes two Robust Kalman Filter models, the generating process model of voice signal is carried out
Mathematical modeling has all done on the temporal characteristics and time-varying characteristics of voice and has targetedly considered, AR parameter Estimation takes dynamic real
Shi Gengxin iteration, meets the requirement of parameter time varying characteristic, but can every frame go estimated speech signal to utilize by state estimation
Voice short-term stationarity characteristic is worthy to be popularized so that filter effect is better than traditional Kalman filtering in result.
Embodiment described above is only the preferred embodiments of the invention, and implementation model of the invention is not limited with this
It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.
Claims (1)
1. the online sound enhancement method under a kind of environment suitable for nonstationary noise, which comprises the following steps:
1) system model under nonstationary noise environment is established
1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model
The generation process of voice signal is one by white-noise excitation, through the output of full pole linear system from recursive procedure, i.e.,
Current output is equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is an autoregression AR mould
Type is expressed as follows:
Wherein, u (k) is the white Gaussian noise excitation value at k moment;S (k-i) is the voice signal at (k-i) moment;S (k) is the
The voice signal at k moment;aiFor i-th of linear predictor coefficient, also referred to as AR model parameter;P is the order of AR model parameter;
The voice signal model for meeting practical measurement process is established, it is as follows that voice signal measures process description:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, Y (k) is k moment voice signal measurement sequence;S (k) is the voice signal at k moment;N (k) is k moment white Gaussian
Noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity;
1.2) voice signal state-space model is established
Formula (1) and formula (2) are converted into state-space model, are described as follows:
X (k)=FX (k-1)+p (k) (3)
Y (k)=CX (k)+n (k)+v (k) (4)
Wherein,
C=[0 0...0 1] (6)
X (k)=[S (k-p+1) ... S (k)]T (7)
In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state estimation
Sequence, the i.e. optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;When Y (k) is k
Carve voice signal measurement sequence;F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in Fp(k)…a1
(k)] it is known as AR parameter;C=[0 0...0 1] is to measure transfer matrix;P (k) is k moment state-noise, Gaussian distributed;
N (k) is to measure noise, Gaussian distributed at the k moment;V (k) is the nonstationary noise at k moment, obeys laplacian distribution;
The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:
E (p (k))=q, E (n (k))=r
E(p(k)p(j)T)=Q δkj,E(n(k)n(j)T)=R δkj (8)
Wherein, q and r is respectively the mean value of noise p (k) He n (k);Q and R is respectively the covariance of noise p (k) He n (k);δkjFor
Kronecker function;Speech Enhancement problem is to go to estimate optimal voice signal X under the premise of known measurement voice signal Y (k)
(k);
2) framing and adding window
Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, and this makes it possible to voice signal point
It is handled for some short sections, here it is framing, the framing of voice signal is the window using moveable finite length
The method that is weighted is realized;Frame number usually per second is 33~100 frames, and framing method is the method for overlapping segmentation, preceding
One frame and the overlapping part of a later frame are known as frame shifting, and it is 0~0.5 that frame, which is moved with the ratio of frame length,;
3) system initialization
3.1) improved Kalman filter device parameter initialization
Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is positive definite;
3.2) AR parameter initialization
It initializes AR parameter state estimated sequence θ (0/0);
4) estimate AR parameter
AR parameter refers to the last line [a in formula (3) in state-transition matrix Fp(k)…a1(k)], it is mainly used to describe language
Sound generating process, accuracy have direct influence to the result of speech enhan-cement;It is proposed that synthesis is examined in the estimation of AR parameter
Consider voice signal state estimation sequence X (k-1), state-noise q (k), measure noise n (k), nonstationary noise v (k), establishes new
AR parameter Estimation state-space model, realize the online Robust Estimation of AR parameter, and it is as follows to the real-time estimation process of AR parameter:
4.1) parameter estimation model of AR parameter is established
AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:
Wherein, θ (k)=[ap(k)...a1(k)]TFor k moment AR parameter state sequence;Q (k) is k moment state-noise, is obeyed
Gaussian Profile, covariance matrix are Q (k);R (k) is to measure noise, Gaussian distributed, covariance matrix R at the k moment
(k);V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k);A=X
(k-1)T=[S (k-p) ... S (k-1)] is measurement matrix;Y (k) is k moment voice signal measurement sequence;State and measurement are made an uproar
The statistical property of sound q (k) and r (k) are as follows:
E (q (k))=d, E (r (k))=l
E(q(k)q(j)T)=D δkj,E(r(k)r(j)T)=L δkj (10)
Wherein, d and l is respectively the mean value of noise q (k) He r (k);D and L is respectively the covariance of noise q (k) He r (k);δkjFor
Kronecker function;
4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering problem from the angle of convex optimization, pass
The state-space model for Kalman filtering of uniting is free of nonstationary noise v (k), as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=A θ (k)+r (k) (11)
According to bayesian principle, AR Parameter Estimation Problem is expressed as estimating optimal AR under the premise of metric data Y (k) is known
Argument sequence θ (k), it may be assumed that
According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:
Wherein, Ψ beThe covariance matrix Ψ (k) of conditional probability p in known situation (θ (k) | Y (k))
=Pθ(k | k)+D (k), wherein Pθ(k | k) it is covariance updated value;As likelihood function condition L1(Y (k), θ (k)) and L2(θ(k))
When obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value;Observation type (13) and formula (14) discovery maximize
Likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it obtains following excellent
Change form:
Subjiect to Y (k)=A θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, Ψ (k)=Pθ(k | k)+D (k) is the covariance matrix of Gaussian noise;The estimation of θ (k)
Value isR (k) is exactly the estimation to Gaussian noise;Pθ(k | k) be that covariance updates matrix:
Pθ(k | k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) be covariance prediction matrix:
Pθ(k | k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) it is covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1 (18)
4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is to utilize to make an uproar
The sparse characteristic of sound can be in optimization after traditional Kalman filtering problem is converted convex optimization problem by step 4.2)
Increase the sparsity constraints of nonstationary noise v (k) to complete the estimation to sparse noise, new optimization form are as follows:
Wherein, v (k) is sparse noise, by that can obtain the optimal estimation θ to AR parameter to above-mentioned optimization problem solving
(k),The optimization problem that formula (17) indicates is a convex optimization problem, the interior point method being able to use in engineering
It is solved;
5) estimated speech signal status switch
5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering problem from the angle of convex optimization, pass
The state-space model for Kalman filtering of uniting is as follows:
X (k)=FX (k-1)+p (k) (20)
Y (k)=CX (k)+n (k) (21)
According to bayesian principle, Kalman filtering problem is expressed as estimating optimal language under the premise of metric data Y (k) is known
Sound-like state sequence X (k), it may be assumed that
According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):
Wherein, Θ beThe covariance matrix Θ of conditional probability p in known situation (X (k) | Y (k-1))=
FP(k-1|k-1)FT+ Q (k-1), wherein P (k-1 | k-1) is covariance updated value;As likelihood function condition L1(Y(k),X(k))
And L2When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value;Observation type (23) and formula (24)
It was found that maximizing likelihood function condition L1(Y (k), X (k)) and L2(X (k)) is equivalent to the finger for minimizing power exponent in likelihood function
Number partWithTherefore obtain as
Lower optimization form:
Subjiect to Y (k)=CX (k)+n (k) (25)
Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise;The estimated value of X (k) isn
(k) it is exactly estimation to Gaussian noise;
P (k | k) be that covariance updates matrix:
P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)T+Q(k-1) (27)
Kθ(k) it is covariance gain:
K (k)=P (k | k-1) CT(CP(k|k-1)CT+R(k-1))-1 (28)
5.2) from convex optimization angle building to the estimation problem of sparse noise
The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman filtering
After problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to make an uproar to sparse
The estimation of sound, new optimization form are as follows:
Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, by obtaining the optimal estimation to molten bath centroid position to above-mentioned optimization problem solving
X (k), X (k) are the optimal estimation in traditional Kalman filtering to state valueFormula (29) indicate optimization problem be
One convex optimization problem, the interior point method being able to use in engineering are solved;
5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, for updating k
The AR parameter θ (k+1) at+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by all languages
Sound signal has been handled.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610843483.0A CN106340304B (en) | 2016-09-23 | 2016-09-23 | A kind of online sound enhancement method under the environment suitable for nonstationary noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610843483.0A CN106340304B (en) | 2016-09-23 | 2016-09-23 | A kind of online sound enhancement method under the environment suitable for nonstationary noise |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106340304A CN106340304A (en) | 2017-01-18 |
CN106340304B true CN106340304B (en) | 2019-09-06 |
Family
ID=57840174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610843483.0A Expired - Fee Related CN106340304B (en) | 2016-09-23 | 2016-09-23 | A kind of online sound enhancement method under the environment suitable for nonstationary noise |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106340304B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110248212B (en) * | 2019-05-27 | 2020-06-02 | 上海交通大学 | Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system |
CN110648680A (en) * | 2019-09-23 | 2020-01-03 | 腾讯科技(深圳)有限公司 | Voice data processing method and device, electronic equipment and readable storage medium |
CN112557925B (en) * | 2020-11-11 | 2023-05-05 | 国联汽车动力电池研究院有限责任公司 | Lithium ion battery SOC estimation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102890935A (en) * | 2012-10-22 | 2013-01-23 | 北京工业大学 | Robust speech enhancement method based on fast Kalman filtering |
CN103323815A (en) * | 2013-03-05 | 2013-09-25 | 上海交通大学 | Underwater acoustic locating method based on equivalent sound velocity |
CN103903630A (en) * | 2014-03-18 | 2014-07-02 | 北京捷通华声语音技术有限公司 | Method and device used for eliminating sparse noise |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010091077A1 (en) * | 2009-02-03 | 2010-08-12 | University Of Ottawa | Method and system for a multi-microphone noise reduction |
-
2016
- 2016-09-23 CN CN201610843483.0A patent/CN106340304B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102890935A (en) * | 2012-10-22 | 2013-01-23 | 北京工业大学 | Robust speech enhancement method based on fast Kalman filtering |
CN103323815A (en) * | 2013-03-05 | 2013-09-25 | 上海交通大学 | Underwater acoustic locating method based on equivalent sound velocity |
CN103903630A (en) * | 2014-03-18 | 2014-07-02 | 北京捷通华声语音技术有限公司 | Method and device used for eliminating sparse noise |
Non-Patent Citations (3)
Title |
---|
一种具有在线参数调整功能的Kalman滤波及其应用;吴飞;《计算机工程与科学》;20120615(第6期);全文 |
基于凸优化技术的改进型卡尔曼滤波算法;冯宝;《自动化与信息工程》;20141015;第35卷(第5期);全文 |
鲁棒卡尔曼算法及其应用研究;吴飞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130115(第1期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN106340304A (en) | 2017-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107845389B (en) | Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network | |
CN109524020B (en) | Speech enhancement processing method | |
CN106971741B (en) | Method and system for voice noise reduction for separating voice in real time | |
WO2020107269A1 (en) | Self-adaptive speech enhancement method, and electronic device | |
Khorram et al. | Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition | |
CN107393550A (en) | Method of speech processing and device | |
CN111261183B (en) | Method and device for denoising voice | |
CN112735456B (en) | Speech enhancement method based on DNN-CLSTM network | |
CN106340304B (en) | A kind of online sound enhancement method under the environment suitable for nonstationary noise | |
CN103559888A (en) | Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle | |
CN106971740A (en) | Probability and the sound enhancement method of phase estimation are had based on voice | |
CN109192200A (en) | A kind of audio recognition method | |
CN110797033A (en) | Artificial intelligence-based voice recognition method and related equipment thereof | |
CN114242098A (en) | Voice enhancement method, device, equipment and storage medium | |
CN115171712A (en) | Speech enhancement method suitable for transient noise suppression | |
CN113241089B (en) | Voice signal enhancement method and device and electronic equipment | |
CN103903624B (en) | Periodical pitch detection method under a kind of gauss heat source model environment | |
Sun et al. | Wavelet denoising method based on improved threshold function | |
CN113066483B (en) | Sparse continuous constraint-based method for generating countermeasure network voice enhancement | |
Deng et al. | Sparse HMM-based speech enhancement method for stationary and non-stationary noise environments | |
CN113793615A (en) | Speaker recognition method, model training method, device, equipment and storage medium | |
Ding et al. | Suppression of additive noise using a power spectral density MMSE estimator | |
CN102256201A (en) | Automatic environmental identification method used for hearing aid | |
CN107993666B (en) | Speech recognition method, speech recognition device, computer equipment and readable storage medium | |
Chinaev et al. | A generalized log-spectral amplitude estimator for single-channel speech enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190906 |
|
CF01 | Termination of patent right due to non-payment of annual fee |