CN106340304B - A kind of online sound enhancement method under the environment suitable for nonstationary noise - Google Patents

A kind of online sound enhancement method under the environment suitable for nonstationary noise Download PDF

Info

Publication number
CN106340304B
CN106340304B CN201610843483.0A CN201610843483A CN106340304B CN 106340304 B CN106340304 B CN 106340304B CN 201610843483 A CN201610843483 A CN 201610843483A CN 106340304 B CN106340304 B CN 106340304B
Authority
CN
China
Prior art keywords
noise
estimation
voice signal
parameter
moment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610843483.0A
Other languages
Chinese (zh)
Other versions
CN106340304A (en
Inventor
冯宝
张绍荣
孙山林
郑伟
张国宁
武博
韦周耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Aerospace Technology
Original Assignee
Guilin University of Aerospace Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Aerospace Technology filed Critical Guilin University of Aerospace Technology
Priority to CN201610843483.0A priority Critical patent/CN106340304B/en
Publication of CN106340304A publication Critical patent/CN106340304A/en
Application granted granted Critical
Publication of CN106340304B publication Critical patent/CN106340304B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Abstract

The invention discloses the online sound enhancement methods under a kind of environment suitable for nonstationary noise, comprising steps of 1) establishing the system model under nonstationary noise environment;2) framing and adding window;3) system initialization;4) estimate AR parameter;5) estimated speech signal status switch.The present invention is aiming at the problem that AR parameter in speech model cannot change real-time update with noise, propose double card Kalman Filtering frame, two Kalman filter concurrent operations, voice signal state estimation and AR parameter Estimation update mutually, state estimation procedure and parameter estimation procedure are alternately, so that parameter estimation procedure can adapt to noise change procedure, to improve the accuracy of system model, and then the performance of speech enhan-cement is improved.The present invention, in conjunction with convex optimisation technique, proposes improved Kalman filter frame aiming at the problem that traditional Kalman filter algorithm can not handle nonstationary noise, can accurately be estimated Gaussian noise and nonstationary noise, improve the accuracy of speech enhan-cement.

Description

A kind of online sound enhancement method under the environment suitable for nonstationary noise
Technical field
The present invention relates to field of speech enhancement, the online voice referred in particular under a kind of environment suitable for nonstationary noise increases Strong method.
Background technique
In speech recognition front-ends treatment process, voice signal always by various noise jammings and floods, due to interference Randomness, can only signal processing technology go as far as possible enhance voice quality.The main purpose of speech enhan-cement is from noisy speech In extract pure raw tone.
Common voice enhancement algorithm mainly include the following types:
1, noise cancellation method: this method is directly to subtract noise component(s) from noisy speech according in a time domain or in a frequency domain What the method gone was realized.The maximum feature of this method is to need using background signal as reference signal, reference signal accurately with The no performance for directly determining this method.
2, harmonic signal enhancement method: since the voiced sound in voice has significantly periodically, this periodically reflection is into frequency domain It is then a series of peak components one by one for respectively corresponding fundamental frequency (fundamental tone) and its harmonic wave, these frequency components occupy voice Most of energy can carry out speech enhan-cement using this periodicity, and fundamental tone and its harmonic wave point are extracted using comb filter Amount, inhibits other periodic noises and aperiodic broadband noise.
3, the enhancing algorithm based on speech production model: the voiced process of voice can be modeled as a linear time-varying filtering Device.Different driving sources is used to different types of voice.In the generation model of voice, most widely used is full pole mould Type.Based on the available a series of voice enhancement algorithm of speech production model, such as time-varying parameter Wiener filtering and Kalman Filtering method.
4, the enhancing algorithm based on short time spectrum: there are many enhancing algorithm type based on voice short time spectrum, such as compose Subtractive method, Wiener Filter Method, LMSE method etc..SNR ranges are big, method is simple, are easy to adapting to for such method In real time the advantages that processing.
5, the enhancing algorithm based on wavelet decomposition: wavelet decomposition method is the hair with this tool of mathematical analysis of wavelet decomposition It opens up and grows up, while it combines some basic principles of subtractive method of spectrums again.
6, the enhancing algorithm based on sense of hearing shielding: sense of hearing screen method is calculated using a kind of enhancing of the auditory properties of human ear Method.
Voice enhancement algorithm based on Kalman filtering belong to above the third, traditional Kalman filtering carry out voice increasing There are two important hypothesis when strong: process noise and the measurement equal Gaussian distributed of noise.Traditional Kalman filtering is in actual speech Following both sides limitation is shown in enhancing: 1. the estimation of AR parameter must be accurate.However environment is acquired in actual speech In, noise be it is continually changing, this requires the estimations of AR parameter in speech model should have real-time, while should be in AR parameter The influence of various noises is considered in estimation procedure, otherwise will lead to the decline of speech enhan-cement performance.2. traditional Kalman filtering is calculated Method only considers that the case where Gaussian noise does not meet practical application.It can be by a kind of nonstationary noise (tool during speech signal collection Have sparsity, obey laplacian distribution) pollution, it is not common, but is implicitly present in and is affected to voice quality.If In speech enhan-cement, when by nonstationary noise as Gaussian noise processing, it will it is serious to reduce speech enhan-cement quality, it is unfavorable for subsequent The identification of voice semanteme.
Based on the above issues, provide a kind of can handle Gaussian noise and in the case of nonstationary noise exists simultaneously in real time Online speech enhancement technique is very important.
Summary of the invention
The technical problem to be solved by the present invention is to can not handle AR in speech model for existing kalman filter method Parameter can not have nonstationary noise during real-time update, measurement, in conjunction with convex optimisation technique, provide one kind and be suitable for Online sound enhancement method under nonstationary noise environment, being capable of On-line Estimation AR parameter and nonstationary noise.
To achieve the above object, technical solution provided by the present invention are as follows: under a kind of environment suitable for nonstationary noise Online sound enhancement method, comprising the following steps:
1) system model under nonstationary noise environment is established
1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model
The generation process of voice signal is one by white-noise excitation, through the output of full pole linear system from recurrence mistake Journey, i.e., current output are equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is an autoregression AR model, is expressed as follows:
Wherein, u (k) is the white Gaussian noise excitation value at k moment;S (k-i) is the voice signal at (k-i) moment;s(k) For the voice signal at kth moment;aiFor i-th of linear predictor coefficient, also referred to as AR model parameter;P is the rank of AR model parameter Number;
The voice signal model for meeting practical measurement process is established, it is as follows that voice signal measures process description:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, Y (k) is k moment voice signal measurement sequence;S (k) is the voice signal at k moment;N (k) is that the k moment is high This white noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity;
1.2) voice signal state-space model is established
Formula (1) and formula (2) are converted into state-space model, are described as follows:
X (k)=FX (k-1)+p (k) (3)
Y (k)=CX (k)+n (k)+v (k) (4)
Wherein,
C=[0 0 ... 0 1] (6)
X (k)=[S (k-p+1) ... S (k)]T (7)
In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state Estimated sequence, the i.e. optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;Y(k) For k moment voice signal measurement sequence;F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in Fp(k) … a1(k)] it is known as AR parameter;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, is obeyed high This distribution;N (k) is to measure noise, Gaussian distributed at the k moment;V (k) is the nonstationary noise at k moment, obeys Laplce Distribution;
The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:
E (p (k))=q, E (n (k))=r
E(p(k)p(j)T)=Q δkj,E(n(k)n(j)T)=R δkj (8)
Wherein, q and r is respectively the mean value of noise p (k) He n (k);Q and R is respectively the covariance of noise p (k) He n (k); δkjFor Kronecker function;Speech Enhancement problem is to go to estimate optimal voice under the premise of known measurement voice signal Y (k) Signal X (k);
2) framing and adding window
Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, this makes it possible to voice is believed Number it is divided into some short sections to be handled, here it is framing, the framing of voice signal is using moveable finite length Method that window is weighted is realized;Frame number usually per second is 33~100 frames, and framing method is the side of overlapping segmentation The overlapping part of method, former frame and a later frame is known as frame shifting, and it is 0~0.5 that frame, which is moved with the ratio of frame length,;
3) system initialization
3.1) improved Kalman filter device parameter initialization
Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is just Fixed;
3.2) AR parameter initialization
It initializes AR parameter state estimated sequence θ (0/0);
4) estimate AR parameter
AR parameter refers to the last line [a in formula (3) in state-transition matrix Fp(k) … a1(k)], it is mainly used to Speech production process is described, accuracy has direct influence to the result of speech enhan-cement;It proposes in the estimation of AR parameter Comprehensively consider voice signal state estimation sequence X (k-1), state-noise q (k), measure noise n (k), nonstationary noise v (k), New AR parameter Estimation state-space model is established, realizes the online Robust Estimation of AR parameter, and to the real-time estimation mistake of AR parameter Journey is as follows:
4.1) parameter estimation model of AR parameter is established
AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:
Wherein, θ (k)=[ap(k) … a1(k)]TFor k moment AR parameter state sequence;Q (k) is k moment state-noise, Gaussian distributed, covariance matrix are Q (k);R (k) is to measure noise, Gaussian distributed, covariance matrix at the k moment For R (k);V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k);A =X (k-1)T=[S (k-p) ... S (k-1)] is measurement matrix;Y (k) is k moment voice signal measurement sequence;State and amount Survey the statistical property of noise q (k) and r (k) are as follows:
E (q (k))=d, E (r (k))=l
E(q(k)q(j)T)=D δkj,E(r(k)r(j)T)=L δkj (10)
Wherein, d and l is respectively the mean value of noise q (k) He r (k);D and L is respectively the covariance of noise q (k) He r (k); δkjFor Kronecker function;
4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic, the state-space model of traditional Kalman filtering are free of nonstationary noise v (k), as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=A θ (k)+r (k) (11)
According to bayesian principle, AR Parameter Estimation Problem is expressed as under the premise of metric data Y (k) is known, and estimation is most Excellent AR argument sequence θ (k), it may be assumed that
According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:
Wherein, Ψ beThe covariance matrix Ψ of conditional probability p in known situation (θ (k) | Y (k)) (k)=Pθ(k | k)+D (k), wherein Pθ(k | k) it is covariance updated value;As likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) when obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value;Observation type (13) and formula (14) discovery are most Bigization likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it obtains following excellent Change form:
Subjiect to Y (k)=A θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, Ψ (k)=Pθ(k | k)+D (k) is the covariance matrix of Gaussian noise;θ(k) Estimated value beR (k) is exactly the estimation to Gaussian noise;Pθ(k | k) be that covariance updates matrix:
Pθ(k | k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) be covariance prediction matrix:
Pθ(k | k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) it is covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1 (18)
4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is benefit It can be excellent after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) with the sparse characteristic of noise Increase the sparsity constraints of nonstationary noise v (k) in change to complete the estimation to sparse noise, new optimization form are as follows:
Wherein, v (k) is sparse noise, by that can obtain estimating the optimal of AR parameter to above-mentioned optimization problem solving It counts θ (k),The optimization problem that formula (17) indicates is a convex optimization problem, the interior point being able to use in engineering Method is solved;
5) estimated speech signal status switch
5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic, the state-space model of traditional Kalman filtering are as follows:
X (k)=FX (k-1)+p (k) (20)
Y (k)=CX (k)+n (k) (21)
According to bayesian principle, Kalman filtering problem is expressed as under the premise of metric data Y (k) is known, and estimation is most Excellent voice status sequence X (k), it may be assumed that
According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):
Wherein, Θ beThe covariance square of conditional probability p in known situation (X (k) | Y (k-1)) Battle array Θ=FP (k-1 | k-1) FT+ Q (k-1), wherein P (k-1 | k-1) is covariance updated value;As likelihood function condition L1(Y(k), X (k)) and L2When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value;Observation type (23) and formula (24) discovery maximizes likelihood function condition L1(Y (k), X (k)) and L2(X (k)), which is equivalent to, minimizes power exponent in likelihood function Exponential partWithTherefore Optimize form to following:
Subjiect to Y (k)=CX (k)+n (k) (25)
Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise;The estimated value of X (k) isN (k) is exactly the estimation to Gaussian noise;
P (k | k) be that covariance updates matrix:
P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)T+Q(k-1) (27)
Kθ(k) it is covariance gain:
K (k)=P (k | k-1) CT(CP(k|k-1)CT+R(k-1))-1 (28)
5.2) from convex optimization angle building to the estimation problem of sparse noise
The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman After filtering problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to dilute Dredge the estimation of noise, new optimization form are as follows:
Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, by obtaining to the optimal of molten bath centroid position to above-mentioned optimization problem solving Estimate X (k), X (k) is the optimal estimation in traditional Kalman filtering to state valueThe optimization that formula (29) indicates is asked An entitled convex optimization problem, the interior point method being able to use in engineering are solved;
5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, will be used for The AR parameter θ (k+1) for updating the k+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by institute There is Speech processing complete.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, the present invention cannot change in real time more for AR parameter in speech model (especially autoregression AR model) with noise New problem proposes double card Kalman Filtering frame, two Kalman filter concurrent operations, voice signal state estimation and AR Parameter Estimation updates mutually, state estimation procedure and parameter estimation procedure alternately so that parameter estimation procedure can adapt to Noise change procedure to improve the accuracy of system model, and then improves the performance of speech enhan-cement.
2, the present invention is aiming at the problem that traditional Kalman filter algorithm can not handle nonstationary noise, in conjunction with convex optimization skill Art proposes improved Kalman filter frame.New algorithm joined Gauss to measurement process in speech enhan-cement model simultaneously Noise and nonstationary noise item can be to Gaussian noises and non-flat by establishing reasonable Optimized model using convex optimisation technique Steady noise is accurately estimated, the accuracy of speech enhan-cement is improved.
Detailed description of the invention
Fig. 1 is the flow chart of the sound enhancement method under nonstationary noise.
Fig. 2 a is primary speech signal schematic diagram.
Fig. 2 b is the voice signal schematic diagram with white Gaussian noise.
Fig. 2 c is the voice signal schematic diagram with white Gaussian noise and nonstationary noise.
Fig. 3 is the voice enhancement algorithm flow chart based on dual improved Kalman filter.
Fig. 4 a is primary speech signal.
Fig. 4 b is speech enhan-cement result schematic diagram.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
As shown in Figure 1, it is suitable for the online sound enhancement method under nonstationary noise environment described in the present embodiment, including Following steps:
1) system model under nonstationary noise environment is established
1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model
The generation process of voice signal can be described as one by white-noise excitation, through the output of full pole linear system from Recursive procedure, i.e., current output are equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is one Autoregression AR model, is expressed as follows
Wherein, u (k) is the white Gaussian noise excitation value at k moment;S (k-i) is the voice signal at (k-i) moment;s(k) For the voice signal at kth moment;aiFor i-th of linear predictor coefficient, also referred to as AR model parameter;P is the rank of AR model parameter Number.
As shown in Fig. 2 a, 2b, 2c, the voice signal observed in actual environment can be by various noise pollutions, especially right and wrong Stationary noise is proposed to consider Gaussian noise and nonstationary noise simultaneously during voice signal measures in the present invention, be established more Meet the voice signal model of practical measurement process.Voice signal in the present invention measures process and can be described as follows:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, Y (k) is k moment voice signal measurement sequence;S (k) is the voice signal at k moment;N (k) is that the k moment is high This white noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity.
1.2) voice signal state-space model is established
Formula (1) and formula (2) are converted into state-space model, can be described as follows:
X (k)=FX (k-1)+p (k) (3)
Y (k)=CX (k)+n (k)+v (k) (4)
Wherein
C=[0 0 ... 0 1] (6)
X (k)=[S (k-p+1) ... S (k)]T (7)
In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state Estimated sequence, the i.e. optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;Y(k) For k moment voice signal measurement sequence;F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in Fp(k) … a1(k)] it is known as AR parameter.;C=[0 0 ... 0 1] is to measure transfer matrix;P (k) is k moment state-noise, is obeyed high This distribution;N (k) is to measure noise, Gaussian distributed at the k moment;V (k) is the nonstationary noise at k moment, obeys Laplce Distribution.
The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:
E (p (k))=q, E (n (k))=r
E(p(k)p(j)T)=Q δkj,E(n(k)n(j)T)=R δkj (8)
Wherein, q and r is respectively the mean value of noise p (k) He n (k);Q and R is respectively the covariance of noise p (k) He n (k). δkjFor Kronecker function.Speech Enhancement problem is to go to estimate optimal voice under the premise of known measurement voice signal Y (k) Signal X (k).
2) framing and adding window
Voice signal has short-term stationarity (10~30ms in it is considered that voice signal approximation constant), thus can be with Voice signal is divided into some short sections to be handled, here it is framing, the framing of voice signal is using movably having Method that the window of limit for length's degree is weighted is realized.General frame number per second is about 33~100 frames.General framing method For the method for overlapping segmentation, the overlapping part of former frame and a later frame is known as frame shifting, frame move with the ratio of frame length be generally 0~ 0.5.Frame length is 25ms in the present invention, and it is 10ms that frame, which moves,.
3) system initialization
3.1) improved Kalman filter device parameter initialization
Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is just Fixed.
3.2) AR parameter initialization
It initializes AR parameter state estimated sequence θ (0/0), the order of AR parameter (rule of thumb sets in the present invention for 13 It is fixed).
4) estimate AR parameter
AR parameter refers to the last line [a in formula (3) in state-transition matrix Fp(k) … a1(k)], it is mainly used to Speech production process is described, accuracy has direct influence to the result of speech enhan-cement.AR parameter Estimation in practical application It is larger by voice signal itself, various influence of noises, therefore propose to comprehensively consider voice in the estimation of AR parameter in the present invention Signal condition estimated sequence X (k-1), state-noise q (k), noise n (k), nonstationary noise v (k) etc. are measured, establishes new AR Parameter Estimation state-space model realizes the online Robust Estimation of AR parameter, this is a core point of the invention.As shown in figure 3, It is as follows to the real-time estimation process of AR parameter:
4.1) parameter estimation model of AR parameter is established
AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:
Wherein θ (k)=[ap(k) … a1(k)]TFor k moment AR parameter state sequence;Q (k) is k moment state-noise, Gaussian distributed, covariance matrix are Q (k);R (k) is to measure noise, Gaussian distributed, covariance matrix at the k moment For R (k);V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k);A =X (k-1)T=[S (k-p) ... S (k-1)] is measurement matrix;Y (k) is k moment voice signal measurement sequence.State and amount Survey the statistical property of noise q (k) and r (k) are as follows:
E (q (k))=d, E (r (k))=l
E(q(k)q(j)T)=D δkj,E(r(k)r(j)T)=L δkj (10)
Wherein, d and l is respectively the mean value of noise q (k) He r (k);D and L is respectively the covariance of noise q (k) He r (k). δkjFor Kronecker function.
4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic.The state-space model (being free of nonstationary noise v (k)) of traditional Kalman filtering is as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=A θ (k)+r (k) (11)
According to bayesian principle, AR Parameter Estimation Problem can be expressed as estimating under the premise of metric data Y (k) is known Count optimal AR argument sequence θ (k), it may be assumed that
According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:
Wherein, Ψ beThe covariance matrix Ψ of conditional probability p in known situation (θ (k) | Y (k)) (k)=Pθ(k | k)+D (k) (wherein Pθ(k | k) be covariance updated value).As likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) when obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value.Observation type (13) and formula (14) discovery are most Bigization likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it is available such as Lower optimization form:
Subjiect to Y (k)=A θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, Ψ (k)=Pθ(k | k)+D (k) is the covariance matrix of Gaussian noise.θ(k) Estimated value beR (k) is exactly the estimation to Gaussian noise.Pθ(k | k) be that covariance updates matrix:
Pθ(k | k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) be covariance prediction matrix:
Pθ(k | k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) it is covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1 (18)
4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is benefit It can be excellent after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) with the sparse characteristic of noise Increase the sparsity constraints of nonstationary noise v (k) in change to complete the estimation to sparse noise, new optimization form are as follows:
Subjiect to Y (k)=A θ (k)+r (k)+v (k)
Wherein, v (k) is sparse noise, available to estimate to the optimal of AR parameter by above-mentioned optimization problem solving Meter θ (k) (note:), formula (17) indicate optimization problem be a convex optimization problem, can be used in engineering compared with It is solved for mature interior point method.
5) estimated speech signal status switch.
During speech signal collection, nonstationary noise is affected to voice quality.In order to improve voice quality, Voice enhancement algorithm allows for the case where coping with Gaussian noise and nonstationary noise mixing simultaneously.Nonstationary noise is generally obeyed Laplacian distribution has sparse characteristic, the estimation of nonstationary noise is mainly utilized the sparse characteristic of noise.For convenience Noise sparsity constraints are introduced in optimization problem, use convex optimisation technique by traditional Kalman filtering problem reformulation for one first Then a convex optimization problem introduces the sparsity constraints to sparse noise in the optimization newly constructed, is finally completed speech enhan-cement Task, this is another core point of the invention.
5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering from the angle of convex optimization and ask Topic.The state-space model of traditional Kalman filtering is as follows:
X (k)=FX (k-1)+p (k) (20)
Y (k)=CX (k)+n (k) (21)
According to bayesian principle, Kalman filtering problem can be expressed as estimating under the premise of metric data Y (k) is known Count optimal voice status sequence X (k), it may be assumed that
According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):
Wherein, Θ beThe covariance square of conditional probability p in known situation (X (k) | Y (k-1)) Battle array Θ=FP (k-1 | k-1) FT+ Q (k-1) (wherein P (k-1 | k-1) is covariance updated value).As likelihood function condition L1(Y (k), X (k)) and L2When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value.Observation type (23) With formula (24) it can be found that maximizing likelihood function condition L1(Y (k), X (k)) and L2(X (k)) is equivalent to minimum likelihood function The exponential part of middle power exponentWith Therefore available following optimization form:
Subjiect to Y (k)=CX (k)+n (k) (25)
Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise.The estimated value of X (k) isN (k) is exactly the estimation to Gaussian noise.
P (k | k) be that covariance updates matrix:
P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)T+Q(k-1) (27)
Kθ(k) it is covariance gain:
K (k)=P (k | k-1) CT(CP(k|k-1)CT+R(k-1))-1 (28)
5.2) from convex optimization angle building to the estimation problem of sparse noise
The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman After filtering problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to dilute Dredge the estimation of noise, new optimization form are as follows:
Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, available to molten bath centroid position by above-mentioned optimization problem solving (note: X (k) is the optimal estimation in traditional Kalman filtering to state value to optimal estimation X (k)), formula (29) table The optimization problem shown is a convex optimization problem, and interior point method more mature in engineering can be used and solved.
5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, will be used for The AR parameter θ (k+1) for updating the k+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by institute There is Speech processing complete.
As shown in Figs. 4a and 4b, it can relatively accurately make an uproar to Gaussian noise and non-stationary by method proposed by the present invention Sound is filtered out, and is enhanced primitive sound signal.
Using the present invention, white noise and nonstationary noise can be accurately estimated and filtered out, realize white noise and non-stationary Speech enhan-cement under noise mixing, while more pure estimated speech signal is provided, it is mentioned for the raising of speech recognition accuracy It is supported for front end.
Since the present invention establishes two Robust Kalman Filter models, the generating process model of voice signal is carried out Mathematical modeling has all done on the temporal characteristics and time-varying characteristics of voice and has targetedly considered, AR parameter Estimation takes dynamic real Shi Gengxin iteration, meets the requirement of parameter time varying characteristic, but can every frame go estimated speech signal to utilize by state estimation Voice short-term stationarity characteristic is worthy to be popularized so that filter effect is better than traditional Kalman filtering in result.
Embodiment described above is only the preferred embodiments of the invention, and implementation model of the invention is not limited with this It encloses, therefore all shapes according to the present invention, changes made by principle, should all be included within the scope of protection of the present invention.

Claims (1)

1. the online sound enhancement method under a kind of environment suitable for nonstationary noise, which comprises the following steps:
1) system model under nonstationary noise environment is established
1.1) establish Gaussian noise and sparse noise exist jointly in the case of autoregression AR model
The generation process of voice signal is one by white-noise excitation, through the output of full pole linear system from recursive procedure, i.e., Current output is equal to the pumping signal of present moment and the weighted sum of p moment output in the past, this is an autoregression AR mould Type is expressed as follows:
Wherein, u (k) is the white Gaussian noise excitation value at k moment;S (k-i) is the voice signal at (k-i) moment;S (k) is the The voice signal at k moment;aiFor i-th of linear predictor coefficient, also referred to as AR model parameter;P is the order of AR model parameter;
The voice signal model for meeting practical measurement process is established, it is as follows that voice signal measures process description:
Y (k)=s (k)+n (k)+v (k) (2)
Wherein, Y (k) is k moment voice signal measurement sequence;S (k) is the voice signal at k moment;N (k) is k moment white Gaussian Noise;V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity;
1.2) voice signal state-space model is established
Formula (1) and formula (2) are converted into state-space model, are described as follows:
X (k)=FX (k-1)+p (k) (3)
Y (k)=CX (k)+n (k)+v (k) (4)
Wherein,
C=[0 0...0 1] (6)
X (k)=[S (k-p+1) ... S (k)]T (7)
In voice signal state equation (3) and voice signal measurement equation (4), X (k) is k moment voice signal state estimation Sequence, the i.e. optimal State Estimation of voice signal;X (k-1) is (k-1) moment voice signal state estimation sequence;When Y (k) is k Carve voice signal measurement sequence;F is the state-transition matrix that linear predictor coefficient is constituted, the last line [a in Fp(k)…a1 (k)] it is known as AR parameter;C=[0 0...0 1] is to measure transfer matrix;P (k) is k moment state-noise, Gaussian distributed; N (k) is to measure noise, Gaussian distributed at the k moment;V (k) is the nonstationary noise at k moment, obeys laplacian distribution;
The state of voice signal and the statistical property for measuring noise p (k) and n (k) are as follows:
E (p (k))=q, E (n (k))=r
E(p(k)p(j)T)=Q δkj,E(n(k)n(j)T)=R δkj (8)
Wherein, q and r is respectively the mean value of noise p (k) He n (k);Q and R is respectively the covariance of noise p (k) He n (k);δkjFor Kronecker function;Speech Enhancement problem is to go to estimate optimal voice signal X under the premise of known measurement voice signal Y (k) (k);
2) framing and adding window
Voice signal has short-term stationarity, thinks that voice signal is constant in 10--30ms, and this makes it possible to voice signal point It is handled for some short sections, here it is framing, the framing of voice signal is the window using moveable finite length The method that is weighted is realized;Frame number usually per second is 33~100 frames, and framing method is the method for overlapping segmentation, preceding One frame and the overlapping part of a later frame are known as frame shifting, and it is 0~0.5 that frame, which is moved with the ratio of frame length,;
3) system initialization
3.1) improved Kalman filter device parameter initialization
Voice signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, guarantees that covariance matrix is positive definite;
3.2) AR parameter initialization
It initializes AR parameter state estimated sequence θ (0/0);
4) estimate AR parameter
AR parameter refers to the last line [a in formula (3) in state-transition matrix Fp(k)…a1(k)], it is mainly used to describe language Sound generating process, accuracy have direct influence to the result of speech enhan-cement;It is proposed that synthesis is examined in the estimation of AR parameter Consider voice signal state estimation sequence X (k-1), state-noise q (k), measure noise n (k), nonstationary noise v (k), establishes new AR parameter Estimation state-space model, realize the online Robust Estimation of AR parameter, and it is as follows to the real-time estimation process of AR parameter:
4.1) parameter estimation model of AR parameter is established
AR parameter model under Gaussian noise and the mixed lower environment of nonstationary noise is described as follows:
Wherein, θ (k)=[ap(k)...a1(k)]TFor k moment AR parameter state sequence;Q (k) is k moment state-noise, is obeyed Gaussian Profile, covariance matrix are Q (k);R (k) is to measure noise, Gaussian distributed, covariance matrix R at the k moment (k);V (k) is k moment nonstationary noise, obeys laplacian distribution, has sparsity, and covariance matrix is W (k);A=X (k-1)T=[S (k-p) ... S (k-1)] is measurement matrix;Y (k) is k moment voice signal measurement sequence;State and measurement are made an uproar The statistical property of sound q (k) and r (k) are as follows:
E (q (k))=d, E (r (k))=l
E(q(k)q(j)T)=D δkj,E(r(k)r(j)T)=L δkj (10)
Wherein, d and l is respectively the mean value of noise q (k) He r (k);D and L is respectively the covariance of noise q (k) He r (k);δkjFor Kronecker function;
4.2) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering problem from the angle of convex optimization, pass The state-space model for Kalman filtering of uniting is free of nonstationary noise v (k), as follows:
θ (k)=θ (k-1)+q (k)
Y (k)=A θ (k)+r (k) (11)
According to bayesian principle, AR Parameter Estimation Problem is expressed as estimating optimal AR under the premise of metric data Y (k) is known Argument sequence θ (k), it may be assumed that
According to maximal possibility estimation theory, the likelihood function of p (Y (k) | θ (k)) and p (θ (k)) are established:
Wherein, Ψ beThe covariance matrix Ψ (k) of conditional probability p in known situation (θ (k) | Y (k)) =Pθ(k | k)+D (k), wherein Pθ(k | k) it is covariance updated value;As likelihood function condition L1(Y (k), θ (k)) and L2(θ(k)) When obtaining maximum, and conditional probability p (Y (k) | θ (k)) obtain optimal estimation value;Observation type (13) and formula (14) discovery maximize Likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) is equivalent to the exponential part for minimizing power exponent in likelihood functionWithTherefore it obtains following excellent Change form:
Subjiect to Y (k)=A θ (k)+r (k) (15)
Wherein, θ (k) and r (k) is variable, Ψ (k)=Pθ(k | k)+D (k) is the covariance matrix of Gaussian noise;The estimation of θ (k) Value isR (k) is exactly the estimation to Gaussian noise;Pθ(k | k) be that covariance updates matrix:
Pθ(k | k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) be covariance prediction matrix:
Pθ(k | k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) it is covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1 (18)
4.3) optimization problem estimated nonstationary noise is constructed from convex optimization angle
Nonstationary noise obeys laplacian distribution, has sparse characteristic, and the core concept of nonstationary noise estimation is to utilize to make an uproar The sparse characteristic of sound can be in optimization after traditional Kalman filtering problem is converted convex optimization problem by step 4.2) Increase the sparsity constraints of nonstationary noise v (k) to complete the estimation to sparse noise, new optimization form are as follows:
Wherein, v (k) is sparse noise, by that can obtain the optimal estimation θ to AR parameter to above-mentioned optimization problem solving (k),The optimization problem that formula (17) indicates is a convex optimization problem, the interior point method being able to use in engineering It is solved;
5) estimated speech signal status switch
5.1) traditional Kalman filtering problem is reconstructed from convex optimization angle
In order to easily estimate sparse noise, needs to reconstruct Kalman filtering problem from the angle of convex optimization, pass The state-space model for Kalman filtering of uniting is as follows:
X (k)=FX (k-1)+p (k) (20)
Y (k)=CX (k)+n (k) (21)
According to bayesian principle, Kalman filtering problem is expressed as estimating optimal language under the premise of metric data Y (k) is known Sound-like state sequence X (k), it may be assumed that
According to maximal possibility estimation theory, establish p (Y (k) | X (k)) and p (likelihood function of X (k):
Wherein, Θ beThe covariance matrix Θ of conditional probability p in known situation (X (k) | Y (k-1))= FP(k-1|k-1)FT+ Q (k-1), wherein P (k-1 | k-1) is covariance updated value;As likelihood function condition L1(Y(k),X(k)) And L2When (X (k)) obtains maximum, and conditional probability p (X (k) | Y (k)) obtain optimal estimation value;Observation type (23) and formula (24) It was found that maximizing likelihood function condition L1(Y (k), X (k)) and L2(X (k)) is equivalent to the finger for minimizing power exponent in likelihood function Number partWithTherefore obtain as Lower optimization form:
Subjiect to Y (k)=CX (k)+n (k) (25)
Wherein, X (k) and n (k) is variable, and Θ is the covariance matrix of Gaussian noise;The estimated value of X (k) isn (k) it is exactly estimation to Gaussian noise;
P (k | k) be that covariance updates matrix:
P (k | k)=(I-K (k) C (k)) P (k | k-1) (26)
P (k | k-1) be covariance prediction matrix:
P (k | k-1)=F (k-1) P (k-1 | k-1) F (k-1)T+Q(k-1) (27)
Kθ(k) it is covariance gain:
K (k)=P (k | k-1) CT(CP(k|k-1)CT+R(k-1))-1 (28)
5.2) from convex optimization angle building to the estimation problem of sparse noise
The core concept of the estimation of sparse noise is the sparse characteristic using noise, by step 5.1) by traditional Kalman filtering After problem is converted into convex optimization problem, the sparsity constraints of sparse noise v (k) can be increased in optimization to complete to make an uproar to sparse The estimation of sound, new optimization form are as follows:
Subjiect to Y (k)=CX (k)+n (k)+v (k) (29)
Wherein, v (k) is sparse noise, by obtaining the optimal estimation to molten bath centroid position to above-mentioned optimization problem solving X (k), X (k) are the optimal estimation in traditional Kalman filtering to state valueFormula (29) indicate optimization problem be One convex optimization problem, the interior point method being able to use in engineering are solved;
5.3) after completing to the enhancing of k moment voice signal, enhance resultStep 4) will be returned to, for updating k The AR parameter θ (k+1) at+1 moment is further continued for carrying out the speech enhan-cement at k+1 moment later, estimate X (k+1), until by all languages Sound signal has been handled.
CN201610843483.0A 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise Expired - Fee Related CN106340304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610843483.0A CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610843483.0A CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Publications (2)

Publication Number Publication Date
CN106340304A CN106340304A (en) 2017-01-18
CN106340304B true CN106340304B (en) 2019-09-06

Family

ID=57840174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610843483.0A Expired - Fee Related CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Country Status (1)

Country Link
CN (1) CN106340304B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248212B (en) * 2019-05-27 2020-06-02 上海交通大学 Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system
CN110648680A (en) * 2019-09-23 2020-01-03 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN112557925B (en) * 2020-11-11 2023-05-05 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890935A (en) * 2012-10-22 2013-01-23 北京工业大学 Robust speech enhancement method based on fast Kalman filtering
CN103323815A (en) * 2013-03-05 2013-09-25 上海交通大学 Underwater acoustic locating method based on equivalent sound velocity
CN103903630A (en) * 2014-03-18 2014-07-02 北京捷通华声语音技术有限公司 Method and device used for eliminating sparse noise

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010091077A1 (en) * 2009-02-03 2010-08-12 University Of Ottawa Method and system for a multi-microphone noise reduction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102890935A (en) * 2012-10-22 2013-01-23 北京工业大学 Robust speech enhancement method based on fast Kalman filtering
CN103323815A (en) * 2013-03-05 2013-09-25 上海交通大学 Underwater acoustic locating method based on equivalent sound velocity
CN103903630A (en) * 2014-03-18 2014-07-02 北京捷通华声语音技术有限公司 Method and device used for eliminating sparse noise

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种具有在线参数调整功能的Kalman滤波及其应用;吴飞;《计算机工程与科学》;20120615(第6期);全文
基于凸优化技术的改进型卡尔曼滤波算法;冯宝;《自动化与信息工程》;20141015;第35卷(第5期);全文
鲁棒卡尔曼算法及其应用研究;吴飞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130115(第1期);全文

Also Published As

Publication number Publication date
CN106340304A (en) 2017-01-18

Similar Documents

Publication Publication Date Title
CN107845389B (en) Speech enhancement method based on multi-resolution auditory cepstrum coefficient and deep convolutional neural network
CN109524020B (en) Speech enhancement processing method
CN106971741B (en) Method and system for voice noise reduction for separating voice in real time
WO2020107269A1 (en) Self-adaptive speech enhancement method, and electronic device
Khorram et al. Capturing long-term temporal dependencies with convolutional networks for continuous emotion recognition
CN107393550A (en) Method of speech processing and device
CN111261183B (en) Method and device for denoising voice
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
CN106340304B (en) A kind of online sound enhancement method under the environment suitable for nonstationary noise
CN103559888A (en) Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle
CN106971740A (en) Probability and the sound enhancement method of phase estimation are had based on voice
CN109192200A (en) A kind of audio recognition method
CN110797033A (en) Artificial intelligence-based voice recognition method and related equipment thereof
CN114242098A (en) Voice enhancement method, device, equipment and storage medium
CN115171712A (en) Speech enhancement method suitable for transient noise suppression
CN113241089B (en) Voice signal enhancement method and device and electronic equipment
CN103903624B (en) Periodical pitch detection method under a kind of gauss heat source model environment
Sun et al. Wavelet denoising method based on improved threshold function
CN113066483B (en) Sparse continuous constraint-based method for generating countermeasure network voice enhancement
Deng et al. Sparse HMM-based speech enhancement method for stationary and non-stationary noise environments
CN113793615A (en) Speaker recognition method, model training method, device, equipment and storage medium
Ding et al. Suppression of additive noise using a power spectral density MMSE estimator
CN102256201A (en) Automatic environmental identification method used for hearing aid
CN107993666B (en) Speech recognition method, speech recognition device, computer equipment and readable storage medium
Chinaev et al. A generalized log-spectral amplitude estimator for single-channel speech enhancement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190906

CF01 Termination of patent right due to non-payment of annual fee