CN106340304A - Online speech enhancement method for non-stationary noise environment - Google Patents

Online speech enhancement method for non-stationary noise environment Download PDF

Info

Publication number
CN106340304A
CN106340304A CN201610843483.0A CN201610843483A CN106340304A CN 106340304 A CN106340304 A CN 106340304A CN 201610843483 A CN201610843483 A CN 201610843483A CN 106340304 A CN106340304 A CN 106340304A
Authority
CN
China
Prior art keywords
noise
theta
estimation
parameter
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610843483.0A
Other languages
Chinese (zh)
Other versions
CN106340304B (en
Inventor
冯宝
张绍荣
孙山林
郑伟
张国宁
武博
韦周耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Aerospace Technology
Original Assignee
Guilin University of Aerospace Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Aerospace Technology filed Critical Guilin University of Aerospace Technology
Priority to CN201610843483.0A priority Critical patent/CN106340304B/en
Publication of CN106340304A publication Critical patent/CN106340304A/en
Application granted granted Critical
Publication of CN106340304B publication Critical patent/CN106340304B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention provides an online speech enhancement method for a non-stationary noise environment. The method comprises the steps of (1) establishing a system model in a non-stationary noise environment, (2) framing and windowing, (3) carrying out system initialization, (4) estimating an AR parameter, and (5) estimating a speech signal state sequence. For a problem that the AR parameter in a speech model can not be updated with noise change in real time, the invention put forward a dual Calman filtering frame, two Calman filters are in parallel computing, speech signal state estimation and AR parameter estimation are in mutual updating, a data estimation process and a parameter estimation process are carried out alternately, thus the parameter estimation process can be adapted to the noise change process so as to improve the accuracy of the system model, and thus the performance of speech enhancement is enhanced. For a problem that a traditional Calman filtering algorithm can not process non-stationary noise, combined with a convex optimization technique, an improved Calman filtering frame is put forward, Gauss noise and non-stationary noise can be accurately estimated, and the accuracy of speech enhancement is improved.

Description

Online voice enhancement method suitable for non-stationary noise environment
Technical Field
The invention relates to the field of voice enhancement, in particular to an online voice enhancement method suitable for a non-stationary noise environment.
Background
In the process of speech recognition front-end processing, speech signals are always interfered and submerged by various noises, and due to the randomness of the interference, only signal processing technology can enhance the speech quality as much as possible. The main purpose of speech enhancement is to extract clean original speech from noisy speech.
The following common speech enhancement algorithms are mainly used:
1. noise cancellation this is done according to a method in which the noise component is subtracted directly from the noisy speech in the time or frequency domain. The method is mainly characterized in that a background signal is required to be used as a reference signal, and whether the reference signal is accurate or not directly determines the performance of the method.
2. The harmonic enhancement method is characterized in that voiced sound in the voice has obvious periodicity, the periodicity is reflected in a frequency domain to be a series of peak components respectively corresponding to fundamental frequency (fundamental tone) and harmonic thereof, the frequency components occupy most energy of the voice, the periodicity can be used for voice enhancement, and a comb filter is adopted to extract the fundamental tone and the harmonic component thereof so as to inhibit other periodic noise and aperiodic broadband noise.
3. Based on the enhancement algorithm of the speech generation model, the sound production process of the speech can be modeled as a linear time-varying filter. Different excitation sources are used for different types of speech. Among the generative models of speech, the most widely used are all-pole models. Based on the speech generation model, a series of speech enhancement algorithms, such as time-varying parameter wiener filtering and kalman filtering methods, can be obtained.
4. The enhancement algorithm based on the short-time spectrum estimation has various types, such as a spectral subtraction method, a wiener filtering method, a minimum mean square error method and the like. The method has the advantages of large range of signal-to-noise ratio, simple method, easy real-time processing and the like.
5. Wavelet decomposition method is developed along with the development of a mathematical analysis tool of wavelet decomposition, and combines some basic principles of spectral subtraction.
6. Auditory masking method is an enhancement algorithm that utilizes the auditory properties of the human ear.
The speech enhancement algorithm based on kalman filtering belongs to the third category above, and there are two important assumptions in the conventional kalman filtering when performing speech enhancement: both process noise and metrology noise follow gaussian distributions. The traditional Kalman filtering has the following limitations in the actual speech enhancement: estimation of AR parameters must be accurate. However, in an actual speech acquisition environment, noise is constantly changing, which requires that the estimation of the AR parameters in the speech model should have real-time performance, and the influence of various noises should be considered in the estimation process of the AR parameters, otherwise, the speech enhancement performance is reduced. Secondly, the traditional Kalman filtering algorithm only considers the situation of Gaussian noise and is not suitable for practical application. The speech signal acquisition process is contaminated by a non-stationary noise (sparse, subject to laplacian distribution), which is not common, but does exist and has a large influence on the speech quality. If the non-stationary noise is treated as Gaussian noise in the speech enhancement, the speech enhancement quality is seriously reduced, and the subsequent speech semantic recognition is not facilitated.
Based on the above problems, it is necessary to provide an online speech enhancement technique capable of processing both gaussian noise and non-stationary noise in real time.
Disclosure of Invention
The invention aims to solve the technical problems that the existing Kalman filtering method cannot process the problems that AR parameters in a voice model cannot be updated in real time and non-stationary noise exists in the measurement process, and provides an online voice enhancement method suitable for the non-stationary noise environment by combining a convex optimization technology, so that the AR parameters and the non-stationary noise can be estimated online.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an online voice enhancement method suitable for a non-stationary noise environment comprises the following steps:
1) establishing a system model in a non-stationary noise environment
1.1) establishing an autoregressive AR model under the condition that Gaussian noise and sparse noise coexist
The generation process of the speech signal is an autoregressive process excited by white noise and output by an all-pole linear system, namely the current output is equal to the weighted sum of the excitation signal at the current moment and the outputs at p past moments, which is an autoregressive AR model and is expressed as follows:
s ( k ) = Σ i = 1 p a i s ( k - i ) + u ( k ) - - - ( 1 )
wherein u (k) is a Gaussian white noise excitation value at the moment k; s (k-i) is a voice signal at the (k-i) th moment; s (k) is the speech signal at the k-th moment; a isiIs the ith linear prediction coefficient, also called AR model parameter; p is the order of the AR model parameter;
establishing a voice signal model conforming to an actual measurement process, wherein the voice signal measurement process is described as follows:
Y(k)=s(k)+n(k)+v(k) (2)
wherein Y (k) is a measurement sequence of the voice signal at time k; s (k) is a speech signal at time k; n (k) is white Gaussian noise at time k; v (k) is non-stationary noise at the moment k, obeys Laplace distribution and has sparsity;
1.2) establishing a speech signal state space model
Converting equations (1) and (2) into a state space model, described as follows:
X(k)=FX(k-1)+p(k) (3)
Y(k)=CX(k)+n(k)+v(k) (4)
wherein,
F = 0 1 0 ... 0 0 0 1 ... 0 ... ... ... ... ... 0 0 0 ... 1 a p ( k ) a p - 1 ( k ) a p - 2 ( k ) a 1 ( k ) - - - ( 5 )
C=[0 0 ... 0 1](6)
X(k)=[S(k-p+1) … S(k)]T(7)
in the speech signal state equation (3) and the speech signal measurement equation (4), x (k) is a speech signal state estimation sequence at the time k, that is, an optimal state estimation of a speech signal; x (k-1) is a speech signal state estimation sequence at the (k-1) moment; y (k) is a measurement sequence of the speech signal at time k; f is a state transition matrix formed by linear prediction coefficients, and the last row [ a ] in Fp(k)… a1(k)]Referred to as AR parameters; c ═ 00 … 01]Is a measurement transfer matrix; p (k) is state noise at time k, obeying Gaussian distribution; n (k) is the measurement noise at time k, and follows Gaussian distribution; v (k) is the non-stationary noise at time k, subject to a laplacian distribution;
the state of the speech signal and the statistical properties of the measured noise p (k) and n (k) are:
E(p(k))=q,E(n(k))=r
E(p(k)p(j)T)=Qkj,E(n(k)n(j)T)=Rkj(8)
wherein q and r are mean values of noise p (k) and n (k), respectively; q and R are the covariance of noise p (k) and n (k), respectively;kjis a function of Kronecker; the speech enhancement problem is to estimate the optimal speech signal x (k) on the premise that the measured speech signal y (k) is known;
2) framing and windowing
The voice signal has short-time stationarity, and the voice signal is considered to be unchanged within 10-30 ms, so that the voice signal can be divided into a plurality of short sections for processing, namely framing, and the framing of the voice signal is realized by adopting a movable window with limited length for weighting; the number of frames per second is usually 33-100 frames, the framing method is an overlapped segmentation method, the overlapped part of a previous frame and a next frame is called frame shift, and the ratio of the frame shift to the frame length is 0-0.5;
3) system initialization
3.1) improved Kalman Filter parameter initialization
Initializing a speech signal state estimation sequence X (0/0) and a covariance matrix P (0/0), and ensuring that the covariance matrix is positive definite;
3.2) AR parameter initialization
Initializing an AR parameter state estimation sequence θ (0/0);
4) estimating AR parameters
The AR parameter refers to the last row [ a ] in the state transition matrix F in equation (3)p(k) … a1(k)]Mainly used for describing the voice generating process, and the accuracy of the voice generating process has direct influence on the voice enhancement result; the method proposes that a speech signal state estimation sequence X (k-1), state noise q (k), measurement noise n (k) and non-stationary noise v (k) are comprehensively considered in the estimation of the AR parameters, a new AR parameter estimation state space model is established, the on-line robust estimation of the AR parameters is realized, and the real-time estimation process of the AR parameters is as follows:
4.1) establishing a parameter estimation model of the AR parameters
The AR parameter model under the environment mixed by Gaussian noise and non-stationary noise is described as follows:
θ(k)=θ(k-1)+q(k)
Y(k)=Aθ(k)+r(k)+w(k) (9)
wherein θ (k) ═ ap(k) … a1(k)]TIs an AR parameter state sequence at the k moment; q (k) is state noise at the moment k, and follows Gaussian distribution, and the covariance matrix is Q (k); r (k) measuring noise at the time k, wherein the noise follows Gaussian distribution, and the covariance matrix is R (k); w (k) measuring noise at the time k, wherein the noise follows Gaussian distribution, and the covariance matrix is W (k); a ═ X (k-1)T=[S(k-p) …S(k-1)]Is a measurement matrix; y (k) is a measurement sequence of the speech signal at time k; the statistical properties of the state and measurement noise q (k) and r (k) are:
E(q(k))=d,E(r(k))=l
E(q(k)q(j)T)=Dkj,E(r(k)r(j)T)=Lkj(10)
wherein d and l are the mean values of the noise q (k) and r (k), respectively; d and L are the covariance of the noise q (k) and r (k), respectively;kjis a function of Kronecker;
4.2) reconstructing the conventional Kalman filtering problem from a convex optimization perspective
In order to conveniently estimate sparse noise, the kalman filtering problem needs to be reconstructed from the perspective of convex optimization, and a state space model of the conventional kalman filtering does not contain non-stationary noise w (k), as follows:
θ(k)=θ(k-1)+q(k)
Y(k)=Aθ(k)+r(k)(11)
according to the bayesian principle, the AR parameter estimation problem is expressed as estimating an optimal AR parameter sequence θ (k) on the premise that the measured data y (k) is known, that is:
p ( θ ( k ) | Y ( k ) ) = p ( Y ( k ) | θ ( k ) ) p ( θ ( k ) ) p ( Y ( k ) ) - - - ( 12 )
establishing a likelihood function of p (Y (k) | theta (k)) and p (theta (k)) according to the maximum likelihood estimation theory:
L 1 ( Y ( k ) , θ ( k ) ) = p ( θ ( k ) ) p ( r ( k ) ) p ( θ ( k ) ) = p ( r ( k ) ) = 1 ( 2 π ) m | L | 1 / 2 exp ( - 1 2 r T ( k ) L - 1 r ( k ) ) - - - ( 13 )
L 2 ( θ ( k ) ) = p ( θ ( k ) ) = 1 ( 2 π ) n | Σ | 1 / 2 exp ( - 1 2 ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) - - - ( 14 )
wherein Ψ isThe covariance matrix Ψ (k) ═ P of the conditional probability P (θ (k) | y (k)) is knownθ(k | k) + D (k), where Pθ(k | k) is a covariance update value; when likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) obtaining an optimal estimation value for the conditional probability p (y (k) | θ (k)) when the maximum value is obtained; observing the conditions L of the maximum likelihood function found by the equations (12) and (13)1(Z (k), X (k +1)) and L2(X (k +1)) corresponds to minimizing the exponential part of the power exponent in the likelihood functionAndthe following optimized form is thus obtained:
min i m i z e r T ( k ) L - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) )
subjiect to Y(k)=Aθ(k)+r(k) (15)
where θ (k) and r (k) are variables, Ψ (k) ═ Pθ(k | k) + D (k) is the covariance matrix of Gaussian noise; the estimated value of theta (k) isr (k) isIs an estimate of gaussian noise; pθ(k | k) is the covariance update matrix:
Pθ(k|k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) is the covariance prediction matrix:
Pθ(k|k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) to covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1(18)
4.3) constructing an optimization problem for non-stationary noise estimation from a convex optimization perspective
The non-stationary noise obeys Laplace distribution and has a sparse characteristic, the core idea of non-stationary noise estimation is to utilize the sparse characteristic of noise, and after the traditional Kalman filtering problem is converted into a convex optimization problem through step 4.2), the estimation of the sparse noise can be completed by adding the sparsity constraint of the non-stationary noise w (k) in the optimization, and the new optimization form is as follows:
min i m i z e r T ( k ) L - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) + λ | | w ( k ) | | 1
subjiect to Y(k)=Aθ(k)+r(k)+w(k) (19)
wherein w (k) is sparse noise, and the optimal estimation theta (k) of the AR parameters can be obtained by solving the optimization problem,the optimization problem represented by the formula (17) is a convex optimization problem and can be solved by using an interior point method in engineering;
5) estimating a speech signal state sequence
5.1) reconstructing the conventional Kalman filtering problem from a convex optimization perspective
In order to conveniently estimate sparse noise, the kalman filtering problem needs to be reconstructed from the perspective of convex optimization, and a state space model of the conventional kalman filtering is as follows:
X(k)=FX(k-1)+p(k) (20)
Y(k)=CX(k)+n(k) (21)
according to the bayesian principle, the kalman filtering problem is expressed as estimating an optimal speech state sequence x (k) on the premise that the measured data y (k) is known, that is:
p ( X ( k ) | Y ( k ) ) = p ( Y ( k ) | X ( k ) ) p ( X ( k ) p ( Y ( k ) ) - - - ( 22 )
establishing a likelihood function of p (Y (k) | X (k)) and p (X (k)) according to the maximum likelihood estimation theory:
L 1 ( Y ( k ) , X ( k ) ) = p ( X ( k ) ) p ( n ( k ) ) p ( X ( k ) ) = p ( W ( k ) ) = 1 ( 2 π ) m | R | 1 / 2 exp ( - 1 2 W T ( k ) R - 1 W ( k ) ) - - - ( 23 )
L 2 ( X ( k ) ) = p ( X ( k ) ) = 1 ( 2 π ) n | Σ | 1 / 2 exp ( - 1 2 ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) ) - - - ( 24 )
wherein, theta isThe covariance matrix Θ of the conditional probability p (x (k) Y (k-1)) in the known case is FP (k-1| k-1) FT+ Q (k-1), where P (k-1| k-1) is a covariance update value; when likelihood function condition L1(Y (k), X (k) and L2(x (k)) obtaining an optimal estimation value for the conditional probability p (x (k) y (k)) when the maximum value is obtained; the observation expressions (23) and (24) find the maximum likelihood function condition L1(Y (k), X (k) and L2(X (k)) corresponds to minimizing the exponential part of the power exponent in the likelihood functionAndthe following optimized form is thus obtained:
min i m i z e W T ( k ) R - 1 W ( k ) + ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) )
subjiect to Y(k)=CX(k)+n(k) (25)
where X (k) and n (k) are variables, and Θ is a covariance matrix of Gaussian noise; the estimated value of X (k) isn (k) is an estimate of gaussian noise;
p (k | k) is the covariance update matrix:
P(k|k)=(I-K(k)C(k))P(k|k-1) (26)
p (k | k-1) is the covariance prediction matrix:
P(k|k-1)=F(k-1)P(k-1|k-1)F(k-1)T+Q(k-1) (27)
Kθ(k) to covariance gain:
K(k)=P(k|k-1)CT(CP(k|k-1)CT+R(k-1))-1(28)
5.2) constructing the estimation problem of sparse noise from the convex optimization angle
The core idea of sparse noise estimation is that sparse characteristics of noise are utilized, and after the traditional Kalman filtering problem is converted into a convex optimization problem through the step 5.1), sparse noise n can be added in optimizations(k) The estimation of sparse noise is completed by the sparsity constraint of the following steps:
min i m i z e W T ( k ) R - 1 W ( k ) + ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) + λ | | v ( k ) | | 1
subjiect to Y(k)=CX(k)+n(k)+v(k) (29)
wherein v (k) is sparse noise, and the optimal estimation X (k) of the centroid position of the molten pool is obtained by solving the optimization problem, wherein X (k) is the optimal estimation of the state value in the traditional Kalman filteringThe optimization problem represented by the formula (29) is a convex optimization problem, and can be solved by using an interior point method in engineering;
5.3) after finishing the enhancement of the voice signal at the k moment, enhancing the resultAnd returning to the step 4) for updating the AR parameter theta (k +1) at the moment k +1, and then continuing to perform the speech enhancement at the moment k +1 to estimate X (k +1) until all speech signals are processed.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides a double-Kalman filtering frame aiming at the problem that AR parameters in a voice model (particularly an autoregressive AR model) can not be updated in real time along with noise change, two Kalman filters perform parallel operation, voice signal state estimation and AR parameter estimation are updated mutually, and a state estimation process and a parameter estimation process are performed alternately, so that the parameter estimation process can adapt to the noise change process, the accuracy of the system model is improved, and the voice enhancement performance is improved.
2. The invention provides an improved Kalman filtering framework by combining a convex optimization technology aiming at the problem that the traditional Kalman filtering algorithm cannot process non-stationary noise. The new algorithm adds Gaussian noise and non-stationary noise items in the voice enhancement model in the measurement process, and can accurately estimate the Gaussian noise and the non-stationary noise by establishing a reasonable optimization model by using a convex optimization technology, so that the accuracy of voice enhancement is improved.
Drawings
FIG. 1 is a flow chart of a method of speech enhancement under non-stationary noise.
FIG. 2a is a diagram of an original speech signal.
FIG. 2b is a diagram of a speech signal with white Gaussian noise.
FIG. 2c is a diagram of a speech signal with white Gaussian noise and non-stationary noise.
FIG. 3 is a flow chart of a speech enhancement algorithm based on dual modified Kalman filtering.
Fig. 4a is an original speech signal.
FIG. 4b is a diagram illustrating the speech enhancement result.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the online speech enhancement method applicable to a non-stationary noise environment according to this embodiment includes the following steps:
1) establishing a system model in a non-stationary noise environment
1.1) establishing an autoregressive AR model under the condition that Gaussian noise and sparse noise coexist
The process of generating a speech signal can be described as a self-recursive process excited by white noise and output via an all-pole linear system, i.e. the current output is equal to the weighted sum of the excitation signal at the present moment and the outputs at the past p moments, which is an autoregressive AR model represented as follows
s ( k ) = Σ i = 1 p a i s ( k - i ) + u ( k ) - - - ( 1 )
Wherein u (k) is a Gaussian white noise excitation value at the moment k; s (k-i) is a voice signal at the (k-i) th moment; s (k) is the speech signal at the k-th moment; a isiIs the ith linear prediction coefficient, also called AR model parameter; p is the order of the AR model parameters.
As shown in fig. 2a, 2b, and 2c, a speech signal observed in a real environment is polluted by various noises, especially non-stationary noises. The speech signal measurement process of the present invention can be described as follows:
Y(k)=s(k)+n(k)+v(k) (2)
wherein Y (k) is a measurement sequence of the voice signal at time k; s (k) is a speech signal at time k; n (k) is white Gaussian noise at time k; v (k) is non-stationary noise at the time k, follows Laplace distribution, and has sparsity.
1.2) establishing a speech signal state space model
Converting equations (1) and (2) into a state space model, the following can be described:
X(k)=FX(k-1)+p(k) (3)
Y(k)=CX(k)+n(k)+v(k) (4)
wherein
F = 0 1 0 ... 0 0 0 1 ... 0 ... ... ... ... ... 0 0 0 ... 1 a p ( k ) a p - 1 ( k ) a p - 2 ( k ) a 1 ( k ) - - - ( 5 )
C=[0 0 … 0 1](6)
X(k)=[S(k-p+1) … S(k)]T(7)
In the speech signal state equation (3) and the speech signal measurement equation (4), x (k) is a speech signal state estimation sequence at the time k, that is, an optimal state estimation of a speech signal; x (k-1) is a speech signal state estimation sequence at the (k-1) moment; y (k) is a measurement sequence of the speech signal at time k; f is a state transition matrix formed by linear prediction coefficients, and the last row [ a ] in Fp(k)… a1(k)]Referred to as AR parameters. (ii) a C ═ 00 … 01]Is a measurement transfer matrix; p (k) is state noise at time k, obeying Gaussian distribution; n (k) is the measurement noise at time k, and follows Gaussian distribution; v (k) is the non-stationary noise at time k, obeying the laplacian distribution.
The state of the speech signal and the statistical properties of the measured noise p (k) and n (k) are:
E(p(k))=q,E(n(k))=r
E(p(k)p(j)T)=Qkj,E(n(k)n(j)T)=Rkj(8)
wherein q and r are mean values of noise p (k) and n (k), respectively; q and R are the covariance of the noise p (k) and n (k), respectively.kjAs a function of Kronecker. The speech enhancement problem is to estimate the optimal speech signal x (k) given the measured speech signal y (k).
2) Framing and windowing
The voice signal has short-time stationarity (the voice signal can be considered to be approximately unchanged within 10-30 ms), so that the voice signal can be divided into a plurality of short sections for processing, namely framing, and framing of the voice signal is realized by adopting a movable window with limited length for weighting. The number of frames per second is generally about 33 to 100 frames. A common framing method is an overlapping segmentation method, the overlapping part of a previous frame and a next frame is called frame shift, and the ratio of the frame shift to the frame length is generally 0-0.5. In the invention, the frame length is 25ms, and the frame shift is 10 ms.
3) System initialization
3.1) improved Kalman Filter parameter initialization
The speech signal state estimation sequence X (0/0), covariance matrix P (0/0) are initialized, ensuring that the covariance matrix is positive.
3.2) AR parameter initialization
The state estimation sequence θ of the AR parameter is initialized (0/0), and the order of the AR parameter is 13 (empirically set) in the present invention.
4) Estimating AR parameters
The AR parameter refers to the last row [ a ] in the state transition matrix F in equation (3)p(k) … a1(k)]Mainly used for describing the voice generating process, and the accuracy of the voice generating process has direct influence on the voice enhancement result. In practical application, AR parameter estimation is greatly influenced by a voice signal and various noises, so that the invention provides a new AR parameter estimation state space model established by comprehensively considering a voice signal state estimation sequence X (k-1), state noise q (k), measurement noise n (k), non-stationary noise v (k) and the like in the estimation of the AR parameter, and the invention is a core point of the invention. As shown in fig. 3, the real-time estimation process for AR parameters is as follows:
4.1) establishing a parameter estimation model of the AR parameters
The AR parameter model under the environment mixed by Gaussian noise and non-stationary noise is described as follows:
θ(k)=θ(k-1)+q(k)
Y(k)=Aθ(k)+r(k)+w(k) (9)
wherein θ (k) ═ ap(k) … a1(k)]TIs an AR parameter state sequence at the k moment; q (k) is state noise at the moment k, and follows Gaussian distribution, and the covariance matrix is Q (k); r (k) k time measurement noiseSound, obeying a gaussian distribution with a covariance matrix r (k); w (k) measuring noise at the time k, wherein the noise follows Gaussian distribution, and the covariance matrix is W (k); a ═ X (k-1)T=[S(k-p) …S(k-1)]Is a measurement matrix; y (k) is a measurement sequence of the speech signal at time k. The statistical properties of the state and measurement noise q (k) and r (k) are:
E(q(k))=d,E(r(k))=l
E(q(k)q(j)T)=Dkj,E(r(k)r(j)T)=Lkj(10)
wherein d and l are the mean values of the noise q (k) and r (k), respectively; d and L are the covariance of the noise q (k) and r (k), respectively.kjAs a function of Kronecker.
4.2) reconstructing the conventional Kalman filtering problem from a convex optimization perspective
In order to be able to estimate the sparse noise conveniently, the kalman filtering problem needs to be reconstructed from the perspective of convex optimization. The state space model of conventional kalman filtering (without non-stationary noise w (k)) is as follows:
θ(k)=θ(k-1)+q(k)
Y(k)=Aθ(k)+r(k) (11)
according to the bayesian principle, the AR parameter estimation problem can be expressed as estimating an optimal AR parameter sequence θ (k) on the premise that the measured data y (k) is known, that is:
p ( θ ( k ) | Y ( k ) ) = p ( Y ( k ) | θ ( k ) ) p ( θ ( k ) ) p ( Y ( k ) ) - - - ( 12 )
establishing a likelihood function of p (Y (k) | theta (k)) and p (theta (k)) according to the maximum likelihood estimation theory:
L 1 ( Y ( k ) , θ ( k ) ) = p ( θ ( k ) ) p ( r ( k ) ) p ( θ ( k ) ) = p ( r ( k ) ) = 1 ( 2 π ) m | L | 1 / 2 exp ( - 1 2 r T ( k ) L - 1 r ( k ) ) - - - ( 13 )
L 2 ( θ ( k ) ) = p ( θ ( k ) ) = 1 ( 2 π ) n | Σ | 1 / 2 exp ( - 1 2 ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) - - - ( 14 )
wherein Ψ isThe covariance matrix Ψ (k) ═ P of the conditional probability P (θ (k) | y (k)) is knownθ(k | k) + D (k) (wherein Pθ(k | k) is a covariance update value). When likelihood function condition L1(Y (k), θ (k)) and L2When the conditional probability p (y (k) is equal to or greater than (θ (k)), the conditional probability p (y (k)) is equal to or greater than (θ (k)). The conditions L for the maximum likelihood function can be found by observing the equations (12) and (13)1(Z (k), X (k +1)) and L2(X (k +1)) corresponds to minimizing the exponential part of the power exponent in the likelihood functionAndthe following optimized form can thus be obtained:
min i m i z e r T ( k ) L - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) )
subjiect to Y(k)=Aθ(k)+r(k) (15)
where θ (k) and r (k) are variables, Ψ (k) ═ Pθ(k | k) + D (k) is the covariance matrix of Gaussian noise. The estimated value of theta (k) isr (k) is an estimate of gaussian noise. Pθ(k | k) is the covariance update matrix:
Pθ(k|k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) is the covariance prediction matrix:
Pθ(k|k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) to covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1(18)
4.3) constructing an optimization problem for non-stationary noise estimation from a convex optimization perspective
The non-stationary noise obeys Laplace distribution and has a sparse characteristic, the core idea of non-stationary noise estimation is to utilize the sparse characteristic of noise, and after the traditional Kalman filtering problem is converted into a convex optimization problem through step 4.2), the estimation of the sparse noise can be completed by adding the sparsity constraint of the non-stationary noise w (k) in the optimization, and the new optimization form is as follows:
min i m i z e r T ( k ) L - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) + λ | | w ( k ) | | 1
subjiect to Y(k)=Aθ(k)+r(k)+w(k) (19)
where w (k) is sparse noise, the optimal estimate θ (k) of the AR parameters can be obtained by solving the above optimization problem (note:) The optimization problem represented by the formula (17) is a convex optimization problem, and can be solved by using a relatively mature interior point method in engineering.
5) A sequence of speech signal states is estimated.
In the voice signal acquisition process, non-stationary noise has a large influence on the voice quality. In order to be able to improve speech quality, the speech enhancement algorithm must be able to cope with both gaussian and non-stationary noise mixing. The non-stationary noise generally obeys Laplace distribution and has a sparse characteristic, and the estimation of the non-stationary noise mainly utilizes the sparse characteristic of the noise. In order to introduce noise sparsity constraint in the optimization problem, firstly, the traditional Kalman filtering problem is reconstructed into a convex optimization problem by adopting a convex optimization technology, then sparsity constraint on sparse noise is introduced in newly constructed optimization, and finally, a voice enhancement task is completed, which is another core point of the invention.
5.1) reconstructing the conventional Kalman filtering problem from a convex optimization perspective
In order to be able to estimate the sparse noise conveniently, the kalman filtering problem needs to be reconstructed from the perspective of convex optimization. The state space model of the conventional kalman filter is as follows:
X(k)=FX(k-1)+p(k) (20)
Y(k)=CX(k)+n(k) (21)
according to the bayesian principle, the kalman filtering problem can be expressed as estimating an optimal speech state sequence x (k) on the premise that the measured data y (k) is known, that is:
p ( X ( k ) | Y ( k ) ) = p ( Y ( k ) | X ( k ) ) p ( X ( k ) p ( Y ( k ) ) - - - ( 22 )
establishing a likelihood function of p (Y (k) | X (k)) and p (X (k)) according to the maximum likelihood estimation theory:
L 1 ( Y ( k ) , X ( k ) ) = p ( X ( k ) ) p ( n ( k ) ) p ( X ( k ) ) = p ( W ( k ) ) = 1 ( 2 π ) m | R | 1 / 2 exp ( - 1 2 W T ( k ) R - 1 W ( k ) ) - - - ( 23 )
L 2 ( X ( k ) ) = p ( X ( k ) ) = 1 ( 2 π ) n | Σ | 1 / 2 exp ( - 1 2 ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) ) - - - ( 24 )
wherein, theta isThe covariance matrix Θ of the conditional probability p (x (k) Y (k-1)) in the known case is FP (k-1| k-1) FT+ Q (k-1) (where P (k-1| k-1) is the covariance update value). When likelihood function condition L1(Y (k), X (k) and L2When (x (k)) has a maximum value, the conditional probability p (x (k)) y (k)) has an optimum estimated value. The conditions L for the maximum likelihood function can be found by observing the equations (23) and (24)1(Y (k), X (k) and L2(X (k)) corresponds to minimizing the exponential part of the power exponent in the likelihood functionAndthe following optimized form can thus be obtained:
min i m i z e W T ( k ) R - 1 W ( k ) + ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) )
subjiect to Y(k)=CX(k)+n(k) (25)
where X (k) and n (k) are variables and Θ is the covariance matrix of Gaussian noise. The estimated value of X (k) isn (k) is an estimate of gaussian noise.
P (k | k) is the covariance update matrix:
P(k|k)=(I-K(k)C(k))P(k|k-1) (26)
p (k | k-1) is the covariance prediction matrix:
P(k|k-1)=F(k-1)P(k-1|k-1)F(k-1)T+Q(k-1) (27)
Kθ(k) to covariance gain:
K(k)=P(k|k-1)CT(CP(k|k-1)CT+R(k-1))-1(28)
5.2) constructing the estimation problem of sparse noise from the convex optimization angle
The core idea of sparse noise estimation is to utilize the sparse characteristic of noise, and after the traditional Kalman filtering problem is converted into the convex optimization problem through the step 5.1), the sparse noise n can be increased in optimizations(k) The estimation of sparse noise is completed by the sparsity constraint of the following steps:
min i m i z e W T ( k ) R - 1 W ( k ) + ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) + λ | | v ( k ) | | 1
subjiect to Y(k)=CX(k)+n(k)+v(k) (29)
wherein v (k) is sparse noise, and the optimal estimation X (k) of the centroid position of the molten pool can be obtained by solving the optimization problem (note: X (k)) is the optimal estimation of the state value in the traditional Kalman filtering) The optimization problem represented by the formula (29) is oneThe convex optimization problem can be solved by using a mature interior point method in engineering.
5.3) after finishing the enhancement of the voice signal at the k moment, enhancing the resultAnd returning to the step 4) for updating the AR parameter theta (k +1) at the moment k +1, and then continuing to perform the speech enhancement at the moment k +1 to estimate X (k +1) until all speech signals are processed.
As shown in fig. 4a and 4b, the method provided by the present invention can accurately filter gaussian noise and non-stationary noise, and enhance the original speech signal.
The invention can accurately estimate and filter white noise and non-stationary noise, realize voice enhancement under the mixing of the white noise and the non-stationary noise, and simultaneously provide a purer estimated voice signal and provide front-end support for improving the accuracy of voice recognition.
Because the two robust Kalman filtering models are established, the generation process model of the voice signal is subjected to mathematical modeling, the short-time characteristic and the time-varying characteristic of the voice are considered in a targeted manner, the AR parameter estimation adopts dynamic real-time updating iteration, the requirement of the time-varying characteristic of the parameter is met, the voice signal can be estimated by each frame through state estimation, the short-time stability characteristic of the voice is utilized, the filtering effect is superior to that of the traditional Kalman filtering, and the method is worthy of popularization.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that any changes made in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (1)

1. An online voice enhancement method suitable for a non-stationary noise environment, comprising the steps of:
1) establishing a system model in a non-stationary noise environment
1.1) establishing an autoregressive AR model under the condition that Gaussian noise and sparse noise coexist
The generation process of the speech signal is an autoregressive process excited by white noise and output by an all-pole linear system, namely the current output is equal to the weighted sum of the excitation signal at the current moment and the outputs at p past moments, which is an autoregressive AR model and is expressed as follows:
s ( k ) = Σ i = 1 p a i s ( k - i ) + u ( k ) - - - ( 1 )
wherein u (k) is a Gaussian white noise excitation value at the moment k; s (k-i) is a voice signal at the (k-i) th moment; s (k) is the speech signal at the k-th moment; a isiIs the ith linear prediction coefficient, also called AR model parameter; p is the order of the AR model parameter;
establishing a voice signal model conforming to an actual measurement process, wherein the voice signal measurement process is described as follows:
Y(k)=s(k)+n(k)+v(k) (2)
wherein Y (k) is a measurement sequence of the voice signal at time k; s (k) is a speech signal at time k; n (k) is white Gaussian noise at time k; v (k) is non-stationary noise at the moment k, obeys Laplace distribution and has sparsity;
1.2) establishing a speech signal state space model
Converting equations (1) and (2) into a state space model, described as follows:
X(k)=FX(k-1)+p(k) (3)
Y(k)=CX(k)+n(k)+v(k) (4)
wherein,
F = 0 1 0 ... 0 0 0 1 ... 0 ... ... ... ... ... 0 0 0 ... 1 a p ( k ) a p - 1 ( k ) a p - 2 ( k ) a 1 ( k ) - - - ( 5 )
C=[0 0 ... 0 1](6)
X(k)=[S(k-p+1) ... S(k)]T(7)
in the speech signal state equation (3) and the speech signal measurement equation (4), x (k) is a speech signal state estimation sequence at the time k, that is, an optimal state estimation of a speech signal; x (k-1) is a speech signal state estimation sequence at the (k-1) moment; y (k) is a measurement sequence of the speech signal at time k; f is a state transition matrix formed by linear prediction coefficients, and the last row [ a ] in Fp(k) … a1(k)]Referred to as AR parameters; c ═ 00.. 01]Is a measurement transfer matrix; p (k) is state noise at time k, obeying Gaussian distribution; n (k) is the measurement noise at time k, and follows Gaussian distribution; v (k) is the non-stationary noise at time k, subject to a laplacian distribution;
the state of the speech signal and the statistical properties of the measured noise p (k) and n (k) are:
E(p(k))=q,E(n(k))=r
E(p(k)p(j)T)=Qkj,E(n(k)n(j)T)=Rkj(8)
wherein q and r are mean values of noise p (k) and n (k), respectively; q and R are the covariance of noise p (k) and n (k), respectively;kjis a function of Kronecker; the speech enhancement problem is to estimate the optimal speech signal on the premise that the measured speech signal Y (k) is knownSignal x (k);
2) framing and windowing
The voice signal has short-time stationarity, and the voice signal is considered to be unchanged within 10-30 ms, so that the voice signal can be divided into a plurality of short sections for processing, namely framing, and the framing of the voice signal is realized by adopting a movable window with limited length for weighting; the number of frames per second is usually 33-100 frames, the framing method is an overlapped segmentation method, the overlapped part of a previous frame and a next frame is called frame shift, and the ratio of the frame shift to the frame length is 0-0.5;
3) system initialization
3.1) improved Kalman Filter parameter initialization
Initializing a speech signal state estimation sequence X (0/0) and a covariance matrix P (0/0), and ensuring that the covariance matrix is positive definite;
3.2) AR parameter initialization
Initializing an AR parameter state estimation sequence θ (0/0);
4) estimating AR parameters
The AR parameter refers to the last row [ a ] in the state transition matrix F in equation (3)p(k) … a1(k)]Mainly used for describing the voice generating process, and the accuracy of the voice generating process has direct influence on the voice enhancement result; the method proposes that a speech signal state estimation sequence X (k-1), state noise q (k), measurement noise n (k) and non-stationary noise v (k) are comprehensively considered in the estimation of the AR parameters, a new AR parameter estimation state space model is established, the on-line robust estimation of the AR parameters is realized, and the real-time estimation process of the AR parameters is as follows:
4.1) establishing a parameter estimation model of the AR parameters
The AR parameter model under the environment mixed by Gaussian noise and non-stationary noise is described as follows:
θ(k)=θ(k-1)+q(k)
Y(k)=Aθ(k)+r(k)+w(k) (9)
wherein θ (k) ═ ap(k) … a1(k)]TIs an AR parameter state sequence at the k moment; q (k) is state noise at the moment k, and follows Gaussian distribution, and the covariance matrix is Q (k); r (k) measuring noise at the time k, wherein the noise follows Gaussian distribution, and the covariance matrix is R (k); w (k)) Measuring noise at the time k, and obeying Gaussian distribution, wherein a covariance matrix is W (k); a ═ X (k-1)T=[S(k-p)...S(k-1)]Is a measurement matrix; y (k) is a measurement sequence of the speech signal at time k; the statistical properties of the state and measurement noise q (k) and r (k) are:
E(q(k))=d,E(r(k))=l
E(q(k)q(j)T)=Dkj,E(r(k)r(j)T)=Lkj(10)
wherein d and l are the mean values of the noise q (k) and r (k), respectively; d and L are the covariance of the noise q (k) and r (k), respectively;kjis a function of Kronecker;
4.2) reconstructing the conventional Kalman filtering problem from a convex optimization perspective
In order to conveniently estimate sparse noise, the kalman filtering problem needs to be reconstructed from the perspective of convex optimization, and a state space model of the conventional kalman filtering does not contain non-stationary noise w (k), as follows:
θ(k)=θ(k-1)+q(k)
Y(k)=Aθ(k)+r(k) (11)
according to the bayesian principle, the AR parameter estimation problem is expressed as estimating an optimal AR parameter sequence θ (k) on the premise that the measured data y (k) is known, that is:
p ( θ ( k ) | Y ( k ) ) = p ( Y ( k ) | θ ( k ) ) p ( θ ( k ) ) p ( Y ( k ) ) - - - ( 12 )
establishing a likelihood function of p (Y (k) | theta (k)) and p (theta (k)) according to the maximum likelihood estimation theory:
L 1 ( Y ( k ) , θ ( k ) ) = p ( θ ( k ) ) p ( r ( k ) ) p ( θ ( k ) ) = p ( r ( k ) ) = 1 ( 2 π ) m | L | 1 / 2 exp ( - 1 2 r T ( k ) L - 1 r ( k ) ) - - - ( 13 )
L 2 ( θ ( k ) ) = p ( θ ( k ) ) = 1 ( 2 π ) n | Σ | 1 / 2 exp ( - 1 2 ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) - - - ( 14 )
wherein Ψ isThe covariance matrix Ψ (k) ═ P of the conditional probability P (θ (k) | y (k)) is knownθ(k | k) + D (k), where Pθ(k | k) is a covariance update value; when likelihood function condition L1(Y (k), θ (k)) and L2(θ (k)) obtaining an optimal estimation value for the conditional probability p (y (k) | θ (k)) when the maximum value is obtained; observing the conditions L of the maximum likelihood function found by the equations (12) and (13)1(Z (k), X (k +1)) and L2(X (k +1)) corresponds to minimizing the exponential part of the power exponent in the likelihood functionAndthe following optimized form is thus obtained:
min i m i z e r T ( k ) L - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ ( k ) - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) s u b j i e c t t o Y ( k ) = A θ ( k ) + r ( k ) - - - ( 15 )
where θ (k) and r (k) are variables, Ψ (k) ═ Pθ(k | k) + D (k) is the covariance matrix of Gaussian noise; the estimated value of theta (k) isr (k) is an estimate of gaussian noise; pθ(k | k) is the covariance update matrix:
Pθ(k|k)=(I-Kθ(k)A(k))Pθ(k|k-1) (16)
Pθ(k | k-1) is the covariance prediction matrix:
Pθ(k|k-1)=Pθ(k-1|k-1)+D(k-1) (17)
Kθ(k) to covariance gain:
Kθ(k)=Pθ(k|k-1)AT(APθ(k|k-1)AT+L(k-1))-1(18)
4.3) constructing an optimization problem for non-stationary noise estimation from a convex optimization perspective
The non-stationary noise obeys Laplace distribution and has a sparse characteristic, the core idea of non-stationary noise estimation is to utilize the sparse characteristic of noise, and after the traditional Kalman filtering problem is converted into a convex optimization problem through step 4.2), the estimation of the sparse noise can be completed by adding the sparsity constraint of the non-stationary noise w (k) in the optimization, and the new optimization form is as follows:
min i m i z e r T ( k ) L - 1 r ( k ) + ( θ ( k ) - θ ^ ( k | k - 1 ) ) T Ψ - 1 ( θ ( k ) - θ ^ ( k | k - 1 ) ) + λ | | w ( k ) | | 1 s u b j i e c t t o Y ( k ) = A θ ( k ) + r ( k ) + w ( k ) - - - ( 19 )
wherein w (k) is sparse noise, and the optimal estimation theta (k) of the AR parameters can be obtained by solving the optimization problem,the optimization problem represented by the formula (17) is a convex optimization problem and can be solved by using an interior point method in engineering;
5) estimating a speech signal state sequence
5.1) reconstructing the conventional Kalman filtering problem from a convex optimization perspective
In order to conveniently estimate sparse noise, the kalman filtering problem needs to be reconstructed from the perspective of convex optimization, and a state space model of the conventional kalman filtering is as follows:
X(k)=FX(k-1)+p(k) (20)
Y(k)=CX(k)+n(k) (21)
according to the bayesian principle, the kalman filtering problem is expressed as estimating an optimal speech state sequence x (k) on the premise that the measured data y (k) is known, that is:
p ( X ( k ) | Y ( k ) ) = p ( Y ( k ) | X ( k ) ) p ( X ( k ) ) p ( Y ( k ) ) - - - ( 22 )
establishing a likelihood function of p (Y (k) | X (k)) and p (X (k)) according to the maximum likelihood estimation theory:
L 1 ( Y ( k ) , X ( k ) ) = p ( X ( k ) ) p ( n ( k ) ) p ( X ( k ) ) = p ( W ( k ) ) = 1 ( 2 π ) m | R | 1 / 2 exp ( - 1 2 W T ( k ) R - 1 W ( k ) ) - - - ( 23 )
L 2 ( X ( k ) ) = p ( X ( k ) ) = 1 ( 2 π ) n | Σ | 1 / 2 exp ( - 1 2 ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) ) - - - ( 24 )
wherein, theta isThe covariance matrix Θ of the conditional probability p (x (k) Y (k-1)) in the known case is FP (k-1| k-1) FT+ Q (k-1), where P (k-1| k-1) is a covariance update value; when likelihood function condition L1(Y (k), X (k) and L2(x (k)) obtaining an optimal estimation value for the conditional probability p (x (k) y (k)) when the maximum value is obtained; the observation expressions (23) and (24) find the maximum likelihood function condition L1(Y (k), X (k) and L2(X (k)) corresponds to minimizing the exponential part of the power exponent in the likelihood functionAndthe following optimized form is thus obtained:
min i m i z e W T ( k ) R - 1 W ( k ) + ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) s u b j i e c t t o Y ( k ) = C X ( k ) + n ( k ) - - - ( 25 )
where X (k) and n (k) are variables, and Θ is a covariance matrix of Gaussian noise; the estimated value of X (k) isn (k) is an estimate of gaussian noise;
p (k | k) is the covariance update matrix:
P(k|k)=(I-K(k)C(k))P(k|k-1) (26)
p (k | k-1) is the covariance prediction matrix:
P(k|k-1)=F(k-1)P(k-1|k-1)F(k-1)T+Q(k-1) (27)
Kθ(k) to covariance gain:
K(k)=P(k|k-1)CT(CP(k|k-1)CT+R(k-1))-1(28)
5.2) constructing the estimation problem of sparse noise from the convex optimization angle
The core idea of sparse noise estimation is that sparse characteristics of noise are utilized, and after the traditional Kalman filtering problem is converted into a convex optimization problem through the step 5.1), sparse noise n can be added in optimizations(k) The estimation of sparse noise is completed by the sparsity constraint of the following steps:
min i m i z e W T ( k ) R - 1 W ( k ) + ( X ( k ) - X ^ ( k | k - 1 ) ) T Θ - 1 ( X ( k ) - X ^ ( k | k - 1 ) ) + λ | | v ( k ) | | 1 s u b j i e c t t o Y ( k ) = C X ( k ) + n ( k ) + v ( k ) - - - ( 29 )
wherein v (k) is sparse noise, and the optimal estimation X (k) of the centroid position of the molten pool is obtained by solving the optimization problem, wherein X (k) is the optimal estimation of the state value in the traditional Kalman filteringThe optimization problem represented by the formula (29) is a convex optimization problem, and can be solved by using an interior point method in engineering;
5.3) after finishing the enhancement of the voice signal at the k moment, enhancing the resultAnd returning to the step 4) for updating the AR parameter theta (k +1) at the moment k +1, and then continuing to perform the speech enhancement at the moment k +1 to estimate X (k +1) until all speech signals are processed.
CN201610843483.0A 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise Expired - Fee Related CN106340304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610843483.0A CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610843483.0A CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Publications (2)

Publication Number Publication Date
CN106340304A true CN106340304A (en) 2017-01-18
CN106340304B CN106340304B (en) 2019-09-06

Family

ID=57840174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610843483.0A Expired - Fee Related CN106340304B (en) 2016-09-23 2016-09-23 A kind of online sound enhancement method under the environment suitable for nonstationary noise

Country Status (1)

Country Link
CN (1) CN106340304B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248212A (en) * 2019-05-27 2019-09-17 上海交通大学 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system
CN110648680A (en) * 2019-09-23 2020-01-03 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN112557925A (en) * 2020-11-11 2021-03-26 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305345A1 (en) * 2009-02-03 2011-12-15 University Of Ottawa Method and system for a multi-microphone noise reduction
CN102890935A (en) * 2012-10-22 2013-01-23 北京工业大学 Robust speech enhancement method based on fast Kalman filtering
CN103323815A (en) * 2013-03-05 2013-09-25 上海交通大学 Underwater acoustic locating method based on equivalent sound velocity
CN103903630A (en) * 2014-03-18 2014-07-02 北京捷通华声语音技术有限公司 Method and device used for eliminating sparse noise

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110305345A1 (en) * 2009-02-03 2011-12-15 University Of Ottawa Method and system for a multi-microphone noise reduction
CN102890935A (en) * 2012-10-22 2013-01-23 北京工业大学 Robust speech enhancement method based on fast Kalman filtering
CN103323815A (en) * 2013-03-05 2013-09-25 上海交通大学 Underwater acoustic locating method based on equivalent sound velocity
CN103903630A (en) * 2014-03-18 2014-07-02 北京捷通华声语音技术有限公司 Method and device used for eliminating sparse noise

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯宝: "基于凸优化技术的改进型卡尔曼滤波算法", 《自动化与信息工程》 *
吴飞: "一种具有在线参数调整功能的Kalman滤波及其应用", 《计算机工程与科学》 *
吴飞: "鲁棒卡尔曼算法及其应用研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110248212A (en) * 2019-05-27 2019-09-17 上海交通大学 360 degree of video stream server end code rate adaptive transmission methods of multi-user and system
CN110248212B (en) * 2019-05-27 2020-06-02 上海交通大学 Multi-user 360-degree video stream server-side code rate self-adaptive transmission method and system
CN110648680A (en) * 2019-09-23 2020-01-03 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN110648680B (en) * 2019-09-23 2024-05-14 腾讯科技(深圳)有限公司 Voice data processing method and device, electronic equipment and readable storage medium
CN112557925A (en) * 2020-11-11 2021-03-26 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device
CN112557925B (en) * 2020-11-11 2023-05-05 国联汽车动力电池研究院有限责任公司 Lithium ion battery SOC estimation method and device

Also Published As

Publication number Publication date
CN106340304B (en) 2019-09-06

Similar Documents

Publication Publication Date Title
CN108682418B (en) Speech recognition method based on pre-training and bidirectional LSTM
CN111261183B (en) Method and device for denoising voice
CN110634502B (en) Single-channel voice separation algorithm based on deep neural network
Hu et al. A generalized subspace approach for enhancing speech corrupted by colored noise
CN111860273B (en) Magnetic resonance underground water detection noise suppression method based on convolutional neural network
CN105957537B (en) One kind being based on L1/2The speech de-noising method and system of sparse constraint convolution Non-negative Matrix Factorization
US8296135B2 (en) Noise cancellation system and method
Saleem et al. Deepresgru: residual gated recurrent neural network-augmented kalman filtering for speech enhancement and recognition
CN106340304B (en) A kind of online sound enhancement method under the environment suitable for nonstationary noise
CN111816200B (en) Multi-channel speech enhancement method based on time-frequency domain binary mask
CN110808057A (en) Voice enhancement method for generating confrontation network based on constraint naive
CN115223583A (en) Voice enhancement method, device, equipment and medium
CN112086100A (en) Quantization error entropy based urban noise identification method of multilayer random neural network
CN115171712A (en) Speech enhancement method suitable for transient noise suppression
Talmon et al. Clustering and suppression of transient noise in speech signals using diffusion maps
Li et al. A Convolutional Neural Network with Non-Local Module for Speech Enhancement.
CN112580451A (en) Data noise reduction method based on improved EMD and MED
CN103903630A (en) Method and device used for eliminating sparse noise
CN108573698B (en) Voice noise reduction method based on gender fusion information
Meutzner et al. A generative-discriminative hybrid approach to multi-channel noise reduction for robust automatic speech recognition
Bu et al. A Probability Weighted Beamformer for Noise Robust ASR.
CN112652321B (en) Deep learning phase-based more friendly voice noise reduction system and method
CN113066483B (en) Sparse continuous constraint-based method for generating countermeasure network voice enhancement
Khalil et al. Enhancement of speech signals using multiple statistical models
Deng et al. Recursive noise estimation using iterative stochastic approximation for stereo-based robust speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190906