CN100498935C - Variation Bayesian voice strengthening method based on voice generating model - Google Patents

Variation Bayesian voice strengthening method based on voice generating model Download PDF

Info

Publication number
CN100498935C
CN100498935C CNB2006100283311A CN200610028331A CN100498935C CN 100498935 C CN100498935 C CN 100498935C CN B2006100283311 A CNB2006100283311 A CN B2006100283311A CN 200610028331 A CN200610028331 A CN 200610028331A CN 100498935 C CN100498935 C CN 100498935C
Authority
CN
China
Prior art keywords
model
distribution
speech
production model
exponent number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100283311A
Other languages
Chinese (zh)
Other versions
CN1870136A (en
Inventor
黄青华
杨杰
薛云峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CNB2006100283311A priority Critical patent/CN100498935C/en
Publication of CN1870136A publication Critical patent/CN1870136A/en
Application granted granted Critical
Publication of CN100498935C publication Critical patent/CN100498935C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

A method for intensifying variational Bayes voice based on voice generating model includes setting up a noising voice model and state space equation of voice generating model, expressing a noising course and probability distribution, applying approximate posteriori distribution to approximate parameter of voice generating model and probability distribution of pure voice according to variational Bayes method to obtain parameter update equality of those approximate posteriori distribution and updating equality with cyclic iteration till algorithm convergence.

Description

Variation Bayes sound enhancement method based on speech production model
Technical field
The present invention relates to a kind of variation Bayes sound enhancement method, can be widely used in aspects such as speech communication and speech recognition, belong to field of voice signal based on speech production model.
Background technology
Actual voice capture device and voice collecting environment can not obtain pure voice down, voice can be by the diversity of settings noise pollution, therefore in speech communication and speech recognition etc. are used, it is very important that voice are strengthened as a pre-service link, and the voice after the enhancing can better guarantee the accuracy that subsequent voice is handled.
For improving voice quality, existing sound enhancement method mainly contains following several:
First method is a threshold method, and its ultimate principle thinks that the less part of amplitude absolute value mainly is a noise in the signal, further compresses this part signal by a kind of linearity or non-linear compression function and reaches the purpose that voice strengthen.When being compression noise, the major defect of this algorithm also compressed a lot of useful voice messagings.
Second method is a spectrum-subtraction, suppose that noise is stably or the additive noise that becomes when slow, and suppose that voice signal and noise are under the separate condition, to deduct the power spectrum of noise from the power spectrum of noisy speech, thereby obtain comparatively pure voice spectrum.But it is exactly to have the not naturetone that is called " music " noise in the voice signal after strengthening that this method has a well-known shortcoming, and then makes people's ear subjective sensation uncomfortable.
The third method is based on the enhancement algorithms of speech production model, this algorithm is owing to the parameter of " pure " speech model can't accurately be estimated, so can only adopt direct estimation model parameter from signals and associated noises, inaccurate if model is estimated, strengthen back intelligibility of speech variation.Therefore estimation model parameter and model order are the keys of this method accurately from the voice that contain noise.(S.Gannot such as Gannot, D.Burshteinand E.Weinstein, Iterative and Sequential Kalman Filter-Based Speech EnhancementAlgorithms, IEEE Trans.Speech and Audio Processing, vol.6, No.4, July 1998, pp.373-385.) a kind of enhancement algorithms based on Kalman filtering is proposed, estimate the speech production model parameter with maximum likelihood method, but this method can not the estimation model exponent number, can only determine model order with additive method or priori, and the estimation of initial parameter value is very big to result's influence.(J.Vermaak such as Vermaak, C.Andrieu, A.Doucet and S.J.Godsill, Partical Methods for Bayesian Modeling andEnhancement of Speech Signals, IEEE Trans.Speech and Audio Processing, Vol.10, No.3,2002, pp.173-185.) propose to estimate the speech production model parameter with the Markov chain Monte Carlo method, estimate pure voice signal with Kalman filter.But this method can not the estimation model exponent number, and calculated amount is very big, is not suitable for a lot of occasions.
Summary of the invention
The objective of the invention is at the deficiencies in the prior art, a kind of variation Bayes sound enhancement method based on speech production model is proposed, can select the exponent number of speech production model automatically, and can avoid producing in the parameter estimation procedure over-fitting phenomenon, make the estimation of model more accurate, the better effects if that voice strengthen.
For realizing this purpose, the technical solution used in the present invention is considered: the variation bayes method is a kind of Bayes's approximation method that grows up recent years, its principle is that the approximate posteriority with known variables and parameter distributes and approaches their true distribution, make bayes method can resolve realization, it can learning model structure and model parameter.Therefore, the present invention makes full use of the variation bayes method and avoid the advantage of over-fitting and the ability of Model Selection in the learning parameter process, accurately estimates the parameter and the exponent number of speech production model, better to reach the purpose that voice strengthen.The present invention at first sets up the state space equation of noisy speech model and speech production model, expresses the probability distribution of noisy process and speech production process then.According to the variation bayes method, with approximate posteriority the distribute parameter of approaching speech production model and the probability distribution of clean speech signal.At last, obtain the renewal equation of the parameter of these approximate posteriority distributions, loop iteration upgrades equation up to algorithm convergence.It is that the exponent number of minimum cost function value correspondence promptly is optimum model order with the exponent number of the speech production model independent variable as the cost function of variation bayes method that automodel is selected.The voice signal that is calculated by this optimum exponent number is an optimal results.
Variation Bayes sound enhancement method based on speech production model of the present invention mainly comprises following step:
1, noisy speech signal is expressed as the form of clean speech signal and noise addition, sets up the noisy speech model, represent speech production model, and set up the state space equation of noisy speech model and speech production model correspondence with an autoregressive process.
2, the noise of selected noisy speech model is a Gaussian distribution, the driving noise of speech production model also is a Gaussian distribution, state space equation according to these two Gaussian distribution and noisy speech model and speech production model correspondence, draw the probability distribution of state vector and observation vector, determine the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution by priori.
3, according to the cost function of variation bayes method, and according to the probability distribution of state vector and observation vector, and the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, obtain the approximate posteriority distribution of the contrary variance of the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm.
4, with the renewal equation of the approximate posteriority distribution parameter of variation Kalman smoothing algorithm estimated state vector, by the derive renewal equation of approximate posteriority distribution parameter of the weight coefficient of speech production model and the contrary variance of all Gaussian distribution of the variation maximization of variation expectation-maximization algorithm.
5, in predetermined speech production model exponent number scope, select an initial exponent number value, noisy speech signal and initial exponent number value are brought in the parameter update equation of being derived by step 4, the calculation cost function iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation, with the cost function of this moment and the approximate posteriority distribution parameter preservation of the state vector of correspondence with it.
6, in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value in the new exponent number value replacement step 5, repeating step 5 obtains the approximate posteriority distribution parameter of one group of cost function corresponding with each model order and state vector.
7, in all cost functions that obtain, the exponent number of minimum cost function correspondence is exactly optimum model order, and the voice signal that is calculated by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number is exactly optimum result.
The present invention makes full use of the advantage of variation Bayesian learning model parameter and structure, estimates the parameter and the exponent number of speech production model more exactly, has improved voice and has strengthened effect.
The variation Bayes sound enhancement method based on speech production model that the present invention proposes can be widely used in aspects such as speech communication and speech recognition, has suitable practical value.
Embodiment
In order to understand technical scheme of the present invention better, below be described in further detail.
1. noisy speech signal x tChinese herbaceous peony is expressed as clean speech signal s tWith noise n tThe form of addition, it is as follows to set up the noisy speech model:
x t=s t+n t (1)
Subscript t is the time.Speech production model is represented with an autoregressive process:
s i = w → T s → t ( p ) + e t - - - ( 2 )
w ‾ = [ w 1 , w 2 · · · w p ] T Be the weight coefficient of autoregressive model, s → t ( p ) = [ s t - 1 , · · · , s t - p ] Be and t p the value in relevant past of speech value constantly, p is the exponent number of model.e tIt is the driving noise of autoregressive model.According to above-mentioned noisy speech model (1) and speech production model (2), it is as follows to set up state space equation:
s → t = A s → t - 1 + B e t - - - ( 3 )
x t = C s → t + n t - - - ( 4 )
s → t = Δ s t s t - 1 · · · s t - p + 1 T Be the state vector of p dimension, noisy speech signal x tBe observation vector, A = Δ w → T I [ p - 1 ] 0 p - 1 × 1 Be the state-transition matrix of p * p, B = C T = Δ 1 0 · · · 0 T , I[p-1] be (p-1) * (p-1) unit matrix.
2. noise n tElect Gaussian distribution as, be expressed as The driving noise e of autoregressive model tAlso elect Gaussian distribution as, be expressed as
Figure C200610028331D000610
Figure C200610028331D000611
It is a that expression stochastic variable y satisfies average, and contrary variance is the Gaussian distribution of b.According to (3), state vector
Figure C200610028331D000612
Probability distribution as shown in the formula:
Figure C200610028331D000613
According to (4), the probability distribution of observation vector can be write
Figure C200610028331D000614
The weight coefficient of autoregressive model is obeyed Gauss's prior distribution of a zero-mean
Figure C200610028331D000615
The contrary variance of all Gaussian distribution is obeyed the Gamma prior distribution
Figure C200610028331D00071
Figure C200610028331D00072
Figure C200610028331D00073
3. set { the x that represents observation vector with X 1, x 2..., x T, represent the set of state vector with S
Figure C200610028331D00074
Represent the set of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution with θ
Figure C200610028331D00075
The principle of variation bayes method use exactly an approximate posteriority distribution Q (S, θ) approach p (S, θ | X), the cost function of usefulness is in practice
C KL = ⟨ log Q ( S , θ ) p ( X , S , θ ) ⟩ Q = ⟨ log Q ( S ) Q ( θ ) p ( X , S , θ ) ⟩ Q - - - ( 11 )
QBe illustrated in the expectation under the probability distribution Q ().Cost function (11) according to the variation bayes method, and according to probability distribution (5)-(6) of state vector and observation vector, and prior distribution (7)-(10) of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, the approximate posteriority distribution of contrary variance that can obtain the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm is as follows:
Figure C200610028331D00078
Q(α)=Gamma(α|b (α),c (α)) (14)
Q(β)=Gamma(β|b (β),c (β)) (15)
G(γ)=Gamma(γ|b (γ),c (γ)) (16)
4. ask distribute parameter in (12) of the approximate posteriority of state vector with variation Kalman smoothing algorithm.An arrangement set With Represent at first definite condition expectation m → t | τ = E ( s → t | { x } l τ ) And conditional covariance matrix V t | τ = Var ( s → t | { x } l τ ) , Initial value m → 0 | 0 = m → 0 And V 0|0=V 0, to t=1 ..., T below is a Kalman filtering forward recursive process:
m → t | t - 1 = A ‾ m → t - 1 | t - 1 - - - ( 17 )
V t|t-1=AV t-1|t-1A T+P (18)
K t=V t|t-1C T(CV t|t-1C T+(<γ> Q) -1) -1 (19)
m &RightArrow; t | t = m &RightArrow; t | t - 1 + K t ( x t - C m &RightArrow; t | t - 1 ) - - - ( 20 )
V t|t=V t|t-1-K tCV t|t-1 (21)
Here A &OverBar; = &Delta; &lang; w &RightArrow; &rang; Q T I [ p - 1 ] 0 p - 1 &times; 1 , P = &beta; &OverBar; 0 1 &times; p - 1 0 p - 1 &times; p , β=(<β 〉 Q) -1,
Figure C200610028331D00087
It is state vector
Figure C200610028331D00088
Kalman filtering distribute.Proceed Kalman's smoothing algorithm, with corresponding Kalman filtering value initialization
Figure C200610028331D00089
And V T|T, to t=T-1 ..., 0, it is as follows then to carry out the backward recursive process:
Q t = V t | t A &OverBar; T V t + 1 | t - 1 - - - ( 22 )
m &RightArrow; t | T = m &RightArrow; t | t + Q t ( m &RightArrow; t + 1 | T - m &RightArrow; t + 1 | t ) - - - ( 23 )
V t | T = V t | t + Q t ( V t + 1 | T - V t + 1 | t ) Q t T - - - ( 24 )
Therefore, we obtain
Figure C200610028331D000813
The renewal equation of parameter is: m &RightArrow; t ( s ) = m &RightArrow; t | T With V t ( s ) = [ V t | T ] - 1 . Renewal equation with the approximate posteriority distribution parameter of the weight coefficient of the variation of variation expectation-maximization algorithm maximization derivation speech production model and the contrary variance of all Gaussian distribution is as follows:
&Sigma; ( w ) = &lang; &alpha;I [ p ] &rang; Q + &Sigma; t = 1 T &lang; &beta; s &RightArrow; t ( p ) s &RightArrow; t ( p ) T &rang; Q - - - ( 25 )
&mu; &RightArrow; ( w ) = [ &Sigma; ( w ) ] - 1 [ &Sigma; t = 1 T &lang; &beta; s t s &RightArrow; t ( p ) &rang; Q ] - - - ( 26 )
c &OverBar; ( &alpha; ) = c ( &alpha; ) + p 2 - - - ( 27 )
b &OverBar; ( &alpha; ) = b ( &alpha; ) + 1 2 &lang; w &RightArrow; T w &RightArrow; &rang; Q - - - ( 28 )
c &OverBar; ( &beta; ) = c ( &beta; ) + T 2 - - - ( 29 )
b &OverBar; ( &beta; ) = b ( &beta; ) + 1 2 &Sigma; t = 1 T &lang; ( s t - w &RightArrow; T s &RightArrow; t ( p ) ) 2 &rang; Q - - - ( 30 )
c &OverBar; ( &gamma; ) = c ( &gamma; ) + T 2 - - - ( 31 )
b &OverBar; ( &gamma; ) = b ( &gamma; ) + 1 2 &Sigma; t = 1 T &lang; ( x t - s t ) 2 &rang; Q - - - ( 32 )
5. in predetermined speech production model exponent number scope, select an initial exponent number value p 1, with the signals and associated noises x of reality tWith initial exponent number value p 1Bring in renewal equation (17)-(32) of the parameter of deriving by step 4, the cost function of calculating (11) formula that iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation and stop, the cost function of this moment is reached the approximate posteriority distribution parameter of corresponding with it state vector
Figure C200610028331D00095
Preserve;
6. in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value p in the new exponent number value p replacement step 5 1, repeating step 5 obtains the approximate posteriority distribution parameter of one group of cost function corresponding with each model order and state vector;
7. in all cost functions that obtain, the p value of minimum cost function correspondence is exactly optimum model order, by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number
Figure C200610028331D00096
The voice signal that calculates s ^ t = C m &RightArrow; t ( s ) Be exactly best result.

Claims (1)

1, a kind of variation Bayes sound enhancement method based on speech production model is characterized in that comprising following concrete steps:
1) noisy speech signal is expressed as the form of clean speech signal and noise addition, sets up the noisy speech model, represent speech production model with an autoregressive process, and set up the state space equation of noisy speech model and speech production model correspondence;
2) noise of selected noisy speech model is a Gaussian distribution, the driving noise of speech production model also is a Gaussian distribution, state space equation according to these two Gaussian distribution and noisy speech model and speech production model correspondence, draw the probability distribution of state vector and observation vector, determine the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution by priori;
3) according to the cost function of variation bayes method, and according to the probability distribution of state vector and the probability distribution of observation vector, and the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, the approximate posteriority of obtaining state vector with the variation expectation-maximization algorithm distributes, the approximate posteriority distribution of the contrary variance of the approximate posteriority distribution of the weight coefficient of speech production model and all Gaussian distribution;
4) with the renewal equation of the approximate posteriority distribution parameter of variation Kalman smoothing algorithm estimated state vector, by the derive renewal equation of approximate posteriority distribution parameter of contrary variance of the renewal equation of approximate posteriority distribution parameter of weight coefficient of speech production model and all Gaussian distribution of the variation maximization of variation expectation-maximization algorithm;
5) in predetermined speech production model exponent number scope, select an initial exponent number value, noisy speech signal and initial exponent number value are brought in the parameter update equation of being derived by step 4), the calculation cost function iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation, with the cost function of this moment and the approximate posteriority distribution parameter preservation of the state vector of correspondence with it;
6) in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value in the new exponent number value replacement step 5), repeating step 5), obtain the approximate posteriority distribution parameter of one group of cost function corresponding and state vector with each model order;
7) in all cost functions that obtain, the exponent number of minimum cost function correspondence is exactly optimum model order, and the voice signal that is calculated by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number is exactly optimum result.
CNB2006100283311A 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model Expired - Fee Related CN100498935C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100283311A CN100498935C (en) 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100283311A CN100498935C (en) 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model

Publications (2)

Publication Number Publication Date
CN1870136A CN1870136A (en) 2006-11-29
CN100498935C true CN100498935C (en) 2009-06-10

Family

ID=37443781

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100283311A Expired - Fee Related CN100498935C (en) 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model

Country Status (1)

Country Link
CN (1) CN100498935C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254552B (en) * 2011-07-14 2012-10-03 杭州电子科技大学 Semantic enhanced transport vehicle acoustic information fusion method
CN102637438B (en) * 2012-03-23 2013-07-17 同济大学 Voice filtering method
US20140114650A1 (en) * 2012-10-22 2014-04-24 Mitsubishi Electric Research Labs, Inc. Method for Transforming Non-Stationary Signals Using a Dynamic Model
CN108206024B (en) * 2017-12-29 2021-06-25 河海大学常州校区 Voice data processing method based on variational Gaussian regression process
CN113421545B (en) * 2021-06-30 2023-09-29 平安科技(深圳)有限公司 Multi-mode voice synthesis method, device, equipment and storage medium
CN117540173B (en) * 2024-01-09 2024-04-19 长江水利委员会水文局 Flood simulation uncertainty analysis method based on Bayesian joint probability model

Also Published As

Publication number Publication date
CN1870136A (en) 2006-11-29

Similar Documents

Publication Publication Date Title
CN100498935C (en) Variation Bayesian voice strengthening method based on voice generating model
CN107886967B (en) A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
CN111261183B (en) Method and device for denoising voice
CN104067340B (en) For the method for voice strengthened in mixed signal
US20150032445A1 (en) Noise estimation apparatus, noise estimation method, noise estimation program, and recording medium
CN111985523A (en) Knowledge distillation training-based 2-exponential power deep neural network quantification method
CN101853661B (en) Noise spectrum estimation and voice mobility detection method based on unsupervised learning
CN104900232A (en) Isolation word identification method based on double-layer GMM structure and VTS feature compensation
CN101477801A (en) Method for detecting and eliminating pulse noise in digital audio signal
CN104485103A (en) Vector Taylor series-based multi-environment model isolated word identifying method
CN102945670A (en) Multi-environment characteristic compensation method for voice recognition system
CN109192200A (en) A kind of audio recognition method
CN108010536A (en) Echo cancel method, device, system and storage medium
CN110998723B (en) Signal processing device using neural network, signal processing method, and recording medium
CN105513614A (en) Voice activation detection method based on noise power spectrum density Gamma distribution statistical model
CN110808057A (en) Voice enhancement method for generating confrontation network based on constraint naive
CN1909064B (en) Time-domain blind separating method for in-line natural voice convolution mixing signal
Cheng et al. A new unscented particle filter
CN112086100A (en) Quantization error entropy based urban noise identification method of multilayer random neural network
CN109102818B (en) Denoising audio sampling algorithm based on signal frequency probability density function distribution
CN104035332A (en) M-estimation impulsive noise active control method
CN104240717A (en) Voice enhancement method based on combination of sparse code and ideal binary system mask
Rennie et al. Dynamic noise adaptation
Richter et al. Speech signal improvement using causal generative diffusion models

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090610

Termination date: 20120629