CN1870136A - Variation Bayesian voice strengthening method based on voice generating model - Google Patents

Variation Bayesian voice strengthening method based on voice generating model Download PDF

Info

Publication number
CN1870136A
CN1870136A CNA2006100283311A CN200610028331A CN1870136A CN 1870136 A CN1870136 A CN 1870136A CN A2006100283311 A CNA2006100283311 A CN A2006100283311A CN 200610028331 A CN200610028331 A CN 200610028331A CN 1870136 A CN1870136 A CN 1870136A
Authority
CN
China
Prior art keywords
model
distribution
speech
production model
exponent number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006100283311A
Other languages
Chinese (zh)
Other versions
CN100498935C (en
Inventor
黄青华
杨杰
薛云峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CNB2006100283311A priority Critical patent/CN100498935C/en
Publication of CN1870136A publication Critical patent/CN1870136A/en
Application granted granted Critical
Publication of CN100498935C publication Critical patent/CN100498935C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A method for intensifying variational Bayes voice based on voice generating model includes setting up a noising voice model and state space equation of voice generating model, expressing a noising course and probability distribution, applying approximate posteriori distribution to approximate parameter of voice generating model and probability distribution of pure voice according to variational Bayes method to obtain parameter update equality of those approximate posteriori distribution and updating equality with cyclic iteration till algorithm convergence.

Description

Variation Bayes sound enhancement method based on speech production model
Technical field
The present invention relates to a kind of variation Bayes sound enhancement method, can be widely used in aspects such as speech communication and speech recognition, belong to field of voice signal based on speech production model.
Background technology
Actual voice capture device and voice collecting environment can not obtain pure voice down, voice can be by the diversity of settings noise pollution, therefore in speech communication and speech recognition etc. are used, it is very important that voice are strengthened as a pre-service link, and the voice after the enhancing can better guarantee the accuracy that subsequent voice is handled.
For improving voice quality, existing sound enhancement method mainly contains following several:
First method is a threshold method, and its ultimate principle thinks that the less part of amplitude absolute value mainly is a noise in the signal, further compresses this part signal by a kind of linearity or non-linear compression function and reaches the purpose that voice strengthen.When being compression noise, the major defect of this algorithm also compressed a lot of useful voice messagings.
Second method is a spectrum-subtraction, suppose that noise is stably or the additive noise that becomes when slow, and suppose that voice signal and noise are under the separate condition, to deduct the power spectrum of noise from the power spectrum of noisy speech, thereby obtain comparatively pure voice spectrum.But it is exactly to have the not naturetone that is called " music " noise in the voice signal after strengthening that this method has a well-known shortcoming, and then makes people's ear subjective sensation uncomfortable.
The third method is based on the enhancement algorithms of speech production model, this algorithm is owing to the parameter of " pure " speech model can't accurately be estimated, so can only adopt direct estimation model parameter from signals and associated noises, inaccurate if model is estimated, strengthen back intelligibility of speech variation.Therefore estimation model parameter and model order are the keys of this method accurately from the voice that contain noise.(S.Gannot such as Gannot, D.Burshteinand E.Weinstein, Iterative and Sequential Kalman Filter-Based Speech EnhancementAlgorithms, IEEE Trans.Speech and Audio Processing, vol.6, No.4, July l998, pp.373-385.) a kind of enhancement algorithms based on Kalman filtering is proposed, estimate the speech production model parameter with maximum likelihood method, but this method can not the estimation model exponent number, can only determine model order with additive method or priori, and the estimation of initial parameter value is very big to result's influence.(J.Vermaak such as Vermaak, C.Andrieu, A.Doucet and S.J.Godsill, Partical Methods for Bayesian Modeling andEnhancement of Speech Signals, IEEE Trans.Speech and Audio Processing, Vol.10, No.3,2002, pp.173-185.) propose to estimate the speech production model parameter with the Markov chain Monte Carlo method, estimate pure voice signal with Kalman filter.But this method can not the estimation model exponent number, and calculated amount is very big, is not suitable for a lot of occasions.
Summary of the invention
The objective of the invention is at the deficiencies in the prior art, a kind of variation Bayes sound enhancement method based on speech production model is proposed, can select the exponent number of speech production model automatically, and can avoid producing in the parameter estimation procedure over-fitting phenomenon, make the estimation of model more accurate, the better effects if that voice strengthen.
For realizing this purpose, the technical solution used in the present invention is considered: the variation bayes method is a kind of Bayes's approximation method that grows up recent years, its principle is that the approximate posteriority with known variables and parameter distributes and approaches their true distribution, make bayes method can resolve realization, it can learning model structure and model parameter.Therefore, the present invention makes full use of the variation bayes method and avoid the advantage of over-fitting and the ability of Model Selection in the learning parameter process, accurately estimates the parameter and the exponent number of speech production model, better to reach the purpose that voice strengthen.The present invention at first sets up the state space equation of noisy speech model and speech production model, expresses the probability distribution of noisy process and speech production process then.According to the variation bayes method, with approximate posteriority the distribute parameter of approaching speech production model and the probability distribution of clean speech signal.At last, obtain the renewal equation of the parameter of these approximate posteriority distributions, loop iteration upgrades equation up to algorithm convergence.It is that the exponent number of minimum cost function value correspondence promptly is optimum model order with the exponent number of the speech production model independent variable as the cost function of variation bayes method that automodel is selected.The voice signal that is calculated by this optimum exponent number is an optimal results.
Variation Bayes sound enhancement method based on speech production model of the present invention mainly comprises following step:
1, noisy speech signal is expressed as the form of clean speech signal and noise addition, sets up the noisy speech model, represent speech production model, and set up the state space equation of noisy speech model and speech production model correspondence with an autoregressive process.
2, the noise of selected noisy speech model is a Gaussian distribution, the driving noise of speech production model also is a Gaussian distribution, state space equation according to these two Gaussian distribution and noisy speech model and speech production model correspondence, draw the probability distribution of state vector and observation vector, determine the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution by priori.
3, according to the cost function of variation bayes method, and according to the probability distribution of state vector and observation vector, and the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, obtain the approximate posteriority distribution of the contrary variance of the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm.
4, with the renewal equation of the approximate posteriority distribution parameter of variation Kalman smoothing algorithm estimated state vector, by the derive renewal equation of approximate posteriority distribution parameter of the weight coefficient of speech production model and the contrary variance of all Gaussian distribution of the variation maximization of variation expectation-maximization algorithm.
5, in predetermined speech production model exponent number scope, select an initial exponent number value, noisy speech signal and initial exponent number value are brought in the parameter update equation of being derived by step 4, the calculation cost function iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation, with the cost function of this moment and the approximate posteriority distribution parameter preservation of the state vector of correspondence with it.
6, in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value in the new exponent number value replacement step 5, repeating step 5 obtains the approximate posteriority distribution parameter of one group of cost function corresponding with each model order and state vector.
7, in all cost functions that obtain, the exponent number of minimum cost function correspondence is exactly optimum model order, and the voice signal that is calculated by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number is exactly optimum result.
The present invention makes full use of the advantage of variation Bayesian learning model parameter and structure, estimates the parameter and the exponent number of speech production model more exactly, has improved voice and has strengthened effect.
The variation Bayes sound enhancement method based on speech production model that the present invention proposes can be widely used in aspects such as speech communication and speech recognition, has suitable practical value.
Embodiment
In order to understand technical scheme of the present invention better, below be described in further detail.
1. noisy speech signal x tBe expressed as clean speech signal s tWith noise n tThe form of addition, it is as follows to set up the noisy speech model:
x t=s t+n t (1)
Subscript t is the time.Speech production model is represented with an autoregressive process:
s t = w V T s V t ( p ) + e t - - - ( 2 )
w V = w 1 , w 2 L w p T Be the weight coefficient of autoregressive model, s V t ( p ) = [ s t - 1 , L , s t - p ] Be and t p the value in relevant past of speech value constantly, p is the exponent number of model.e tIt is the driving noise of autoregressive model.According to above-mentioned noisy speech model (1) and speech production model (2), it is as follows to set up state space equation:
s V t = A s V t - 1 + Be t - - - ( 3 )
x t = C s V t + n t - - - ( 4 )
s V t @ s t s t - 1 L s t - p + 1 T Be the state vector of p dimension, noisy speech signal x tBe observation vector, A @ w V T I [ p - 1 ] 0 p - 1 × 1 Be the state-transition matrix of p * p, B=C T@[10L0] T, I[p-1] and be (p-1) * (p-1) unit matrix.
2. noise n tElect Gaussian distribution as, be expressed as p (n t)=G (n t| 0, γ).The driving noise e of autoregressive model tAlso elect Gaussian distribution as, be expressed as p (e t)=G (e t| 0, β).(y|a b) represents that it is a that stochastic variable y satisfies average to G, and contrary variance is the Gaussian distribution of b.According to (3), state vector Probability distribution as shown in the formula:
p ( s V t | s V t - 1 , w V , β ) = G ( s V t | A s V t - 1 , β ) - - - ( 5 )
According to (4), the probability distribution of observation vector can be write
p ( x t | s V t , γ ) = G ( x t | s t , γ ) - - - ( 6 )
The weight coefficient of autoregressive model is obeyed Gauss's prior distribution of a zero-mean
p ( w V | α ) = G ( w V | 0 , αI [ p ] ) - - - ( 7 )
The contrary variance of all Gaussian distribution is obeyed the Gamma prior distribution
p(α|H)=Gamma(δ|b (α),c (α)) (8)
p(β|H)=Gamma(β|b (β),c (β)) (9)
p(γ|H)=Gamma(γ|b (γ),c (γ)) (10)
3. set { the x that represents observation vector with X 1, x 2..., x T, represent the set of state vector with S Represent the set of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution with θ The principle of variation bayes method use exactly an approximate posteriority distribution Q (S, θ) approach p (S, θ | X), the cost function of usefulness is in practice
C KL = ⟨ log Q ( S , θ ) p ( X , S , θ ) ⟩ Q = ⟨ log Q ( S ) Q ( θ ) p ( X , S , θ ) ⟩ Q - - - ( 11 )
QBe illustrated in the expectation under the probability distribution Q ().Cost function (11) according to the variation bayes method, and according to probability distribution (5)-(6) of state vector and observation vector, and prior distribution (7)-(10) of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, the approximate posteriority distribution of contrary variance that can obtain the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm is as follows:
Q ( s V t ) = G ( s V t | m V t ( s ) , V t ( s ) ) - - - ( 12 )
Q ( w V ) = G ( w V | μ V ( w ) , Σ ( w ) ) - - - ( 13 )
Q(α)=Gamma(α| b (α), c (α)) (14)
Q(β)=Gamma(β| b (β), c (β)) (15)
Q(γ)=Gamma(γ| b (γ), c (γ)) (16)
4. ask distribute parameter in (12) of the approximate posteriority of state vector with variation Kalman smoothing algorithm.An arrangement set { x T0, x T0+1, L, x T1Usefulness { x} T0 T1Represent at first definite condition expectation m V t | τ = E ( s V t | { x } 1 τ ) And conditional covariance matrix V t | τ = Var ( s V t | { x } 1 τ ) , Initial value m V 0 | 0 = m V 0 And V 0|0=V 0, to t=1, L, T below is a Kalman filtering forward recursive process:
m V t | t - 1 = A ‾ m V t - 1 | t - 1 - - - ( 17 )
V t|t-1= AV t-1|t-1A T+P (18)
K t = V t | t - 1 C T ( CV t | t - 1 C T + ( ⟨ γ ⟩ Q ) - 1 ) - 1 - - - ( 19 )
m V t | t = m V t | t - 1 + K t ( x t - C m V t | t - 1 ) - - - ( 20 )
V t|t=V t|t-1-K tCV t|t-1 (21)
Here A ‾ @ ⟨ w V ⟩ Q T I [ p - 1 ] 0 p - 1 × 1 , P = β ‾ 0 1 × p - 1 0 p - 1 × p , β=(〈β〉 Q) -1 p ( s V t | { x } 1 t ) = G ( s V t | m V t | t , V t | t ) It is state vector Kalman filtering distribute.Proceed Kalman's smoothing algorithm, with corresponding Kalman filtering value initialization
Figure A20061002833100089
And V T|T, to t=T-1, L, 0, it is as follows then to carry out the backward recursive process:
Q t = V t | t A ‾ T V t + 1 | t - 1 - - - ( 22 )
m V t | T = m V t | t + Q t ( m V t + 1 | T - m V t + 1 | t ) - - - ( 23 )
V t | T = V t | t + Q t ( V t + 1 | T - V t + 1 | t ) Q t T - - - ( 24 )
Therefore, we obtain Q ( s V t ) = G ( s V t | m V t ( s ) , V t ( s ) ) The renewal equation of parameter is: m V t ( s ) = m V t | T With V t ( s ) = [ V t | T ] - 1 .
Renewal equation with the approximate posteriority distribution parameter of the weight coefficient of the variation of variation expectation-maximization algorithm maximization derivation speech production model and the contrary variance of all Gaussian distribution is as follows:
Σ ( w ) = ⟨ αI [ p ] ⟩ Q + Σ t = 1 T ⟨ β s V t ( p ) s V t ( p ) T ⟩ Q - - - ( 25 )
μ V ( w ) = [ Σ ( w ) ] - 1 [ Σ t = 1 T ⟨ βs t s V t ( p ) ⟩ Q ] - - - ( 26 )
c ‾ ( α ) = c ( α ) + p 2 - - - ( 27 )
b ‾ ( α ) = b ( α ) + 1 2 ⟨ w V T w V ⟩ Q - - - ( 28 )
c ‾ ( β ) = c ( β ) + T 2 - - - ( 29 )
b ‾ ( β ) = b ( β ) + 1 2 ⟨ ( s t - w V T s V t ( p ) ) 2 ⟩ Q - - - ( 30 )
c ‾ ( γ ) = c ( γ ) + T 2 - - - ( 31 )
b ‾ ( γ ) = b ( γ ) + 1 2 Σ t = 1 T ⟨ ( x t - s t ) 2 ⟩ Q - - - ( 32 )
5. in predetermined speech production model exponent number scope, select an initial exponent number value P 1, with the signals and associated noises x of reality tWith initial exponent number value p 1Bring in renewal equation (17)-(32) of the parameter of deriving by step 4, the cost function of calculating (11) formula that iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation and stop, the cost function of this moment is reached the approximate posteriority distribution parameter of corresponding with it state vector
Figure A20061002833100095
Preserve;
6. in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value P in the new exponent number value p replacement step 5 1, repeating step 5 obtains the approximate posteriority distribution parameter of one group of cost function corresponding with each model order and state vector;
7. in all cost functions that obtain, the p value of minimum cost function correspondence is exactly optimum model order, by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number The voice signal that calculates s ^ t = C m V t ( s ) Be exactly best result.

Claims (1)

1, a kind of variation Bayes sound enhancement method based on speech production model is characterized in that comprising following concrete steps:
1) noisy speech signal is expressed as the form of clean speech signal and noise addition, sets up the noisy speech model, represent speech production model with an autoregressive process, and set up the state space equation of noisy speech model and speech production model correspondence;
2) noise of selected noisy speech model is a Gaussian distribution, the driving noise of speech production model also is a Gaussian distribution, state space equation according to these two Gaussian distribution and noisy speech model and speech production model correspondence, draw the probability distribution of state vector and observation vector, determine the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution by priori;
3) according to the cost function of variation bayes method, and according to the probability distribution of state vector and observation vector, and the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, obtain the approximate posteriority distribution of the contrary variance of the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm;
4) with the renewal equation of the approximate posteriority distribution parameter of variation Kalman smoothing algorithm estimated state vector, by the derive renewal equation of approximate posteriority distribution parameter of the weight coefficient of speech production model and the contrary variance of all Gaussian distribution of the variation maximization of variation expectation-maximization algorithm;
5) in predetermined speech production model exponent number scope, select an initial exponent number value, noisy speech signal and initial exponent number value are brought in the parameter update equation of being derived by step 4), the calculation cost function iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation, with the cost function of this moment and the approximate posteriority distribution parameter preservation of the state vector of correspondence with it;
6) in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value in the new exponent number value replacement step 5), repeating step 5), obtain the approximate posteriority distribution parameter of one group of cost function corresponding and state vector with each model order;
7) in all cost functions that obtain, the exponent number of minimum cost function correspondence is exactly optimum model order, and the voice signal that is calculated by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number is exactly optimum result.
CNB2006100283311A 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model Expired - Fee Related CN100498935C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100283311A CN100498935C (en) 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100283311A CN100498935C (en) 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model

Publications (2)

Publication Number Publication Date
CN1870136A true CN1870136A (en) 2006-11-29
CN100498935C CN100498935C (en) 2009-06-10

Family

ID=37443781

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100283311A Expired - Fee Related CN100498935C (en) 2006-06-29 2006-06-29 Variation Bayesian voice strengthening method based on voice generating model

Country Status (1)

Country Link
CN (1) CN100498935C (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254552A (en) * 2011-07-14 2011-11-23 杭州电子科技大学 Semantic enhanced transport vehicle acoustic information fusion method
CN102637438A (en) * 2012-03-23 2012-08-15 同济大学 Voice filtering method
CN104737229A (en) * 2012-10-22 2015-06-24 三菱电机株式会社 Method for transforming input signal
CN108206024A (en) * 2017-12-29 2018-06-26 河海大学常州校区 A kind of voice data processing method based on variation Gauss regression process
CN113421545A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Multi-modal speech synthesis method, device, equipment and storage medium
CN117540173A (en) * 2024-01-09 2024-02-09 长江水利委员会水文局 Flood simulation uncertainty analysis method based on Bayesian joint probability model

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254552A (en) * 2011-07-14 2011-11-23 杭州电子科技大学 Semantic enhanced transport vehicle acoustic information fusion method
CN102254552B (en) * 2011-07-14 2012-10-03 杭州电子科技大学 Semantic enhanced transport vehicle acoustic information fusion method
CN102637438A (en) * 2012-03-23 2012-08-15 同济大学 Voice filtering method
CN102637438B (en) * 2012-03-23 2013-07-17 同济大学 Voice filtering method
CN104737229A (en) * 2012-10-22 2015-06-24 三菱电机株式会社 Method for transforming input signal
CN108206024A (en) * 2017-12-29 2018-06-26 河海大学常州校区 A kind of voice data processing method based on variation Gauss regression process
CN113421545A (en) * 2021-06-30 2021-09-21 平安科技(深圳)有限公司 Multi-modal speech synthesis method, device, equipment and storage medium
CN113421545B (en) * 2021-06-30 2023-09-29 平安科技(深圳)有限公司 Multi-mode voice synthesis method, device, equipment and storage medium
CN117540173A (en) * 2024-01-09 2024-02-09 长江水利委员会水文局 Flood simulation uncertainty analysis method based on Bayesian joint probability model
CN117540173B (en) * 2024-01-09 2024-04-19 长江水利委员会水文局 Flood simulation uncertainty analysis method based on Bayesian joint probability model

Also Published As

Publication number Publication date
CN100498935C (en) 2009-06-10

Similar Documents

Publication Publication Date Title
CN109859767B (en) Environment self-adaptive neural network noise reduction method, system and storage medium for digital hearing aid
CN109841226B (en) Single-channel real-time noise reduction method based on convolution recurrent neural network
CN110619885B (en) Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN1870136A (en) Variation Bayesian voice strengthening method based on voice generating model
CN101976566B (en) Voice enhancement method and device using same
CN107272066A (en) A kind of noisy seismic signal first-arrival traveltime pick-up method and device
CN110045419A (en) A kind of perceptron residual error autoencoder network seismic data denoising method
CN111985523A (en) Knowledge distillation training-based 2-exponential power deep neural network quantification method
JP2013534651A5 (en)
CN110490816B (en) Underwater heterogeneous information data noise reduction method
CN104067340B (en) For the method for voice strengthened in mixed signal
CN109192200A (en) A kind of audio recognition method
CN115618204A (en) Electric energy data denoising method based on optimal wavelet basis and improved wavelet threshold function
CN112861740A (en) Wavelet threshold denoising parameter selection method based on composite evaluation index and wavelet entropy
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN1909064A (en) Time-domain blind separating method for in-line natural voice convolution mixing signal
CN1805011A (en) Adaptive filter method and apparatus for improving speech quality of mobile communication apparatus
CN101923716B (en) Method for improving particle filter tracking effect
CN102184530A (en) Image denoising method based on gray relation threshold value
CN115440240A (en) Training method for voice noise reduction, voice noise reduction system and voice noise reduction method
CN105185385A (en) Voice fundamental tone frequency estimation method based on gender anticipation and multi-frequency-band parameter mapping
CN1924850A (en) Audio fast search method
CN114141266A (en) Speech enhancement method for estimating prior signal-to-noise ratio based on PESQ driven reinforcement learning
CN1845640A (en) Wireless channel blind estimation method based on wavelet shrinkage and HMM
CN108573698B (en) Voice noise reduction method based on gender fusion information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090610

Termination date: 20120629