CN100498935C - Variation Bayesian voice strengthening method based on voice generating model - Google Patents
Variation Bayesian voice strengthening method based on voice generating model Download PDFInfo
- Publication number
- CN100498935C CN100498935C CNB2006100283311A CN200610028331A CN100498935C CN 100498935 C CN100498935 C CN 100498935C CN B2006100283311 A CNB2006100283311 A CN B2006100283311A CN 200610028331 A CN200610028331 A CN 200610028331A CN 100498935 C CN100498935 C CN 100498935C
- Authority
- CN
- China
- Prior art keywords
- model
- distribution
- speech
- production model
- exponent number
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
Abstract
A method for intensifying variational Bayes voice based on voice generating model includes setting up a noising voice model and state space equation of voice generating model, expressing a noising course and probability distribution, applying approximate posteriori distribution to approximate parameter of voice generating model and probability distribution of pure voice according to variational Bayes method to obtain parameter update equality of those approximate posteriori distribution and updating equality with cyclic iteration till algorithm convergence.
Description
Technical field
The present invention relates to a kind of variation Bayes sound enhancement method, can be widely used in aspects such as speech communication and speech recognition, belong to field of voice signal based on speech production model.
Background technology
Actual voice capture device and voice collecting environment can not obtain pure voice down, voice can be by the diversity of settings noise pollution, therefore in speech communication and speech recognition etc. are used, it is very important that voice are strengthened as a pre-service link, and the voice after the enhancing can better guarantee the accuracy that subsequent voice is handled.
For improving voice quality, existing sound enhancement method mainly contains following several:
First method is a threshold method, and its ultimate principle thinks that the less part of amplitude absolute value mainly is a noise in the signal, further compresses this part signal by a kind of linearity or non-linear compression function and reaches the purpose that voice strengthen.When being compression noise, the major defect of this algorithm also compressed a lot of useful voice messagings.
Second method is a spectrum-subtraction, suppose that noise is stably or the additive noise that becomes when slow, and suppose that voice signal and noise are under the separate condition, to deduct the power spectrum of noise from the power spectrum of noisy speech, thereby obtain comparatively pure voice spectrum.But it is exactly to have the not naturetone that is called " music " noise in the voice signal after strengthening that this method has a well-known shortcoming, and then makes people's ear subjective sensation uncomfortable.
The third method is based on the enhancement algorithms of speech production model, this algorithm is owing to the parameter of " pure " speech model can't accurately be estimated, so can only adopt direct estimation model parameter from signals and associated noises, inaccurate if model is estimated, strengthen back intelligibility of speech variation.Therefore estimation model parameter and model order are the keys of this method accurately from the voice that contain noise.(S.Gannot such as Gannot, D.Burshteinand E.Weinstein, Iterative and Sequential Kalman Filter-Based Speech EnhancementAlgorithms, IEEE Trans.Speech and Audio Processing, vol.6, No.4, July 1998, pp.373-385.) a kind of enhancement algorithms based on Kalman filtering is proposed, estimate the speech production model parameter with maximum likelihood method, but this method can not the estimation model exponent number, can only determine model order with additive method or priori, and the estimation of initial parameter value is very big to result's influence.(J.Vermaak such as Vermaak, C.Andrieu, A.Doucet and S.J.Godsill, Partical Methods for Bayesian Modeling andEnhancement of Speech Signals, IEEE Trans.Speech and Audio Processing, Vol.10, No.3,2002, pp.173-185.) propose to estimate the speech production model parameter with the Markov chain Monte Carlo method, estimate pure voice signal with Kalman filter.But this method can not the estimation model exponent number, and calculated amount is very big, is not suitable for a lot of occasions.
Summary of the invention
The objective of the invention is at the deficiencies in the prior art, a kind of variation Bayes sound enhancement method based on speech production model is proposed, can select the exponent number of speech production model automatically, and can avoid producing in the parameter estimation procedure over-fitting phenomenon, make the estimation of model more accurate, the better effects if that voice strengthen.
For realizing this purpose, the technical solution used in the present invention is considered: the variation bayes method is a kind of Bayes's approximation method that grows up recent years, its principle is that the approximate posteriority with known variables and parameter distributes and approaches their true distribution, make bayes method can resolve realization, it can learning model structure and model parameter.Therefore, the present invention makes full use of the variation bayes method and avoid the advantage of over-fitting and the ability of Model Selection in the learning parameter process, accurately estimates the parameter and the exponent number of speech production model, better to reach the purpose that voice strengthen.The present invention at first sets up the state space equation of noisy speech model and speech production model, expresses the probability distribution of noisy process and speech production process then.According to the variation bayes method, with approximate posteriority the distribute parameter of approaching speech production model and the probability distribution of clean speech signal.At last, obtain the renewal equation of the parameter of these approximate posteriority distributions, loop iteration upgrades equation up to algorithm convergence.It is that the exponent number of minimum cost function value correspondence promptly is optimum model order with the exponent number of the speech production model independent variable as the cost function of variation bayes method that automodel is selected.The voice signal that is calculated by this optimum exponent number is an optimal results.
Variation Bayes sound enhancement method based on speech production model of the present invention mainly comprises following step:
1, noisy speech signal is expressed as the form of clean speech signal and noise addition, sets up the noisy speech model, represent speech production model, and set up the state space equation of noisy speech model and speech production model correspondence with an autoregressive process.
2, the noise of selected noisy speech model is a Gaussian distribution, the driving noise of speech production model also is a Gaussian distribution, state space equation according to these two Gaussian distribution and noisy speech model and speech production model correspondence, draw the probability distribution of state vector and observation vector, determine the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution by priori.
3, according to the cost function of variation bayes method, and according to the probability distribution of state vector and observation vector, and the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, obtain the approximate posteriority distribution of the contrary variance of the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm.
4, with the renewal equation of the approximate posteriority distribution parameter of variation Kalman smoothing algorithm estimated state vector, by the derive renewal equation of approximate posteriority distribution parameter of the weight coefficient of speech production model and the contrary variance of all Gaussian distribution of the variation maximization of variation expectation-maximization algorithm.
5, in predetermined speech production model exponent number scope, select an initial exponent number value, noisy speech signal and initial exponent number value are brought in the parameter update equation of being derived by step 4, the calculation cost function iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation, with the cost function of this moment and the approximate posteriority distribution parameter preservation of the state vector of correspondence with it.
6, in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value in the new exponent number value replacement step 5, repeating step 5 obtains the approximate posteriority distribution parameter of one group of cost function corresponding with each model order and state vector.
7, in all cost functions that obtain, the exponent number of minimum cost function correspondence is exactly optimum model order, and the voice signal that is calculated by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number is exactly optimum result.
The present invention makes full use of the advantage of variation Bayesian learning model parameter and structure, estimates the parameter and the exponent number of speech production model more exactly, has improved voice and has strengthened effect.
The variation Bayes sound enhancement method based on speech production model that the present invention proposes can be widely used in aspects such as speech communication and speech recognition, has suitable practical value.
Embodiment
In order to understand technical scheme of the present invention better, below be described in further detail.
1. noisy speech signal x
tChinese herbaceous peony is expressed as clean speech signal s
tWith noise n
tThe form of addition, it is as follows to set up the noisy speech model:
x
t=s
t+n
t (1)
Subscript t is the time.Speech production model is represented with an autoregressive process:
2. noise n
tElect Gaussian distribution as, be expressed as
The driving noise e of autoregressive model
tAlso elect Gaussian distribution as, be expressed as
It is a that expression stochastic variable y satisfies average, and contrary variance is the Gaussian distribution of b.According to (3), state vector
Probability distribution as shown in the formula:
According to (4), the probability distribution of observation vector can be write
The weight coefficient of autoregressive model is obeyed Gauss's prior distribution of a zero-mean
The contrary variance of all Gaussian distribution is obeyed the Gamma prior distribution
3. set { the x that represents observation vector with X
1, x
2..., x
T, represent the set of state vector with S
Represent the set of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution with θ
The principle of variation bayes method use exactly an approximate posteriority distribution Q (S, θ) approach p (S, θ | X), the cost function of usefulness is in practice
<
QBe illustrated in the expectation under the probability distribution Q ().Cost function (11) according to the variation bayes method, and according to probability distribution (5)-(6) of state vector and observation vector, and prior distribution (7)-(10) of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, the approximate posteriority distribution of contrary variance that can obtain the weight coefficient of state vector, speech production model and all Gaussian distribution with the variation expectation-maximization algorithm is as follows:
Q(α)=Gamma(α|b
(α),c
(α)) (14)
Q(β)=Gamma(β|b
(β),c
(β)) (15)
G(γ)=Gamma(γ|b
(γ),c
(γ)) (16)
4. ask distribute parameter in (12) of the approximate posteriority of state vector with variation Kalman smoothing algorithm.An arrangement set
With
Represent at first definite condition expectation
And conditional covariance matrix
Initial value
And V
0|0=V
0, to t=1 ..., T below is a Kalman filtering forward recursive process:
V
t|t-1=AV
t-1|t-1A
T+P (18)
K
t=V
t|t-1C
T(CV
t|t-1C
T+(<γ>
Q)
-1)
-1 (19)
V
t|t=V
t|t-1-K
tCV
t|t-1 (21)
Here
β=(<β 〉
Q)
-1,
It is state vector
Kalman filtering distribute.Proceed Kalman's smoothing algorithm, with corresponding Kalman filtering value initialization
And V
T|T, to t=T-1 ..., 0, it is as follows then to carry out the backward recursive process:
Therefore, we obtain
The renewal equation of parameter is:
With
Renewal equation with the approximate posteriority distribution parameter of the weight coefficient of the variation of variation expectation-maximization algorithm maximization derivation speech production model and the contrary variance of all Gaussian distribution is as follows:
5. in predetermined speech production model exponent number scope, select an initial exponent number value p
1, with the signals and associated noises x of reality
tWith initial exponent number value p
1Bring in renewal equation (17)-(32) of the parameter of deriving by step 4, the cost function of calculating (11) formula that iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation and stop, the cost function of this moment is reached the approximate posteriority distribution parameter of corresponding with it state vector
Preserve;
6. in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value p in the new exponent number value p replacement step 5
1, repeating step 5 obtains the approximate posteriority distribution parameter of one group of cost function corresponding with each model order and state vector;
7. in all cost functions that obtain, the p value of minimum cost function correspondence is exactly optimum model order, by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number
The voice signal that calculates
Be exactly best result.
Claims (1)
1, a kind of variation Bayes sound enhancement method based on speech production model is characterized in that comprising following concrete steps:
1) noisy speech signal is expressed as the form of clean speech signal and noise addition, sets up the noisy speech model, represent speech production model with an autoregressive process, and set up the state space equation of noisy speech model and speech production model correspondence;
2) noise of selected noisy speech model is a Gaussian distribution, the driving noise of speech production model also is a Gaussian distribution, state space equation according to these two Gaussian distribution and noisy speech model and speech production model correspondence, draw the probability distribution of state vector and observation vector, determine the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution by priori;
3) according to the cost function of variation bayes method, and according to the probability distribution of state vector and the probability distribution of observation vector, and the prior distribution of the contrary variance of the weight coefficient of speech production model and all Gaussian distribution, the approximate posteriority of obtaining state vector with the variation expectation-maximization algorithm distributes, the approximate posteriority distribution of the contrary variance of the approximate posteriority distribution of the weight coefficient of speech production model and all Gaussian distribution;
4) with the renewal equation of the approximate posteriority distribution parameter of variation Kalman smoothing algorithm estimated state vector, by the derive renewal equation of approximate posteriority distribution parameter of contrary variance of the renewal equation of approximate posteriority distribution parameter of weight coefficient of speech production model and all Gaussian distribution of the variation maximization of variation expectation-maximization algorithm;
5) in predetermined speech production model exponent number scope, select an initial exponent number value, noisy speech signal and initial exponent number value are brought in the parameter update equation of being derived by step 4), the calculation cost function iterates, be not more than certain pre-determined threshold value up to cost function from an absolute value that goes on foot next step variation, with the cost function of this moment and the approximate posteriority distribution parameter preservation of the state vector of correspondence with it;
6) in predetermined speech production model exponent number scope, change the value of model order successively, with the initial exponent number value in the new exponent number value replacement step 5), repeating step 5), obtain the approximate posteriority distribution parameter of one group of cost function corresponding and state vector with each model order;
7) in all cost functions that obtain, the exponent number of minimum cost function correspondence is exactly optimum model order, and the voice signal that is calculated by the approximate posteriority distribution parameter of the pairing state vector of this optimization model exponent number is exactly optimum result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100283311A CN100498935C (en) | 2006-06-29 | 2006-06-29 | Variation Bayesian voice strengthening method based on voice generating model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100283311A CN100498935C (en) | 2006-06-29 | 2006-06-29 | Variation Bayesian voice strengthening method based on voice generating model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1870136A CN1870136A (en) | 2006-11-29 |
CN100498935C true CN100498935C (en) | 2009-06-10 |
Family
ID=37443781
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2006100283311A Expired - Fee Related CN100498935C (en) | 2006-06-29 | 2006-06-29 | Variation Bayesian voice strengthening method based on voice generating model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100498935C (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254552B (en) * | 2011-07-14 | 2012-10-03 | 杭州电子科技大学 | Semantic enhanced transport vehicle acoustic information fusion method |
CN102637438B (en) * | 2012-03-23 | 2013-07-17 | 同济大学 | Voice filtering method |
US20140114650A1 (en) * | 2012-10-22 | 2014-04-24 | Mitsubishi Electric Research Labs, Inc. | Method for Transforming Non-Stationary Signals Using a Dynamic Model |
CN108206024B (en) * | 2017-12-29 | 2021-06-25 | 河海大学常州校区 | Voice data processing method based on variational Gaussian regression process |
CN113421545B (en) * | 2021-06-30 | 2023-09-29 | 平安科技(深圳)有限公司 | Multi-mode voice synthesis method, device, equipment and storage medium |
CN117540173B (en) * | 2024-01-09 | 2024-04-19 | 长江水利委员会水文局 | Flood simulation uncertainty analysis method based on Bayesian joint probability model |
-
2006
- 2006-06-29 CN CNB2006100283311A patent/CN100498935C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1870136A (en) | 2006-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100498935C (en) | Variation Bayesian voice strengthening method based on voice generating model | |
CN107886967B (en) | A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network | |
CN112735456B (en) | Speech enhancement method based on DNN-CLSTM network | |
CN111261183B (en) | Method and device for denoising voice | |
CN104067340B (en) | For the method for voice strengthened in mixed signal | |
US20150032445A1 (en) | Noise estimation apparatus, noise estimation method, noise estimation program, and recording medium | |
CN111985523A (en) | Knowledge distillation training-based 2-exponential power deep neural network quantification method | |
CN101853661B (en) | Noise spectrum estimation and voice mobility detection method based on unsupervised learning | |
CN104900232A (en) | Isolation word identification method based on double-layer GMM structure and VTS feature compensation | |
CN101477801A (en) | Method for detecting and eliminating pulse noise in digital audio signal | |
CN104485103A (en) | Vector Taylor series-based multi-environment model isolated word identifying method | |
CN102945670A (en) | Multi-environment characteristic compensation method for voice recognition system | |
CN109192200A (en) | A kind of audio recognition method | |
CN108010536A (en) | Echo cancel method, device, system and storage medium | |
CN110998723B (en) | Signal processing device using neural network, signal processing method, and recording medium | |
CN105513614A (en) | Voice activation detection method based on noise power spectrum density Gamma distribution statistical model | |
CN110808057A (en) | Voice enhancement method for generating confrontation network based on constraint naive | |
CN1909064B (en) | Time-domain blind separating method for in-line natural voice convolution mixing signal | |
Cheng et al. | A new unscented particle filter | |
CN112086100A (en) | Quantization error entropy based urban noise identification method of multilayer random neural network | |
CN109102818B (en) | Denoising audio sampling algorithm based on signal frequency probability density function distribution | |
CN104035332A (en) | M-estimation impulsive noise active control method | |
CN104240717A (en) | Voice enhancement method based on combination of sparse code and ideal binary system mask | |
Rennie et al. | Dynamic noise adaptation | |
Richter et al. | Speech signal improvement using causal generative diffusion models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090610 Termination date: 20120629 |