CN101853661A - Noise spectrum estimation and voice mobility detection method based on unsupervised learning - Google Patents

Noise spectrum estimation and voice mobility detection method based on unsupervised learning Download PDF

Info

Publication number
CN101853661A
CN101853661A CN201010178166A CN201010178166A CN101853661A CN 101853661 A CN101853661 A CN 101853661A CN 201010178166 A CN201010178166 A CN 201010178166A CN 201010178166 A CN201010178166 A CN 201010178166A CN 101853661 A CN101853661 A CN 101853661A
Authority
CN
China
Prior art keywords
lambda
frame
voice
alpha
kappa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201010178166A
Other languages
Chinese (zh)
Other versions
CN101853661B (en
Inventor
应冬文
颜永红
付强
潘接林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
Original Assignee
Institute of Acoustics CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS filed Critical Institute of Acoustics CAS
Priority to CN2010101781664A priority Critical patent/CN101853661B/en
Publication of CN101853661A publication Critical patent/CN101853661A/en
Application granted granted Critical
Publication of CN101853661B publication Critical patent/CN101853661B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The noise power Power estimation and voice mobility detection method that the present invention relates to a kind of based on unsupervised learning,Include the following steps: the log-magnitude feature 1) for voice signal on each frequency point,Establish a GMM model; 2) for one section of voice data,M frame buffer is set,Preceding M frame input signal is stored in caching,The log-magnitude spectrum of M frame in caching is extracted,The GMM model for substituting into step 1) is initialized,The model λ 0 initialized,k; 3) in the model λ 0 initialized,After k,Since M+1 frame,Using the method for incremental learning,GMM model is updated frame by frame,Successively recursion obtains
Figure 201010178166.4_AB_0
, and obtain noise figure
Figure 201010178166.4_AB_1
With probability of occurrence of the voice signal on k-th of frequency point of the i-th frame. The present invention is the tight coupling solution of Power estimation and voice mobility detection, can enhance voice application system to the adaptability of noise circumstance; The present invention independent of " noise starting " it is assumed that also, the present invention description of the voice mobility on time-frequency two-dimensional space can also be provided.

Description

Noise spectrum based on unsupervised learning is estimated and the voice mobility detection method
Technical field
The present invention relates to the voice process technology field, specifically, the present invention relates to a kind of noise power spectrum and estimate and the voice mobility detection method based on unsupervised learning.Wherein, voice mobility detection is to judge the algorithm whether voice occur on time dimension, and it can answer existence with the form of "Yes" or "No", also can describe the existence of voice with the voice probability of occurrence.
Background technology
Most voice application system is had in the face of ambient noise interference.Forefathers have proposed a lot of methods and have removed the interference of noise to voice system, and nearly all method all depends on voice mobility detection and noise power spectrum is estimated.These two modules exist contact closely, and their accuracy directly influences the whole noiseproof feature of system.Traditional solution exists following several problem:
1. in general anti-noise algorithm, it is the loose coupling of a cascade that voice mobility detection and noise power spectrum are estimated, the mobility of first computing voice is come the estimating noise power spectrum according to mobility then.The voice mobility detection device directly influences the accuracy that noise power spectrum is estimated to the sensitivity of voice signal.The voice mobility detection device is too responsive, causes underestimating of noise power spectrum easily; Otherwise, too blunt, cause over-evaluating of noise power spectrum easily.Therefore, often need to regulate the sensitivity of speech detector in the traditional scheme, the adaptability of noise circumstance is brought influence to system according to noise circumstance.
2. traditional solution is based on the mode of semi-supervised learning.At initial period, general system need make the hypothesis of " noise is initial ", supposes that promptly always there is one section non-speech audio in the beginning of sentence.This section non-speech audio can be understood as the ground unrest sample of artificial mark, sets up the initialization model of noise from these mark samples, and this is a kind of supervised learning method.Its defective is: this hypothesis is difficult to be met in some applications, such as starting with voice signal when sentence, will cause the initialization failure of noise model so, and it is all inaccurate to make speech detection and noise power spectrum estimate then.Follow-up phase after setting up the initialization model of noise, traditional solution adopt detection and results estimated to come more new model mostly, and this learning method is towards decision-making, and it is a kind of study of non-supervision.This learning method towards decision-making, with the output result of estimation/detecting device, the back coupling feedback is used for more new model.But it feeds back to model with incorrect result easily, causes the precise decreasing of model, and model further causes the precise decreasing estimating/detect.Wrong like this along with the time is progressively accumulated, system performance also can be along with the time progressively descends.Supervised learning in initial period adds the unsupervised learning in the follow-up phase, has formed a semi-supervised learning process.Two problems in initial period and follow-up phase all are because the mode of this semi-supervised learning causes.
3. most of voice mobility detection devices in the past only provide the description of voice mobility on time dimension, lack the description of voice mobility on the frequency domain dimension, therefore can't carry out further process of refinement to noise.
Summary of the invention
The present invention is directed in the past the voice mobility detection device and the shortcoming of noise power spectrum estimator, a tightly coupled solution has been proposed, make voice mobility detection and noise power spectrum estimate under a unsupervised learning framework, to obtain unification, thereby strengthen the adaptability of voice application system noise circumstance.In addition, this invention does not rely on " noise is initial " and supposes that practicality is stronger than traditional method; Simultaneously, the present invention also provides the description of voice mobility on time frequency space, helps noise is carried out further process of refinement.
For achieving the above object, the invention provides a kind of noise power spectrum and estimate and the voice mobility detection method, as shown in Figure 2, comprise the following steps: based on unsupervised learning
1) for the logarithm amplitude characteristic of voice signal on each frequency, set up a GMM model, mathematic(al) representation is as follows:
p ( x i , k | λ i , k ) = w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ;
Wherein, gaussian component is expressed as:
p ( x i , k | h , λ i , k ) = 1 2 π κ i , k ( h ) exp { - 1 2 ( x i , k - μ i , k ( h ) ) } ;
Wherein, x I, kRepresent the logarithm amplitude spectrum on k the frequency of i frame, h represents gaussian component, h ∈ 0,1},
Figure GSA00000122082900023
The weight coefficient of expression GMM,
Figure GSA00000122082900024
With
Figure GSA00000122082900025
Represent average and variance respectively, wherein h=1 represents speech components, and h=0 represents noise component;
Figure GSA00000122082900026
The parameter set of expression gauss hybrid models;
2) for one section speech data, set the M frame buffer, preceding M frame input signal is deposited in the buffer memory, extract the logarithm amplitude spectrum of M frame in the buffer memory, the GMM model of substitution step 1) carries out initialization, obtains initialized model λ 0, kInitialization procedure adopts constraint EM algorithm;
3) obtaining initialized model λ 0, kAfterwards, since the M+1 frame, adopt the method for incremental learning, upgrade the GMM model frame by frame, recursion obtains successively
Figure GSA00000122082900031
And draw noise figure
Figure GSA00000122082900032
With the probability of occurrence of voice signal on k frequency of i frame:
p ( h = 1 | x i , k , λ i , k ) = w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ,
I=1 wherein, 2,3 ...
Wherein, the incremental learning method of described GMM comprises recursion weight coefficient, recursion average and recursion variance;
Recursion weight coefficient method is:
Figure GSA00000122082900034
The recursion Mean Method is:
Figure GSA00000122082900035
Perhaps
Figure GSA00000122082900036
Recursion variance method is:
Figure GSA00000122082900037
Perhaps
Figure GSA00000122082900038
Perhaps
Figure GSA00000122082900039
Wherein α is a smoothing factor.
Compared with prior art, the present invention has following technique effect:
The present invention is that a kind of voice mobility detection and noise power spectrum are estimated tightly coupled scheme, can strengthen the adaptability of voice application system to noise circumstance; In addition, the present invention does not rely on " noise is initial " and supposes to have stronger practicality; And the present invention can also provide the description of voice mobility on the time-frequency two-dimensional space, helps noise is carried out further process of refinement.
Description of drawings
Fig. 1 shows one section voice time-domain diagram and sound spectrograph of being subjected to noise;
Wherein (a) part is one section sound spectrograph that is destroyed by white noise, and signal to noise ratio (S/N ratio) is 0dB; (b) part is the probability graph of voice signal existence, and the gray scale among the figure represents that the probability of (promptly existing) appears in voice signal; From (a) and (b) the contrast of figure as can be seen, the probability that exists of this method output has been described the structure of sound spectrograph accurately.
Fig. 2 is of the present invention a kind of based on the noise power spectrum estimation of unsupervised learning and the process flow diagram of voice mobility detection method.
Embodiment
The present invention proposes a kind of noise power spectrum based on the unsupervised learning framework estimates and the voice mobility detection method.The maximum characteristics of unsupervised learning framework are that the model of noise and voice messaging is set up in a kind of mode of non-supervision, no matter in the initialization of model or in renewal process, all do not rely on the information of artificial mark.Particularly, it has following characteristics:
● at initial phase, do not rely on the initial hypothesis of noise, so the range of application that should invent is used more wide in range than general solution.
● in renewal process, do not need feedback information, therefore, the problem of error accumulation can be eased to a certain extent.
● providing the information of voice mobility and the information of noise power spectrum simultaneously, is tightly coupled relation between them, only need just can regulating system by a few parameters.And in loosely coupled system, voice mobility module and noise detection module exist adjusting parameter separately, and parameter is more, and system is difficult to regulate.
● voice mobility is the two-dimensional signal of " time---frequency ", and other voice mobility detection algorithm has only been described the existence of voice on time dimension.
In one embodiment, the carrier of unsupervised learning framework is the gauss hybrid models (GaussianMixture Model is abbreviated as GMM) of two components.The distribution of one of them representation in components speech energy, another component are the distributions of noise energy.The present invention becomes 8 subbands according to the Mel scale with band segmentation, extracts energy envelope on each subband, and sets up the GMM of a correspondence.At first adopt EM algorithm initialization GMM, adopt the mode of incremental learning progressively to upgrade GMM then.According to the GMM model, deduce out the mobility on this subband of voice and the power spectrum information of noise respectively.
The present invention adopts the GMM that has constraint condition that the spectrum-envelope of voice is carried out match.
In fit procedure, respectively average, the weight of GMM are closed variance etc. and retrain.No matter at the EM algorithm still in the incremental learning process, all requirements
Figure GSA00000122082900041
Figure GSA00000122082900042
Figure GSA00000122082900043
And
Figure GSA00000122082900044
Wherein, for the incremental learning method of GMM, specifically comprise the calculating of recursion weight coefficient, recursion average and recursion variance.
1) recursion weight coefficient:
Figure GSA00000122082900045
Wherein α be one less than 1 but approach 1 smoothing factor, α=0.99 for example.
2) recursion average.
Figure GSA00000122082900051
Perhaps
Figure GSA00000122082900052
α wherein μBe one less than 1 but approach 1 smoothing factor, for example α μ=0.99.
3) recursion variance.
Figure GSA00000122082900053
Perhaps
Figure GSA00000122082900054
Perhaps
Figure GSA00000122082900055
α wherein κBe one less than 1 but approach 1 smoothing factor, for example α κ=0.99.
Below in conjunction with a preferred embodiment the present invention is done description further.
Principle of the present invention is as follows:
For the logarithm amplitude characteristic of voice signal on each frequency, set up a gauss hybrid models GMM, this model changes along with the variation of time and input signal.The mathematic(al) representation of model is as follows:
p ( x i , k | λ i , k ) = w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ;
Wherein gaussian component is expressed as:
p ( x i , k | h , λ i , k ) = 1 2 π κ i , k ( h ) exp { - 1 2 ( x i , k - μ i , k ( h ) ) } ;
Here x I, kRepresent the logarithm amplitude spectrum on k the frequency of i frame, h represents gaussian component, h ∈ 0,1},
Figure GSA00000122082900058
The weight coefficient of expression GMM,
Figure GSA00000122082900059
With Represent average and variance respectively.Wherein h=1 represents speech components, and h=0 represents noise component.
Figure GSA000001220829000511
The parameter set of expression gauss hybrid models.
In this model
Figure GSA000001220829000512
Be exactly that we want the noise estimated.Simultaneously, we can derive the probability of occurrence of voice signal on k frequency of i frame:
p ( h = 1 | x i , k , λ i , k ) = w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k )
Based on above-mentioned principle, according to one embodiment of present invention, as shown in Figure 2, described noise power spectrum is estimated and the voice mobility detection method comprises the following steps:
Step 100: set the M frame buffer, preceding M frame input signal is deposited in the buffer memory, extract the amplitude spectrum of M frame in the buffer memory.The method of extracting frame amplitude spectrum is as follows:
At first the digitized sound signal of this frame is done pre-service (according to system's actual conditions, can comprise windowing, pre-emphasis etc.), establishing every frame length is the F point, and first zero padding is to N point (N 〉=F wherein, N=2 j, j is integer and j 〉=8), carry out leaf transformation in the N point discrete Fourier, obtain discrete spectrum
Figure GSA00000122082900061
Y wherein I, nN sampled point of i frame in the expression buffer memory, Y I, kK Fourier transform value of i frame in the expression buffer memory (k=0,1 ..., N-1).So, its range value may be calculated
The initialization of step 200:GMM.The gauss hybrid models λ of two components of initialization on each frequency k I, k, subscript i express time wherein, λ I=0, kRepresent initialized model.Initialization procedure adopts constraint EM algorithm, and on certain frequency k, concrete initialization step is as follows:
Step 201: the method by cluster (for example the non-supervision cluster of LBG, perhaps fuzzy clustering or the like) is divided into two classes with M+1 sample: With
Figure GSA00000122082900064
M wherein 0+ M 1-1=M, the class that average is bigger is represented with subscript (1), and is another kind of with subscript (0) expression.The average of two classes is
Figure GSA00000122082900065
The average of the class that energy is less is
Figure GSA00000122082900066
Wherein
Figure GSA00000122082900067
The variance of two classes is respectively:
Figure GSA00000122082900068
Figure GSA00000122082900069
The initializes weights coefficient of two classes: The likelihood score of novel model of calculating, In following iterative process, old model parameter set is expressed as λ ' 0, k, new model parameter is:
Figure GSA000001220829000612
Before the beginning iteration,
Figure GSA000001220829000613
L ' kBe set to very big number, for example a L ' k=-10000.Below begin interative computation.
Step 202: the probability that calculating noise and voice occur, p ( h | x i , k , λ 0 , k ′ ) = w 0 , k ( h ) p ( x i , k | h , λ 0 , k ′ ) ; Σ h w 0 , k ( h ) p ( x i , k | h , λ 0 , k ′ ) , h ∈ { 0,1 } ;
Step 203: calculate new weight coefficient:
Figure GSA00000122082900071
Step 204: if
Figure GSA00000122082900072
Then stop iteration, simultaneously λ 0, k=λ ' 0, kWherein υ is one and approaches 0 and greater than 0 number, for example υ=0.05.
Step 205: calculate new average:
Figure GSA00000122082900073
Step 206: new average is retrained:
Figure GSA00000122082900074
Wherein δ is a constant, and span is between 1 to 10.
Step 207: calculate new variance,
Step 208: new variance is retrained,
Figure GSA00000122082900076
Step 209: the likelihood score of novel model of calculating
Step 210: if satisfy condition
Figure GSA00000122082900078
Figure GSA00000122082900079
Termination of iterations, wherein ε is a very little numeral, for example ε=0.1.If
Figure GSA000001220829000710
Figure GSA000001220829000712
Iteration jumps to
" step 202 ".
The progressively renewal of step 300:GMM.Setting up initialized model λ 0, kAfterwards,, adopt the method for incremental learning, upgrade the GMM model frame by frame since the M+1 frame.Iterative process can be expressed as: on each frequency k, and known λ I, kWith current observed value x I+1, k, infer λ I+1, kCarry out Fourier transform for the i+1 frame, obtain Y I+1, k, 0≤k<N wherein.On each frequency k, calculate amplitude spectrum x I, k=20*log10|Y I, k|.For k frequency, concrete iterative step is as follows:
Step 301: the probability that calculating noise and voice occur,
Figure GSA000001220829000713
h∈{0,1}。
Step 302: calculate new weight coefficient:
Figure GSA000001220829000714
Wherein, α be one less than 1 but approach 1 smoothing factor, α=0.99 for example.
Step 303: new weight coefficient is retrained, And
Figure GSA000001220829000716
Step 304: calculate new average,
Figure GSA000001220829000717
Step 305: new average is retrained:
Step 306: calculate new variance,
Figure GSA00000122082900082
Step 307: new variance is retrained,
Figure GSA00000122082900083
From above substep, we have obtained λ I+1, kIn all parameters, thereby obtained corresponding voice probability of occurrence p (h|x I+1, k, λ I, k) and the power spectrum valuation of noise signal
Figure GSA00000122082900084
Algorithm based on the foregoing description, the noise power spectrum estimation performance is estimated, adopt each 8 sentence of men and women words person speech data in the TIMIT database and white Gaussian noise, F16 fight support storehouse noise and babble noise in the NOISEX92 noise data storehouse according to 0,5, signal to noise ratio (S/N ratio) such as 10dB mixes.Evaluation index is the line spectrum error, is defined as follows formula:
SegError = 1 M Σ l = 1 M { 10 log 10 Σ k = 0 N - 1 D 2 ( k , l ) / Σ k = 0 N - 1 [ D ( k , l ) - D ^ ( k , l ) ] 2 }
Wherein D (k, l) the actual noise amplitude spectrum of expression,
Figure GSA00000122082900086
The noise amplitude spectrum that expression is estimated notices that the SegErr value is more little, and the expression estimated value approaches actual value more, estimates approximately accurately.Algorithm compares respectively at three kinds of noise power spectrum algorithm for estimating of current main-stream, wherein MS represents the minimum statistics algorithm, MCRA represents the recurrence average algorithm of minimum control, and IMCRA represents that the minimum control that improves version returns average algorithm, and TV-GMM is an algorithm of the present invention.Table 1 has been expressed the result of line spectrum error SegError.
Table 1
Figure GSA00000122082900087
As can be seen from the above table, the algorithm of the present invention's proposition all has remarkable advantages for three kinds of algorithms of present main flow.

Claims (2)

1. the noise power spectrum based on unsupervised learning is estimated and the voice mobility detection method, comprises the following steps:
1) for the logarithm amplitude characteristic of voice signal on each frequency, set up a GMM model, mathematic(al) representation is as follows:
p ( x i , k | λ i , k ) = w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ;
Wherein, gaussian component is expressed as:
p ( x i , k | h , λ i , k ) = 1 2 π κ i , k ( h ) exp { - 1 2 ( x i , k - μ i , k ( h ) ) } ,
Wherein, x I, kRepresent the logarithm amplitude spectrum on k the frequency of i frame, h represents gaussian component, h ∈ 0,1},
Figure FSA00000122082800013
The weight coefficient of expression GMM,
Figure FSA00000122082800014
With
Figure FSA00000122082800015
Represent average and variance respectively, wherein h=1 represents speech components, and h=0 represents noise component;
Figure FSA00000122082800016
The parameter set of expression gauss hybrid models;
2) for one section speech data, set the M frame buffer, preceding M frame input signal is deposited in the buffer memory, extract the logarithm amplitude spectrum of M frame in the buffer memory, the GMM model of substitution step 1) carries out initialization, obtains initialized model λ 0, kInitialization procedure adopts constraint EM algorithm;
3) obtaining initialized model λ 0, kAfterwards, since the M+1 frame, adopt the method for incremental learning, upgrade the GMM model of each frequency band frame by frame, recursion obtains successively
Figure FSA00000122082800017
And draw noise figure With the probability of occurrence of voice signal on k frequency of i frame:
p ( h = 1 | x i , k , λ i , k ) = w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) w i , k ( 0 ) p ( x i , k | h = 0 , λ i , k ) + w i , k ( 1 ) p ( x i , k | h = 1 , λ i , k ) ,
I=1 wherein, 2,3 ...
2. noise power spectrum according to claim 1 is estimated and the voice mobility detection method, be it is characterized in that the incremental learning method of described GMM comprises: recursion weight coefficient, recursion average and recursion variance;
Recursion weight coefficient method is: w i + 1 , k ( h ) = α w i , k ( h ) + ( 1 - α ) p ( h | x k + 1 , λ i , k ) ;
The recursion Mean Method is: μ i + 1 , k ( h ) = α w i , k ( h ) μ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) x i + 1 , k w k + 1 , z ; Perhaps
μ i + 1 , k ( h ) = α μ μ i , k ( h ) + ( 1 - α μ ) p ( h | x i + 1 , k λ i , k ) x i + 1 , k ;
Recursion variance method is: κ i + 1 , k ( h ) = α w i , k ( h ) κ i , k ( h ) + ( 1 - α ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 w i + 1 , k ( h ) ; Perhaps
κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i + 1 , k ( h ) ) 2 ; Perhaps
κ i + 1 , k ( h ) = α κ κ i , k ( h ) + ( 1 - α κ ) p ( h | x i + 1 , k , λ i , k ) ( x i + 1 , k - μ i , k ( h ) ) 2 ;
Wherein, α is a smoothing factor.
CN2010101781664A 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning Expired - Fee Related CN101853661B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101781664A CN101853661B (en) 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101781664A CN101853661B (en) 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Publications (2)

Publication Number Publication Date
CN101853661A true CN101853661A (en) 2010-10-06
CN101853661B CN101853661B (en) 2012-05-30

Family

ID=42805116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101781664A Expired - Fee Related CN101853661B (en) 2010-05-14 2010-05-14 Noise spectrum estimation and voice mobility detection method based on unsupervised learning

Country Status (1)

Country Link
CN (1) CN101853661B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103839544A (en) * 2012-11-27 2014-06-04 展讯通信(上海)有限公司 Voice activity detection method and apparatus
CN104464728A (en) * 2014-11-26 2015-03-25 河海大学 Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN104575513A (en) * 2013-10-24 2015-04-29 展讯通信(上海)有限公司 Burst noise processing system and burst noise detection and suppression method and device
CN105989843A (en) * 2015-01-28 2016-10-05 中兴通讯股份有限公司 Method and device of realizing missing feature reconstruction
WO2017063516A1 (en) * 2015-10-13 2017-04-20 阿里巴巴集团控股有限公司 Method of determining noise signal, and method and device for audio noise removal
CN107731230A (en) * 2017-11-10 2018-02-23 北京联华博创科技有限公司 A kind of court's trial writing-record system and method
CN107818780A (en) * 2017-11-13 2018-03-20 河海大学 A kind of robust speech recognition methods based on nonlinear characteristic compensation
CN110675885A (en) * 2019-10-17 2020-01-10 浙江大华技术股份有限公司 Sound mixing method, device and storage medium
CN111739562A (en) * 2020-07-22 2020-10-02 上海大学 Voice activity detection method based on data selectivity and Gaussian mixture model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101226742B (en) * 2007-12-05 2011-01-26 浙江大学 Method for recognizing sound-groove based on affection compensation
CN101464950B (en) * 2009-01-16 2011-05-04 北京航空航天大学 Video human face identification and retrieval method based on on-line learning and Bayesian inference

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800322B (en) * 2011-05-27 2014-03-26 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN103839544B (en) * 2012-11-27 2016-09-07 展讯通信(上海)有限公司 Voice-activation detecting method and device
CN103839544A (en) * 2012-11-27 2014-06-04 展讯通信(上海)有限公司 Voice activity detection method and apparatus
CN104575513B (en) * 2013-10-24 2017-11-21 展讯通信(上海)有限公司 The processing system of burst noise, the detection of burst noise and suppressing method and device
CN104575513A (en) * 2013-10-24 2015-04-29 展讯通信(上海)有限公司 Burst noise processing system and burst noise detection and suppression method and device
CN104464728A (en) * 2014-11-26 2015-03-25 河海大学 Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN105989843A (en) * 2015-01-28 2016-10-05 中兴通讯股份有限公司 Method and device of realizing missing feature reconstruction
WO2017063516A1 (en) * 2015-10-13 2017-04-20 阿里巴巴集团控股有限公司 Method of determining noise signal, and method and device for audio noise removal
US10796713B2 (en) 2015-10-13 2020-10-06 Alibaba Group Holding Limited Identification of noise signal for voice denoising device
CN107731230A (en) * 2017-11-10 2018-02-23 北京联华博创科技有限公司 A kind of court's trial writing-record system and method
CN107818780A (en) * 2017-11-13 2018-03-20 河海大学 A kind of robust speech recognition methods based on nonlinear characteristic compensation
CN107818780B (en) * 2017-11-13 2020-09-18 河海大学 Robust speech recognition method based on nonlinear feature compensation
CN110675885A (en) * 2019-10-17 2020-01-10 浙江大华技术股份有限公司 Sound mixing method, device and storage medium
CN111739562A (en) * 2020-07-22 2020-10-02 上海大学 Voice activity detection method based on data selectivity and Gaussian mixture model
CN111739562B (en) * 2020-07-22 2022-12-23 上海大学 Voice activity detection method based on data selectivity and Gaussian mixture model

Also Published As

Publication number Publication date
CN101853661B (en) 2012-05-30

Similar Documents

Publication Publication Date Title
CN101853661A (en) Noise spectrum estimation and voice mobility detection method based on unsupervised learning
CN102800322B (en) Method for estimating noise power spectrum and voice activity
WO2018107810A1 (en) Voiceprint recognition method and apparatus, and electronic device and medium
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN103280220B (en) A kind of real-time recognition method for baby cry
CN101751921B (en) Real-time voice conversion method under conditions of minimal amount of training data
CN102968990B (en) Speaker identifying method and system
CN104485103B (en) A kind of multi-environment model isolated word recognition method based on vector Taylor series
CN105206270A (en) Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM)
KR100919223B1 (en) The method and apparatus for speech recognition using uncertainty information in noise environment
CN102789779A (en) Speech recognition system and recognition method thereof
CN104464728A (en) Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN111899757B (en) Single-channel voice separation method and system for target speaker extraction
Dubey et al. Non-intrusive speech quality assessment using several combinations of auditory features
CN104900232A (en) Isolation word identification method based on double-layer GMM structure and VTS feature compensation
CN104361894A (en) Output-based objective voice quality evaluation method
CN104732972A (en) HMM voiceprint recognition signing-in method and system based on grouping statistics
Karbasi et al. Twin-HMM-based non-intrusive speech intelligibility prediction
CN105355198A (en) Multiple self-adaption based model compensation type speech recognition method
CN112086100B (en) Quantization error entropy based urban noise identification method of multilayer random neural network
CN103345920B (en) Self-adaptation interpolation weighted spectrum model voice conversion and reconstructing method based on Mel-KSVD sparse representation
CN102930863B (en) Voice conversion and reconstruction method based on simplified self-adaptive interpolation weighting spectrum model
CN115758082A (en) Fault diagnosis method for rail transit transformer
Mi et al. A content-independent method for LFM signal source identification
Razani et al. A reduced complexity MFCC-based deep neural network approach for speech enhancement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120530

CF01 Termination of patent right due to non-payment of annual fee