CN101807397B - Voice detection method of noise robustness based on hidden semi-Markov model - Google Patents
Voice detection method of noise robustness based on hidden semi-Markov model Download PDFInfo
- Publication number
- CN101807397B CN101807397B CN2010101175378A CN201010117537A CN101807397B CN 101807397 B CN101807397 B CN 101807397B CN 2010101175378 A CN2010101175378 A CN 2010101175378A CN 201010117537 A CN201010117537 A CN 201010117537A CN 101807397 B CN101807397 B CN 101807397B
- Authority
- CN
- China
- Prior art keywords
- sigma
- noise
- markov model
- parameter
- likelihood ratio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a voice detection method of noise robustness based on a hidden semi-Markov model, which comprises the following steps: (1) building the hidden semi-Markov model lambda= (A, B, pi and tau); (2) initializing parameters of pi and tau in the hidden semi-Markov model lambda; (3) carrying out DCT transformation on non-empty input signals; (4) estimating the parameters of B and a likelihood ratio test threshold respectively by utilizing front multi-frame input signals and a likelihood ratio, carrying out likelihood ratio test and finishing the voice detection; and (5) regulating the parameters of B and the likelihood ratio test threshold dynamically. The method regulates the parameters and the test threshold of the model dynamically according to the time-delay feature of voice and noise and realizes the real-time voice detection of noise robustness by utilizing the likelihood ratio test to carry out the voice detection.
Description
Invention field
The present invention relates to a kind of under noise circumstance voice signal handle under the category, based on the speech detection method of the noise robustness of hidden semi-Markov model.
Background of invention
Speech detection is used for detection signal phonological component and noise section, is extensive use of in fields such as voice coding, transmission, voice enhancing and speech recognitions.Method based on statistical model has also obtained quite good detecting effectiveness at present, fluctuates bigger but these methods detect effect under different noise types, different signal to noise ratio (S/N ratio) environment.And in the application of reality, noise circumstance is various, inevitable, so noise robustness becomes the focus of present speech detection.Propose the speech detection algorithms of the robust of the different noise circumstances of adaptation, use all significant for voice coding, enhancing, identification etc.
Summary of the invention
The technical problem to be solved in the present invention: traditional voice detect to lack robustness under the noise circumstance, provides a kind of under different signal to noise ratio (S/N ratio)s, different noise circumstance, based on the speech detection method of the noise robustness of hidden semi-Markov model.
The technical solution used in the present invention: a kind of speech detection method of the noise robustness based on hidden semi-Markov model is characterized in that step is as follows:
(1) foundation comprises voice and two state Q={q of non-voice
0, q
1Hidden semi-Markov model λ=(A, B, π, τ), wherein:
q
0Be non-voice, q
1Be voice;
A={a
Ij, i, j=0,1 is state q
i, q
jTransition probability;
B={b
i(O
t), i=0,1; T>0 is input signal dct transform coefficient O
t={ o
1, o
2..., o
K, K>0 is at given state q
iFollowing condition distribution probability b
i(O
t)=P (O
t| q
i), o wherein
1, o
2..., o
KSeparate;
π={ π
i, i=0,1; π
i>0 is state q
iThe prior distribution probability;
τ={ P (d|q
i), i=0,1; D>0 is state q
iContinue the probability of d;
(2) according to the prior distribution probability π={ π of training dataset statistics initialization to state in the hidden semi-Markov model
i, the parameter (k of state duration distribution Weibull
i, ω
i), signal frame sequence number t=0;
(3) if input voice S signal is empty, finish; Otherwise, S is carried out dct transform
T=t+1;
(4) if t<P judges that current demand signal is noise VAD=0, change (3); If t=P estimates input signal dct transform coefficient O under the given state
tGauss parameter (the μ that distributes
i G, σ
i) and Laplace parameter (μ
i L, l
i), the likelihood ratio LRT of P frame before calculating
t, initialization likelihood ratio test threshold value η judges that current demand signal is noise VAD=0, changes (3); If t>P calculates likelihood ratio LRT
t, if LRT
t〉=η judges that then current demand signal is voice VAD=1, if LRT
t<η judges that then current demand signal is noise VAD=0, changes (5);
(5) adjust dct transform coefficient O under the given state
tGauss parameter (the μ that distributes
i G, σ
i) and Laplace parameter (μ
i L, l
i), upgrade likelihood ratio test threshold value η; Change (3).
According to a further aspect of the invention, wherein step (1) further comprises again:
According to the training dataset statistics, determine
(1)a
00=a
11=0,a
10=a
01=1;
(2) to q
0, b
0(o
i t) be that Gauss distributes
(3) to q
1, b
1(o
i t) for distributing
Wherein
(4) to q
0And q
1, P (d|q
i) be that Weibull distributes
According to a further aspect of the invention, wherein step (2) further comprises again:
(a) according to the noise duration frequency F according to statistics of reference numerals in the training set
0And voice duration frequency F
1
(b) by F
iApproximate W (d; k
i, ω
i) parameter (k
i, ω
i) maximal possibility estimation;
(c) the prior distribution probability of state in the hidden semi-Markov model
According to a further aspect of the invention, wherein step (4) further comprises again:
(a) calculate forward variable α
i t, i=0,1:
If t=1,
If t>1,
(b) calculate likelihood ratio
(c) during t=P, by the dct transform coefficient O of P frame before the input signal
t, wherein P>0,1≤t≤P estimates the parameter (μ that B distributes
i G, σ
i) and (μ
i L, l
i) be:
P wherein, R is a constant;
(d) during t=P, by the dct transform coefficient O of P frame before the input signal
t, wherein P>0,1≤t≤P estimates that the likelihood ratio test threshold value is
According to a further aspect of the invention, wherein step (5) further comprises again:
(a), adjust parameter (μ if present frame is judged to be noise
i G, σ
i) and threshold value η:
η=ρ
0η+(1-ρ
0)LRT
t
Otherwise adjust parameter (μ
i L, l
i) and threshold value η:
η=ρ
1η+(1-ρ
1)LRT
t
0<ρ wherein
0, ρ
1<1 for upgrading constant;
Description of drawings
Fig. 1 is the inventive method basic flow sheet.
Embodiment
Below with reference to accompanying drawing, embodiments of the invention are described in detail.
At first principle of the present invention is described.
Human acoustic mechanism is that vocal cords are subjected to certain external force generation vibrations, and forms through a series of sympathetic response organ coordination thereafter.Therefore whole voiced process can be thought a life cycle, is subjected to the constraint of human organ's self-characteristic, and the life cycle of sounding can be thought and has certain statistical law.And this statistical law noise robustness normally, promptly Ren Lei sounding can be thought and not be subjected to The noise in the environment, therefore this statistical law of accurate description will make that the speech activity modeling tallies with the actual situation more under the noise circumstance, improve the noise robustness of speech detection.The normal Birnbaum-Saunders of use distributes and Weibull distribution description life cycle on the engineering.
Particularly, method basic procedure proposed by the invention as shown in Figure 1.
The core concept that the present invention mainly comprises: input audio signal is set up hidden semi-Markov model; Relate to the type of distribution by the training dataset testing model, and utilize the parameter that relates in this data set and the preceding some frame estimation models of input audio signal; Carry out speech detection by likelihood ratio test; Dynamically update model parameter and likelihood ratio test threshold value thereafter.
Arthmetic statement of the present invention is as follows:
1. set up and comprise voice and two state Q={q of non-voice
0, q
1Hidden semi-Markov model λ=(A, B, π, τ), wherein: q
0Be non-voice, q
1Be voice;
A={a
Ij, i, j=0,1 is state q
i, q
jTransition probability;
B={b
i(O
t), i=0,1; T>0 is input signal dct transform coefficient O
t={ o
1, o
2..., o
K), at given state q
iFollowing condition distribution probability b
i(O
t)=P (O
t| q
i), o wherein
1, o
2..., o
KSeparate;
π={ π
i, i=0,1; π
i>0 is state q
iThe prior distribution probability;
τ={ P (d|q
i), i=0,1; D>0 is state q
iContinue the probability of d;
The distribution pattern that relates to according to TIMIT training dataset statistics discovery model is as follows:
(1)a
00=a
11=0,a
10=a
01=1;
(2) to q
0, b
0(o
i t) be that Gauss distributes
(3) to q
1, b
1(o
i t) for distributing
Wherein
(4) to q
0And q
1, P (d|q
i) be that Weibull distributes
According to the prior distribution probability π={ π of training dataset statistics initialization to state in the hidden semi-Markov model
i, the parameter (k that distributes of state duration
i, ω
i), signal frame sequence number t=0; Method is as follows:
(a) according to the noise duration frequency F according to statistics of reference numerals in the training set
0And voice duration frequency F
1
(b) by F
iApproximate W (d; k
i, ω
i) parameter (k
i, ω
i) maximal possibility estimation;
4. if t<P judges that current demand signal is noise VAD=0, change (3); If t=P estimates input signal dct transform coefficient O under the given state
tParameter (the μ that distributes
i G, σ
i) and (μ
i L, l
i), the likelihood ratio LRT of P frame before calculating
t, initialization likelihood ratio test threshold value η judges that current demand signal is noise VAD=0, changes (3); If t>P calculates likelihood ratio LRT
t, if LRT
t〉=η judges that then current demand signal is voice VAD=1, if LRT
t<η judges that then current demand signal is noise VAD=0, changes (5); Method is as follows:
(a) calculate forward variable α
i t, i=0,1:
If t=1,
If t>1,
(b) calculate likelihood ratio
(c) during t=P, by the dct transform coefficient O of P frame before the input signal
t, wherein P>0,1≤t≤P estimates the parameter (μ that B distributes
i G, σ
i) and (μ
i L, l
i) be:
P wherein, R is a constant;
(d) during t=P, by the dct transform coefficient O of P frame before the input signal
t, wherein P>0,1≤t≤P estimates that the likelihood ratio test threshold value is
5. dct transform coefficient O under the adjustment given state
tParameter (the μ that distributes
i G, σ
i) and (μ
i L, l
i), upgrade likelihood ratio test threshold value η; Change (3); Method is as follows:
(a), adjust parameter (μ if present frame is judged to be noise
i G, σ
i) and threshold value η:
η=ρ
0η+(1-ρ
0)LRT
t
Otherwise adjust parameter (μ
i L, l
i) and threshold value η:
η=ρ
1η+(1-ρ
1)LRT
t
ρ wherein
0, ρ
1Be constant;
In the speech detection experiment of NOIZEUS data set, constant P=15, R=20, ρ
0=0.99, ρ
1=0.79;
Experimental data is as shown in the table:
Can see that the present invention obtains effect under multiple noise circumstance almost consistent, and most applications be better than international standard G.729B reach AMR2.
In sum, be speech frame in the input signal and noise frame under the detection noise environment according to said method.
What may be obvious that for the person of ordinary skill of the art draws other advantages and modification.Therefore, the present invention with wider aspect is not limited to shown and described specifying and exemplary embodiment here.Therefore, under situation about not breaking away from, can make various modifications to it by the spirit and scope of claim and the defined general inventive concept of equivalents thereof subsequently.
Claims (5)
1. based on the speech detection method of the noise robustness of hidden semi-Markov model, it is characterized in that step is as follows:
(1) foundation comprises voice and two state Q={q of non-voice
0, q
1Hidden semi-Markov model λ=(A, B, π, τ), wherein:
q
0Be non-voice, q
1Be voice;
A={a
Ij, a
IjBe state q
i, q
jTransition probability; I=0,1; J=0,1;
B={b
i(0
t); I=0,1; T>0 is input signal dct transform coefficient O
t={ o
1, o
2..., o
K, K>0 is at given state q
iFollowing condition distribution probability b
i(O
t)=P (O
t| q
i), o wherein
1, o
2..., o
KSeparate;
π={ π
i, i=0,1; π
i>0 is state q
iThe prior distribution probability;
τ={ P (d|q
i), i=0,1; D>0 is state q
iContinue the probability of d;
(2) according to the prior distribution probability π={ π of training dataset statistics initialization to state in the hidden semi-Markov model
i, the parameter (k of state duration distribution Weibull
i, ω
i), signal frame sequence number t=0;
(4) if t<P judges that current demand signal is noise VAD=0, change (3); If t=P estimates input signal dct transform coefficient O under the given state
tThe Gauss parameter that distributes
With the Laplace parameter
The likelihood ratio LRT of P frame before calculating
t, initialization likelihood ratio test threshold value η judges that current demand signal is noise VAD=0, changes (3); If t>P calculates likelihood ratio LRT
t, if LRT
t〉=η judges that then current demand signal is voice VAD=1, if LRT
t<η judges that then current demand signal is noise VAD=0, changes (5);
2. according to the speech detection method based on the noise robustness of hidden semi-Markov model of claim 1, it is characterized in that: described step (1) further comprises:
According to the training dataset statistics, determine
(1.1)a
00=a
11=0,a
10=a
01=1;
(1.3) to q
1,
For distributing
Wherein
(1.4) to q
0And q
1, P (d|q
i) be that Weibull distributes
3. according to the speech detection method based on the noise robustness of hidden semi-Markov model of claim 1, it is characterized in that: described step (2) further comprises:
(2.1) according to the noise duration frequency F according to statistics of reference numerals in the training dataset
0And voice duration frequency F
1
(2.2) by F
iApproximate W (d; k
i, ω
i) parameter (k
i, ω
i) maximal possibility estimation;
4. according to the speech detection method based on the noise robustness of hidden semi-Markov model of claim 1, it is characterized in that: described step (4) further comprises:
If t=1,
If t>1,
(4.2) calculate likelihood ratio
(4.3) during t=P, by the dct transform coefficient O of P frame before the input signal
t, wherein P>0,1≤t≤P estimates the parameter that B distributes
And
For:
P wherein, R is a constant;
5. according to the speech detection method based on the noise robustness of hidden semi-Markov model of claim 1, it is characterized in that: described step (5) further comprises:
(5.1), adjust parameter if present frame is judged to be noise
And threshold value η:
η=ρ
0η+(1-ρ
0)LRT
t
η=ρ
1η+(1-ρ
1)LRT
t
0<ρ wherein
0, ρ
1<1 for upgrading constant.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101175378A CN101807397B (en) | 2010-03-03 | 2010-03-03 | Voice detection method of noise robustness based on hidden semi-Markov model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101175378A CN101807397B (en) | 2010-03-03 | 2010-03-03 | Voice detection method of noise robustness based on hidden semi-Markov model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101807397A CN101807397A (en) | 2010-08-18 |
CN101807397B true CN101807397B (en) | 2011-11-16 |
Family
ID=42609166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010101175378A Expired - Fee Related CN101807397B (en) | 2010-03-03 | 2010-03-03 | Voice detection method of noise robustness based on hidden semi-Markov model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101807397B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103021405A (en) * | 2012-12-05 | 2013-04-03 | 渤海大学 | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter |
CN103730124A (en) * | 2013-12-31 | 2014-04-16 | 上海交通大学无锡研究院 | Noise robustness endpoint detection method based on likelihood ratio test |
CN106599920A (en) * | 2016-12-14 | 2017-04-26 | 中国航空工业集团公司上海航空测控技术研究所 | Aircraft bearing fault diagnosis method based on coupled hidden semi-Markov model |
CN109856977A (en) * | 2019-03-13 | 2019-06-07 | 济南大学 | A kind of Design of Feedback Controller method of the Markov jump system with noise and time lag |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3888543B2 (en) * | 2000-07-13 | 2007-03-07 | 旭化成株式会社 | Speech recognition apparatus and speech recognition method |
CN1320372C (en) * | 2004-11-25 | 2007-06-06 | 上海交通大学 | Method for testing and identifying underwater sound noise based on small wave area |
JP4241771B2 (en) * | 2006-07-04 | 2009-03-18 | 株式会社東芝 | Speech recognition apparatus and method |
CN101030369B (en) * | 2007-03-30 | 2011-06-29 | 清华大学 | Built-in speech discriminating method based on sub-word hidden Markov model |
-
2010
- 2010-03-03 CN CN2010101175378A patent/CN101807397B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN101807397A (en) | 2010-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107564513B (en) | Voice recognition method and device | |
CN102238190B (en) | Identity authentication method and system | |
WO2020181824A1 (en) | Voiceprint recognition method, apparatus and device, and computer-readable storage medium | |
CN110706692B (en) | Training method and system of child voice recognition model | |
JP2020524308A (en) | Method, apparatus, computer device, program and storage medium for constructing voiceprint model | |
JP4765461B2 (en) | Noise suppression system, method and program | |
CN108538293B (en) | Voice awakening method and device and intelligent device | |
JP6464650B2 (en) | Audio processing apparatus, audio processing method, and program | |
CN106486131A (en) | A kind of method and device of speech de-noising | |
CN105161092A (en) | Speech recognition method and device | |
CN105304080A (en) | Speech synthesis device and speech synthesis method | |
JP5842056B2 (en) | Noise estimation device, noise estimation method, noise estimation program, and recording medium | |
CN101807397B (en) | Voice detection method of noise robustness based on hidden semi-Markov model | |
WO2018051945A1 (en) | Speech processing device, speech processing method, and recording medium | |
CN109616139A (en) | Pronunciation signal noise power spectral density estimation method and device | |
CN106611598A (en) | VAD dynamic parameter adjusting method and device | |
CN109616105A (en) | A kind of noisy speech recognition methods based on transfer learning | |
CN105023570A (en) | method and system of transforming speech | |
CN105023574A (en) | Method and system of enhancing TTS | |
Takamichi et al. | Sampling-based speech parameter generation using moment-matching networks | |
CN105895104B (en) | Speaker adaptation recognition methods and system | |
CN107274892A (en) | Method for distinguishing speek person and device | |
CN110268471A (en) | The method and apparatus of ASR with embedded noise reduction | |
CN103366737B (en) | The apparatus and method of tone feature are applied in automatic speech recognition | |
Sharma et al. | Automatic speech recognition systems: challenges and recent implementation trends |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20111116 Termination date: 20170303 |