CN100573663C - Mute detection method based on speech characteristic to jude - Google Patents
Mute detection method based on speech characteristic to jude Download PDFInfo
- Publication number
- CN100573663C CN100573663C CNB2006100396964A CN200610039696A CN100573663C CN 100573663 C CN100573663 C CN 100573663C CN B2006100396964 A CNB2006100396964 A CN B2006100396964A CN 200610039696 A CN200610039696 A CN 200610039696A CN 100573663 C CN100573663 C CN 100573663C
- Authority
- CN
- China
- Prior art keywords
- quiet
- crossing rate
- zero
- frame
- voice data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of mute detection method, at first extract the multi-threshold zero-crossing rate of a frame voice data based on phonetic feature identification; Carry out anticipation with weighting multi-threshold zero-crossing rate to quiet, determine significantly quiet; Extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum; With two category support vector machines the compound characteristics of audio frequency is differentiated, a class result is a normal voice, and another kind of is quiet.The present invention can improve the silence detection success ratio, and can be discerned some specific human voices.The present invention is widely used in particularly having vast market prospect in voice-enabled chat, video conference in the voice-over-net conversation.
Description
One, technical field
The present invention relates to audio-frequency processing method, specifically a kind of mute detection method that is used for the voice-over-net conversation based on speech characteristic to jude.
Two, background technology
In the people spoke process, its sound can be divided into quiet and speech two parts, and it is quiet that 60% time was on average arranged.And when many people talked, each had only a people to speak constantly basically, and other people then shows as quiet.Noise (comprising pneumatic noise) quiet and that introduced by voice capture device is the same with speech data all to be transmitted in network, causes the reduction of voice quality.Use quiet inhibition technology, can eliminate quiet part, can save the transmission bandwidth more than 50%, reduce network congestion.
Existing mute detection method comprises extraction audio signal characteristic value and relatively judges quiet with pre-set threshold value, the parameter that the tradition mute detection method uses comprise short-time zero-crossing rate, short-time energy, coefficient of autocorrelation but voice signal and some ambient noise signal have non-stationary, thereby system recognition rate weak effect; And, because threshold value is fixed, can not well adapt to different noises, so these detection system discriminations are not high.
In addition, along with popularizing of voice-over-net conversation, most application concentrates on the PC platform, for easy-to-use, speech side generally all can select to wear headset and exchange, this just causes microphone very near from people's nose, mouth, and the air-flow that people's general breathing produces can enter microphone and produce audio stream.Though this sound signal is more weak, it also is a kind of voice, and some mute detection methods (for example G.729B, G.723.1A etc.) commonly used at present can be identified as normal voice with the part pneumatic noise, have further reduced the detection system discrimination.
Three, summary of the invention
The purpose of this invention is to provide a kind of mute detection method based on phonetic feature identification, this mute detection method can improve the silence detection success ratio, and can be discerned some specific human voices.
The objective of the invention is to be achieved through the following technical solutions:
A kind of mute detection method based on speech characteristic to jude is characterized in that it comprises following steps:
(1) extracts the multi-threshold zero-crossing rate of a frame voice data, and it is sued for peace with preferred weighted value.Multi-threshold zero-crossing rate detection method is established 3 thresholding T that height is different
1, T
2, T
3,, and T
1<T
2<T
3, each frame is asked corresponding to T respectively with formula (1)
1, T
2, T
3Three kinds of thresholding zero-crossing rate Z
1, Z
2And Z
3
Z
n=∑{|sgn[x(n)-T
n]-sgn[x(n-1)-T
n]+|sgn[x(n)+T
n]-sgn[x(n-1)+T
n]} (1)
Total zero-crossing rate Z is expressed from the next:
*W (n-w)
Z=W
1Z
1+W
2Z
2+W
3Z
3
Wherein: W
1, W
2, W
3Be the zero-crossing rate weights; Z
0Be defined as total zero-crossing rate cut off value.
(2) carry out anticipation with multi-threshold zero-crossing rate weighted sum to quiet, if total zero-crossing rate Z of a frame voice data is less than setting threshold Z
0, just judge that it is quiet, otherwise being transferred to step (3), handles this frame.
(3) extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum; Calculating based on the Mel yardstick cepstrum coefficient of multiresolution spectrum comprises: wavelet decomposition and reconstruct, Fourier transform, Mel yardstick cepstrum extraction module.Mel yardstick cepstrum coefficient (C
MFCC) computing formula is as follows:
Wherein:
In the formula, o (l), c (l) and h (l) are respectively lower limit, center and the upper limiting frequency of l triangle filter
(4) with two category support vector machines the compound characteristics of audio frequency is differentiated, obtain normal voice and quiet two class results,, be sent to the take over party after the compression for normal voice, for quiet, only in partial frame, add behind the adaptive noise compression and be sent to the take over party.
The present invention detects voice stage by stage by extracting multiple speech parameter, and effectively anticipation is quiet.Detect by subsequent step for the voice data of in step (2), failing to discern, in the step (3) for the overall spectrum feature of picked up signal, earlier this frame voice data is carried out wavelet decomposition, reconstruct and Fourier transform and form multiresolution spectrum, and extract the final audio frequency characteristics of Mel yardstick cepstrum conduct of this frequency spectrum.Step is differentiated the compound characteristics of voice data with support vector machine in (4), obtains the final decision result.Compared with prior art, the present invention uses support vector machine audio frequency characteristics sorting technique, with respect to traditional sorting technique, have more strict theoretical foundation, this method is applied in fields such as text classification, image recognitions, obtained than the better classifying quality of traditional machine learning method, the accuracy height of classification, and also this method has robustness preferably.
Four, description of drawings
Fig. 1 is the schematic flow sheet of the inventive method;
Fig. 2 is the schematic diagram that sound intermediate frequency compound characteristics of the present invention extracts;
Fig. 3 is a wavelet decomposition tree structure diagram among the present invention;
Five, embodiment
Below in conjunction with accompanying drawing the present invention is elaborated.
A kind of mute detection method based on speech characteristic to jude of the present invention was seen Fig. 1, adopts the sample frequency of 8kHz in the concrete testing process, detected 10 milliseconds of each frames as a frame with 80 o'clock.It comprises following steps:
(1) extracts the multi-threshold zero-crossing rate of a frame voice data, and it is sued for peace with preferred weighted value.In step (1), use total zero-crossing rate cut off value Z
0With optimal weight vector (W
1, W
2, W
3), their value must just set before silence detection.In order to determine their value, collect at least 2000 frame varying environment subaudio frequency data, wherein half is quiet, half is the speech voice.The quiet False Rate that detect to produce with the multi-threshold zero-crossing rate is an objective function, travels through each weight vectors and threshold value span, finds out to produce minimum weight vectors and the threshold value of False Rate, Here it is optimal weight vector sum threshold value Z
0
(2) carry out anticipation with multi-threshold zero-crossing rate weighted sum to quiet, if total zero-crossing rate Z of a frame voice data is less than setting threshold Z
0, just judge that it is quiet, otherwise being transferred to step (3), handles this frame.
(3) extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum; Based on the extraction of Mel yardstick cepstrum (MFCC) coefficient of multiresolution spectrum as shown in Figure 2.Adopt the Daubechies4 wavelet package transforms that the windowing signal decomposition is become the coefficient of 6 subbands to the time domain voice signal, be reconstructed coefficient size to the first time wavelet decomposition at each subband, as shown in Figure 3.And each sub-band coefficients carried out normalized, and subsequently coefficient is done the FFT conversion, multiresolution spectrum is formed in each sub-band coefficients summation, at last multiresolution spectrum is delivered the MFCC extraction module.MFCC is characterized as L=12, and the inner product function of support vector machine is selected radial basis function (σ for use
2=0.3), the training method of support vector machine can adopt the SMO method, and the present invention is also unrestricted to this.
(4) with two category support vector machines the compound characteristics of audio frequency is differentiated, obtained two class results, a class result is a normal voice, and another kind of is quiet (comprising pneumatic noise).For normal voice, system can be with g.729, g.723 waiting voice compressing method to compress and sending to the network take over party.
Among the present invention, be quiet frame for differentiating in step (2), the step (4), in actual use, do not transmit sound if make fully during quiet, can make the hearer feel uncomfortable, therefore need add some noises artificially and make hearer's communication of feeling not interrupt that the noise of adding need guarantee to make that transmit leg is consistent with the noise power of reciever, but be not each frame transmitted noise all when quiet, just transmission continuously the first quiet frame get final product.As for how transmitted noise the present invention is also unrestricted to this.
Claims (3)
1, a kind of mute detection method based on speech characteristic to jude is characterized in that it comprises following steps:
(1) extracts the multi-threshold zero-crossing rate of a frame voice data, and, obtain total zero-crossing rate Z its weighted value summation;
(2) carry out anticipation with multi-threshold zero-crossing rate weighted sum to quiet, total zero-crossing rate Z of a frame voice data is less than setting threshold Z
0, judge that it is quiet, otherwise being transferred to step (3), handles this frame;
(3) extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum;
(4) with two category support vector machines the compound characteristics of audio frequency is differentiated, obtain normal voice and quiet two class results,, be sent to the take over party after the compression for normal voice, for quiet, only in partial frame, add behind the adaptive noise compression and be sent to the take over party.
2, according to right 1 described mute detection method, it is characterized in that based on speech characteristic to jude: in the step (1), extract 3 multi-threshold zero-crossing rates of voice data, and to its weighted value summation.
3, according to right 1 described mute detection method, it is characterized in that based on speech characteristic to jude: in the step (4), the described quiet pneumatic noise that comprises.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100396964A CN100573663C (en) | 2006-04-20 | 2006-04-20 | Mute detection method based on speech characteristic to jude |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2006100396964A CN100573663C (en) | 2006-04-20 | 2006-04-20 | Mute detection method based on speech characteristic to jude |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1835073A CN1835073A (en) | 2006-09-20 |
CN100573663C true CN100573663C (en) | 2009-12-23 |
Family
ID=37002788
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2006100396964A Expired - Fee Related CN100573663C (en) | 2006-04-20 | 2006-04-20 | Mute detection method based on speech characteristic to jude |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100573663C (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10311874B2 (en) | 2017-09-01 | 2019-06-04 | 4Q Catalyst, LLC | Methods and systems for voice-based programming of a voice-controlled device |
Families Citing this family (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE602007005833D1 (en) * | 2006-11-16 | 2010-05-20 | Ibm | LANGUAGE ACTIVITY DETECTION SYSTEM AND METHOD |
CN101393744B (en) * | 2007-09-19 | 2011-09-14 | 华为技术有限公司 | Method for regulating threshold of sound activation and device |
CN101764882A (en) * | 2009-12-31 | 2010-06-30 | 深圳市戴文科技有限公司 | PTT conversation device and method for realizing PTT conversation |
CN101895870A (en) * | 2010-03-23 | 2010-11-24 | 中兴通讯股份有限公司 | Silence recognition device for mobile phone, mobile phone de-noising method and system thereof in silence mode |
CN101895642B (en) * | 2010-06-30 | 2015-07-22 | 中兴通讯股份有限公司 | Method and device for detecting telephone channel faults |
EP2405634B1 (en) * | 2010-07-09 | 2014-09-03 | Google, Inc. | Method of indicating presence of transient noise in a call and apparatus thereof |
CN102332269A (en) * | 2011-06-03 | 2012-01-25 | 陈威 | Method for reducing breathing noises in breathing mask |
CN103456301B (en) * | 2012-05-28 | 2019-02-12 | 中兴通讯股份有限公司 | A kind of scene recognition method and device and mobile terminal based on ambient sound |
CN104112446B (en) * | 2013-04-19 | 2018-03-09 | 华为技术有限公司 | Breathing detection method and device |
CN103325388B (en) * | 2013-05-24 | 2016-05-25 | 广州海格通信集团股份有限公司 | Based on the mute detection method of least energy wavelet frame |
US9653094B2 (en) | 2015-04-24 | 2017-05-16 | Cyber Resonance Corporation | Methods and systems for performing signal analysis to identify content types |
CN105976831A (en) * | 2016-05-13 | 2016-09-28 | 中国人民解放军国防科学技术大学 | Lost child detection method based on cry recognition |
CN108242241B (en) * | 2016-12-23 | 2021-10-26 | 中国农业大学 | Pure voice rapid screening method and device thereof |
CN109859744B (en) * | 2017-11-29 | 2021-01-19 | 宁波方太厨具有限公司 | Voice endpoint detection method applied to range hood |
CN108447505B (en) * | 2018-05-25 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | Audio signal zero-crossing rate processing method, device and speech recognition apparatus |
CN110910905B (en) * | 2018-09-18 | 2023-05-02 | 京东科技控股股份有限公司 | Mute point detection method and device, storage medium and electronic equipment |
CN110310668A (en) * | 2019-05-21 | 2019-10-08 | 深圳壹账通智能科技有限公司 | Mute detection method, system, equipment and computer readable storage medium |
CN113225592B (en) * | 2020-01-21 | 2022-08-09 | 华为技术有限公司 | Screen projection method and device based on Wi-Fi P2P |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1204766A (en) * | 1997-03-25 | 1999-01-13 | 皇家菲利浦电子有限公司 | Method and device for detecting voice activity |
CN1266312A (en) * | 1998-07-31 | 2000-09-13 | 摩托罗拉公司 | Method and apparatus for provding speaking telephone operation in portable communication equipment |
CN1290094A (en) * | 2000-11-03 | 2001-04-04 | 国家数字交换系统工程技术研究中心 | Multi-channel 64Kbps squelch compression method for packet switch network |
CN1398126A (en) * | 2001-07-18 | 2003-02-19 | 华为技术有限公司 | Method for implementing multi-language coding-decoding in universal mobile communication system |
CN1622193A (en) * | 2004-12-24 | 2005-06-01 | 北京中星微电子有限公司 | Voice signal detection method |
-
2006
- 2006-04-20 CN CNB2006100396964A patent/CN100573663C/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1204766A (en) * | 1997-03-25 | 1999-01-13 | 皇家菲利浦电子有限公司 | Method and device for detecting voice activity |
CN1266312A (en) * | 1998-07-31 | 2000-09-13 | 摩托罗拉公司 | Method and apparatus for provding speaking telephone operation in portable communication equipment |
CN1290094A (en) * | 2000-11-03 | 2001-04-04 | 国家数字交换系统工程技术研究中心 | Multi-channel 64Kbps squelch compression method for packet switch network |
CN1398126A (en) * | 2001-07-18 | 2003-02-19 | 华为技术有限公司 | Method for implementing multi-language coding-decoding in universal mobile communication system |
CN1622193A (en) * | 2004-12-24 | 2005-06-01 | 北京中星微电子有限公司 | Voice signal detection method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10311874B2 (en) | 2017-09-01 | 2019-06-04 | 4Q Catalyst, LLC | Methods and systems for voice-based programming of a voice-controlled device |
Also Published As
Publication number | Publication date |
---|---|
CN1835073A (en) | 2006-09-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100573663C (en) | Mute detection method based on speech characteristic to jude | |
Hermansky et al. | TRAPS-classifiers of temporal patterns. | |
US7684982B2 (en) | Noise reduction and audio-visual speech activity detection | |
KR100636317B1 (en) | Distributed Speech Recognition System and method | |
Kingsbury et al. | Recognizing reverberant speech with RASTA-PLP | |
US6804643B1 (en) | Speech recognition | |
CN111508498B (en) | Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium | |
CN109788400A (en) | A kind of neural network chauvent's criterion method, system and storage medium for digital deaf-aid | |
CN110120227A (en) | A kind of depth stacks the speech separating method of residual error network | |
KR20080064557A (en) | Apparatus and method for improving speech intelligibility | |
EP1250699A2 (en) | Speech recognition | |
Sharma et al. | Study of robust feature extraction techniques for speech recognition system | |
Khan et al. | Speaker separation using visually-derived binary masks | |
Couvreur et al. | Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models | |
CN110197657B (en) | Dynamic sound feature extraction method based on cosine similarity | |
Singh et al. | Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition | |
Paliwal | On the use of filter-bank energies as features for robust speech recognition | |
CN111341351A (en) | Voice activity detection method and device based on self-attention mechanism and storage medium | |
CN114023352B (en) | Voice enhancement method and device based on energy spectrum depth modulation | |
Li et al. | An auditory system-based feature for robust speech recognition | |
CN112992131A (en) | Method for extracting ping-pong command of target voice in complex scene | |
CN114189781A (en) | Noise reduction method and system for double-microphone neural network noise reduction earphone | |
Malewadi et al. | Development of Speech recognition technique for Marathi numerals using MFCC & LFZI algorithm | |
Malik et al. | Wavelet transform based automatic speaker recognition | |
Pasad et al. | Voice activity detection for children's read speech recognition in noisy conditions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20091223 Termination date: 20160420 |
|
CF01 | Termination of patent right due to non-payment of annual fee |