CN100573663C - Mute detection method based on speech characteristic to jude - Google Patents

Mute detection method based on speech characteristic to jude Download PDF

Info

Publication number
CN100573663C
CN100573663C CNB2006100396964A CN200610039696A CN100573663C CN 100573663 C CN100573663 C CN 100573663C CN B2006100396964 A CNB2006100396964 A CN B2006100396964A CN 200610039696 A CN200610039696 A CN 200610039696A CN 100573663 C CN100573663 C CN 100573663C
Authority
CN
China
Prior art keywords
quiet
crossing rate
zero
frame
voice data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2006100396964A
Other languages
Chinese (zh)
Other versions
CN1835073A (en
Inventor
都思丹
薛卫
周余
孔令红
叶迎宪
赵康涟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CNB2006100396964A priority Critical patent/CN100573663C/en
Publication of CN1835073A publication Critical patent/CN1835073A/en
Application granted granted Critical
Publication of CN100573663C publication Critical patent/CN100573663C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a kind of mute detection method, at first extract the multi-threshold zero-crossing rate of a frame voice data based on phonetic feature identification; Carry out anticipation with weighting multi-threshold zero-crossing rate to quiet, determine significantly quiet; Extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum; With two category support vector machines the compound characteristics of audio frequency is differentiated, a class result is a normal voice, and another kind of is quiet.The present invention can improve the silence detection success ratio, and can be discerned some specific human voices.The present invention is widely used in particularly having vast market prospect in voice-enabled chat, video conference in the voice-over-net conversation.

Description

Mute detection method based on speech characteristic to jude
One, technical field
The present invention relates to audio-frequency processing method, specifically a kind of mute detection method that is used for the voice-over-net conversation based on speech characteristic to jude.
Two, background technology
In the people spoke process, its sound can be divided into quiet and speech two parts, and it is quiet that 60% time was on average arranged.And when many people talked, each had only a people to speak constantly basically, and other people then shows as quiet.Noise (comprising pneumatic noise) quiet and that introduced by voice capture device is the same with speech data all to be transmitted in network, causes the reduction of voice quality.Use quiet inhibition technology, can eliminate quiet part, can save the transmission bandwidth more than 50%, reduce network congestion.
Existing mute detection method comprises extraction audio signal characteristic value and relatively judges quiet with pre-set threshold value, the parameter that the tradition mute detection method uses comprise short-time zero-crossing rate, short-time energy, coefficient of autocorrelation but voice signal and some ambient noise signal have non-stationary, thereby system recognition rate weak effect; And, because threshold value is fixed, can not well adapt to different noises, so these detection system discriminations are not high.
In addition, along with popularizing of voice-over-net conversation, most application concentrates on the PC platform, for easy-to-use, speech side generally all can select to wear headset and exchange, this just causes microphone very near from people's nose, mouth, and the air-flow that people's general breathing produces can enter microphone and produce audio stream.Though this sound signal is more weak, it also is a kind of voice, and some mute detection methods (for example G.729B, G.723.1A etc.) commonly used at present can be identified as normal voice with the part pneumatic noise, have further reduced the detection system discrimination.
Three, summary of the invention
The purpose of this invention is to provide a kind of mute detection method based on phonetic feature identification, this mute detection method can improve the silence detection success ratio, and can be discerned some specific human voices.
The objective of the invention is to be achieved through the following technical solutions:
A kind of mute detection method based on speech characteristic to jude is characterized in that it comprises following steps:
(1) extracts the multi-threshold zero-crossing rate of a frame voice data, and it is sued for peace with preferred weighted value.Multi-threshold zero-crossing rate detection method is established 3 thresholding T that height is different 1, T 2, T 3,, and T 1<T 2<T 3, each frame is asked corresponding to T respectively with formula (1) 1, T 2, T 3Three kinds of thresholding zero-crossing rate Z 1, Z 2And Z 3
Z n=∑{|sgn[x(n)-T n]-sgn[x(n-1)-T n]+|sgn[x(n)+T n]-sgn[x(n-1)+T n]} (1)
Total zero-crossing rate Z is expressed from the next: *W (n-w)
Z=W 1Z 1+W 2Z 2+W 3Z 3
Wherein: W 1, W 2, W 3Be the zero-crossing rate weights; Z 0Be defined as total zero-crossing rate cut off value.
(2) carry out anticipation with multi-threshold zero-crossing rate weighted sum to quiet, if total zero-crossing rate Z of a frame voice data is less than setting threshold Z 0, just judge that it is quiet, otherwise being transferred to step (3), handles this frame.
(3) extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum; Calculating based on the Mel yardstick cepstrum coefficient of multiresolution spectrum comprises: wavelet decomposition and reconstruct, Fourier transform, Mel yardstick cepstrum extraction module.Mel yardstick cepstrum coefficient (C MFCC) computing formula is as follows:
c MFCC ( i ) = 2 L Σ l = 1 L log m ( l ) cos { ( l - 1 2 ) iπ L } - - - ( 2 )
Wherein:
m ( l ) = Σ k = o ( l ) h ( l ) W l ( k ) | X n ( k ) | , l = 1,2 , . . . , L - - - ( 3 )
W l ( k ) = k - o ( l ) c ( l ) - o ( l ) o ( l ) ≤ k ≤ c ( l ) h ( l ) - k h ( l ) - c ( l ) c ( l ) ≤ k≤ h ( l ) - - - ( 4 )
In the formula, o (l), c (l) and h (l) are respectively lower limit, center and the upper limiting frequency of l triangle filter
(4) with two category support vector machines the compound characteristics of audio frequency is differentiated, obtain normal voice and quiet two class results,, be sent to the take over party after the compression for normal voice, for quiet, only in partial frame, add behind the adaptive noise compression and be sent to the take over party.
The present invention detects voice stage by stage by extracting multiple speech parameter, and effectively anticipation is quiet.Detect by subsequent step for the voice data of in step (2), failing to discern, in the step (3) for the overall spectrum feature of picked up signal, earlier this frame voice data is carried out wavelet decomposition, reconstruct and Fourier transform and form multiresolution spectrum, and extract the final audio frequency characteristics of Mel yardstick cepstrum conduct of this frequency spectrum.Step is differentiated the compound characteristics of voice data with support vector machine in (4), obtains the final decision result.Compared with prior art, the present invention uses support vector machine audio frequency characteristics sorting technique, with respect to traditional sorting technique, have more strict theoretical foundation, this method is applied in fields such as text classification, image recognitions, obtained than the better classifying quality of traditional machine learning method, the accuracy height of classification, and also this method has robustness preferably.
Four, description of drawings
Fig. 1 is the schematic flow sheet of the inventive method;
Fig. 2 is the schematic diagram that sound intermediate frequency compound characteristics of the present invention extracts;
Fig. 3 is a wavelet decomposition tree structure diagram among the present invention;
Five, embodiment
Below in conjunction with accompanying drawing the present invention is elaborated.
A kind of mute detection method based on speech characteristic to jude of the present invention was seen Fig. 1, adopts the sample frequency of 8kHz in the concrete testing process, detected 10 milliseconds of each frames as a frame with 80 o'clock.It comprises following steps:
(1) extracts the multi-threshold zero-crossing rate of a frame voice data, and it is sued for peace with preferred weighted value.In step (1), use total zero-crossing rate cut off value Z 0With optimal weight vector (W 1, W 2, W 3), their value must just set before silence detection.In order to determine their value, collect at least 2000 frame varying environment subaudio frequency data, wherein half is quiet, half is the speech voice.The quiet False Rate that detect to produce with the multi-threshold zero-crossing rate is an objective function, travels through each weight vectors and threshold value span, finds out to produce minimum weight vectors and the threshold value of False Rate, Here it is optimal weight vector sum threshold value Z 0
(2) carry out anticipation with multi-threshold zero-crossing rate weighted sum to quiet, if total zero-crossing rate Z of a frame voice data is less than setting threshold Z 0, just judge that it is quiet, otherwise being transferred to step (3), handles this frame.
(3) extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum; Based on the extraction of Mel yardstick cepstrum (MFCC) coefficient of multiresolution spectrum as shown in Figure 2.Adopt the Daubechies4 wavelet package transforms that the windowing signal decomposition is become the coefficient of 6 subbands to the time domain voice signal, be reconstructed coefficient size to the first time wavelet decomposition at each subband, as shown in Figure 3.And each sub-band coefficients carried out normalized, and subsequently coefficient is done the FFT conversion, multiresolution spectrum is formed in each sub-band coefficients summation, at last multiresolution spectrum is delivered the MFCC extraction module.MFCC is characterized as L=12, and the inner product function of support vector machine is selected radial basis function (σ for use 2=0.3), the training method of support vector machine can adopt the SMO method, and the present invention is also unrestricted to this.
(4) with two category support vector machines the compound characteristics of audio frequency is differentiated, obtained two class results, a class result is a normal voice, and another kind of is quiet (comprising pneumatic noise).For normal voice, system can be with g.729, g.723 waiting voice compressing method to compress and sending to the network take over party.
Among the present invention, be quiet frame for differentiating in step (2), the step (4), in actual use, do not transmit sound if make fully during quiet, can make the hearer feel uncomfortable, therefore need add some noises artificially and make hearer's communication of feeling not interrupt that the noise of adding need guarantee to make that transmit leg is consistent with the noise power of reciever, but be not each frame transmitted noise all when quiet, just transmission continuously the first quiet frame get final product.As for how transmitted noise the present invention is also unrestricted to this.

Claims (3)

1, a kind of mute detection method based on speech characteristic to jude is characterized in that it comprises following steps:
(1) extracts the multi-threshold zero-crossing rate of a frame voice data, and, obtain total zero-crossing rate Z its weighted value summation;
(2) carry out anticipation with multi-threshold zero-crossing rate weighted sum to quiet, total zero-crossing rate Z of a frame voice data is less than setting threshold Z 0, judge that it is quiet, otherwise being transferred to step (3), handles this frame;
(3) extract the compound characteristics of a frame voice data, compound characteristics comprises zero-crossing rate, short-time energy value, based on the Mel yardstick cepstrum coefficient of multiresolution spectrum;
(4) with two category support vector machines the compound characteristics of audio frequency is differentiated, obtain normal voice and quiet two class results,, be sent to the take over party after the compression for normal voice, for quiet, only in partial frame, add behind the adaptive noise compression and be sent to the take over party.
2, according to right 1 described mute detection method, it is characterized in that based on speech characteristic to jude: in the step (1), extract 3 multi-threshold zero-crossing rates of voice data, and to its weighted value summation.
3, according to right 1 described mute detection method, it is characterized in that based on speech characteristic to jude: in the step (4), the described quiet pneumatic noise that comprises.
CNB2006100396964A 2006-04-20 2006-04-20 Mute detection method based on speech characteristic to jude Expired - Fee Related CN100573663C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100396964A CN100573663C (en) 2006-04-20 2006-04-20 Mute detection method based on speech characteristic to jude

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100396964A CN100573663C (en) 2006-04-20 2006-04-20 Mute detection method based on speech characteristic to jude

Publications (2)

Publication Number Publication Date
CN1835073A CN1835073A (en) 2006-09-20
CN100573663C true CN100573663C (en) 2009-12-23

Family

ID=37002788

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100396964A Expired - Fee Related CN100573663C (en) 2006-04-20 2006-04-20 Mute detection method based on speech characteristic to jude

Country Status (1)

Country Link
CN (1) CN100573663C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602007005833D1 (en) * 2006-11-16 2010-05-20 Ibm LANGUAGE ACTIVITY DETECTION SYSTEM AND METHOD
CN101393744B (en) * 2007-09-19 2011-09-14 华为技术有限公司 Method for regulating threshold of sound activation and device
CN101764882A (en) * 2009-12-31 2010-06-30 深圳市戴文科技有限公司 PTT conversation device and method for realizing PTT conversation
CN101895870A (en) * 2010-03-23 2010-11-24 中兴通讯股份有限公司 Silence recognition device for mobile phone, mobile phone de-noising method and system thereof in silence mode
CN101895642B (en) * 2010-06-30 2015-07-22 中兴通讯股份有限公司 Method and device for detecting telephone channel faults
EP2405634B1 (en) * 2010-07-09 2014-09-03 Google, Inc. Method of indicating presence of transient noise in a call and apparatus thereof
CN102332269A (en) * 2011-06-03 2012-01-25 陈威 Method for reducing breathing noises in breathing mask
CN103456301B (en) * 2012-05-28 2019-02-12 中兴通讯股份有限公司 A kind of scene recognition method and device and mobile terminal based on ambient sound
CN104112446B (en) * 2013-04-19 2018-03-09 华为技术有限公司 Breathing detection method and device
CN103325388B (en) * 2013-05-24 2016-05-25 广州海格通信集团股份有限公司 Based on the mute detection method of least energy wavelet frame
US9653094B2 (en) 2015-04-24 2017-05-16 Cyber Resonance Corporation Methods and systems for performing signal analysis to identify content types
CN105976831A (en) * 2016-05-13 2016-09-28 中国人民解放军国防科学技术大学 Lost child detection method based on cry recognition
CN108242241B (en) * 2016-12-23 2021-10-26 中国农业大学 Pure voice rapid screening method and device thereof
CN109859744B (en) * 2017-11-29 2021-01-19 宁波方太厨具有限公司 Voice endpoint detection method applied to range hood
CN108447505B (en) * 2018-05-25 2019-11-05 百度在线网络技术(北京)有限公司 Audio signal zero-crossing rate processing method, device and speech recognition apparatus
CN110910905B (en) * 2018-09-18 2023-05-02 京东科技控股股份有限公司 Mute point detection method and device, storage medium and electronic equipment
CN110310668A (en) * 2019-05-21 2019-10-08 深圳壹账通智能科技有限公司 Mute detection method, system, equipment and computer readable storage medium
CN113225592B (en) * 2020-01-21 2022-08-09 华为技术有限公司 Screen projection method and device based on Wi-Fi P2P

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1204766A (en) * 1997-03-25 1999-01-13 皇家菲利浦电子有限公司 Method and device for detecting voice activity
CN1266312A (en) * 1998-07-31 2000-09-13 摩托罗拉公司 Method and apparatus for provding speaking telephone operation in portable communication equipment
CN1290094A (en) * 2000-11-03 2001-04-04 国家数字交换系统工程技术研究中心 Multi-channel 64Kbps squelch compression method for packet switch network
CN1398126A (en) * 2001-07-18 2003-02-19 华为技术有限公司 Method for implementing multi-language coding-decoding in universal mobile communication system
CN1622193A (en) * 2004-12-24 2005-06-01 北京中星微电子有限公司 Voice signal detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1204766A (en) * 1997-03-25 1999-01-13 皇家菲利浦电子有限公司 Method and device for detecting voice activity
CN1266312A (en) * 1998-07-31 2000-09-13 摩托罗拉公司 Method and apparatus for provding speaking telephone operation in portable communication equipment
CN1290094A (en) * 2000-11-03 2001-04-04 国家数字交换系统工程技术研究中心 Multi-channel 64Kbps squelch compression method for packet switch network
CN1398126A (en) * 2001-07-18 2003-02-19 华为技术有限公司 Method for implementing multi-language coding-decoding in universal mobile communication system
CN1622193A (en) * 2004-12-24 2005-06-01 北京中星微电子有限公司 Voice signal detection method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10311874B2 (en) 2017-09-01 2019-06-04 4Q Catalyst, LLC Methods and systems for voice-based programming of a voice-controlled device

Also Published As

Publication number Publication date
CN1835073A (en) 2006-09-20

Similar Documents

Publication Publication Date Title
CN100573663C (en) Mute detection method based on speech characteristic to jude
Hermansky et al. TRAPS-classifiers of temporal patterns.
US7684982B2 (en) Noise reduction and audio-visual speech activity detection
KR100636317B1 (en) Distributed Speech Recognition System and method
Kingsbury et al. Recognizing reverberant speech with RASTA-PLP
US6804643B1 (en) Speech recognition
CN111508498B (en) Conversational speech recognition method, conversational speech recognition system, electronic device, and storage medium
CN109788400A (en) A kind of neural network chauvent's criterion method, system and storage medium for digital deaf-aid
CN110120227A (en) A kind of depth stacks the speech separating method of residual error network
KR20080064557A (en) Apparatus and method for improving speech intelligibility
EP1250699A2 (en) Speech recognition
Sharma et al. Study of robust feature extraction techniques for speech recognition system
Khan et al. Speaker separation using visually-derived binary masks
Couvreur et al. Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models
CN110197657B (en) Dynamic sound feature extraction method based on cosine similarity
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
Paliwal On the use of filter-bank energies as features for robust speech recognition
CN111341351A (en) Voice activity detection method and device based on self-attention mechanism and storage medium
CN114023352B (en) Voice enhancement method and device based on energy spectrum depth modulation
Li et al. An auditory system-based feature for robust speech recognition
CN112992131A (en) Method for extracting ping-pong command of target voice in complex scene
CN114189781A (en) Noise reduction method and system for double-microphone neural network noise reduction earphone
Malewadi et al. Development of Speech recognition technique for Marathi numerals using MFCC & LFZI algorithm
Malik et al. Wavelet transform based automatic speaker recognition
Pasad et al. Voice activity detection for children's read speech recognition in noisy conditions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091223

Termination date: 20160420

CF01 Termination of patent right due to non-payment of annual fee