CN103489454B - Based on the sound end detecting method of wave configuration feature cluster - Google Patents

Based on the sound end detecting method of wave configuration feature cluster Download PDF

Info

Publication number
CN103489454B
CN103489454B CN201310432146.9A CN201310432146A CN103489454B CN 103489454 B CN103489454 B CN 103489454B CN 201310432146 A CN201310432146 A CN 201310432146A CN 103489454 B CN103489454 B CN 103489454B
Authority
CN
China
Prior art keywords
sound
subsegment
signal
cluster
speech signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310432146.9A
Other languages
Chinese (zh)
Other versions
CN103489454A (en
Inventor
杨莹春
赵启明
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310432146.9A priority Critical patent/CN103489454B/en
Publication of CN103489454A publication Critical patent/CN103489454A/en
Application granted granted Critical
Publication of CN103489454B publication Critical patent/CN103489454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of sound end detecting method based on wave configuration feature cluster, comprise the steps: S01, obtain clean speech signal by primary speech signal; S02, obtains the envelope signal of described clean speech signal, and envelope signal is divided into some sound subsegments; S03, sound subsegment is carried out cluster by the wave configuration feature according to each sound subsegment, removes non-speech sounds subsegment; Whole sound subsegments of reserve part in step S03 are processed, obtain sound end by S04.The present invention uses relatively simple unsupervised clustering just to obtain good result quickly and accurately when using single features.

Description

Based on the sound end detecting method of wave configuration feature cluster
Technical field
The present invention relates to speech terminals detection field, particularly a kind of sound end detecting method based on wave configuration feature cluster.
Background technology
The development of current sound groove recognition technology in e reaches higher level, and speech terminals detection is speech analysis, a steps necessary in phonetic synthesis and Speaker Identification, in voice repetition system and speech recognition system, speech terminals detection technology has achieved a reasonable result, current existing end-point detection technology is very many, the feature of main use has short-time energy, zero-crossing rate, information entropy, sub belt energy, fundamental tone, time domain parameter, frequency domain parameter, cepstrum parameter etc., and the category of model method used also is varied, main has double threshold, neural network, wavelet model, hidden Markov model etc., but the problem of such as background noise in actual environment, effect does not also reach expectation.
Publication number is the end-point detecting method that patent document discloses a kind of speech recognition of 102148030A, and it comprises: gather ground unrest and noisy speech signal; The characteristic of analysis background noise and noisy speech signal; Extract parameter or its LPC (linearpredictivecoding) the i.e. linear forecast coding coefficient of ground unrest linear prediction model, as background noise linear prediction template; Determine the end points of noisy speech signal, by the linear predictor coefficient of every frame noisy speech and the parameter comparison of ground unrest template, and be treated to eigenwert.When the change of this eigenwert exceedes setting value, namely as detecting that the mark of sound end can also according to the change of ground unrest, namely revise ground unrest linear prediction model by its as background noise template.Under the method can realize band background noise environment very well, to the end-point detection of people's speech utterance signal.
The defect of the method is, needs to obtain multiple characteristic parameter, and its computation process is complicated.Therefore how by single characteristic parameter, carrying out speech sound signal terminal point detection comparatively accurately when there is ground unrest is the problem needing to solve.Meanwhile, wish when ground unrest is less, realize the detection of speech sound signal terminal point comparatively fast.
Summary of the invention
In order to carry out good speech terminals detection when there is ground unrest by single wave configuration feature, the invention provides a kind of sound end detecting method based on wave configuration feature cluster, comprising the steps:
S01, obtains clean speech signal by primary speech signal;
S02, obtains the envelope signal of described clean speech signal, and envelope signal is divided into some sound subsegments;
S03, sound subsegment is carried out cluster by the wave configuration feature according to each sound subsegment, and in cluster result, remove the sound subsegment of non-voice, retains remainder;
Whole sound subsegments of reserve part in step S03 are processed, obtain sound end by S04.
Wherein, non-speech sounds subsegment refers to mute part in voice signal outside terminal point information and background noise portions.The present invention uses speech enhancement technique to obtain clean speech with certain threshold value, filtering is asked for the envelope signal of voice signal and is core technology by the sound subsegment cluster of unsupervised clustering, can when there is ground unrest filtering noise well, obtain sound end from original voice signal.
Step S01 obtains clean speech signal step: primary speech signal is carried out speech enhan-cement and obtains contrast signal, signal to noise ratio (S/N ratio) is calculated by contrast signal and primary speech signal, if signal to noise ratio (S/N ratio) is greater than setting threshold value, then using primary speech signal as clean speech signal; If signal to noise ratio (S/N ratio) is less than setting threshold value, then using contrast signal as clean speech signal.
When having powerful connections noise, then the effect of follow-up sound subsegment cluster can be deteriorated, and the effect of speech terminals detection can significantly reduce, and whether therefore need to precheck current speech signal is clean speech signal.
The method of carrying out speech enhan-cement to primary speech signal is the one in following method: the Minimum Mean Squared Error estimation method of maximum a-posteriori estimation method, Kalman filtering method, comb filter method, Wiener filtering, spectrum-subtraction, short-time spectrum amplitude, self-adaptive routing, hidden Markov model method, wavelet transformation, neural network, auditory masking and fractal theory.Utilize the Minimum Mean Squared Error estimation method of maximum a-posteriori estimation method, Kalman filtering method, comb filter method, Wiener filtering, spectrum-subtraction, short-time spectrum amplitude, self-adaptive routing, hidden Markov model method, wavelet transformation, neural network, auditory masking, fractal theory to carry out speech enhan-cement for prior art, those skilled in the art utilize these methods all can carry out speech enhan-cement to primary speech signal.
The method obtaining the envelope signal of clean speech signal in step S02 is IIR filtering, hilbert conversion or Analytical Wavelet transform.Those skilled in the art all can obtain the envelope signal of clean speech signal by these methods.
To envelope signal maximizing and minimal value in step S02, according to minimal value, envelope signal being divided into sound subsegment, be a sound subsegment, and maximum value position is the crest of sound subsegment between two adjacent minimal value positions.Each sound subsegment and this wave configuration feature of sound subsegment crest amplitude can be obtained by asking the maximum value of envelope signal and minimal value.
The crest amplitude of the subsegment that selects a sound in step S03 is as wave configuration feature, and after sound subsegment is pressed the descending sort of crest amplitude, again sound subsegment is carried out cluster, the sound subsegment of the last part in cluster result as non-voice is removed, retain remainder.Wave configuration feature comprises crest amplitude, wave band amplitude average, wave band area, crest factor, as preferably, using crest amplitude as the wave configuration feature being used for cluster.By the descending sort of crest amplitude and after carrying out cluster, the various piece of gained cluster result presses the descending sort of crest amplitude, therefore last part in cluster result is removed as non-speech sounds subsegment, retain part above.
Level aggregating algorithm, K-means algorithm, K-modes algorithm, fuzzy clustering algorithm, graph-theoretical algorithm, based on the clustering algorithm of grid and density and ACODF algorithm.Those skilled in the art all can carry out cluster to sound subsegment by these methods.。
In step S04, disposal route is by whole sound subsegments of reserve part in step S03 according to time sequence, and sound subsegment adjacent time inter being less than threshold value couples together, and obtains sound end.Wherein, in the various piece of sound subsegment being carried out to cluster acquired results according to wave configuration feature, the time sequencing of sound subsegment there occurs change, therefore after cluster terminates, sound subsegments all in cluster result is resequenced according to time order and function order, connect sound subsegment according to the threshold value of sound subsegment adjacent time inter again, the independently sound clip obtained is sound end.
In step S04, the threshold range of adjacent time inter is 0.08s to 0.3s.This threshold value determines according to the word speed of ordinary people, and the word speed of a people is with 200 words per minute clocks for the upper limit, then the time of individual character is 0.3s, therefore with this for the upper limit setting threshold value.
The present invention uses relatively simple unsupervised clustering just to obtain good result quickly and accurately when using single features.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the end-point detection of one embodiment of the invention;
Fig. 2 is the detailed step of the sound subsegment cluster of this embodiment of the invention;
Fig. 3 is the primary speech signal figure of this embodiment of the invention;
Fig. 4 is the display effect of terminal point information in envelope signal of this embodiment of the invention, only illustrates data in end points in figure;
Fig. 5 is the display effect of terminal point information in primitive tone signal of this embodiment of the invention, only illustrates data in end points in figure.
Embodiment
Below in conjunction with drawings and Examples, be that the end-point detection of five parts is described in detail to the present invention is based on wave configuration feature cluster.
The experimental data of present example of the present invention is the telephone data of train partly and in test part in the male of 2004,2006 and 2008 of NISTSpeakerRecognitionEvaluation evaluation and test, in train in 2004, telephone packet contains 248 voice, and in test, telephone packet contains 1606 voice; In 06 year train, telephone packet contains 354 voice; And contain 648 voice in telephone data in the train of 08 year.NIST provides the end points text message that in 04 year and 06 year, whole speech data is correct, therefore may be used for detecting error rate of the present invention.Represent the telephone data in male in train part with male_train_telephone below, represent the telephone data in male in test part with male_test_telephone.The form of speech data is 8000Hz sampling rate, and l6 position quantizes, single pass WAV.The environment of experiment is matlab2012.
Fig. 1 is an embodiment of the sound end detecting method that the present invention is based on wave configuration feature cluster, and step is as follows:
Step S01, obtains clean speech signal.
When having powerful connections noise, then the effect of follow-up sound subsegment cluster can be deteriorated, and the effect of end-point detection can significantly reduce, and whether therefore need to precheck current speech signal is pure voice signal.Primary speech signal is carried out the signal signal as a comparison after speech enhan-cement, the method that primary speech signal carries out speech enhan-cement is the one in following method: the Minimum Mean Squared Error estimation method of maximum a-posteriori estimation method, Kalman filtering method, comb filter method, Wiener filtering, spectrum-subtraction, short-time spectrum amplitude, self-adaptive routing, hidden Markov model method, wavelet transformation, neural network, auditory masking and fractal theory.The present embodiment adopts and carries out Wiener filtering to carry out the speech enhan-cement of primary speech signal to primary speech signal.When primary speech signal input S filter, Wiener filtering is adopted as far as possible accurately to be showed by the pure voice signal not containing background noise, therefore Wiener filtering is adopted preferably to carry out speech enhan-cement, calculate the signal to noise ratio (snr) of sound afterwards, if SNR is more than or equal to certain threshold value as preferably, then think that present sound signals is pure voice signal, otherwise using filtered data as pure voice signal.The threshold value of SNR is by calculate and the SNR of audio files determines under observing a large amount of session operational scenarios, the expression noise-free signal (clean speech signal) of SNR more than 9.2, less than 9.2 for there being noise signal.
1-1, NIST is calculated in the male_train/test_telephone data of 2004,2006,2008 to the SNR of each speech data respectively, concrete computation process is as follows:
1. to initial primary speech signal x i(i=1,2 ... M) do Wiener filtering, obtain filtering signal (i.e. contrast signal) sy i(i=1,2 ... m), wherein M is primary speech signal x i(i=1,2 ... M) length, m is Wiener filtering signal sy i(i=1,2 ... m) length, subscript i represents i-th sampled point,
The transport function expression formula of S filter is as follows:
G ( k ) = { E { | S ( k ) | 2 } E { | S ( k ) | 2 } + βE { | W ( k ) | 2 } ) } ∂
Wherein, the Fourier transform that S (k) is clean speech signal, the Fourier transform that W (k) is additive noise signal, k represents a kth frequency, symbol E{} represents mathematical expectation, when implementing Wiener filtering to noisy speech signal, by adjustment with the value of β, obtain better filter effect.The parameter of Wiener filtering is determined according to minimum mean square error criterion.
2. with sy i(i=1,2 ... m) length intercepts initial primary speech signal x i(i=1,2 ... M) the 1st amplitude to m sampled point is as the amplitude sx of primary speech signal i(i=1,2 ... m), namely inside step afterwards, x i(i=1,2 ... m) as primary speech signal,
sx i = x i ( i = 1,2 , . . . M ) , M = m x i ( i = 1,2 , . . . m ) , M > m
i=1,2,…m
3. calculate the SNR of speech data, expression formula is as follows:
SNR = 10 * log 10 ( Σ i = 1 i = m sx i 2 / Σ i = 1 i = m ( sx i - sy i ) 2 )
The value of 1-2, SNR more than 9.2 in, voice signal can ensure as being clean speech signal, but does not represent SNR and less than 9.2 one be decided to be impure voice.
sz i = sy i , &PartialD; < 9.2 sx i , &PartialD; &GreaterEqual; 9.2
i=1,2,…m
Wherein sz i(i=1,2 ... m) clean speech signal for obtaining.
Step S02, obtains sound subsegment.
Ask for envelope signal to the pure voice signal obtained, the method asking for envelope signal can be IIR filtering, hilbert conversion or Analytical Wavelet transform.IIR filtering speed is fast, and the entirety that the envelope signal of taking-up more adequately reflects primary speech signal moves towards trend, and therefore the present embodiment adopts IIR filtering to ask for envelope, first by the amplitude sz of voice signal i(i=1,2 ... m) all take absolute value and obtain y, then construct IIR filter (IIRFilter), the envelope signal that filtering obtains signal is carried out to y.Then envelope signal being asked to the maximum value minimal value of signal, be a sound subsegment, and maximum value position is the crest of sound subsegment, envelope signal is divided into sound subsegment between two adjacent minimal value positions.
2-1, asks for envelope signal to the clean speech signal that step 1-1 provides.
1. take absolute value to clean speech signal, expression formula is as follows:
sw i = sz i , sz i &GreaterEqual; 0 - sz i , sz i < 0 ,
i=1,2,…m
Sz i(i=1,2 ... m) be the pure voice signal that step 1-1 obtains, sw i(i=1,2 ... m) be to sz i(i=1,2 ... m) take absolute value obtained data
2. filter design function butter is used to construct the parameter of filter function filter, according to filter order n and cutoff frequency W n, calculate ButterWorth wave filter numerator coefficients b i(i=1,2 ... and denominator coefficients a n+1) i(i=1,2 ... n+1), as preferably, n=3, W n=10/f n, f n=f s/ 2, wherein, f nfor nyquist frequency, i.e. data sampling frequency f shalf;
3. use filter function filter to sz i(i=1,2 ... m) carry out filtering, obtain envelope signal so i(i=1,2 ... m), it is obtained by difference equation below,
a 1*so n=b 1*sz n+b 2*sz n-1+…+b nb+1*sz n-nb-a 2
*so n-1-…-a na+1*so n-na
n=1,2,…,m
Wherein, na=nb=n, represents the exponent number of wave filter, { a 1, a 2... a na+1represent that difference equation exports so i(i=1,2 ... m) coefficient, if a 1be not 1, function f ilter its specification can be turned to 1, { b 1, b 2... b nb+1represent input sz i(i=1,2 ... m) coefficient.
2-2, obtains the amplitude at sound subsegment and corresponding crest place.
1. to envelope signal so of voice i(i=1,2 ... m) ask first difference, obtain expression formula as follows:
f1 i=so i+1-so i
i=1,2,…m-1
2. to f1 i(i=1,2 ... m-1) ask first difference again, obtain expression formula as follows:
f2 j=f1 j+1-f1 j
j=1,2,…m-2
3. by f2 j(j=1,2 ... m-2) add that 0 obtains f3 end to end respectively k(k=1,2 ... m),
f 3 k = 0 , k = 1 f 2 k - 1 0 , k = m , k = 2,3 , . . . m - 1
k=1,2,…m
4. f3 k(k=1,2 ... m) intermediate value be-2 position be the position at maximum value place in envelope signal, value be 2 position be then the position at minimal value place in envelope signal,
5. with the initial sum final position that two adjacent minimal values are sound subsegment, the maximum value between them is then the crest of this sound subsegment.
Step S03, carries out cluster according to crest amplitude by sound subsegment, and in cluster result, remove the sound subsegment of non-voice, retains remainder.
As shown in Figure 2, concrete steps are as follows:
1. cluster is carried out by sound subsegment morphological feature.As preferably, select crest amplitude as the wave configuration feature of sound subsegment, first press these sound subsegments of crest amplitude descending sort, obtain sample sample to be clustered.
2. to have levels aggregating algorithm, K-means algorithm, K-modes algorithm, fuzzy clustering algorithm, graph-theoretical algorithm, clustering algorithm, ACODF algorithm etc. based on grid and density by the method that sample to be clustered to be carried out cluster by unsupervised clustering.K-means algorithm is simple and computing velocity is very fast, being applied in speech terminals detection can speed up processing, as preferably, adopt K-means method, be five parts (namely carrying out five points of clusters) by sample cluster, according to the order of crest amplitude descending sort, five classes are respectively class1, class2, class3, class4, class5, and num_class1, num_class2, num_class3, num_class4 is class1 respectively, class2, class3, the number of sound subsegment in class4, calculate its summation: total_num=num_class1+num_class2+num_class3+num_class4.
Retain front four classes of five points of clusters, i.e. front total_num the sample of sample, the total length calculating these sound subsegments obtains the result time_K-means_five_interval_1 of first time five points of clusters,
time _ K - means _ result _ five _ 1
= &Sigma; i = 1 i = num _ class 1 L interval ( i ) + &Sigma; i = 1 i = num _ class 2 L interval ( i )
+ &Sigma; i = 1 i = num _ class 3 L interval ( i ) + &Sigma; i = 1 i = num _ class 4 L interval ( i )
Wherein L intervali () represents the length of each the sound subsegment in the result of five points of clusters in front four classes.
The object of five points of clusters rejects non-speech sounds subsegment exactly, in conjunction with the actual displayed characteristic (class that crest amplitude is minimum is very likely non-speech sounds subsegment) of sound subsegment crest amplitude, should retain the part that classification results medium wave peak amplitude is higher.Owing to having carried out the descending sort of crest amplitude before cluster, the multiple parts therefore after cluster are equally according to the descending sort of crest amplitude, and the minimum part of crest amplitude is in finally.Learn through experiment, front four classes of getting cluster are that reserve part is best.Same, the result of four points of clusters and three points of clusters only retains first three class and front two classes all respectively.
If time_K-means_five_interval_1 is less than the certain proportion (in the embodiment of the present invention, as preferably, this ratio is 60%) of whole sound subsegment total length, then using first time five points of cluster results as final cluster result.Unsupervised clustering has instability, just obtains final cluster result when the result of 95% is all first time five points of clusters in the middle of experiment.
If 3. time_K-means_five_interval_1 is greater than the certain proportion of whole sound subsegment total length (in the embodiment of the present invention, as preferably, this ratio is 60%), just in kind carry out five points of clusters again, the sound subsegment total length obtaining reserve part is time_K-means_five_interval_2, if time_K-means_five_interval_2 is less than above-mentioned 60%, then using this time five points of clusters as cluster result.Otherwise situation less in the result of twice five points of clusters, as the result time_K-means_five_interval of five points of clusters, carries out FOUR EASY STEPS.
4. it is in kind four parts by sound subsegment cluster, and retain first three class of cluster result, length time_K-means_four_interval, if (time_K-means_five_interval-time_K-means_four_interval) <time_K-means_five_interval*0.15, then the result be the result of five parts with cluster being step 3; Otherwise be three parts by sound subsegment cluster, wherein the sound subsegment length of front two classes is time_K-means_three_interval, if (time_K-means_four_interval-time_K-means_three_interval) <time_K-means_five_interval*0.2, then using cluster be the result of four parts as cluster result, and retain first three class; Otherwise using the result of three points of clusters as cluster result, and retain front two classes.
Step S04, the sound subsegment for the treatment of step 03 reserve part, obtains sound end.
After sound subsegment being sorted according to crest amplitude and carry out cluster, the time sequencing of sound subsegment there occurs change, therefore after cluster terminates, to sound subsegments all in cluster result according to the rearrangement of time order and function order, then connect sound according to the time interval between each sound subsegment.Described threshold value determines according to the word speed of ordinary people, and the word speed of a people is with 200 words per minute clocks for the upper limit, then the time of individual character is 0.3s, with this for the upper limit sets a threshold value.In order to the continuity of voice, threshold value is set to 0.1s by the present embodiment, and sound subsegment adjacent time inter being less than 0.1s couples together and just obtains final terminal point information.
The working time of the inventive method and the length of primary voice data have much relations, the data length of two data sets tested here is all be about 3 minutes, with the crest amplitude of waveform for feature, the working time calculated and result are as table 1, and wherein MINE mono-hurdle corresponding data is for adopting the inventive method speech terminals detection result of carrying out.The feature of simultaneously carrying out testing also has the combination of wave band amplitude average, wave band area and crest factor and these five features, as preferably, choose the crest amplitude of waveform as wave configuration feature, as a comparison be, article APRACTICAL, the VQVAD method that in SELF-ADAPTIVEVOICEACTIVITYDETECTORFORSPEAKERVERIFICATION WITHNOISYTELEPHONEEPHONEANDMICROPHONEDATA, TomiKinnunen and PadmanabhanRajan proposes, and the method for energy measuring in Open Source Platform ALIZE.
The computing platform of experiment is PC, Corei3-21303.3GHz processor and 8GBDDR3 internal memory.In three steps, sound enhancement method occupies more than 90% of the processing time, and can not do this operation in time needing the voice asking for terminal point information can determine to be clean speech, at this moment the processing time of each voice will within 0.5s.
Can see from form, the sound end detecting method utilizing the present invention to carry out has processing speed faster, and decreases error rate.
The present invention uses relatively simple unsupervised clustering just to obtain good result quickly and accurately when using single features.

Claims (7)

1. based on a sound end detecting method for wave configuration feature cluster, it is characterized in that, comprise the steps:
S01, obtains clean speech signal by primary speech signal;
S02, obtain the envelope signal of described clean speech signal, to envelope signal maximizing and minimal value, according to minimal value, envelope signal is divided into sound subsegment, be a sound subsegment between two adjacent minimal value positions, and maximum value position is the crest of sound subsegment;
S03, sound subsegment is carried out cluster by the wave configuration feature according to each sound subsegment, and in cluster result, remove the sound subsegment of non-voice, retains remainder;
S04, by whole sound subsegments of reserve part in step S03 according to time sequence, and sound subsegment adjacent time inter being less than threshold value couples together, and obtains sound end.
2. as claimed in claim 1 based on the sound end detecting method of wave configuration feature cluster, it is characterized in that, step S01 obtains clean speech signal step: primary speech signal is carried out speech enhan-cement and obtains contrast signal, signal to noise ratio (S/N ratio) is calculated by contrast signal and primary speech signal, if signal to noise ratio (S/N ratio) is greater than setting threshold value, then using primary speech signal as clean speech signal; If signal to noise ratio (S/N ratio) is less than setting threshold value, then using contrast signal as clean speech signal.
3. as claimed in claim 2 based on the sound end detecting method of wave configuration feature cluster, it is characterized in that, the method for carrying out speech enhan-cement to primary speech signal is the one in following method: the Minimum Mean Squared Error estimation method of maximum a-posteriori estimation method, Kalman filtering method, comb filter method, Wiener filtering, spectrum-subtraction, short-time spectrum amplitude, self-adaptive routing, hidden Markov model method, wavelet transformation, neural network, auditory masking and fractal theory.
4. as claimed in claim 1 based on the sound end detecting method of wave configuration feature cluster, it is characterized in that, the method obtaining the envelope signal of clean speech signal in step S02 is IIR filtering, conversion or Analytical Wavelet transform.
5. as claimed in claim 1 based on the sound end detecting method of wave configuration feature cluster, it is characterized in that, the crest amplitude of the subsegment that selects a sound in step S03 is as wave configuration feature, and after sound subsegment is pressed the descending sort of crest amplitude, again sound subsegment is carried out cluster, the sound subsegment of last part in cluster result as non-voice is removed, retains remainder.
6. as claimed in claim 1 based on the sound end detecting method of wave configuration feature cluster, it is characterized in that, in step S03, the clustering algorithm of sound subsegment is the one in following algorithm: level aggregating algorithm, K-means algorithm, algorithm, fuzzy clustering algorithm, graph-theoretical algorithm, based on the clustering algorithm of grid and density and algorithm.
7. as claimed in claim 1 based on the sound end detecting method of wave configuration feature cluster, it is characterized in that, in step S04, the threshold range of adjacent time inter is 0.08s to 0.3s.
CN201310432146.9A 2013-09-22 2013-09-22 Based on the sound end detecting method of wave configuration feature cluster Active CN103489454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310432146.9A CN103489454B (en) 2013-09-22 2013-09-22 Based on the sound end detecting method of wave configuration feature cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310432146.9A CN103489454B (en) 2013-09-22 2013-09-22 Based on the sound end detecting method of wave configuration feature cluster

Publications (2)

Publication Number Publication Date
CN103489454A CN103489454A (en) 2014-01-01
CN103489454B true CN103489454B (en) 2016-01-20

Family

ID=49829633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310432146.9A Active CN103489454B (en) 2013-09-22 2013-09-22 Based on the sound end detecting method of wave configuration feature cluster

Country Status (1)

Country Link
CN (1) CN103489454B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105091208B (en) * 2014-05-23 2017-11-14 美的集团股份有限公司 Air conditioner wind speed control method and system
US20160111107A1 (en) * 2014-10-21 2016-04-21 Mitsubishi Electric Research Laboratories, Inc. Method for Enhancing Noisy Speech using Features from an Automatic Speech Recognition System
CN104867493B (en) * 2015-04-10 2018-08-03 武汉工程大学 Multifractal Dimension end-point detecting method based on wavelet transformation
CN106971725B (en) * 2016-01-14 2021-06-15 芋头科技(杭州)有限公司 Voiceprint recognition method and system with priority
CN105825871B (en) * 2016-03-16 2019-07-30 大连理工大学 A kind of end-point detecting method without leading mute section of voice
CN107561376A (en) * 2016-06-30 2018-01-09 中兴通讯股份有限公司 A kind of method and device of power supply noise measurement
CN106205624B (en) * 2016-07-15 2019-10-15 河海大学 A kind of method for recognizing sound-groove based on DBSCAN algorithm
CN106611598B (en) * 2016-12-28 2019-08-02 上海智臻智能网络科技股份有限公司 A kind of VAD dynamic parameter adjustment method and device
CN107045870B (en) * 2017-05-23 2020-06-26 南京理工大学 Speech signal endpoint detection method based on characteristic value coding
CN107393558B (en) * 2017-07-14 2020-09-11 深圳永顺智信息科技有限公司 Voice activity detection method and device
CN107799126B (en) * 2017-10-16 2020-10-16 苏州狗尾草智能科技有限公司 Voice endpoint detection method and device based on supervised machine learning
CN108172219B (en) * 2017-11-14 2021-02-26 珠海格力电器股份有限公司 Method and device for recognizing voice
CN108198547B (en) * 2018-01-18 2020-10-23 深圳市北科瑞声科技股份有限公司 Voice endpoint detection method and device, computer equipment and storage medium
CN108257607B (en) * 2018-01-24 2021-05-18 成都创信特电子技术有限公司 Multi-channel voice signal processing method
CN108281154B (en) * 2018-01-24 2021-05-18 成都创信特电子技术有限公司 Noise reduction method for voice signal
CN108133711B (en) * 2018-01-24 2021-05-18 成都创信特电子技术有限公司 Digital signal monitoring device with noise reduction module
CN108962283B (en) * 2018-01-29 2020-11-06 北京猎户星空科技有限公司 Method and device for determining question end mute time and electronic equipment
CN108492347B (en) * 2018-04-11 2022-02-15 广东数相智能科技有限公司 Image generation method, device and computer readable storage medium
CN109410920B (en) * 2018-10-15 2020-08-18 百度在线网络技术(北京)有限公司 Method and device for acquiring information
CN111199741A (en) * 2018-11-20 2020-05-26 阿里巴巴集团控股有限公司 Voiceprint identification method, voiceprint verification method, voiceprint identification device, computing device and medium
CN112001431B (en) * 2020-08-11 2022-06-28 天津大学 Efficient image classification method based on comb convolution
CN112802489A (en) * 2021-04-09 2021-05-14 广州健抿科技有限公司 Automatic call voice adjusting system and method
CN113192507B (en) * 2021-05-13 2022-04-29 北京泽桥传媒科技股份有限公司 Information retrieval method and system based on voice recognition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758331A (en) * 2005-10-31 2006-04-12 浙江大学 Quick audio-frequency separating method based on tonic frequency
CN102148030A (en) * 2011-03-23 2011-08-10 同济大学 Endpoint detecting method for voice recognition
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity
CN102971789A (en) * 2010-12-24 2013-03-13 华为技术有限公司 A method and an apparatus for performing a voice activity detection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991718A (en) * 1998-02-27 1999-11-23 At&T Corp. System and method for noise threshold adaptation for voice activity detection in nonstationary noise environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1758331A (en) * 2005-10-31 2006-04-12 浙江大学 Quick audio-frequency separating method based on tonic frequency
CN102971789A (en) * 2010-12-24 2013-03-13 华为技术有限公司 A method and an apparatus for performing a voice activity detection
CN102148030A (en) * 2011-03-23 2011-08-10 同济大学 Endpoint detecting method for voice recognition
CN102800322A (en) * 2011-05-27 2012-11-28 中国科学院声学研究所 Method for estimating noise power spectrum and voice activity

Also Published As

Publication number Publication date
CN103489454A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
CN103489454B (en) Based on the sound end detecting method of wave configuration feature cluster
Pandey et al. Densely connected neural network with dilated convolutions for real-time speech enhancement in the time domain
Xiao et al. Normalization of the speech modulation spectra for robust speech recognition
CN111292762A (en) Single-channel voice separation method based on deep learning
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
Baby et al. Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition
CN106328123B (en) Method for recognizing middle ear voice in normal voice stream under condition of small database
Dua et al. Performance evaluation of Hindi speech recognition system using optimized filterbanks
Paliwal et al. Usefulness of phase in speech processing
Roy et al. DeepLPC: A deep learning approach to augmented Kalman filter-based single-channel speech enhancement
Hao et al. Time-domain neural network approach for speech bandwidth extension
He et al. Stress detection using speech spectrograms and sigma-pi neuron units
Hasan et al. Preprocessing of continuous bengali speech for feature extraction
Zouhir et al. Feature Extraction Method for Improving Speech Recognition in Noisy Environments.
Hagen Robust speech recognition based on multi-stream processing
Chu et al. A noise-robust FFT-based auditory spectrum with application in audio classification
Krishnan et al. Features of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognition
Adam et al. Wavelet cesptral coefficients for isolated speech recognition
Chavan et al. Speech recognition in noisy environment, issues and challenges: A review
Gref et al. Improving robust speech recognition for German oral history interviews using multi-condition training
Gerazov et al. Kernel power flow orientation coefficients for noise-robust speech recognition
Ravindran et al. Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing
Prasanna Kumar et al. Single-channel speech separation using empirical mode decomposition and multi pitch information with estimation of number of speakers
MY An improved feature extraction method for Malay vowel recognition based on spectrum delta
Pour et al. Gammatonegram based speaker identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant