CN104810018B - The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount - Google Patents

The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount Download PDF

Info

Publication number
CN104810018B
CN104810018B CN201510222045.8A CN201510222045A CN104810018B CN 104810018 B CN104810018 B CN 104810018B CN 201510222045 A CN201510222045 A CN 201510222045A CN 104810018 B CN104810018 B CN 104810018B
Authority
CN
China
Prior art keywords
mrow
msub
msup
sliding window
kurtosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510222045.8A
Other languages
Chinese (zh)
Other versions
CN104810018A (en
Inventor
吴小培
吕钊
罗雅琴
张超
周蚌艳
张磊
郭晓静
高湘萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN201510222045.8A priority Critical patent/CN104810018B/en
Publication of CN104810018A publication Critical patent/CN104810018A/en
Application granted granted Critical
Publication of CN104810018B publication Critical patent/CN104810018B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of sound end detecting method based on the estimation of dynamic accumulative amount, include the Higher Order Cumulants recurrence calculation based on sliding window and the end-point detection based on sliding window kurtosis.Higher Order Cumulants recurrence calculation based on sliding window refers to add rectangular window to raw sample data, carries out cumulant estimation to data in window, often slide a sample point and data in window are updated, realize the dynamic estimation of cumulant.End-point detection based on sliding window kurtosis is to combine the end-point detection that Higher Order Cumulants recurrence calculation method calculates sliding window kurtosis and energy feature carries out voice signal.The present invention has advantages below compared with prior art:The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount in the present invention is the end-point detection based on sliding window kurtosis, energy double threshold, parameter sliding window kurtosis to voice segments starting point with compared with strong sensitivity and to noise with more preferable antijamming capability, in a noisy environment with preferable robustness.

Description

The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount
Technical field
The present invention relates to data statistic analysis and field of signal processing, more particularly to it is a kind of based on the estimation of dynamic accumulative amount The Method of Speech Endpoint Detection.
Background technology
Growing with man-machine interface, speech recognition has become current manual's intelligence and led with pattern-recognition The emphasis of domain research.Voice is that the mankind are most important and the mode of intelligence transmission of most convenient, and realizes that man-machine interaction's is most direct One of approach.Allow machine to identify voice command exactly and perform corresponding operation, there is great practical significance, phase Research is closed to have broad application prospects in numerous areas such as medical science, military affairs and industry.As the front-end processing of speech recognition, language The target of voice endpoint detection is to distinguish sound section of voice signal and unvoiced segments.The end-point detection of efficiently and accurately can be significantly Mitigate the load of voice signal identifying system, reduce system response time, strengthening system robustness.Fourth order cumulant is that kurtosis is normal It is used to the non-Gaussian system of metric signal.In speech signal processing, usually assume that noise approximation meets Gaussian Profile, its high-order Cumulant is relatively small (Higher Order Cumulants of ideal Gaussian distribution are zero).Therefore, at the voice signal based on Higher Order Cumulants Reason method often has more preferable interference free performance.But because the amount of calculation of the Higher Order Cumulants such as kurtosis is larger, and numerical value meter The stability of calculation is also poor, therefore receives certain limitation in actual applications.
Classical cumulant algorithm for estimating is batch algorithms, and operand and memory data output are all very big, are not suitable for dynamic number According to online processing, and algorithm is also more sensitive to " outlier (outlier) " in observed data.In order to solve above-mentioned ask Topic, the On-line Estimation algorithm of cumulant are suggested, and effectively improve its dynamic estimation performance.But existing on-line Algorithm is base Established in whole historical datas, and in actual applications, the statistical property of Recent data segment data often has more reference price Value.And due to the non-stationary presence of data, big phase is generally not present between the historical data of early stage and recent data Guan Xing.Therefore estimated accuracy can not only be improved by carrying out statistical analysis using whole historical datas, can may also conversely be covered true Real data statistics.In addition, in the data acquisition under true environment, the significantly outlier occurred at random disturbs meeting Very big error is caused to statistic analysis result.Because traditional on-line Algorithm relies on whole signal datas, therefore caused by outlier Error has very strong transitivity.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of voice based on the estimation of dynamic accumulative amount Signal end detection method.
The present invention is achieved by the following technical solutions:A kind of speech sound signal terminal point inspection based on the estimation of dynamic accumulative amount Survey method, comprises the following steps:
(1), the Higher Order Cumulants recursion estimation based on sliding window:Windowing operation is carried out to raw sample data, in window Data carry out the estimation of cumulant, and slide a sample point every time and data in window are updated, and realize the dynamic of cumulant State is estimated;
(2) end-point detection, based on sliding window kurtosis:With reference to the Higher Order Cumulants recursion based on sliding window of step (1) Estimation, estimates sliding window kurtosis and energy feature, and the end points inspection of voice signal is carried out based on sliding window kurtosis and energy feature Survey.
As the further optimization of such scheme, the estimation of the Higher Order Cumulants recursion based on sliding window of the step (1) Comprise the following steps:
(11), sliding window:Adding window is carried out to raw sample data, realizes the Higher Order Cumulants to all sample points in window Recurrence estimation;
(12), recurrence estimation:By the Higher Order Cumulants the recursive calculative formula derived to step (11) in sliding window, Realize to all sample points in sliding window according to the recurrence estimation of Higher Order Cumulants.
As the further optimization of such scheme, step (11) sliding window is that raw sample data is entered by rectangular window Row interception, often slide a sample point and data in rectangular window are updated, realize the higher order cumulants to all sample points in window Measure recurrence estimation.
As the further optimization of such scheme, the Higher Order Cumulants the recursive calculative formula of step (12) is as follows:
Data set xaAnd xb, corresponding sample number is respectively na,nb;With reference to data set xaAnd xbNew data set x={ xa,xb, Corresponding sample length is n=na+nb, average be μ, kthQuadratic sum is Sk
Data set x={ xa,xb2-4 rank cumulant the recursive calculative formulas based on sliding windowRepresent as follows:
Wherein, the average for taking L data in window a length of L, n the moment sliding window of sliding window is μ(w)(n), kthQuadratic sum is μ(w)(n) andTake n-L=nL,
Wherein, μ (n) and Sk(n) be respectively all historical datas before the n moment variance and kthQuadratic sum, μ (nL) and Sk (nL) it is respectively nLThe variance and k of all historical datas before momentthQuadratic sum.
It is as follows as the further optimization of such scheme, the Updating Estimation of Higher Order Cumulants the recursive calculative formula:
(51) dynamic storage cell is opened up
To data set x={ xa,xbAverage be μ, kthQuadratic sum is Sk, calculate corresponding to sample point x (1)~x (L) Mean μ, kthQuadratic sum Sk, the memory cell for opening up 4L sizes deposits the class value;
Wherein,
Len represents sample length, xiRepresent the sample point in sample, μxRepresent sample x average.
(52) calculation of initial value
Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cellthQuadratic sum μ (n-1)、Sk(n-1) initial value current time n mean μ (n), and k, are calculatedthQuadratic sum Sk(n):
S2(n)=S2(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)
According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked forthQuadratic sum μ (n)、Sk(n);
(53) sliding window dynamic estimation
(531) sliding window average and kthQuadratic sum μ(w)(n)、Calculating
Utilize the current time n obtained in On-line Estimation average and kthQuadratic sum μ (n), Sk(n) and in memory cell The n preservedLThe average and k at=n-L momentthQuadratic sum μ (nL)、Sk(nL), according to being tired out based on sliding window for (4a)-(4d) formulas Accumulated amount method for dynamic estimation, ask for current time n sliding window average and kthQuadratic sum μ(w)(n)、
(532) calculating of sliding window 2-4 ranks cumulant
The current time n obtained in being estimated using step (5.3.1) sliding window sliding window average and kthQuadratic sum μ(w) (n)、According to the sliding window cumulant the recursive calculative formula of (6a)-(6c) formulas, current time n sliding window 2-4 is asked for Rank cumulant(k=2~4);
(54) dynamic renewal memory cell
Calculating current time n sliding window average and kthQuadratic sum μ(w)(n)、Afterwards, moved in sliding window Before, with μ (n), S resulting in step (52) On-line Estimationk(n) content in value renewal dynamic storage cell so that sliding After dynamic window slides a sample point, L average and the k before being still current time that are preserved in dynamic storage cellthQuadratic sum value.
As the further optimization of such scheme, the control of the end-point detection based on sliding window kurtosis of the step (2) Algorithm is as follows:
(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S2 (n);Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal;
(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation2(n) kurtosis door is set Kurt and energy threshold amp is limited, into step (203);
(203) (203) assignment initiation parameter, the smallest sample points and clear band that initializing set voice segments are allowed The maximum sample points allowed, setting voice segments sample points Speechcount=0;Clear band sample points Nonspeechcount=0;Enter step (204) afterwards
(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt;
(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point Start;And searched for backward from Start, into step (207);
(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, to kurtosis value Kurtosis (n) assignment circulation n=n+1, and return to step (205) are carried out;
(207) by median S corresponding to subsequent point2(n) value is made comparisons with energy threshold amp;
(208) if median S corresponding to subsequent point2(n) value is more than energy threshold amp, represents the point in voice segments, voice Section sample point Speechcount adds 1, return to step (207);
(209) if median S corresponding to subsequent point2(n) value is not more than energy threshold amp, represents the point in clear band, sky White section sample points Nonspeechcount adds 1, into step (210);
(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount;
(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed Sample points Minspeechcount, into step (213);
(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point2(n) assignment circulation n, is carried out =n+1, and return to step (207);
(213) judge whether current voice segments sample point Speechcount is more than the smallest sample that voice segments are allowed Point Minspeechcount;
(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points Nonspeechcount is 0, finds starting point again;
(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount;Represent that the point is maintained at voice segments, wherein voice segments section [Start, Start+speechcount+ nonspeechcount-1]。
The present invention has advantages below compared with prior art:The voice signal based on the estimation of dynamic accumulative amount in the present invention End-point detecting method is the end-point detection algorithm based on sliding window kurtosis and energy double threshold, and parameter sliding window kurtosis is to voice segments Starting point is with, with more preferable antijamming capability, in speech signal processing, usually assuming that and make an uproar compared with strong sensitivity and to noise Sound approximation meets Gaussian Profile, and its Higher Order Cumulants is relatively small, and the Higher Order Cumulants of ideal Gaussian distribution are zero, based on high-order The audio signal processing method of cumulant often has more preferable interference free performance, and the present invention has more preferable in a noisy environment Robustness.
In the present invention based on dynamic accumulative amount estimation the Method of Speech Endpoint Detection, wherein provide based on slip The accumulation value that window cumulant recurrence calculation method calculates differs very little with the value result that traditional direct calculation method calculates, The test result of voice signal is shown, the error difference 10 for the 2-4 rank cumulants that two kinds of algorithms calculate-15、10-10With 10-7, Can directly it ignore.
In actual applications, because the non-stationary presence of data, the historical data of early stage are general not with recent data In the presence of big correlation, the statistical property of Recent data section often has more reference value, and the data under true environment In gatherer process, the significantly outlier interference occurred at random can cause very big error to statistic analysis result.Base in the present invention In dynamic accumulative amount estimation the Method of Speech Endpoint Detection, wherein provide based on sliding window cumulant recurrence calculation method It is based on the sample data in sliding window, effectively avoids the problems of above-mentioned, there is stronger application under true environment Value.It is also possible to apply the invention to bioelectrical signals (EEG signals EEG/ electro-ocular signal EOG/ core signal ECGs) and voice to believe Number dynamic analysis in.
Brief description of the drawings
Fig. 1 is the structural representation of the sliding window of the present invention.
Fig. 2 is that the recursive algorithm of the sliding window cumulant estimation of the present invention realizes block diagram.
Fig. 3 is the operation time comparison diagram with directly calculating using sliding window recurrence calculation of the invention.
Fig. 4-A-Fig. 4-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the variance being directly calculated The dynamic waveform and corresponding error curve schematic diagram of sliding window estimation.
Fig. 5-A-Fig. 5-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the deflection being directly calculated Degree sliding window estimates obtained dynamic waveform and corresponding error curve schematic diagram.
Fig. 6-A-Fig. 6-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the kurtosis being directly calculated Sliding window estimates dynamic waveform and corresponding error curve schematic diagram.
Fig. 7 is the control algolithm flow chart of the end-point detection based on sliding window kurtosis.
Fig. 8-A-Fig. 8-C be one section of the preferred embodiments of the present invention actual measurement voice signal oscillogram and corresponding kurtosis and Energy variation schematic diagram.
The change of Fig. 9-A-Fig. 9-F-1 alone word voice of (SNR=5dB) and its corresponding kurtosis under different noise circumstances Change schematic diagram.
The change of Figure 10-A-Figure 10-F-1 continuous speech of (SNR=5dB) and its corresponding kurtosis under different noise circumstances Change schematic diagram.
Embodiment
Embodiments of the invention are elaborated below, the present embodiment is carried out lower premised on technical solution of the present invention Implement, give detailed embodiment and specific operating process, but protection scope of the present invention is not limited to following implementation Example.
Referring to Fig. 1, for the structural representation of the sliding window of the present invention.In the present embodiment, sliding window is by rectangular window pair Raw sample data is intercepted, and to obtain the pending data of equal length at any time, the estimation of cumulant is base In L sample data in sliding window, a sample point is slided every time data in window are updated, then re-start cumulant Estimation, so as to realize the dynamic estimation of cumulant.
Any two data sets xaAnd xb, corresponding sample number is respectively na,nb, data set xaAnd xbNew data set with reference to after X={ xa,xbSample length be n=na+nb, its mean μ and kthQuadratic sum SkFor:
Wherein, μ and SkSample x average and k is represented respectivelythQuadratic sum, wherein subscript (a), (b) represent statistic respectively It is to be based on set of data samples xaAnd xb's.μ and SkDefinition difference it is as follows:
Wherein Len represents sample length, xiRepresent the sample point in sample, μxRepresent sample x average.Wherein, μ is represented Variance, facilitate for statement, hereinafter SkIt is referred to as kthQuadratic sum.
According to sliding window thought, to raw sample data adding window, the estimation of cumulant is based on sample number in sliding window According to, with reference to (1a)-(1d) formulas, sliding window estimation can be regarded as to xbThe cumulant estimation of sample set data.By (1a)-(1d) It is rewritten into following form:
Make the average and k of L data in sliding window window a length of L, n moment sliding windowthQuadratic sum is respectively μ(w)(n) andIn order to express easily, n-L=n is madeL, then in sliding window data equivalent to set of data samples xb, it is all before sliding window to go through History data are equivalent to set of data samples xa, then the rewriting of (3a) (3d)-formula is as follows:
μ (n) and Sk(n) be all historical datas before the n moment variance and kthQuadratic sum, μ (nL) and Sk(nL) it is nLMoment The variance and k of preceding all historical datasthQuadratic sum.
Due to μ and SkThere is following relation with 2-4 rank cumulants:
Wherein σ2、C3、C4Variance, degree of skewness and the kurtosis of sample are represented respectively.
The 2-4 ranks cumulant and S then provided according to (5a)-(5c) formulaskCalculated relationship, sliding window cumulant can be obtained The recursive calculative formula is as follows:
WhereinThe 2-4 rank cumulants based on sliding window are represented respectively.
Referring to Fig. 2, block diagram is realized for the recursive algorithm of the sliding window cumulant estimation of the present invention, sliding window in the present embodiment The recursive algorithm implementation process of cumulant estimation, comprises the following steps:
Step 1, utilize (2a)-(2b) formula direct calculation method to calculate first sample point to l-th sample point, that is, originate Sample data in the window of position, corresponding variance μ and kthQuadratic sum Sk, and preserve.
Step 2, calculation of initial value:Sliding window slides a sample point,
Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cellthQuadratic sum μ (n-1)、Sk(n-1) initial value current time n mean μ (n), and k, are calculatedthQuadratic sum Sk(n):
S2(n)=S2(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)
According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked forthQuadratic sum μ (n)、Sk(n);
Step 3, sliding window dynamic estimation:According to (4a)-(4d) formula sliding window cumulant recursive algorithms, using in step 2 The preceding moment n solved variance μ (n) and kthQuadratic sum SkAnd the n that has preserved (n)LVariance μ (the n at=n-L momentL) and kth Quadratic sum Sk(nL) solve current time n sliding window variance μ(w)And sliding window k (n)thQuadratic sumAnd preserve μ And k (n)thQuadratic sum Sk(n) this class value, the data in memory cell are updated, after sliding a sample point so as to sliding window, moved L average and the k before being still current time preserved in state memory cellthQuadratic sum value.2~4 are solved further according to (6a)-(6c) formulas Rank cumulant(k=2~4).
Step 4, the content for updating memory cell:Utilize the current time n calculated in On-line Estimation module variance μ And k (n)thQuadratic sum Sk(n) content of memory cell is updated, after sliding a sample point so as to sliding window, is deposited in memory cell What is put is still the average and k of L sample point before current timethQuadratic sum.
Involved sliding window cumulant recurrence calculation method has higher operation efficiency in the present invention.The present invention uses Recursive algorithm dynamically solves cumulant, and algorithm operation quantity is greatly lowered, and operation efficiency is substantially better than traditional directly calculating.Table 1 provide is directly to calculate to carry out variance μ based on (5a)-(5c) formula recurrence calculation and (2a)-(2b) formulas(w)(n) andEstimate " addition " and " multiplication " operation times needed for timing.
The computation complexity of table 1 compares
It can be seen that, the operand directly calculated is directly proportional to L, and the operand of recurrence calculation and sliding window from table 1 Length is unrelated.When carrying out statistical analysis to measured data, L values are generally large, therefore in calculating speed, recursive algorithm is than straight Connecing calculating has obvious advantage.
Referring to Fig. 3, to utilize the sliding window recurrence calculation of the present invention and the operation time comparison diagram directly calculated.From figure As can be seen that in the case of sample data identical, the operation time of recurrence calculation is much smaller than directly calculating.Directly calculate computing Time is larger with the increase ascensional range of sample data, in the range of 1 to 10, when sample data points are 100000, Operation time was close to 40 seconds, and the operation time ascensional range of recursive algorithm is 10-2To 10-1In the range of, and much smaller than direct Calculate.
Referring to Fig. 4-A- Fig. 4-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated The waveform that error curve corresponding to the dynamic waveform of variance sliding window estimation changes with ordinate amplitude (amplitude) is illustrated Figure.Wherein, Fig. 4-A are the oscillograms of one section of actual measurement voice signal, and Fig. 4-B are the oscillograms for the variance yields that recurrence calculation obtains, Fig. 4-C are the oscillograms for the variance yields being directly calculated, and Fig. 4-D are the oscillograms of the error amount of two kinds of computational methods.From figure In it can be seen that both error amount is 10-15, error amount can be ignored.
Referring to Fig. 5-A- Fig. 5-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated Degree of skewness sliding window estimates what obtained dynamic waveform and corresponding error curve changed with ordinate amplitude (amplitude) Waveform diagram.Wherein, Fig. 5-A are the oscillograms of one section of actual measurement voice signal, and Fig. 5-B are the deflection angle value that recurrence calculation obtains Oscillogram, Fig. 5-C are the oscillograms for the deflection angle value being directly calculated, and Fig. 5-D are the error amounts of two kinds of computational methods Oscillogram.As can be seen from the figure both error amounts are 10-10, error amount can be ignored.
Referring to Fig. 6-A- Fig. 6-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated Kurtosis sliding window estimates that the waveform that dynamic waveform and corresponding error curve change with ordinate amplitude (amplitude) is illustrated Figure.Wherein, Fig. 6-A are the oscillograms of one section of actual measurement voice signal, and Fig. 6-B are the oscillograms for the kurtosis value that recurrence calculation obtains, Fig. 6-C are the oscillograms for the kurtosis value being directly calculated, and Fig. 6-D are the oscillograms of the error amount of two kinds of computational methods.From figure In it can be seen that both error amount is 10-7, error amount can be ignored.
With reference to figure 7, for the control algolithm flow of the end-point detection based on sliding window kurtosis of the preferred embodiments of the present invention Figure, the caused median S during sliding window kurtosis recurrence calculation is can be seen that from (4d) formula2It is used as sliding window The energy feature of signal, therefore energy feature parameter needed for detection is without extra computation.Parameter definition is as follows shown in figure: Speechcount represents voice segments sample points;Nonspeechcount represents clear band sample points;Minspeechcount Represent the smallest sample points that voice segments are allowed;Maxnonspeechcount represents the maximum sample point that clear band is allowed Number;Minspeechcount and Maxnonspeechcount are respectively set to 256 and 1024.For the specific voice letter of certain section Number, we rule of thumb set kurtosis threshold value, are calculated in experiment using formula threshold=max (Kurtosis)/10 Its kurtosis thresholding kurt.
The control algolithm of the end-point detection based on sliding window kurtosis of the present invention, comprises the following steps:
(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S2 (n);
Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal Degree;
(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation2(n) kurtosis door is set Kurt and energy threshold amp is limited, into step (203);
(203) assignment initiation parameter, afterwards into step (204)
Voice segments sample points Speechcount=0;Clear band sample points Nonspeechcount=0;
The smallest sample points Minspeechcount that voice segments are allowed and the maximum sample points that clear band is allowed Maxnonspeechcount is respectively set to 256 and 1024;
(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt;
(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point Start;And searched for backward from Start, into step (27);
(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, to kurtosis value Kurtosis (n) assignment circulation n=n+1, and return to step (205) are carried out;
(207) by median S corresponding to subsequent point2(n) value is made comparisons with energy threshold amp;
(208) if median S corresponding to subsequent point2(n) value is more than energy threshold amp, represents the point in voice segments, voice Section sample point Speechcount adds 1, return to step (207);
(209) if median S corresponding to subsequent point2(n) value is not more than energy threshold amp, represents the point in clear band, sky White section sample points Nonspeechcount adds 1, into step (210);
(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount;
(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed Sample points Minspeechcount, into step (213);
(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point2(n) assignment circulation n, is carried out =n+1, and return to step (207);
(213) judge whether current voice segments sample point Speechcount is more than the smallest sample that voice segments are allowed Point Minspeechcount;
(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points Nonspeechcount is 0, finds starting point again;
(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount;Represent that the point is maintained at voice segments, wherein voice segments section [Start, Start+speechcount+ nonspeechcount-1]。
It is one section of actual measurement voice signal oscillogram of the preferred embodiments of the present invention and corresponding high and steep referring to Fig. 8-A- Fig. 8-C Degree and energy variation schematic diagram.Fig. 8-A are the oscillogram of the primary speech signal of one section of actual measurement voice signal, and Fig. 8-B are actual measurement The oscillogram of kurtosis value corresponding to voice, display kurtosis value can occur significantly to change when entering voice segments, therefore can be with This is as the basis for estimation for detecting voice segments starting point.Fig. 8-C are to survey energy variation schematic diagram corresponding to voice, energy in figure Waveform then reflects the entire change of voice and non-speech segment energy.Intuitively both inspections being combined with beneficial to improvement voice segments Survey precision.
Referring to Fig. 9-A-Fig. 9-F-1, for the alone word voice of (SNR=5dB) under different noise circumstances and its correspond to high and steep Degree.One section of alone word voice and its signal waveform under 5dBwhite, pink, m109, f16, babble noise in this example And its oscillogram that corresponding sliding window kurtosis changes with ordinate amplitude (amplitude).Figure 10-A are one section of isolated word The oscillogram of the primary speech signal of voice, Figure 10-A-1 are the ripple of kurtosis value corresponding to the raw tone of one section of alone word voice Shape figure.Figure 10-B are the oscillogram that one section of alone word voice adds white noisy speech signals, and Figure 10-B are one section of alone word voice Add the oscillogram of kurtosis value corresponding to 5dBwhite noise speech.Figure 10-C are that one section of alone word voice adds pink noise speech to believe Number oscillogram, Figure 10-C-1 are that one section of alone word voice adds the oscillogram of kurtosis value corresponding to 5dBpink noise speech.Figure 10-D is the oscillogram that one section of alone word voice adds m109 noisy speech signals, and Figure 10-D-1 are that one section of alone word voice adds The oscillogram of kurtosis value corresponding to 5dBm109 noise speech.Figure 10-E are that one section of alone word voice adds f16 noisy speech signals Oscillogram, Figure 10-E-1 are the oscillogram that one section of alone word voice adds kurtosis value corresponding to 5dBf16 noise speech.Figure 10-F are One section of alone word voice adds the oscillogram of babble noisy speech signals, and Figure 10-F-1 are that one section of alone word voice adds The oscillogram of kurtosis value corresponding to 5dBbabble noise speech.It can be seen that the isolated word for being mixed with different noises Voice signal, its sliding window kurtosis have similar waveform to the sliding window kurtosis of clean speech signal, when entering voice segments Kurtosis value can occur significantly to change.
Referring to Figure 10-A-Figure 10-F-1, the continuous speech of (SNR=5dB) and its corresponding kurtosis under different noise circumstances Change schematic diagram.One section of continuous speech and its under 5dBwhite, pink, m109, f16, babble noise in this example The oscillogram that signal waveform and its corresponding sliding window kurtosis change with ordinate amplitude (amplitude).Figure 10-A are one The oscillogram of the primary speech signal of section continuous speech, Figure 10-A-1 are kurtosis value corresponding to the raw tone of one section of continuous speech Oscillogram.Figure 10-B are the oscillogram that one section of continuous speech adds white noisy speech signals, and Figure 10-B are one section of continuous speech Add the oscillogram of kurtosis value corresponding to 5dBwhite noise speech.Figure 10-C are that one section of continuous speech adds pink noisy speech signals Oscillogram, Figure 10-C-1 are that one section of continuous speech adds the oscillogram of kurtosis value corresponding to 5dBpink noise speech.Figure 10-D are One section of continuous speech adds the oscillogram of m109 noisy speech signals, and Figure 10-D-1 are that one section of continuous speech adds 5dBm109 noise languages The oscillogram of kurtosis value corresponding to sound.Figure 10-E add the oscillogram of f16 noisy speech signals, Figure 10-E-1 for one section of continuous speech Add the oscillogram of kurtosis value corresponding to 5dBf16 noise speech for one section of continuous speech.Figure 10-F are that one section of continuous speech adds The oscillogram of babble noisy speech signals, Figure 10-F-1 add high and steep corresponding to 5dBbabble noise speech for one section of continuous speech The oscillogram of angle value.It can be seen that the continuous speech signal for being mixed with different noises, its sliding window kurtosis with it is pure The sliding window kurtosis of voice signal has similar waveform, and when entering voice segments, kurtosis value can occur significantly to change.
The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount in the present invention be based on sliding window kurtosis and The end-point detection algorithm of energy double threshold, parameter sliding window kurtosis have compared with strong sensitivity to voice segments starting point and had to noise There is more preferable antijamming capability, in speech signal processing, usually assume that noise approximation meets Gaussian Profile, its Higher Order Cumulants Relatively small, the Higher Order Cumulants of ideal Gaussian distribution are zero, and the audio signal processing method based on Higher Order Cumulants often has There is more preferable interference free performance, the present invention has more preferable robustness in a noisy environment.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims (7)

1. a kind of the Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount, it is characterised in that comprise the following steps:
(1), the Higher Order Cumulants recursion estimation based on sliding window:Windowing operation is carried out to raw sample data, to data in window The estimation of cumulant is carried out, and slides a sample point every time and data in window is updated, realizes that the dynamic of cumulant is estimated Meter;
(2) end-point detection, based on sliding window kurtosis:Estimated with reference to the Higher Order Cumulants recursion based on sliding window of step (1), Sliding window kurtosis and energy feature are estimated, the end-point detection of voice signal is carried out based on sliding window kurtosis and energy feature.
2. the Method of Speech Endpoint Detection according to claim 1 based on the estimation of dynamic accumulative amount, it is characterised in that The estimation of the Higher Order Cumulants recursion based on sliding window of the step (1) comprises the following steps:
(11), sliding window:Adding window is carried out to raw sample data, realizes the Higher Order Cumulants recursion to all sample points in window Estimation;
(12), recurrence estimation:By the Higher Order Cumulants the recursive calculative formula derived to step (11) in sliding window, realize To all sample points in sliding window according to the recurrence estimation of Higher Order Cumulants.
3. the Method of Speech Endpoint Detection according to claim 2 based on the estimation of dynamic accumulative amount, it is characterised in that Step (11) sliding window is that raw sample data is intercepted by rectangular window, often slides a sample point in rectangular window Data are updated, and realize the Higher Order Cumulants recurrence estimation to all sample points in window.
4. the Method of Speech Endpoint Detection according to claim 2 based on the estimation of dynamic accumulative amount, it is characterised in that: The Higher Order Cumulants the recursive calculative formula of step (12) is as follows:
Data set xaAnd xb, corresponding sample number is respectively na,nb;With reference to data set xaAnd xbNew data set x={ xa,xb, it is corresponding Sample length is n=na+nb, average be μ, kthQuadratic sum is Sk
Data set x={ xa,xb2-4 rank cumulant the recursive calculative formulas based on sliding window Table Show as follows:
<mrow> <msubsup> <mi>C</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <mi>L</mi> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mi>a</mi> <mo>)</mo> </mrow> </mrow>
<mrow> <msubsup> <mi>C</mi> <mn>3</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msqrt> <mi>L</mi> </msqrt> <msubsup> <mi>S</mi> <mn>3</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <msup> <mrow> <mo>(</mo> <msqrt> <mrow> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> </msqrt> <mo>)</mo> </mrow> <mn>3</mn> </msup> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mi>b</mi> <mo>)</mo> </mrow> </mrow>
<mrow> <msubsup> <mi>C</mi> <mn>4</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>LS</mi> <mn>4</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> </mrow> <msup> <mrow> <mo>&amp;lsqb;</mo> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>2</mn> </msup> </mfrac> <mo>-</mo> <mn>3</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mi>c</mi> <mo>)</mo> </mrow> </mrow>
Wherein, the average for taking L data in window a length of L, n the moment sliding window of sliding window is μ(w)(n), kthQuadratic sum isTake n-L=nL,
<mrow> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mi>n</mi> <mi>L</mi> </mfrac> <mo>&amp;lsqb;</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>+</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mi>a</mi> <mo>)</mo> </mrow> </mrow>
<mrow> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>S</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mfrac> <mrow> <msub> <mi>n</mi> <mi>L</mi> </msub> <mi>L</mi> </mrow> <mi>n</mi> </mfrac> <msup> <mrow> <mo>&amp;lsqb;</mo> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>2</mn> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mi>b</mi> <mo>)</mo> </mrow> </mrow>
<mrow> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>S</mi> <mn>3</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>Ln</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>2</mn> <mi>L</mi> <mo>)</mo> </mrow> <mfrac> <msup> <mrow> <mo>&amp;lsqb;</mo> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>3</mn> </msup> <msup> <mi>n</mi> <mn>2</mn> </msup> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mn>3</mn> <mo>&amp;lsqb;</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msub> <mi>LS</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mfrac> <mrow> <mo>&amp;lsqb;</mo> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mi>n</mi> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mi>c</mi> <mo>)</mo> </mrow> </mrow> 1
<mrow> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>S</mi> <mn>4</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>S</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>Ln</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msup> <mi>n</mi> <mn>2</mn> </msup> <mo>-</mo> <mn>3</mn> <mi>n</mi> <mi>L</mi> <mo>+</mo> <mn>3</mn> <msup> <mi>L</mi> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mfrac> <msup> <mrow> <mo>&amp;lsqb;</mo> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>4</mn> </msup> <msup> <mi>n</mi> <mn>3</mn> </msup> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mn>6</mn> <mo>&amp;lsqb;</mo> <msup> <mi>L</mi> <mn>2</mn> </msup> <msub> <mi>S</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msup> <msub> <mi>n</mi> <mi>L</mi> </msub> <mn>2</mn> </msup> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mfrac> <msup> <mrow> <mo>&amp;lsqb;</mo> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>L</mi> </msub> <mo>&amp;rsqb;</mo> </mrow> <mn>2</mn> </msup> <mi>n</mi> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mn>4</mn> <mo>&amp;lsqb;</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <msubsup> <mi>S</mi> <mn>3</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msub> <mi>LS</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mfrac> <mrow> <mo>&amp;lsqb;</mo> <msup> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mi>L</mi> </msub> <mo>&amp;rsqb;</mo> </mrow> <mi>n</mi> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mi>d</mi> <mo>)</mo> </mrow> </mrow>
Wherein, μ (n) and Sk(n) be respectively all historical datas before the n moment variance and kthQuadratic sum, μ (nL) and Sk(nL) point Wei not nLThe variance and k of all historical datas before momentthQuadratic sum.
5. the Method of Speech Endpoint Detection according to claim 4 based on the estimation of dynamic accumulative amount, it is characterised in that: The Updating Estimation of Higher Order Cumulants the recursive calculative formula is as follows:
(51) dynamic storage cell is opened up:
To data set x={ xa,xbAverage be μ, kthQuadratic sum is Sk, calculate the average corresponding to sample point x (1)~x (L) μ, kthQuadratic sum Sk, the memory cell for opening up 4L sizes deposits the class value;
Wherein,
<mrow> <msub> <mi>S</mi> <mi>k</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>L</mi> <mi>e</mi> <mi>n</mi> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&amp;mu;</mi> <mi>x</mi> </msub> <mo>)</mo> </mrow> <mi>k</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mi>b</mi> <mo>)</mo> </mrow> </mrow>
Len represents sample length, xiRepresent the sample point in sample, μxRepresent sample x average;
(52) calculation of initial value
Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cellthQuadratic sum μ (n-1), Sk(n-1) initial value current time n mean μ (n), and k, are calculatedthQuadratic sum Sk(n):
<mrow> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mn>1</mn> <mi>n</mi> </mfrac> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mi>a</mi> <mo>)</mo> </mrow> </mrow>
S2(n)=S2(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mrow> <mn>3</mn> <mo>&amp;lsqb;</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <msub> <mi>S</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mi>n</mi> </mfrac> <mo>+</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <msup> <mrow> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>3</mn> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow> <msup> <mi>n</mi> <mn>2</mn> </msup> </mfrac> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mi>c</mi> <mo>)</mo> </mrow> </mrow>
<mrow> <mtable> <mtr> <mtd> <mrow> <msub> <mi>S</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>4</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mrow> <msup> <mrow> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>4</mn> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msup> <mi>n</mi> <mn>2</mn> </msup> <mo>-</mo> <mn>3</mn> <mi>n</mi> <mo>+</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow> <msup> <mi>n</mi> <mn>3</mn> </msup> </mfrac> <mo>+</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <mn>6</mn> <msup> <mrow> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> </mrow> <mn>2</mn> </msup> <msub> <mi>S</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <msup> <mi>n</mi> <mn>2</mn> </msup> </mfrac> <mo>-</mo> <mfrac> <mrow> <mn>4</mn> <mo>&amp;lsqb;</mo> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&amp;mu;</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>&amp;rsqb;</mo> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mi>n</mi> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mi>d</mi> <mo>)</mo> </mrow> </mrow>
According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked forthQuadratic sum μ (n), Sk (n);
(53) sliding window dynamic estimation:
(531) sliding window average and kthQuadratic sum μ(w)(n)、Calculating:
Utilize the current time n obtained in On-line Estimation average and kthQuadratic sum μ (n), Sk(n) and in memory cell protected The n depositedLThe average and k at=n-L momentthQuadratic sum μ (nL)、Sk(nL), according to (4a)-(4d) formulas based on sliding window cumulant Method for dynamic estimation, ask for current time n sliding window average and kthQuadratic sum μ(w)(n)、
(532) calculating of sliding window 2-4 ranks cumulant,
The current time n obtained in being estimated using step (5.3.1) sliding window sliding window average and kthQuadratic sum μ(w)(n)、According to the sliding window cumulant the recursive calculative formula of (6a)-(6c) formulas, the sliding window 2-4 ranks for asking for current time n are tired out Accumulated amount(k=2~4);
(54) dynamic renewal memory cell
Calculating current time n sliding window average and kthQuadratic sum μ(w)(n)、Afterwards, it is moved in sliding window Before, with μ (n), S resulting in step (52) On-line Estimationk(n) content in value renewal dynamic storage cell so that slide After window slides a sample point, L average and the k before being still current time that are preserved in dynamic storage cellthQuadratic sum value.
6. the Method of Speech Endpoint Detection according to claim 1 based on the estimation of dynamic accumulative amount, it is characterised in that The control algolithm of the end-point detection based on sliding window kurtosis of the step (2) is as follows:
(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S2(n); Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal;
(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation2(n) kurtosis thresholding kurt is set With energy threshold amp, into step (23);
(203) assignment initiation parameter, voice segments sample points Speechcount=0;Clear band sample points Nonspeechcount=0, afterwards into step (204);
(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt;
(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point Start;And searched for backward from Start, into step (207);
(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, kurtosis value Kurtosis (n) is entered Row assignment circulates n=n+1, and return to step (25);
(207) by median S corresponding to subsequent point2(n) value is made comparisons with energy threshold amp;
(208) if median S corresponding to subsequent point2(n) value is more than energy threshold amp, represents the point in voice segments, voice segments sample This Speechcount adds 1, return to step (207);
(209) if median S corresponding to subsequent point2(n) value is not more than energy threshold amp, represents the point in clear band, clear band Sample points Nonspeechcount adds 1, into step (300);
(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount;
(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed Sample points Minspeechcount, into step (213);
(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point2(n) assignment circulation n, is carried out =n+1, and return to step (207);
(213) judge whether current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount;
(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points Nonspeechcount is 0, finds starting point again;
(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount;
Represent that the point is maintained at voice segments, wherein voice segments section
[Start,Start+speechcount+nonspeechcount-1]。
7. the Method of Speech Endpoint Detection according to claim 6 based on the estimation of dynamic accumulative amount, it is characterised in that The smallest sample points Minspeechcount that voice segments are allowed and the maximum sample points that clear band is allowed Maxnonspeechcount values are respectively 256 and 1024.
CN201510222045.8A 2015-04-30 2015-04-30 The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount Active CN104810018B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510222045.8A CN104810018B (en) 2015-04-30 2015-04-30 The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510222045.8A CN104810018B (en) 2015-04-30 2015-04-30 The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount

Publications (2)

Publication Number Publication Date
CN104810018A CN104810018A (en) 2015-07-29
CN104810018B true CN104810018B (en) 2017-12-12

Family

ID=53694806

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510222045.8A Active CN104810018B (en) 2015-04-30 2015-04-30 The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount

Country Status (1)

Country Link
CN (1) CN104810018B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105681017B (en) * 2016-01-14 2018-11-16 西安电子科技大学 Timing Synchronization loop circuit state detection method based on Higher Order Cumulants
CN105825871B (en) * 2016-03-16 2019-07-30 大连理工大学 A kind of end-point detecting method without leading mute section of voice
CN105869627A (en) * 2016-04-28 2016-08-17 成都之达科技有限公司 Vehicle-networking-based speech processing method
CN110266429A (en) * 2019-04-18 2019-09-20 四川大学 A kind of signal frame structure detection method based on Higher Order Cumulants
CN112017480B (en) * 2020-08-20 2021-07-13 南京航空航天大学 Dynamic memory planning method for green cruise track of aircraft
CN115376548B (en) * 2022-07-06 2023-06-20 华南理工大学 Audio signal voiced segment endpoint detection method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
CN103093758A (en) * 2011-11-04 2013-05-08 宏达国际电子股份有限公司 Electrical apparatus and voice signals receiving method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093758A (en) * 2011-11-04 2013-05-08 宏达国际电子股份有限公司 Electrical apparatus and voice signals receiving method thereof
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"基于子带能量特征的最优化语音端点检测算法研究";陈振标;《声学学报》;20051231;第30卷(第2期);全文 *
"基于子带能量的语音端点检测算法的研究";朱明明,吴晓培,罗雅琴;《工业控制计算机》;20131231;第26卷(第9期);全文 *
"基于改进高斯混合建模和短时稳定度的运动目标检测算法";张超;《电子与信息学报》;20121031;第34卷(第10期);全文 *
"基于滑动窗的混合高斯模型运动目标检测方法";周建英,吴小培,张超,吕钊;《电子与信息学报》;20130731;第35卷(第7期);全文 *
"基于经验模态分解和Teager峭度的语音端点检测_";张德祥,吴小培,吕钊,郭晓静;《仪器仪表学报》;20100331;第31卷(第3期);全文 *

Also Published As

Publication number Publication date
CN104810018A (en) 2015-07-29

Similar Documents

Publication Publication Date Title
CN104810018B (en) The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount
CN108198209B (en) People tracking method under the condition of shielding and scale change
CN103989462B (en) The extracting method of a kind of pulse wave fisrt feature point and second feature point
CN111445905A (en) Hybrid speech recognition network training method, hybrid speech recognition device and storage medium
WO2020173105A1 (en) Method and device for tracking maneuvering target
CN108629288B (en) Gesture recognition model training method, gesture recognition method and system
CN106373145B (en) Multi-object tracking method based on tracking segment confidence level and the study of distinction appearance
CN110632572A (en) Radar radiation source individual identification method and device based on unintentional phase modulation characteristics
Pan et al. Multiple fading factors-based strong tracking variational Bayesian adaptive Kalman filter
CN106443178A (en) IQuinn-Rife integration based sinusoidal signal frequency estimation method
CN105043384A (en) Modeling method of gyroscopic random noise ARMA model based on robust Kalman wave filtering
CN110443419A (en) Medium-term and long-term Runoff Forecast method based on ICEEMDAN and extreme learning machine
CN113850154A (en) Inverter IGBT (insulated Gate Bipolar transistor) micro fault feature extraction method based on multi-modal data
CN107544066A (en) One kind is based on the distributed asynchronous iteration Wave filter merging method of tracking before detection
CN111665050A (en) Rolling bearing fault diagnosis method based on clustering K-SVD algorithm
CN108898621B (en) Related filtering tracking method based on instance perception target suggestion window
CN108108015A (en) A kind of action gesture recognition methods based on mobile phone gyroscope and dynamic time warping
CN110689108A (en) Nonlinear system state estimation method
Du et al. Nonparametric regression function estimation for errors-in-variables models with validation data
CN106340304B (en) A kind of online sound enhancement method under the environment suitable for nonstationary noise
Luo et al. Target tracking based on amendatory Sage-Husa adaptive Kalman filtering
CN109614999A (en) A kind of data processing method, device, equipment and computer readable storage medium
CN108470016B (en) System state prediction method of industrial dryer
CN115035304A (en) Image description generation method and system based on course learning
CN113190960A (en) Parallel IMM maneuvering target tracking method based on non-equal-dimension state hybrid estimation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant