CN104810018B - The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount - Google Patents
The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount Download PDFInfo
- Publication number
- CN104810018B CN104810018B CN201510222045.8A CN201510222045A CN104810018B CN 104810018 B CN104810018 B CN 104810018B CN 201510222045 A CN201510222045 A CN 201510222045A CN 104810018 B CN104810018 B CN 104810018B
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- msup
- sliding window
- kurtosis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of sound end detecting method based on the estimation of dynamic accumulative amount, include the Higher Order Cumulants recurrence calculation based on sliding window and the end-point detection based on sliding window kurtosis.Higher Order Cumulants recurrence calculation based on sliding window refers to add rectangular window to raw sample data, carries out cumulant estimation to data in window, often slide a sample point and data in window are updated, realize the dynamic estimation of cumulant.End-point detection based on sliding window kurtosis is to combine the end-point detection that Higher Order Cumulants recurrence calculation method calculates sliding window kurtosis and energy feature carries out voice signal.The present invention has advantages below compared with prior art:The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount in the present invention is the end-point detection based on sliding window kurtosis, energy double threshold, parameter sliding window kurtosis to voice segments starting point with compared with strong sensitivity and to noise with more preferable antijamming capability, in a noisy environment with preferable robustness.
Description
Technical field
The present invention relates to data statistic analysis and field of signal processing, more particularly to it is a kind of based on the estimation of dynamic accumulative amount
The Method of Speech Endpoint Detection.
Background technology
Growing with man-machine interface, speech recognition has become current manual's intelligence and led with pattern-recognition
The emphasis of domain research.Voice is that the mankind are most important and the mode of intelligence transmission of most convenient, and realizes that man-machine interaction's is most direct
One of approach.Allow machine to identify voice command exactly and perform corresponding operation, there is great practical significance, phase
Research is closed to have broad application prospects in numerous areas such as medical science, military affairs and industry.As the front-end processing of speech recognition, language
The target of voice endpoint detection is to distinguish sound section of voice signal and unvoiced segments.The end-point detection of efficiently and accurately can be significantly
Mitigate the load of voice signal identifying system, reduce system response time, strengthening system robustness.Fourth order cumulant is that kurtosis is normal
It is used to the non-Gaussian system of metric signal.In speech signal processing, usually assume that noise approximation meets Gaussian Profile, its high-order
Cumulant is relatively small (Higher Order Cumulants of ideal Gaussian distribution are zero).Therefore, at the voice signal based on Higher Order Cumulants
Reason method often has more preferable interference free performance.But because the amount of calculation of the Higher Order Cumulants such as kurtosis is larger, and numerical value meter
The stability of calculation is also poor, therefore receives certain limitation in actual applications.
Classical cumulant algorithm for estimating is batch algorithms, and operand and memory data output are all very big, are not suitable for dynamic number
According to online processing, and algorithm is also more sensitive to " outlier (outlier) " in observed data.In order to solve above-mentioned ask
Topic, the On-line Estimation algorithm of cumulant are suggested, and effectively improve its dynamic estimation performance.But existing on-line Algorithm is base
Established in whole historical datas, and in actual applications, the statistical property of Recent data segment data often has more reference price
Value.And due to the non-stationary presence of data, big phase is generally not present between the historical data of early stage and recent data
Guan Xing.Therefore estimated accuracy can not only be improved by carrying out statistical analysis using whole historical datas, can may also conversely be covered true
Real data statistics.In addition, in the data acquisition under true environment, the significantly outlier occurred at random disturbs meeting
Very big error is caused to statistic analysis result.Because traditional on-line Algorithm relies on whole signal datas, therefore caused by outlier
Error has very strong transitivity.
The content of the invention
It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of voice based on the estimation of dynamic accumulative amount
Signal end detection method.
The present invention is achieved by the following technical solutions:A kind of speech sound signal terminal point inspection based on the estimation of dynamic accumulative amount
Survey method, comprises the following steps:
(1), the Higher Order Cumulants recursion estimation based on sliding window:Windowing operation is carried out to raw sample data, in window
Data carry out the estimation of cumulant, and slide a sample point every time and data in window are updated, and realize the dynamic of cumulant
State is estimated;
(2) end-point detection, based on sliding window kurtosis:With reference to the Higher Order Cumulants recursion based on sliding window of step (1)
Estimation, estimates sliding window kurtosis and energy feature, and the end points inspection of voice signal is carried out based on sliding window kurtosis and energy feature
Survey.
As the further optimization of such scheme, the estimation of the Higher Order Cumulants recursion based on sliding window of the step (1)
Comprise the following steps:
(11), sliding window:Adding window is carried out to raw sample data, realizes the Higher Order Cumulants to all sample points in window
Recurrence estimation;
(12), recurrence estimation:By the Higher Order Cumulants the recursive calculative formula derived to step (11) in sliding window,
Realize to all sample points in sliding window according to the recurrence estimation of Higher Order Cumulants.
As the further optimization of such scheme, step (11) sliding window is that raw sample data is entered by rectangular window
Row interception, often slide a sample point and data in rectangular window are updated, realize the higher order cumulants to all sample points in window
Measure recurrence estimation.
As the further optimization of such scheme, the Higher Order Cumulants the recursive calculative formula of step (12) is as follows:
Data set xaAnd xb, corresponding sample number is respectively na,nb;With reference to data set xaAnd xbNew data set x={ xa,xb,
Corresponding sample length is n=na+nb, average be μ, kthQuadratic sum is Sk;
Data set x={ xa,xb2-4 rank cumulant the recursive calculative formulas based on sliding windowRepresent as follows:
Wherein, the average for taking L data in window a length of L, n the moment sliding window of sliding window is μ(w)(n), kthQuadratic sum is
μ(w)(n) andTake n-L=nL,
Wherein, μ (n) and Sk(n) be respectively all historical datas before the n moment variance and kthQuadratic sum, μ (nL) and Sk
(nL) it is respectively nLThe variance and k of all historical datas before momentthQuadratic sum.
It is as follows as the further optimization of such scheme, the Updating Estimation of Higher Order Cumulants the recursive calculative formula:
(51) dynamic storage cell is opened up
To data set x={ xa,xbAverage be μ, kthQuadratic sum is Sk, calculate corresponding to sample point x (1)~x (L)
Mean μ, kthQuadratic sum Sk, the memory cell for opening up 4L sizes deposits the class value;
Wherein,
Len represents sample length, xiRepresent the sample point in sample, μxRepresent sample x average.
(52) calculation of initial value
Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cellthQuadratic sum μ
(n-1)、Sk(n-1) initial value current time n mean μ (n), and k, are calculatedthQuadratic sum Sk(n):
S2(n)=S2(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)
According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked forthQuadratic sum μ
(n)、Sk(n);
(53) sliding window dynamic estimation
(531) sliding window average and kthQuadratic sum μ(w)(n)、Calculating
Utilize the current time n obtained in On-line Estimation average and kthQuadratic sum μ (n), Sk(n) and in memory cell
The n preservedLThe average and k at=n-L momentthQuadratic sum μ (nL)、Sk(nL), according to being tired out based on sliding window for (4a)-(4d) formulas
Accumulated amount method for dynamic estimation, ask for current time n sliding window average and kthQuadratic sum μ(w)(n)、
(532) calculating of sliding window 2-4 ranks cumulant
The current time n obtained in being estimated using step (5.3.1) sliding window sliding window average and kthQuadratic sum μ(w)
(n)、According to the sliding window cumulant the recursive calculative formula of (6a)-(6c) formulas, current time n sliding window 2-4 is asked for
Rank cumulant(k=2~4);
(54) dynamic renewal memory cell
Calculating current time n sliding window average and kthQuadratic sum μ(w)(n)、Afterwards, moved in sliding window
Before, with μ (n), S resulting in step (52) On-line Estimationk(n) content in value renewal dynamic storage cell so that sliding
After dynamic window slides a sample point, L average and the k before being still current time that are preserved in dynamic storage cellthQuadratic sum value.
As the further optimization of such scheme, the control of the end-point detection based on sliding window kurtosis of the step (2)
Algorithm is as follows:
(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S2
(n);Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal;
(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation2(n) kurtosis door is set
Kurt and energy threshold amp is limited, into step (203);
(203) (203) assignment initiation parameter, the smallest sample points and clear band that initializing set voice segments are allowed
The maximum sample points allowed, setting voice segments sample points Speechcount=0;Clear band sample points
Nonspeechcount=0;Enter step (204) afterwards
(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt;
(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point
Start;And searched for backward from Start, into step (207);
(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, to kurtosis value Kurtosis
(n) assignment circulation n=n+1, and return to step (205) are carried out;
(207) by median S corresponding to subsequent point2(n) value is made comparisons with energy threshold amp;
(208) if median S corresponding to subsequent point2(n) value is more than energy threshold amp, represents the point in voice segments, voice
Section sample point Speechcount adds 1, return to step (207);
(209) if median S corresponding to subsequent point2(n) value is not more than energy threshold amp, represents the point in clear band, sky
White section sample points Nonspeechcount adds 1, into step (210);
(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed
Maxnonspeechcount;
(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed
Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed
Sample points Minspeechcount, into step (213);
(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed
Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point2(n) assignment circulation n, is carried out
=n+1, and return to step (207);
(213) judge whether current voice segments sample point Speechcount is more than the smallest sample that voice segments are allowed
Point Minspeechcount;
(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed
Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points
Nonspeechcount is 0, finds starting point again;
(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed
Minspeechcount;Represent that the point is maintained at voice segments, wherein voice segments section [Start, Start+speechcount+
nonspeechcount-1]。
The present invention has advantages below compared with prior art:The voice signal based on the estimation of dynamic accumulative amount in the present invention
End-point detecting method is the end-point detection algorithm based on sliding window kurtosis and energy double threshold, and parameter sliding window kurtosis is to voice segments
Starting point is with, with more preferable antijamming capability, in speech signal processing, usually assuming that and make an uproar compared with strong sensitivity and to noise
Sound approximation meets Gaussian Profile, and its Higher Order Cumulants is relatively small, and the Higher Order Cumulants of ideal Gaussian distribution are zero, based on high-order
The audio signal processing method of cumulant often has more preferable interference free performance, and the present invention has more preferable in a noisy environment
Robustness.
In the present invention based on dynamic accumulative amount estimation the Method of Speech Endpoint Detection, wherein provide based on slip
The accumulation value that window cumulant recurrence calculation method calculates differs very little with the value result that traditional direct calculation method calculates,
The test result of voice signal is shown, the error difference 10 for the 2-4 rank cumulants that two kinds of algorithms calculate-15、10-10With 10-7,
Can directly it ignore.
In actual applications, because the non-stationary presence of data, the historical data of early stage are general not with recent data
In the presence of big correlation, the statistical property of Recent data section often has more reference value, and the data under true environment
In gatherer process, the significantly outlier interference occurred at random can cause very big error to statistic analysis result.Base in the present invention
In dynamic accumulative amount estimation the Method of Speech Endpoint Detection, wherein provide based on sliding window cumulant recurrence calculation method
It is based on the sample data in sliding window, effectively avoids the problems of above-mentioned, there is stronger application under true environment
Value.It is also possible to apply the invention to bioelectrical signals (EEG signals EEG/ electro-ocular signal EOG/ core signal ECGs) and voice to believe
Number dynamic analysis in.
Brief description of the drawings
Fig. 1 is the structural representation of the sliding window of the present invention.
Fig. 2 is that the recursive algorithm of the sliding window cumulant estimation of the present invention realizes block diagram.
Fig. 3 is the operation time comparison diagram with directly calculating using sliding window recurrence calculation of the invention.
Fig. 4-A-Fig. 4-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the variance being directly calculated
The dynamic waveform and corresponding error curve schematic diagram of sliding window estimation.
Fig. 5-A-Fig. 5-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the deflection being directly calculated
Degree sliding window estimates obtained dynamic waveform and corresponding error curve schematic diagram.
Fig. 6-A-Fig. 6-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the kurtosis being directly calculated
Sliding window estimates dynamic waveform and corresponding error curve schematic diagram.
Fig. 7 is the control algolithm flow chart of the end-point detection based on sliding window kurtosis.
Fig. 8-A-Fig. 8-C be one section of the preferred embodiments of the present invention actual measurement voice signal oscillogram and corresponding kurtosis and
Energy variation schematic diagram.
The change of Fig. 9-A-Fig. 9-F-1 alone word voice of (SNR=5dB) and its corresponding kurtosis under different noise circumstances
Change schematic diagram.
The change of Figure 10-A-Figure 10-F-1 continuous speech of (SNR=5dB) and its corresponding kurtosis under different noise circumstances
Change schematic diagram.
Embodiment
Embodiments of the invention are elaborated below, the present embodiment is carried out lower premised on technical solution of the present invention
Implement, give detailed embodiment and specific operating process, but protection scope of the present invention is not limited to following implementation
Example.
Referring to Fig. 1, for the structural representation of the sliding window of the present invention.In the present embodiment, sliding window is by rectangular window pair
Raw sample data is intercepted, and to obtain the pending data of equal length at any time, the estimation of cumulant is base
In L sample data in sliding window, a sample point is slided every time data in window are updated, then re-start cumulant
Estimation, so as to realize the dynamic estimation of cumulant.
Any two data sets xaAnd xb, corresponding sample number is respectively na,nb, data set xaAnd xbNew data set with reference to after
X={ xa,xbSample length be n=na+nb, its mean μ and kthQuadratic sum SkFor:
Wherein, μ and SkSample x average and k is represented respectivelythQuadratic sum, wherein subscript (a), (b) represent statistic respectively
It is to be based on set of data samples xaAnd xb's.μ and SkDefinition difference it is as follows:
Wherein Len represents sample length, xiRepresent the sample point in sample, μxRepresent sample x average.Wherein, μ is represented
Variance, facilitate for statement, hereinafter SkIt is referred to as kthQuadratic sum.
According to sliding window thought, to raw sample data adding window, the estimation of cumulant is based on sample number in sliding window
According to, with reference to (1a)-(1d) formulas, sliding window estimation can be regarded as to xbThe cumulant estimation of sample set data.By (1a)-(1d)
It is rewritten into following form:
Make the average and k of L data in sliding window window a length of L, n moment sliding windowthQuadratic sum is respectively μ(w)(n) andIn order to express easily, n-L=n is madeL, then in sliding window data equivalent to set of data samples xb, it is all before sliding window to go through
History data are equivalent to set of data samples xa, then the rewriting of (3a) (3d)-formula is as follows:
μ (n) and Sk(n) be all historical datas before the n moment variance and kthQuadratic sum, μ (nL) and Sk(nL) it is nLMoment
The variance and k of preceding all historical datasthQuadratic sum.
Due to μ and SkThere is following relation with 2-4 rank cumulants:
Wherein σ2、C3、C4Variance, degree of skewness and the kurtosis of sample are represented respectively.
The 2-4 ranks cumulant and S then provided according to (5a)-(5c) formulaskCalculated relationship, sliding window cumulant can be obtained
The recursive calculative formula is as follows:
WhereinThe 2-4 rank cumulants based on sliding window are represented respectively.
Referring to Fig. 2, block diagram is realized for the recursive algorithm of the sliding window cumulant estimation of the present invention, sliding window in the present embodiment
The recursive algorithm implementation process of cumulant estimation, comprises the following steps:
Step 1, utilize (2a)-(2b) formula direct calculation method to calculate first sample point to l-th sample point, that is, originate
Sample data in the window of position, corresponding variance μ and kthQuadratic sum Sk, and preserve.
Step 2, calculation of initial value:Sliding window slides a sample point,
Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cellthQuadratic sum μ
(n-1)、Sk(n-1) initial value current time n mean μ (n), and k, are calculatedthQuadratic sum Sk(n):
S2(n)=S2(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)
According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked forthQuadratic sum μ
(n)、Sk(n);
Step 3, sliding window dynamic estimation:According to (4a)-(4d) formula sliding window cumulant recursive algorithms, using in step 2
The preceding moment n solved variance μ (n) and kthQuadratic sum SkAnd the n that has preserved (n)LVariance μ (the n at=n-L momentL) and kth
Quadratic sum Sk(nL) solve current time n sliding window variance μ(w)And sliding window k (n)thQuadratic sumAnd preserve μ
And k (n)thQuadratic sum Sk(n) this class value, the data in memory cell are updated, after sliding a sample point so as to sliding window, moved
L average and the k before being still current time preserved in state memory cellthQuadratic sum value.2~4 are solved further according to (6a)-(6c) formulas
Rank cumulant(k=2~4).
Step 4, the content for updating memory cell:Utilize the current time n calculated in On-line Estimation module variance μ
And k (n)thQuadratic sum Sk(n) content of memory cell is updated, after sliding a sample point so as to sliding window, is deposited in memory cell
What is put is still the average and k of L sample point before current timethQuadratic sum.
Involved sliding window cumulant recurrence calculation method has higher operation efficiency in the present invention.The present invention uses
Recursive algorithm dynamically solves cumulant, and algorithm operation quantity is greatly lowered, and operation efficiency is substantially better than traditional directly calculating.Table
1 provide is directly to calculate to carry out variance μ based on (5a)-(5c) formula recurrence calculation and (2a)-(2b) formulas(w)(n) andEstimate
" addition " and " multiplication " operation times needed for timing.
The computation complexity of table 1 compares
It can be seen that, the operand directly calculated is directly proportional to L, and the operand of recurrence calculation and sliding window from table 1
Length is unrelated.When carrying out statistical analysis to measured data, L values are generally large, therefore in calculating speed, recursive algorithm is than straight
Connecing calculating has obvious advantage.
Referring to Fig. 3, to utilize the sliding window recurrence calculation of the present invention and the operation time comparison diagram directly calculated.From figure
As can be seen that in the case of sample data identical, the operation time of recurrence calculation is much smaller than directly calculating.Directly calculate computing
Time is larger with the increase ascensional range of sample data, in the range of 1 to 10, when sample data points are 100000,
Operation time was close to 40 seconds, and the operation time ascensional range of recursive algorithm is 10-2To 10-1In the range of, and much smaller than direct
Calculate.
Referring to Fig. 4-A- Fig. 4-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated
The waveform that error curve corresponding to the dynamic waveform of variance sliding window estimation changes with ordinate amplitude (amplitude) is illustrated
Figure.Wherein, Fig. 4-A are the oscillograms of one section of actual measurement voice signal, and Fig. 4-B are the oscillograms for the variance yields that recurrence calculation obtains,
Fig. 4-C are the oscillograms for the variance yields being directly calculated, and Fig. 4-D are the oscillograms of the error amount of two kinds of computational methods.From figure
In it can be seen that both error amount is 10-15, error amount can be ignored.
Referring to Fig. 5-A- Fig. 5-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated
Degree of skewness sliding window estimates what obtained dynamic waveform and corresponding error curve changed with ordinate amplitude (amplitude)
Waveform diagram.Wherein, Fig. 5-A are the oscillograms of one section of actual measurement voice signal, and Fig. 5-B are the deflection angle value that recurrence calculation obtains
Oscillogram, Fig. 5-C are the oscillograms for the deflection angle value being directly calculated, and Fig. 5-D are the error amounts of two kinds of computational methods
Oscillogram.As can be seen from the figure both error amounts are 10-10, error amount can be ignored.
Referring to Fig. 6-A- Fig. 6-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated
Kurtosis sliding window estimates that the waveform that dynamic waveform and corresponding error curve change with ordinate amplitude (amplitude) is illustrated
Figure.Wherein, Fig. 6-A are the oscillograms of one section of actual measurement voice signal, and Fig. 6-B are the oscillograms for the kurtosis value that recurrence calculation obtains,
Fig. 6-C are the oscillograms for the kurtosis value being directly calculated, and Fig. 6-D are the oscillograms of the error amount of two kinds of computational methods.From figure
In it can be seen that both error amount is 10-7, error amount can be ignored.
With reference to figure 7, for the control algolithm flow of the end-point detection based on sliding window kurtosis of the preferred embodiments of the present invention
Figure, the caused median S during sliding window kurtosis recurrence calculation is can be seen that from (4d) formula2It is used as sliding window
The energy feature of signal, therefore energy feature parameter needed for detection is without extra computation.Parameter definition is as follows shown in figure:
Speechcount represents voice segments sample points;Nonspeechcount represents clear band sample points;Minspeechcount
Represent the smallest sample points that voice segments are allowed;Maxnonspeechcount represents the maximum sample point that clear band is allowed
Number;Minspeechcount and Maxnonspeechcount are respectively set to 256 and 1024.For the specific voice letter of certain section
Number, we rule of thumb set kurtosis threshold value, are calculated in experiment using formula threshold=max (Kurtosis)/10
Its kurtosis thresholding kurt.
The control algolithm of the end-point detection based on sliding window kurtosis of the present invention, comprises the following steps:
(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S2
(n);
Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal
Degree;
(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation2(n) kurtosis door is set
Kurt and energy threshold amp is limited, into step (203);
(203) assignment initiation parameter, afterwards into step (204)
Voice segments sample points Speechcount=0;Clear band sample points Nonspeechcount=0;
The smallest sample points Minspeechcount that voice segments are allowed and the maximum sample points that clear band is allowed
Maxnonspeechcount is respectively set to 256 and 1024;
(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt;
(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point
Start;And searched for backward from Start, into step (27);
(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, to kurtosis value Kurtosis
(n) assignment circulation n=n+1, and return to step (205) are carried out;
(207) by median S corresponding to subsequent point2(n) value is made comparisons with energy threshold amp;
(208) if median S corresponding to subsequent point2(n) value is more than energy threshold amp, represents the point in voice segments, voice
Section sample point Speechcount adds 1, return to step (207);
(209) if median S corresponding to subsequent point2(n) value is not more than energy threshold amp, represents the point in clear band, sky
White section sample points Nonspeechcount adds 1, into step (210);
(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed
Maxnonspeechcount;
(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed
Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed
Sample points Minspeechcount, into step (213);
(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed
Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point2(n) assignment circulation n, is carried out
=n+1, and return to step (207);
(213) judge whether current voice segments sample point Speechcount is more than the smallest sample that voice segments are allowed
Point Minspeechcount;
(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed
Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points
Nonspeechcount is 0, finds starting point again;
(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed
Minspeechcount;Represent that the point is maintained at voice segments, wherein voice segments section [Start, Start+speechcount+
nonspeechcount-1]。
It is one section of actual measurement voice signal oscillogram of the preferred embodiments of the present invention and corresponding high and steep referring to Fig. 8-A- Fig. 8-C
Degree and energy variation schematic diagram.Fig. 8-A are the oscillogram of the primary speech signal of one section of actual measurement voice signal, and Fig. 8-B are actual measurement
The oscillogram of kurtosis value corresponding to voice, display kurtosis value can occur significantly to change when entering voice segments, therefore can be with
This is as the basis for estimation for detecting voice segments starting point.Fig. 8-C are to survey energy variation schematic diagram corresponding to voice, energy in figure
Waveform then reflects the entire change of voice and non-speech segment energy.Intuitively both inspections being combined with beneficial to improvement voice segments
Survey precision.
Referring to Fig. 9-A-Fig. 9-F-1, for the alone word voice of (SNR=5dB) under different noise circumstances and its correspond to high and steep
Degree.One section of alone word voice and its signal waveform under 5dBwhite, pink, m109, f16, babble noise in this example
And its oscillogram that corresponding sliding window kurtosis changes with ordinate amplitude (amplitude).Figure 10-A are one section of isolated word
The oscillogram of the primary speech signal of voice, Figure 10-A-1 are the ripple of kurtosis value corresponding to the raw tone of one section of alone word voice
Shape figure.Figure 10-B are the oscillogram that one section of alone word voice adds white noisy speech signals, and Figure 10-B are one section of alone word voice
Add the oscillogram of kurtosis value corresponding to 5dBwhite noise speech.Figure 10-C are that one section of alone word voice adds pink noise speech to believe
Number oscillogram, Figure 10-C-1 are that one section of alone word voice adds the oscillogram of kurtosis value corresponding to 5dBpink noise speech.Figure
10-D is the oscillogram that one section of alone word voice adds m109 noisy speech signals, and Figure 10-D-1 are that one section of alone word voice adds
The oscillogram of kurtosis value corresponding to 5dBm109 noise speech.Figure 10-E are that one section of alone word voice adds f16 noisy speech signals
Oscillogram, Figure 10-E-1 are the oscillogram that one section of alone word voice adds kurtosis value corresponding to 5dBf16 noise speech.Figure 10-F are
One section of alone word voice adds the oscillogram of babble noisy speech signals, and Figure 10-F-1 are that one section of alone word voice adds
The oscillogram of kurtosis value corresponding to 5dBbabble noise speech.It can be seen that the isolated word for being mixed with different noises
Voice signal, its sliding window kurtosis have similar waveform to the sliding window kurtosis of clean speech signal, when entering voice segments
Kurtosis value can occur significantly to change.
Referring to Figure 10-A-Figure 10-F-1, the continuous speech of (SNR=5dB) and its corresponding kurtosis under different noise circumstances
Change schematic diagram.One section of continuous speech and its under 5dBwhite, pink, m109, f16, babble noise in this example
The oscillogram that signal waveform and its corresponding sliding window kurtosis change with ordinate amplitude (amplitude).Figure 10-A are one
The oscillogram of the primary speech signal of section continuous speech, Figure 10-A-1 are kurtosis value corresponding to the raw tone of one section of continuous speech
Oscillogram.Figure 10-B are the oscillogram that one section of continuous speech adds white noisy speech signals, and Figure 10-B are one section of continuous speech
Add the oscillogram of kurtosis value corresponding to 5dBwhite noise speech.Figure 10-C are that one section of continuous speech adds pink noisy speech signals
Oscillogram, Figure 10-C-1 are that one section of continuous speech adds the oscillogram of kurtosis value corresponding to 5dBpink noise speech.Figure 10-D are
One section of continuous speech adds the oscillogram of m109 noisy speech signals, and Figure 10-D-1 are that one section of continuous speech adds 5dBm109 noise languages
The oscillogram of kurtosis value corresponding to sound.Figure 10-E add the oscillogram of f16 noisy speech signals, Figure 10-E-1 for one section of continuous speech
Add the oscillogram of kurtosis value corresponding to 5dBf16 noise speech for one section of continuous speech.Figure 10-F are that one section of continuous speech adds
The oscillogram of babble noisy speech signals, Figure 10-F-1 add high and steep corresponding to 5dBbabble noise speech for one section of continuous speech
The oscillogram of angle value.It can be seen that the continuous speech signal for being mixed with different noises, its sliding window kurtosis with it is pure
The sliding window kurtosis of voice signal has similar waveform, and when entering voice segments, kurtosis value can occur significantly to change.
The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount in the present invention be based on sliding window kurtosis and
The end-point detection algorithm of energy double threshold, parameter sliding window kurtosis have compared with strong sensitivity to voice segments starting point and had to noise
There is more preferable antijamming capability, in speech signal processing, usually assume that noise approximation meets Gaussian Profile, its Higher Order Cumulants
Relatively small, the Higher Order Cumulants of ideal Gaussian distribution are zero, and the audio signal processing method based on Higher Order Cumulants often has
There is more preferable interference free performance, the present invention has more preferable robustness in a noisy environment.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.
Claims (7)
1. a kind of the Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount, it is characterised in that comprise the following steps:
(1), the Higher Order Cumulants recursion estimation based on sliding window:Windowing operation is carried out to raw sample data, to data in window
The estimation of cumulant is carried out, and slides a sample point every time and data in window is updated, realizes that the dynamic of cumulant is estimated
Meter;
(2) end-point detection, based on sliding window kurtosis:Estimated with reference to the Higher Order Cumulants recursion based on sliding window of step (1),
Sliding window kurtosis and energy feature are estimated, the end-point detection of voice signal is carried out based on sliding window kurtosis and energy feature.
2. the Method of Speech Endpoint Detection according to claim 1 based on the estimation of dynamic accumulative amount, it is characterised in that
The estimation of the Higher Order Cumulants recursion based on sliding window of the step (1) comprises the following steps:
(11), sliding window:Adding window is carried out to raw sample data, realizes the Higher Order Cumulants recursion to all sample points in window
Estimation;
(12), recurrence estimation:By the Higher Order Cumulants the recursive calculative formula derived to step (11) in sliding window, realize
To all sample points in sliding window according to the recurrence estimation of Higher Order Cumulants.
3. the Method of Speech Endpoint Detection according to claim 2 based on the estimation of dynamic accumulative amount, it is characterised in that
Step (11) sliding window is that raw sample data is intercepted by rectangular window, often slides a sample point in rectangular window
Data are updated, and realize the Higher Order Cumulants recurrence estimation to all sample points in window.
4. the Method of Speech Endpoint Detection according to claim 2 based on the estimation of dynamic accumulative amount, it is characterised in that:
The Higher Order Cumulants the recursive calculative formula of step (12) is as follows:
Data set xaAnd xb, corresponding sample number is respectively na,nb;With reference to data set xaAnd xbNew data set x={ xa,xb, it is corresponding
Sample length is n=na+nb, average be μ, kthQuadratic sum is Sk;
Data set x={ xa,xb2-4 rank cumulant the recursive calculative formulas based on sliding window Table
Show as follows:
<mrow>
<msubsup>
<mi>C</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msubsup>
<mi>S</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
</mrow>
<mi>L</mi>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>6</mn>
<mi>a</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>C</mi>
<mn>3</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msqrt>
<mi>L</mi>
</msqrt>
<msubsup>
<mi>S</mi>
<mn>3</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
</mrow>
<msup>
<mrow>
<mo>(</mo>
<msqrt>
<mrow>
<msubsup>
<mi>S</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
</mrow>
</msqrt>
<mo>)</mo>
</mrow>
<mn>3</mn>
</msup>
</mfrac>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>6</mn>
<mi>b</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>C</mi>
<mn>4</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<msubsup>
<mi>LS</mi>
<mn>4</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
</mrow>
<msup>
<mrow>
<mo>&lsqb;</mo>
<msubsup>
<mi>S</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>2</mn>
</msup>
</mfrac>
<mo>-</mo>
<mn>3</mn>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>6</mn>
<mi>c</mi>
<mo>)</mo>
</mrow>
</mrow>
Wherein, the average for taking L data in window a length of L, n the moment sliding window of sliding window is μ(w)(n), kthQuadratic sum isTake n-L=nL,
<mrow>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mi>n</mi>
<mi>L</mi>
</mfrac>
<mo>&lsqb;</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mo>+</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mi>a</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>S</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>S</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>S</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mfrac>
<mrow>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mi>L</mi>
</mrow>
<mi>n</mi>
</mfrac>
<msup>
<mrow>
<mo>&lsqb;</mo>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mi>b</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msubsup>
<mi>S</mi>
<mn>3</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>S</mi>
<mn>3</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>S</mi>
<mn>3</mn>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>Ln</mi>
<mi>L</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>2</mn>
<mi>L</mi>
<mo>)</mo>
</mrow>
<mfrac>
<msup>
<mrow>
<mo>&lsqb;</mo>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>3</mn>
</msup>
<msup>
<mi>n</mi>
<mn>2</mn>
</msup>
</mfrac>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>-</mo>
<mn>3</mn>
<mo>&lsqb;</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<msubsup>
<mi>S</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>-</mo>
<msub>
<mi>LS</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mfrac>
<mrow>
<mo>&lsqb;</mo>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mi>n</mi>
</mfrac>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mi>c</mi>
<mo>)</mo>
</mrow>
</mrow>
1
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msubsup>
<mi>S</mi>
<mn>4</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>S</mi>
<mn>4</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>S</mi>
<mn>4</mn>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>Ln</mi>
<mi>L</mi>
</msub>
<mrow>
<mo>(</mo>
<msup>
<mi>n</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<mn>3</mn>
<mi>n</mi>
<mi>L</mi>
<mo>+</mo>
<mn>3</mn>
<msup>
<mi>L</mi>
<mn>2</mn>
</msup>
<mo>)</mo>
</mrow>
<mfrac>
<msup>
<mrow>
<mo>&lsqb;</mo>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>4</mn>
</msup>
<msup>
<mi>n</mi>
<mn>3</mn>
</msup>
</mfrac>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>-</mo>
<mn>6</mn>
<mo>&lsqb;</mo>
<msup>
<mi>L</mi>
<mn>2</mn>
</msup>
<msub>
<mi>S</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msup>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mn>2</mn>
</msup>
<msubsup>
<mi>S</mi>
<mn>2</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mfrac>
<msup>
<mrow>
<mo>&lsqb;</mo>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mi>L</mi>
</msub>
<mo>&rsqb;</mo>
</mrow>
<mn>2</mn>
</msup>
<mi>n</mi>
</mfrac>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>-</mo>
<mn>4</mn>
<mo>&lsqb;</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<msubsup>
<mi>S</mi>
<mn>3</mn>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msubsup>
<mo>-</mo>
<msub>
<mi>LS</mi>
<mn>3</mn>
</msub>
<mrow>
<mo>(</mo>
<msub>
<mi>n</mi>
<mi>L</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mfrac>
<mrow>
<mo>&lsqb;</mo>
<msup>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>w</mi>
<mo>)</mo>
</mrow>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mi>L</mi>
</msub>
<mo>&rsqb;</mo>
</mrow>
<mi>n</mi>
</mfrac>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mi>d</mi>
<mo>)</mo>
</mrow>
</mrow>
Wherein, μ (n) and Sk(n) be respectively all historical datas before the n moment variance and kthQuadratic sum, μ (nL) and Sk(nL) point
Wei not nLThe variance and k of all historical datas before momentthQuadratic sum.
5. the Method of Speech Endpoint Detection according to claim 4 based on the estimation of dynamic accumulative amount, it is characterised in that:
The Updating Estimation of Higher Order Cumulants the recursive calculative formula is as follows:
(51) dynamic storage cell is opened up:
To data set x={ xa,xbAverage be μ, kthQuadratic sum is Sk, calculate the average corresponding to sample point x (1)~x (L)
μ, kthQuadratic sum Sk, the memory cell for opening up 4L sizes deposits the class value;
Wherein,
<mrow>
<msub>
<mi>S</mi>
<mi>k</mi>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>L</mi>
<mi>e</mi>
<mi>n</mi>
</mrow>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>&mu;</mi>
<mi>x</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>k</mi>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mi>b</mi>
<mo>)</mo>
</mrow>
</mrow>
Len represents sample length, xiRepresent the sample point in sample, μxRepresent sample x average;
(52) calculation of initial value
Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cellthQuadratic sum μ (n-1),
Sk(n-1) initial value current time n mean μ (n), and k, are calculatedthQuadratic sum Sk(n):
<mrow>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<mo>&lsqb;</mo>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mi>a</mi>
<mo>)</mo>
</mrow>
</mrow>
S2(n)=S2(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msub>
<mi>S</mi>
<mn>3</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>S</mi>
<mn>3</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mfrac>
<mrow>
<mn>3</mn>
<mo>&lsqb;</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<msub>
<mi>S</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
<mi>n</mi>
</mfrac>
<mo>+</mo>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mfrac>
<mrow>
<msup>
<mrow>
<mo>&lsqb;</mo>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>3</mn>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<msup>
<mi>n</mi>
<mn>2</mn>
</msup>
</mfrac>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mi>c</mi>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mtable>
<mtr>
<mtd>
<mrow>
<msub>
<mi>S</mi>
<mn>4</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>S</mi>
<mn>4</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>+</mo>
<mfrac>
<mrow>
<msup>
<mrow>
<mo>&lsqb;</mo>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>4</mn>
</msup>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mrow>
<mo>(</mo>
<msup>
<mi>n</mi>
<mn>2</mn>
</msup>
<mo>-</mo>
<mn>3</mn>
<mi>n</mi>
<mo>+</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
<msup>
<mi>n</mi>
<mn>3</mn>
</msup>
</mfrac>
<mo>+</mo>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mfrac>
<mrow>
<mn>6</mn>
<msup>
<mrow>
<mo>&lsqb;</mo>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mn>2</mn>
</msup>
<msub>
<mi>S</mi>
<mn>2</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
<msup>
<mi>n</mi>
<mn>2</mn>
</msup>
</mfrac>
<mo>-</mo>
<mfrac>
<mrow>
<mn>4</mn>
<mo>&lsqb;</mo>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>&mu;</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<msub>
<mi>S</mi>
<mn>3</mn>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
<mi>n</mi>
</mfrac>
</mrow>
</mtd>
</mtr>
</mtable>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mi>d</mi>
<mo>)</mo>
</mrow>
</mrow>
According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked forthQuadratic sum μ (n), Sk
(n);
(53) sliding window dynamic estimation:
(531) sliding window average and kthQuadratic sum μ(w)(n)、Calculating:
Utilize the current time n obtained in On-line Estimation average and kthQuadratic sum μ (n), Sk(n) and in memory cell protected
The n depositedLThe average and k at=n-L momentthQuadratic sum μ (nL)、Sk(nL), according to (4a)-(4d) formulas based on sliding window cumulant
Method for dynamic estimation, ask for current time n sliding window average and kthQuadratic sum μ(w)(n)、
(532) calculating of sliding window 2-4 ranks cumulant,
The current time n obtained in being estimated using step (5.3.1) sliding window sliding window average and kthQuadratic sum μ(w)(n)、According to the sliding window cumulant the recursive calculative formula of (6a)-(6c) formulas, the sliding window 2-4 ranks for asking for current time n are tired out
Accumulated amount(k=2~4);
(54) dynamic renewal memory cell
Calculating current time n sliding window average and kthQuadratic sum μ(w)(n)、Afterwards, it is moved in sliding window
Before, with μ (n), S resulting in step (52) On-line Estimationk(n) content in value renewal dynamic storage cell so that slide
After window slides a sample point, L average and the k before being still current time that are preserved in dynamic storage cellthQuadratic sum value.
6. the Method of Speech Endpoint Detection according to claim 1 based on the estimation of dynamic accumulative amount, it is characterised in that
The control algolithm of the end-point detection based on sliding window kurtosis of the step (2) is as follows:
(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S2(n);
Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal;
(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation2(n) kurtosis thresholding kurt is set
With energy threshold amp, into step (23);
(203) assignment initiation parameter, voice segments sample points Speechcount=0;Clear band sample points
Nonspeechcount=0, afterwards into step (204);
(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt;
(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point
Start;And searched for backward from Start, into step (207);
(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, kurtosis value Kurtosis (n) is entered
Row assignment circulates n=n+1, and return to step (25);
(207) by median S corresponding to subsequent point2(n) value is made comparisons with energy threshold amp;
(208) if median S corresponding to subsequent point2(n) value is more than energy threshold amp, represents the point in voice segments, voice segments sample
This Speechcount adds 1, return to step (207);
(209) if median S corresponding to subsequent point2(n) value is not more than energy threshold amp, represents the point in clear band, clear band
Sample points Nonspeechcount adds 1, into step (300);
(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed
Maxnonspeechcount;
(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed
Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed
Sample points Minspeechcount, into step (213);
(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed
Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point2(n) assignment circulation n, is carried out
=n+1, and return to step (207);
(213) judge whether current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed
Minspeechcount;
(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed
Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points
Nonspeechcount is 0, finds starting point again;
(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed
Minspeechcount;
Represent that the point is maintained at voice segments, wherein voice segments section
[Start,Start+speechcount+nonspeechcount-1]。
7. the Method of Speech Endpoint Detection according to claim 6 based on the estimation of dynamic accumulative amount, it is characterised in that
The smallest sample points Minspeechcount that voice segments are allowed and the maximum sample points that clear band is allowed
Maxnonspeechcount values are respectively 256 and 1024.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510222045.8A CN104810018B (en) | 2015-04-30 | 2015-04-30 | The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510222045.8A CN104810018B (en) | 2015-04-30 | 2015-04-30 | The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104810018A CN104810018A (en) | 2015-07-29 |
CN104810018B true CN104810018B (en) | 2017-12-12 |
Family
ID=53694806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510222045.8A Active CN104810018B (en) | 2015-04-30 | 2015-04-30 | The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104810018B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105681017B (en) * | 2016-01-14 | 2018-11-16 | 西安电子科技大学 | Timing Synchronization loop circuit state detection method based on Higher Order Cumulants |
CN105825871B (en) * | 2016-03-16 | 2019-07-30 | 大连理工大学 | A kind of end-point detecting method without leading mute section of voice |
CN105869627A (en) * | 2016-04-28 | 2016-08-17 | 成都之达科技有限公司 | Vehicle-networking-based speech processing method |
CN110266429A (en) * | 2019-04-18 | 2019-09-20 | 四川大学 | A kind of signal frame structure detection method based on Higher Order Cumulants |
CN112017480B (en) * | 2020-08-20 | 2021-07-13 | 南京航空航天大学 | Dynamic memory planning method for green cruise track of aircraft |
CN115376548B (en) * | 2022-07-06 | 2023-06-20 | 华南理工大学 | Audio signal voiced segment endpoint detection method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779508A (en) * | 2012-03-31 | 2012-11-14 | 安徽科大讯飞信息科技股份有限公司 | Speech corpus generating device and method, speech synthesizing system and method |
CN103093758A (en) * | 2011-11-04 | 2013-05-08 | 宏达国际电子股份有限公司 | Electrical apparatus and voice signals receiving method thereof |
-
2015
- 2015-04-30 CN CN201510222045.8A patent/CN104810018B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093758A (en) * | 2011-11-04 | 2013-05-08 | 宏达国际电子股份有限公司 | Electrical apparatus and voice signals receiving method thereof |
CN102779508A (en) * | 2012-03-31 | 2012-11-14 | 安徽科大讯飞信息科技股份有限公司 | Speech corpus generating device and method, speech synthesizing system and method |
Non-Patent Citations (5)
Title |
---|
"基于子带能量特征的最优化语音端点检测算法研究";陈振标;《声学学报》;20051231;第30卷(第2期);全文 * |
"基于子带能量的语音端点检测算法的研究";朱明明,吴晓培,罗雅琴;《工业控制计算机》;20131231;第26卷(第9期);全文 * |
"基于改进高斯混合建模和短时稳定度的运动目标检测算法";张超;《电子与信息学报》;20121031;第34卷(第10期);全文 * |
"基于滑动窗的混合高斯模型运动目标检测方法";周建英,吴小培,张超,吕钊;《电子与信息学报》;20130731;第35卷(第7期);全文 * |
"基于经验模态分解和Teager峭度的语音端点检测_";张德祥,吴小培,吕钊,郭晓静;《仪器仪表学报》;20100331;第31卷(第3期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104810018A (en) | 2015-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104810018B (en) | The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount | |
CN108198209B (en) | People tracking method under the condition of shielding and scale change | |
CN103989462B (en) | The extracting method of a kind of pulse wave fisrt feature point and second feature point | |
CN111445905A (en) | Hybrid speech recognition network training method, hybrid speech recognition device and storage medium | |
WO2020173105A1 (en) | Method and device for tracking maneuvering target | |
CN108629288B (en) | Gesture recognition model training method, gesture recognition method and system | |
CN106373145B (en) | Multi-object tracking method based on tracking segment confidence level and the study of distinction appearance | |
CN110632572A (en) | Radar radiation source individual identification method and device based on unintentional phase modulation characteristics | |
Pan et al. | Multiple fading factors-based strong tracking variational Bayesian adaptive Kalman filter | |
CN106443178A (en) | IQuinn-Rife integration based sinusoidal signal frequency estimation method | |
CN105043384A (en) | Modeling method of gyroscopic random noise ARMA model based on robust Kalman wave filtering | |
CN110443419A (en) | Medium-term and long-term Runoff Forecast method based on ICEEMDAN and extreme learning machine | |
CN113850154A (en) | Inverter IGBT (insulated Gate Bipolar transistor) micro fault feature extraction method based on multi-modal data | |
CN107544066A (en) | One kind is based on the distributed asynchronous iteration Wave filter merging method of tracking before detection | |
CN111665050A (en) | Rolling bearing fault diagnosis method based on clustering K-SVD algorithm | |
CN108898621B (en) | Related filtering tracking method based on instance perception target suggestion window | |
CN108108015A (en) | A kind of action gesture recognition methods based on mobile phone gyroscope and dynamic time warping | |
CN110689108A (en) | Nonlinear system state estimation method | |
Du et al. | Nonparametric regression function estimation for errors-in-variables models with validation data | |
CN106340304B (en) | A kind of online sound enhancement method under the environment suitable for nonstationary noise | |
Luo et al. | Target tracking based on amendatory Sage-Husa adaptive Kalman filtering | |
CN109614999A (en) | A kind of data processing method, device, equipment and computer readable storage medium | |
CN108470016B (en) | System state prediction method of industrial dryer | |
CN115035304A (en) | Image description generation method and system based on course learning | |
CN113190960A (en) | Parallel IMM maneuvering target tracking method based on non-equal-dimension state hybrid estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |