CN104810018B

CN104810018B - The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount

Info

Publication number: CN104810018B
Application number: CN201510222045.8A
Authority: CN
Inventors: 吴小培; 吕钊; 罗雅琴; 张超; 周蚌艳; 张磊; 郭晓静; 高湘萍
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2015-04-30
Filing date: 2015-04-30
Publication date: 2017-12-12
Anticipated expiration: 2035-04-30
Also published as: CN104810018A

Abstract

The invention discloses a kind of sound end detecting method based on the estimation of dynamic accumulative amount, include the Higher Order Cumulants recurrence calculation based on sliding window and the end-point detection based on sliding window kurtosis.Higher Order Cumulants recurrence calculation based on sliding window refers to add rectangular window to raw sample data, carries out cumulant estimation to data in window, often slide a sample point and data in window are updated, realize the dynamic estimation of cumulant.End-point detection based on sliding window kurtosis is to combine the end-point detection that Higher Order Cumulants recurrence calculation method calculates sliding window kurtosis and energy feature carries out voice signal.The present invention has advantages below compared with prior art：The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount in the present invention is the end-point detection based on sliding window kurtosis, energy double threshold, parameter sliding window kurtosis to voice segments starting point with compared with strong sensitivity and to noise with more preferable antijamming capability, in a noisy environment with preferable robustness.

Description

The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount

Technical field

The present invention relates to data statistic analysis and field of signal processing, more particularly to it is a kind of based on the estimation of dynamic accumulative amount The Method of Speech Endpoint Detection.

Background technology

Growing with man-machine interface, speech recognition has become current manual's intelligence and led with pattern-recognition The emphasis of domain research.Voice is that the mankind are most important and the mode of intelligence transmission of most convenient, and realizes that man-machine interaction's is most direct One of approach.Allow machine to identify voice command exactly and perform corresponding operation, there is great practical significance, phase Research is closed to have broad application prospects in numerous areas such as medical science, military affairs and industry.As the front-end processing of speech recognition, language The target of voice endpoint detection is to distinguish sound section of voice signal and unvoiced segments.The end-point detection of efficiently and accurately can be significantly Mitigate the load of voice signal identifying system, reduce system response time, strengthening system robustness.Fourth order cumulant is that kurtosis is normal It is used to the non-Gaussian system of metric signal.In speech signal processing, usually assume that noise approximation meets Gaussian Profile, its high-order Cumulant is relatively small (Higher Order Cumulants of ideal Gaussian distribution are zero).Therefore, at the voice signal based on Higher Order Cumulants Reason method often has more preferable interference free performance.But because the amount of calculation of the Higher Order Cumulants such as kurtosis is larger, and numerical value meter The stability of calculation is also poor, therefore receives certain limitation in actual applications.

Classical cumulant algorithm for estimating is batch algorithms, and operand and memory data output are all very big, are not suitable for dynamic number According to online processing, and algorithm is also more sensitive to " outlier (outlier) " in observed data.In order to solve above-mentioned ask Topic, the On-line Estimation algorithm of cumulant are suggested, and effectively improve its dynamic estimation performance.But existing on-line Algorithm is base Established in whole historical datas, and in actual applications, the statistical property of Recent data segment data often has more reference price Value.And due to the non-stationary presence of data, big phase is generally not present between the historical data of early stage and recent data Guan Xing.Therefore estimated accuracy can not only be improved by carrying out statistical analysis using whole historical datas, can may also conversely be covered true Real data statistics.In addition, in the data acquisition under true environment, the significantly outlier occurred at random disturbs meeting Very big error is caused to statistic analysis result.Because traditional on-line Algorithm relies on whole signal datas, therefore caused by outlier Error has very strong transitivity.

The content of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of voice based on the estimation of dynamic accumulative amount Signal end detection method.

The present invention is achieved by the following technical solutions：A kind of speech sound signal terminal point inspection based on the estimation of dynamic accumulative amount Survey method, comprises the following steps：

(1), the Higher Order Cumulants recursion estimation based on sliding window：Windowing operation is carried out to raw sample data, in window Data carry out the estimation of cumulant, and slide a sample point every time and data in window are updated, and realize the dynamic of cumulant State is estimated；

(2) end-point detection, based on sliding window kurtosis：With reference to the Higher Order Cumulants recursion based on sliding window of step (1) Estimation, estimates sliding window kurtosis and energy feature, and the end points inspection of voice signal is carried out based on sliding window kurtosis and energy feature Survey.

As the further optimization of such scheme, the estimation of the Higher Order Cumulants recursion based on sliding window of the step (1) Comprise the following steps：

(11), sliding window：Adding window is carried out to raw sample data, realizes the Higher Order Cumulants to all sample points in window Recurrence estimation；

(12), recurrence estimation：By the Higher Order Cumulants the recursive calculative formula derived to step (11) in sliding window, Realize to all sample points in sliding window according to the recurrence estimation of Higher Order Cumulants.

As the further optimization of such scheme, step (11) sliding window is that raw sample data is entered by rectangular window Row interception, often slide a sample point and data in rectangular window are updated, realize the higher order cumulants to all sample points in window Measure recurrence estimation.

As the further optimization of such scheme, the Higher Order Cumulants the recursive calculative formula of step (12) is as follows：

Data set x_aAnd x_b, corresponding sample number is respectively n_a,n_b；With reference to data set x_aAnd x_bNew data set x={ x_a,x_b, Corresponding sample length is n=n_a+n_b, average be μ, k^thQuadratic sum is S_k；

Data set x={ x_a,x_b2-4 rank cumulant the recursive calculative formulas based on sliding windowRepresent as follows：

Wherein, the average for taking L data in window a length of L, n the moment sliding window of sliding window is μ^(w)(n), k^thQuadratic sum is μ^(w)(n) andTake n-L=n_L,

Wherein, μ (n) and S_k(n) be respectively all historical datas before the n moment variance and k^thQuadratic sum, μ (n_L) and S_k (n_L) it is respectively n_LThe variance and k of all historical datas before moment^thQuadratic sum.

It is as follows as the further optimization of such scheme, the Updating Estimation of Higher Order Cumulants the recursive calculative formula：

(51) dynamic storage cell is opened up

To data set x={ x_a,x_bAverage be μ, k^thQuadratic sum is S_k, calculate corresponding to sample point x (1)~x (L) Mean μ, k^thQuadratic sum S_k, the memory cell for opening up 4L sizes deposits the class value；

Wherein,

Len represents sample length, x_iRepresent the sample point in sample, μ_xRepresent sample x average.

(52) calculation of initial value

Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cell^thQuadratic sum μ (n-1)、S_k(n-1) initial value current time n mean μ (n), and k, are calculated^thQuadratic sum S_k(n)：

S₂(n)=S₂(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)

According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked for^thQuadratic sum μ (n)、S_k(n)；

(53) sliding window dynamic estimation

(531) sliding window average and k^thQuadratic sum μ^(w)(n)、Calculating

Utilize the current time n obtained in On-line Estimation average and k^thQuadratic sum μ (n), S_k(n) and in memory cell The n preserved_LThe average and k at=n-L moment^thQuadratic sum μ (n_L)、S_k(n_L), according to being tired out based on sliding window for (4a)-(4d) formulas Accumulated amount method for dynamic estimation, ask for current time n sliding window average and k^thQuadratic sum μ^(w)(n)、

(532) calculating of sliding window 2-4 ranks cumulant

The current time n obtained in being estimated using step (5.3.1) sliding window sliding window average and k^thQuadratic sum μ^(w) (n)、According to the sliding window cumulant the recursive calculative formula of (6a)-(6c) formulas, current time n sliding window 2-4 is asked for Rank cumulant(k=2~4)；

(54) dynamic renewal memory cell

Calculating current time n sliding window average and k^thQuadratic sum μ^(w)(n)、Afterwards, moved in sliding window Before, with μ (n), S resulting in step (52) On-line Estimation_k(n) content in value renewal dynamic storage cell so that sliding After dynamic window slides a sample point, L average and the k before being still current time that are preserved in dynamic storage cell^thQuadratic sum value.

As the further optimization of such scheme, the control of the end-point detection based on sliding window kurtosis of the step (2) Algorithm is as follows：

(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S₂ (n)；Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal；

(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation₂(n) kurtosis door is set Kurt and energy threshold amp is limited, into step (203)；

(203) (203) assignment initiation parameter, the smallest sample points and clear band that initializing set voice segments are allowed The maximum sample points allowed, setting voice segments sample points Speechcount=0；Clear band sample points Nonspeechcount=0；Enter step (204) afterwards

(204) kurtosis value Kurtosis (n) is traveled through, judges whether kurtosis value Kurtosis (n) is more than kurtosis thresholding kurt；

(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point Start；And searched for backward from Start, into step (207)；

(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, to kurtosis value Kurtosis (n) assignment circulation n=n+1, and return to step (205) are carried out；

(207) by median S corresponding to subsequent point₂(n) value is made comparisons with energy threshold amp；

(208) if median S corresponding to subsequent point₂(n) value is more than energy threshold amp, represents the point in voice segments, voice Section sample point Speechcount adds 1, return to step (207)；

(209) if median S corresponding to subsequent point₂(n) value is not more than energy threshold amp, represents the point in clear band, sky White section sample points Nonspeechcount adds 1, into step (210)；

(210) judge whether current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount；

(211) if current Nonspeechcount is more than the maximum sample points that clear band is allowed Maxnonspeechcount, judges whether current voice segments sample point Speechcount is more than the minimum that voice segments are allowed Sample points Minspeechcount, into step (213)；

(212) if current voice segments sample point Speechcount is no more than the smallest sample points that voice segments are allowed Minspeechcount, the point is represented still in voice segments, and to by median S corresponding to subsequent point₂(n) assignment circulation n, is carried out =n+1, and return to step (207)；

(213) judge whether current voice segments sample point Speechcount is more than the smallest sample that voice segments are allowed Point Minspeechcount；

(214) if current voice segments sample point Speechcount is not more than the smallest sample point that voice segments are allowed Minspeechcount, return to step (203), reset voice segments sample points Speechcount and clear band sample points Nonspeechcount is 0, finds starting point again；

(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount；Represent that the point is maintained at voice segments, wherein voice segments section [Start, Start+speechcount+ nonspeechcount-1]。

The present invention has advantages below compared with prior art：The voice signal based on the estimation of dynamic accumulative amount in the present invention End-point detecting method is the end-point detection algorithm based on sliding window kurtosis and energy double threshold, and parameter sliding window kurtosis is to voice segments Starting point is with, with more preferable antijamming capability, in speech signal processing, usually assuming that and make an uproar compared with strong sensitivity and to noise Sound approximation meets Gaussian Profile, and its Higher Order Cumulants is relatively small, and the Higher Order Cumulants of ideal Gaussian distribution are zero, based on high-order The audio signal processing method of cumulant often has more preferable interference free performance, and the present invention has more preferable in a noisy environment Robustness.

In the present invention based on dynamic accumulative amount estimation the Method of Speech Endpoint Detection, wherein provide based on slip The accumulation value that window cumulant recurrence calculation method calculates differs very little with the value result that traditional direct calculation method calculates, The test result of voice signal is shown, the error difference 10 for the 2-4 rank cumulants that two kinds of algorithms calculate^-15、10^-10With 10^-7, Can directly it ignore.

In actual applications, because the non-stationary presence of data, the historical data of early stage are general not with recent data In the presence of big correlation, the statistical property of Recent data section often has more reference value, and the data under true environment In gatherer process, the significantly outlier interference occurred at random can cause very big error to statistic analysis result.Base in the present invention In dynamic accumulative amount estimation the Method of Speech Endpoint Detection, wherein provide based on sliding window cumulant recurrence calculation method It is based on the sample data in sliding window, effectively avoids the problems of above-mentioned, there is stronger application under true environment Value.It is also possible to apply the invention to bioelectrical signals (EEG signals EEG/ electro-ocular signal EOG/ core signal ECGs) and voice to believe Number dynamic analysis in.

Brief description of the drawings

Fig. 1 is the structural representation of the sliding window of the present invention.

Fig. 2 is that the recursive algorithm of the sliding window cumulant estimation of the present invention realizes block diagram.

Fig. 3 is the operation time comparison diagram with directly calculating using sliding window recurrence calculation of the invention.

Fig. 4-A-Fig. 4-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the variance being directly calculated The dynamic waveform and corresponding error curve schematic diagram of sliding window estimation.

Fig. 5-A-Fig. 5-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the deflection being directly calculated Degree sliding window estimates obtained dynamic waveform and corresponding error curve schematic diagram.

Fig. 6-A-Fig. 6-D are that the present invention is preferable to carry out for voice signal recurrence calculation and the kurtosis being directly calculated Sliding window estimates dynamic waveform and corresponding error curve schematic diagram.

Fig. 7 is the control algolithm flow chart of the end-point detection based on sliding window kurtosis.

Fig. 8-A-Fig. 8-C be one section of the preferred embodiments of the present invention actual measurement voice signal oscillogram and corresponding kurtosis and Energy variation schematic diagram.

The change of Fig. 9-A-Fig. 9-F-1 alone word voice of (SNR=5dB) and its corresponding kurtosis under different noise circumstances Change schematic diagram.

The change of Figure 10-A-Figure 10-F-1 continuous speech of (SNR=5dB) and its corresponding kurtosis under different noise circumstances Change schematic diagram.

Embodiment

Embodiments of the invention are elaborated below, the present embodiment is carried out lower premised on technical solution of the present invention Implement, give detailed embodiment and specific operating process, but protection scope of the present invention is not limited to following implementation Example.

Referring to Fig. 1, for the structural representation of the sliding window of the present invention.In the present embodiment, sliding window is by rectangular window pair Raw sample data is intercepted, and to obtain the pending data of equal length at any time, the estimation of cumulant is base In L sample data in sliding window, a sample point is slided every time data in window are updated, then re-start cumulant Estimation, so as to realize the dynamic estimation of cumulant.

Any two data sets x_aAnd x_b, corresponding sample number is respectively n_a,n_b, data set x_aAnd x_bNew data set with reference to after X={ x_a,x_bSample length be n=n_a+n_b, its mean μ and k^thQuadratic sum S_kFor：

Wherein, μ and S_kSample x average and k is represented respectively^thQuadratic sum, wherein subscript (a), (b) represent statistic respectively It is to be based on set of data samples x_aAnd x_b's.μ and S_kDefinition difference it is as follows：

Wherein Len represents sample length, x_iRepresent the sample point in sample, μ_xRepresent sample x average.Wherein, μ is represented Variance, facilitate for statement, hereinafter S_kIt is referred to as k^thQuadratic sum.

According to sliding window thought, to raw sample data adding window, the estimation of cumulant is based on sample number in sliding window According to, with reference to (1a)-(1d) formulas, sliding window estimation can be regarded as to x_bThe cumulant estimation of sample set data.By (1a)-(1d) It is rewritten into following form：

Make the average and k of L data in sliding window window a length of L, n moment sliding window^thQuadratic sum is respectively μ^(w)(n) andIn order to express easily, n-L=n is made_L, then in sliding window data equivalent to set of data samples x_b, it is all before sliding window to go through History data are equivalent to set of data samples x_a, then the rewriting of (3a) (3d)-formula is as follows：

μ (n) and S_k(n) be all historical datas before the n moment variance and k^thQuadratic sum, μ (n_L) and S_k(n_L) it is n_LMoment The variance and k of preceding all historical datas^thQuadratic sum.

Due to μ and S_kThere is following relation with 2-4 rank cumulants：

Wherein σ²、C₃、C₄Variance, degree of skewness and the kurtosis of sample are represented respectively.

The 2-4 ranks cumulant and S then provided according to (5a)-(5c) formulas_kCalculated relationship, sliding window cumulant can be obtained The recursive calculative formula is as follows：

WhereinThe 2-4 rank cumulants based on sliding window are represented respectively.

Referring to Fig. 2, block diagram is realized for the recursive algorithm of the sliding window cumulant estimation of the present invention, sliding window in the present embodiment The recursive algorithm implementation process of cumulant estimation, comprises the following steps：

Step 1, utilize (2a)-(2b) formula direct calculation method to calculate first sample point to l-th sample point, that is, originate Sample data in the window of position, corresponding variance μ and k^thQuadratic sum S_k, and preserve.

Step 2, calculation of initial value：Sliding window slides a sample point,

S₂(n)=S₂(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)

Step 3, sliding window dynamic estimation：According to (4a)-(4d) formula sliding window cumulant recursive algorithms, using in step 2 The preceding moment n solved variance μ (n) and k^thQuadratic sum S_kAnd the n that has preserved (n)_LVariance μ (the n at=n-L moment_L) and k^th Quadratic sum S_k(n_L) solve current time n sliding window variance μ^(w)And sliding window k (n)^thQuadratic sumAnd preserve μ And k (n)^thQuadratic sum S_k(n) this class value, the data in memory cell are updated, after sliding a sample point so as to sliding window, moved L average and the k before being still current time preserved in state memory cell^thQuadratic sum value.2~4 are solved further according to (6a)-(6c) formulas Rank cumulant(k=2~4).

Step 4, the content for updating memory cell：Utilize the current time n calculated in On-line Estimation module variance μ And k (n)^thQuadratic sum S_k(n) content of memory cell is updated, after sliding a sample point so as to sliding window, is deposited in memory cell What is put is still the average and k of L sample point before current time^thQuadratic sum.

Involved sliding window cumulant recurrence calculation method has higher operation efficiency in the present invention.The present invention uses Recursive algorithm dynamically solves cumulant, and algorithm operation quantity is greatly lowered, and operation efficiency is substantially better than traditional directly calculating.Table 1 provide is directly to calculate to carry out variance μ based on (5a)-(5c) formula recurrence calculation and (2a)-(2b) formulas^(w)(n) andEstimate " addition " and " multiplication " operation times needed for timing.

The computation complexity of table 1 compares

It can be seen that, the operand directly calculated is directly proportional to L, and the operand of recurrence calculation and sliding window from table 1 Length is unrelated.When carrying out statistical analysis to measured data, L values are generally large, therefore in calculating speed, recursive algorithm is than straight Connecing calculating has obvious advantage.

Referring to Fig. 3, to utilize the sliding window recurrence calculation of the present invention and the operation time comparison diagram directly calculated.From figure As can be seen that in the case of sample data identical, the operation time of recurrence calculation is much smaller than directly calculating.Directly calculate computing Time is larger with the increase ascensional range of sample data, in the range of 1 to 10, when sample data points are 100000, Operation time was close to 40 seconds, and the operation time ascensional range of recursive algorithm is 10^-2To 10^-1In the range of, and much smaller than direct Calculate.

Referring to Fig. 4-A- Fig. 4-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated The waveform that error curve corresponding to the dynamic waveform of variance sliding window estimation changes with ordinate amplitude (amplitude) is illustrated Figure.Wherein, Fig. 4-A are the oscillograms of one section of actual measurement voice signal, and Fig. 4-B are the oscillograms for the variance yields that recurrence calculation obtains, Fig. 4-C are the oscillograms for the variance yields being directly calculated, and Fig. 4-D are the oscillograms of the error amount of two kinds of computational methods.From figure In it can be seen that both error amount is 10^-15, error amount can be ignored.

Referring to Fig. 5-A- Fig. 5-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated Degree of skewness sliding window estimates what obtained dynamic waveform and corresponding error curve changed with ordinate amplitude (amplitude) Waveform diagram.Wherein, Fig. 5-A are the oscillograms of one section of actual measurement voice signal, and Fig. 5-B are the deflection angle value that recurrence calculation obtains Oscillogram, Fig. 5-C are the oscillograms for the deflection angle value being directly calculated, and Fig. 5-D are the error amounts of two kinds of computational methods Oscillogram.As can be seen from the figure both error amounts are 10^-10, error amount can be ignored.

Referring to Fig. 6-A- Fig. 6-D, the present invention is preferable to carry out for voice signal recurrence calculation and is directly calculated Kurtosis sliding window estimates that the waveform that dynamic waveform and corresponding error curve change with ordinate amplitude (amplitude) is illustrated Figure.Wherein, Fig. 6-A are the oscillograms of one section of actual measurement voice signal, and Fig. 6-B are the oscillograms for the kurtosis value that recurrence calculation obtains, Fig. 6-C are the oscillograms for the kurtosis value being directly calculated, and Fig. 6-D are the oscillograms of the error amount of two kinds of computational methods.From figure In it can be seen that both error amount is 10^-7, error amount can be ignored.

With reference to figure 7, for the control algolithm flow of the end-point detection based on sliding window kurtosis of the preferred embodiments of the present invention Figure, the caused median S during sliding window kurtosis recurrence calculation is can be seen that from (4d) formula₂It is used as sliding window The energy feature of signal, therefore energy feature parameter needed for detection is without extra computation.Parameter definition is as follows shown in figure： Speechcount represents voice segments sample points；Nonspeechcount represents clear band sample points；Minspeechcount Represent the smallest sample points that voice segments are allowed；Maxnonspeechcount represents the maximum sample point that clear band is allowed Number；Minspeechcount and Maxnonspeechcount are respectively set to 256 and 1024.For the specific voice letter of certain section Number, we rule of thumb set kurtosis threshold value, are calculated in experiment using formula threshold=max (Kurtosis)/10 Its kurtosis thresholding kurt.

The control algolithm of the end-point detection based on sliding window kurtosis of the present invention, comprises the following steps：

(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S₂ (n)；

Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal Degree；

(203) assignment initiation parameter, afterwards into step (204)

Voice segments sample points Speechcount=0；Clear band sample points Nonspeechcount=0；

The smallest sample points Minspeechcount that voice segments are allowed and the maximum sample points that clear band is allowed Maxnonspeechcount is respectively set to 256 and 1024；

(205) point corresponding to kurtosis values of the kurtosis value Kurtosis (n) more than kurtosis thresholding kurt is labeled as starting point Start；And searched for backward from Start, into step (27)；

It is one section of actual measurement voice signal oscillogram of the preferred embodiments of the present invention and corresponding high and steep referring to Fig. 8-A- Fig. 8-C Degree and energy variation schematic diagram.Fig. 8-A are the oscillogram of the primary speech signal of one section of actual measurement voice signal, and Fig. 8-B are actual measurement The oscillogram of kurtosis value corresponding to voice, display kurtosis value can occur significantly to change when entering voice segments, therefore can be with This is as the basis for estimation for detecting voice segments starting point.Fig. 8-C are to survey energy variation schematic diagram corresponding to voice, energy in figure Waveform then reflects the entire change of voice and non-speech segment energy.Intuitively both inspections being combined with beneficial to improvement voice segments Survey precision.

Referring to Fig. 9-A-Fig. 9-F-1, for the alone word voice of (SNR=5dB) under different noise circumstances and its correspond to high and steep Degree.One section of alone word voice and its signal waveform under 5dBwhite, pink, m109, f16, babble noise in this example And its oscillogram that corresponding sliding window kurtosis changes with ordinate amplitude (amplitude).Figure 10-A are one section of isolated word The oscillogram of the primary speech signal of voice, Figure 10-A-1 are the ripple of kurtosis value corresponding to the raw tone of one section of alone word voice Shape figure.Figure 10-B are the oscillogram that one section of alone word voice adds white noisy speech signals, and Figure 10-B are one section of alone word voice Add the oscillogram of kurtosis value corresponding to 5dBwhite noise speech.Figure 10-C are that one section of alone word voice adds pink noise speech to believe Number oscillogram, Figure 10-C-1 are that one section of alone word voice adds the oscillogram of kurtosis value corresponding to 5dBpink noise speech.Figure 10-D is the oscillogram that one section of alone word voice adds m109 noisy speech signals, and Figure 10-D-1 are that one section of alone word voice adds The oscillogram of kurtosis value corresponding to 5dBm109 noise speech.Figure 10-E are that one section of alone word voice adds f16 noisy speech signals Oscillogram, Figure 10-E-1 are the oscillogram that one section of alone word voice adds kurtosis value corresponding to 5dBf16 noise speech.Figure 10-F are One section of alone word voice adds the oscillogram of babble noisy speech signals, and Figure 10-F-1 are that one section of alone word voice adds The oscillogram of kurtosis value corresponding to 5dBbabble noise speech.It can be seen that the isolated word for being mixed with different noises Voice signal, its sliding window kurtosis have similar waveform to the sliding window kurtosis of clean speech signal, when entering voice segments Kurtosis value can occur significantly to change.

Referring to Figure 10-A-Figure 10-F-1, the continuous speech of (SNR=5dB) and its corresponding kurtosis under different noise circumstances Change schematic diagram.One section of continuous speech and its under 5dBwhite, pink, m109, f16, babble noise in this example The oscillogram that signal waveform and its corresponding sliding window kurtosis change with ordinate amplitude (amplitude).Figure 10-A are one The oscillogram of the primary speech signal of section continuous speech, Figure 10-A-1 are kurtosis value corresponding to the raw tone of one section of continuous speech Oscillogram.Figure 10-B are the oscillogram that one section of continuous speech adds white noisy speech signals, and Figure 10-B are one section of continuous speech Add the oscillogram of kurtosis value corresponding to 5dBwhite noise speech.Figure 10-C are that one section of continuous speech adds pink noisy speech signals Oscillogram, Figure 10-C-1 are that one section of continuous speech adds the oscillogram of kurtosis value corresponding to 5dBpink noise speech.Figure 10-D are One section of continuous speech adds the oscillogram of m109 noisy speech signals, and Figure 10-D-1 are that one section of continuous speech adds 5dBm109 noise languages The oscillogram of kurtosis value corresponding to sound.Figure 10-E add the oscillogram of f16 noisy speech signals, Figure 10-E-1 for one section of continuous speech Add the oscillogram of kurtosis value corresponding to 5dBf16 noise speech for one section of continuous speech.Figure 10-F are that one section of continuous speech adds The oscillogram of babble noisy speech signals, Figure 10-F-1 add high and steep corresponding to 5dBbabble noise speech for one section of continuous speech The oscillogram of angle value.It can be seen that the continuous speech signal for being mixed with different noises, its sliding window kurtosis with it is pure The sliding window kurtosis of voice signal has similar waveform, and when entering voice segments, kurtosis value can occur significantly to change.

The Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount in the present invention be based on sliding window kurtosis and The end-point detection algorithm of energy double threshold, parameter sliding window kurtosis have compared with strong sensitivity to voice segments starting point and had to noise There is more preferable antijamming capability, in speech signal processing, usually assume that noise approximation meets Gaussian Profile, its Higher Order Cumulants Relatively small, the Higher Order Cumulants of ideal Gaussian distribution are zero, and the audio signal processing method based on Higher Order Cumulants often has There is more preferable interference free performance, the present invention has more preferable robustness in a noisy environment.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention All any modification, equivalent and improvement made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of the Method of Speech Endpoint Detection based on the estimation of dynamic accumulative amount, it is characterised in that comprise the following steps：

(1), the Higher Order Cumulants recursion estimation based on sliding window：Windowing operation is carried out to raw sample data, to data in window The estimation of cumulant is carried out, and slides a sample point every time and data in window is updated, realizes that the dynamic of cumulant is estimated Meter；

(2) end-point detection, based on sliding window kurtosis：Estimated with reference to the Higher Order Cumulants recursion based on sliding window of step (1), Sliding window kurtosis and energy feature are estimated, the end-point detection of voice signal is carried out based on sliding window kurtosis and energy feature.

2. the Method of Speech Endpoint Detection according to claim 1 based on the estimation of dynamic accumulative amount, it is characterised in that The estimation of the Higher Order Cumulants recursion based on sliding window of the step (1) comprises the following steps：

(11), sliding window：Adding window is carried out to raw sample data, realizes the Higher Order Cumulants recursion to all sample points in window Estimation；

3. the Method of Speech Endpoint Detection according to claim 2 based on the estimation of dynamic accumulative amount, it is characterised in that Step (11) sliding window is that raw sample data is intercepted by rectangular window, often slides a sample point in rectangular window Data are updated, and realize the Higher Order Cumulants recurrence estimation to all sample points in window.

4. the Method of Speech Endpoint Detection according to claim 2 based on the estimation of dynamic accumulative amount, it is characterised in that： The Higher Order Cumulants the recursive calculative formula of step (12) is as follows：

Data set x_aAnd x_b, corresponding sample number is respectively n_a,n_b；With reference to data set x_aAnd x_bNew data set x={ x_a,x_b, it is corresponding Sample length is n=n_a+n_b, average be μ, k^thQuadratic sum is S_k；

Data set x={ x_a,x_b2-4 rank cumulant the recursive calculative formulas based on sliding window Table Show as follows：

Wherein, the average for taking L data in window a length of L, n the moment sliding window of sliding window is μ^(w)(n), k^thQuadratic sum isTake n-L=n_L,

<mrow> <mtable> <mtr> <mtd> <mrow> <msubsup> <mi>S</mi> <mn>3</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>S</mi> <mn>3</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>Ln</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <mi>n</mi> <mo>-</mo> <mn>2</mn> <mi>L</mi> <mo>)</mo> </mrow> <mfrac> <msup> <mrow> <mo>&lsqb;</mo> <msup> <mi>&mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mn>3</mn> </msup> <msup> <mi>n</mi> <mn>2</mn> </msup> </mfrac> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>-</mo> <mn>3</mn> <mo>&lsqb;</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <msubsup> <mi>S</mi> <mn>2</mn> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msubsup> <mo>-</mo> <msub> <mi>LS</mi> <mn>2</mn> </msub> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mfrac> <mrow> <mo>&lsqb;</mo> <msup> <mi>&mu;</mi> <mrow> <mo>(</mo> <mi>w</mi> <mo>)</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&mu;</mi> <mrow> <mo>(</mo> <msub> <mi>n</mi> <mi>L</mi> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mi>n</mi> </mfrac> </mrow> </mtd> </mtr> </mtable> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mi>c</mi> <mo>)</mo> </mrow> </mrow> 1

Wherein, μ (n) and S_k(n) be respectively all historical datas before the n moment variance and k^thQuadratic sum, μ (n_L) and S_k(n_L) point Wei not n_LThe variance and k of all historical datas before moment^thQuadratic sum.

5. the Method of Speech Endpoint Detection according to claim 4 based on the estimation of dynamic accumulative amount, it is characterised in that： The Updating Estimation of Higher Order Cumulants the recursive calculative formula is as follows：

(51) dynamic storage cell is opened up：

To data set x={ x_a,x_bAverage be μ, k^thQuadratic sum is S_k, calculate the average corresponding to sample point x (1)~x (L) μ, k^thQuadratic sum S_k, the memory cell for opening up 4L sizes deposits the class value；

Wherein,

<mrow> <msub> <mi>S</mi> <mi>k</mi> </msub> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>L</mi> <mi>e</mi> <mi>n</mi> </mrow> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>&mu;</mi> <mi>x</mi> </msub> <mo>)</mo> </mrow> <mi>k</mi> </msup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mi>b</mi> <mo>)</mo> </mrow> </mrow>

Len represents sample length, x_iRepresent the sample point in sample, μ_xRepresent sample x average；

(52) calculation of initial value

Utilize the n-1 moment average and k preserved in current time n sample value x (n) and memory cell^thQuadratic sum μ (n-1), S_k(n-1) initial value current time n mean μ (n), and k, are calculated^thQuadratic sum S_k(n)：

S₂(n)=S₂(n-1)+[x(n)-μ(n-1)][x(n)-μ(n)] (7b)

According to the cumulant On-line Estimation method of (7a)-(7d) formulas, current time n average and k are asked for^thQuadratic sum μ (n), S_k (n)；

(53) sliding window dynamic estimation：

(531) sliding window average and k^thQuadratic sum μ^(w)(n)、Calculating：

Utilize the current time n obtained in On-line Estimation average and k^thQuadratic sum μ (n), S_k(n) and in memory cell protected The n deposited_LThe average and k at=n-L moment^thQuadratic sum μ (n_L)、S_k(n_L), according to (4a)-(4d) formulas based on sliding window cumulant Method for dynamic estimation, ask for current time n sliding window average and k^thQuadratic sum μ^(w)(n)、

(532) calculating of sliding window 2-4 ranks cumulant,

The current time n obtained in being estimated using step (5.3.1) sliding window sliding window average and k^thQuadratic sum μ^(w)(n)、According to the sliding window cumulant the recursive calculative formula of (6a)-(6c) formulas, the sliding window 2-4 ranks for asking for current time n are tired out Accumulated amount(k=2~4)；

(54) dynamic renewal memory cell

Calculating current time n sliding window average and k^thQuadratic sum μ^(w)(n)、Afterwards, it is moved in sliding window Before, with μ (n), S resulting in step (52) On-line Estimation_k(n) content in value renewal dynamic storage cell so that slide After window slides a sample point, L average and the k before being still current time that are preserved in dynamic storage cell^thQuadratic sum value.

6. the Method of Speech Endpoint Detection according to claim 1 based on the estimation of dynamic accumulative amount, it is characterised in that The control algolithm of the end-point detection based on sliding window kurtosis of the step (2) is as follows：

(201) the kurtosis value Kurtosis (n) of sliding window recursive algorithm estimated speech signal is used, and records median S₂(n)； Wherein n=1:Length (x), x are pending voice signals, and length (x) is the length of pending voice signal；

(202) the kurtosis value Kurtosis (n) and median S obtained according to step (201) estimation₂(n) kurtosis thresholding kurt is set With energy threshold amp, into step (23)；

(203) assignment initiation parameter, voice segments sample points Speechcount=0；Clear band sample points Nonspeechcount=0, afterwards into step (204)；

(206) if kurtosis value Kurtosis (n) is not more than kurtosis thresholding kurt kurtosis value, kurtosis value Kurtosis (n) is entered Row assignment circulates n=n+1, and return to step (25)；

(208) if median S corresponding to subsequent point₂(n) value is more than energy threshold amp, represents the point in voice segments, voice segments sample This Speechcount adds 1, return to step (207)；

(209) if median S corresponding to subsequent point₂(n) value is not more than energy threshold amp, represents the point in clear band, clear band Sample points Nonspeechcount adds 1, into step (300)；

(213) judge whether current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount；

(215) if current voice segments sample point Speechcount is more than the smallest sample point that voice segments are allowed Minspeechcount；

Represent that the point is maintained at voice segments, wherein voice segments section

[Start,Start+speechcount+nonspeechcount-1]。

7. the Method of Speech Endpoint Detection according to claim 6 based on the estimation of dynamic accumulative amount, it is characterised in that The smallest sample points Minspeechcount that voice segments are allowed and the maximum sample points that clear band is allowed Maxnonspeechcount values are respectively 256 and 1024.