CN102760444A

CN102760444A - Support vector machine based classification method of base-band time-domain voice-frequency signal

Info

Publication number: CN102760444A
Application number: CN2012101250857A
Authority: CN
Inventors: 刘一民; 李元新; 孟华东
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2012-04-25
Filing date: 2012-04-25
Publication date: 2012-10-31
Anticipated expiration: 2032-04-25
Also published as: CN102760444B

Abstract

The invention relates to a support vector machine based classification method of base-band time-domain voice-frequency signals, comprising the following steps of: firstly segmenting a base-band time-domain voice-frequency signal sequence to obtain initial segmented subsequences; then respectively subtracting respective mean value from each initial segmented subsequence to obtain zero-mean-value segmented subsequences; then carrying out windowing treatment on each zero-mean-value segmented subsequence, respectively carrying out Fourier transformation treatment on results to obtain the spectrum amplitudes of the zero-mean-value segmented subsequences, and respectively solving the standard difference of each spectrum amplitude to obtain a characteristic quantity; sequentially combining the zero-mean-value segmented subsequences into a long subsequence according to an order; then calculating a normalized autocorrelation matrix of the long subsequence, and carrying out singular value decomposition on the normalized autocorrelation matrix to obtain a demarcation point of a subspace; then calculating the signal to noise ratio parameter of an other characteristic quantity; and finally sending an input vector composed of the two characteristic quantities into a trained SVM (Support Vector Machine) classifier to identify the classification of base-band time-domain voice-frequency signals and distinguish a voice signal and a noise signal.

Description

Base band time-domain audio signal sorting technique based on SVM

Technical field

The invention belongs to the signal processing technology field, be specifically related to a kind of base band time-domain audio signal sorting technique based on SVM.

Background technology

The present invention is applied in the radio detection system; Handled signal is the base band time-domain audio signal after the demodulation; The voice signal that signal possibly polluted by noise in various degree also possibly be pure noise signal, wherein noise all with white noise for leading and be mixed with a small amount of coloured noise; Utilize the principle of SVM to make up a kind of sorter, signal type is carried out simple and effective discriminating classification.

Following article and patent documentation have covered the main background technology in this field basically.In order to explain out the evolution of technology, let their time sequencings arrange, and introduce the main contribution of document one by one.

1．S.Gokhun?Tanyer,Hamza?ozer,“Voice?Activity?Detection?in?Nonstationary?Gaussian?Noise”,Proceedings?of?ICSP,1620-1623,1998.

Sound end detects (Voice Activity Detection; VAD) be meant the process that from noise, screens out voice; Article has proposed the method for energy threshold method, zero-crossing rate method, least square cycle estimator and adaptive energy thresholding; Wherein energy threshold method and zero-crossing rate method are applicable to Signal-to-Noise (signal to noise ratio more; SNR) under the condition with higher, false-alarm is very high when signal to noise ratio (S/N ratio) is low, and least square cycle estimator can periodically cause detecting failure owing to the noise non-stationary comprises.Article also proposes several different methods is integrated into the strategy that the lang tone signal detects simultaneously.

2．C.J.C.Burges,“A?Tutorial?on?Support?Vector?Machines?for?Pattern?Recognition”，Data?Mining?and?Knowledge?Discovery,vol.2,no.2,pp.121-167,1998.

Having introduced ultimate principle and the conclusion of SVM in detail derives; The method of SVM is that the optimal classification lineoid under the linear separability situation puts forward; Its basic thought may be summarized to be at first and through nonlinear transformation the input space is transformed to a higher dimensional space, in this new space, asks for highest possible priority classification lineoid then." largest interval " and " with data projection to higher dimensional space more " is its key concept, and SVM constitutes two quasi-mode sorters on the ordinary meaning.But mostly this article is the SVM ultimate principle is carried out the proof of the derivation of equation, is not given in prompting and guidance that the voice signal context of detection is used.

3．S.Gokhun?Tanyer,Hamza?ozer,“Voice?Activity?Detection?in?Nonstationary?Noise”,IEEE?Trans.Speech?Audio?Process.,vol.8,no.4,pp.478-481,Jul.2001

Propose the sound end detecting method of adaptive energy thresholding and provide implementation strategy, wherein be applied to method of geometry signal calculated SNR, reduced dependence the noise signal prior imformation.But the method for estimation of this SNR receives the influence of signal cumulative distribution, can not fully learn noise signal information, and selection of parameter is comparatively difficult with adjustment, and SNR estimates at deviation under the situation of noise non-stationary.

4．Quanwei?Cai,Ping?Wei,Xianci?Xiao,“A?Digital?Modulation?Recognition?Method”,Proceedings?of?ICASSP,2004,pp?863–866

Signal SNR estimation principle and method based on SVD have been proposed, simple, the performance of this method is not inquired into, do not provide the Choosing computing parameters method yet.

5．Cheol-Sun?Park,Won?Jang,Sun-Phil?Nah.and?Dae?Young?Kim,“Automatic?Modulation?Recognition?using?Support?Vector?Machine?in?Software?Radio?Applications”，in?Proc.9th?IEEE?ICACT,Feb.2007,pp.9-12

Proposition is based on the method for the signal Modulation Mode Recognition of SVM, with the power spectrum density maximal value γ of the normalization center symmetry instantaneous amplitude of signal _Max, the symmetry nonlinear component absolute value of the center in the signal strong component instantaneous phase standard deviation sigma _Ap, the symmetry nonlinear component of the center in the signal strong component instantaneous phase standard deviation sigma _Dp, receive the standard deviation sigma of the normalization center symmetry instantaneous amplitude absolute value of signal _AaAnd the standard deviation sigma of normalization instantaneous frequency absolute value in the signal strong component _AfInput obtains the result as characteristic quantity, even under the situation of the low SNR of signal, also obtain classification results exactly.

Summary of the invention

In order to overcome the deficiency of above-mentioned prior art; The object of the present invention is to provide a kind of base band time-domain audio signal sorting technique based on SVM; The base band time-domain audio signal is handled; Extract characteristic quantity as the input of sorter obtaining discriminating classification results to signal type, thereby voice signal and noise signal are classified.

To achieve these goals, the technical scheme of the present invention's employing is:

Base band time-domain audio signal sorting technique based on SVM comprises the steps:

Step 1: with total length is the base band time-domain audio signal sequence s={s (1) of N, s (2) ..., s (N) } be divided into the K section, every segment length is L, obtains the initial fragment subsequence

\{\begin{matrix} s \end{matrix}

\{\begin{matrix} _{1} = {s_{1} (1), s_{1} (2), . . ., s_{1} (L)} \\ s_{2} = {s_{2} (1), s_{2} (2), . . ., s_{2} (L)} \\ . . . \\ s_{K} = {s_{K} (1), s_{K} (2), . . ., s_{K} (L)} \end{matrix},

S wherein _i(m)=s ((i-1) L+m) (i=1,2 ..., K, m=1,2 ..., L), each initial fragment subsequence deducts average separately respectively then, can get zero-mean segmentation subsequence

\{\begin{matrix} x_{1} = {x_{1} (1), x_{1} (2), . . ., x_{1} (L)} \\ x_{2} = {x_{2} (1), x_{2} (2), . . ., x_{2} (L)} \\ . . . \\ x_{K} = {x_{K} (1), x_{K} (2), . . ., x_{K} (L)} \end{matrix},

Wherein

x_{i} (m) = s_{i} (m) - \frac{1}{L} Σ_{j = 1}^{L} s_{i} (j);

Step 2: each zero-mean segmentation subsequence is carried out windowing process, obtain the result and do

\{\begin{matrix} {x_{1}}^{'} = x_{1} w^{T} \\ {x_{2}}^{'} = x_{2} w^{T} \\ . . . \\ {x_{K}}^{'} = x_{K} w^{T} \end{matrix},

Wherein w is a Hanning window;

Step 3: the result after the windowing process is carried out Fourier transform respectively handle, the spectrum amplitude sequence that obtains the zero-mean segmentation subsequence after each windowing does

\{\begin{matrix} f_{1} = | FFT ({x_{1}}^{'}) | = {f_{1} (1), f_{1} (2), . . ., f_{1} (M)} \\ f_{2} = | FFT ({x_{2}}^{'}) | = {f_{2} (1), f_{2} (2), . . ., f_{2} (M)} \\ . . . \\ f_{K} = | FFT ({x_{K}}^{'}) | = {f_{K} (1), f_{K} (2), . . ., f_{K} (M)} \end{matrix},

Wherein M is the length of spectrum amplitude sequence;

Step 4: the standard deviation d={d (1) that obtains each spectrum amplitude respectively; D (2); ...; D (K) }; Wherein

obtains the mean value of all standard deviations then; Obtain a characteristic quantity of this base band time-domain audio signal sequence, i.e. spectrum amplitude standard deviation

Step 5: with each zero-mean segmentation subsequence x ₁, x ₂..., x _KBe combined into long sequence x, i.e. an x={x successively according to order ₁, x ₂..., x _K}={ x (1), x (2) ..., x (N) }, calculate the normalized autocorrelation matrix of this sequence then, the result does Wherein

Q is the dimension of autocorrelation matrix, and span is [50,90];

Step 6: autocorrelation matrix R is carried out svd, obtain R=V Λ V ^H, wherein

Λ=diag (λ ₁, λ ₂..., λ _Q) _{Q * Q}=diag (γ ₁+ σ ²..., γ _p+ σ ², σ ²..., σ ²) _{Q * Q}, and γ ₁>=γ ₂>=...>=γ _pThereby, obtain the separation p of subspace;

Step 7: another characteristic quantity according to

calculates this base band time-domain audio signal sequence is designated as signal to noise ratio (S/N ratio) parameter

Step 8: with two characteristic quantities of this base band time-domain audio signal sequence; Be that spectrum amplitude standard deviation D and signal to noise ratio (S/N ratio) parameter constitute input vector; Send in the svm classifier device of having trained; Thereby identify the kind of this base band time-domain audio signal, distinguish voice signal and noise signal.

Above-mentioned subspace separation p can be obtained by following method: by

Wherein

Be to the rounding downwards of autocorrelation matrix dimension result, calculate a last T+1 eigenvalue _Q-T, λ _Q-T+1..., λ _QAverage, all are greater than 1.5E then _λEigenwert in the maximum following p that is designated as, i.e. p={i| λ _i＞1.5E _λ, λ _I+1＜1.5E _λ.

Above-mentioned is the base band time-domain audio signal sequence s={s (1) of N with total length, s (2) ..., s (N) } be divided into the K section, every pairing period should be not more than 20ms.

Compared with prior art, the present invention more obtains the prior imformation of treating category signal through the mode of training, chooses suitable input feature vector amount and can effectively obtain classification results rapidly.In order to reflect the difference of voice signal and noise signal, select signal SNR parameter and signal spectrum amplitude standard deviation input feature vector amount as sorter, not only make things convenient for calculating but also can be good at realizing the discriminating and the classification of signal.The present invention can detect effectively and differentiate voice signal and noise signal; Two input feature vector amount Signal-to-Noise parameters choosing and signal spectrum amplitude standard deviation are calculated the difference that simply can effectively reflect two kinds of signals again, even under the lower situation of signal to noise ratio (S/N ratio), also can guarantee higher classification accuracy rate.The present invention is applicable to real time signal processing, is easy to realize, can perform well in the radio application.

Description of drawings

Fig. 1 is a process flow diagram of the present invention.

Fig. 2 is the probability density distribution figure of input feature vector amount when being the Signal-to-Noise parameter.

Fig. 3 is the probability density distribution figure of input feature vector amount when being the spectrum amplitude standard deviation.

Fig. 4 is a svm classifier device working result synoptic diagram.

Embodiment

Below in conjunction with accompanying drawing and embodiment the present invention is explained further details.

The present invention is based on SVM principle design sorter; Through base band time-domain audio signal series processing is extracted characteristic quantity; It is sent into the sorter of trained as input, thereby identify the type of sound signal, voice signal and noise signal are correctly classified.

As shown in Figure 1, performing step is following:

Step 1: owing to what will handle is the base band time-domain audio signal sequence of having passed through demodulation, at first tackles signal and carries out pre-service, so that extract the characteristic quantity of abundant reflected signal characteristic.

With total length is the base band time-domain audio signal sequence s={s (1) of N, s (2) ..., s (N) } evenly be divided into the K section, every segment length is L, every pairing period should be not more than 20ms.

Obtain the initial fragment subsequence

\{\begin{matrix} s \end{matrix}

\{\begin{matrix} _{1} = {s_{1} (1), s_{1} (2), . . ., s_{1} (L)} \\ s_{2} = {s_{2} (1), s_{2} (2), . . ., s_{2} (L)} \\ . . . \\ s_{K} = {s_{K} (1), s_{K} (2), . . ., s_{K} (L)} \end{matrix},

S wherein _i(m)=s ((i-1) L+m) (i=1,2 ..., K, m=1,2 ..., L), each initial fragment subsequence deducts separately average respectively removing DC component then, thereby can get zero-mean segmentation subsequence

\{\begin{matrix} x_{1} = {x_{1} (1), x_{1} (2), . . ., x_{1} (L)} \\ x_{2} = {x_{2} (1), x_{2} (2), . . ., x_{2} (L)} \\ . . . \\ x_{K} = {x_{K} (1), x_{K} (2), . . ., x_{K} (L)} \end{matrix},

Wherein

x_{i} (m) = s_{i} (m) - \frac{1}{L} Σ_{j = 1}^{L} s_{i} (j) .

Step 2: for reduce to the segmentation subsequence carry out frequency domain when handling secondary lobe select for use Hanning window that each zero-mean segmentation subsequence is carried out windowing process to result's influence.Result after the windowing does

\{\begin{matrix} {x_{1}}^{'} = x_{1} w^{T} \\ {x_{2}}^{'} = x_{2} w^{T} \\ . . . \\ {x_{K}}^{'} = x_{K} w^{T} \end{matrix},

Wherein w is the Hanning window sequence.

\{\begin{matrix} f_{1} = | FFT ({x_{1}}^{'}) | = {f_{1} (1), f_{1} (2), . . ., f_{1} (M)} \\ f_{2} = | FFT ({x_{2}}^{'}) | = {f_{2} (1), f_{2} (2), . . ., f_{2} (M)} \\ . . . \\ f_{K} = | FFT ({x_{K}}^{'}) | = {f_{K} (1), f_{K} (2), . . ., f_{K} (M)} \end{matrix},

Wherein counting of FFT should be 2 power exponent 2 greater than 2～4 times of sub-sequence length ^a, M is the length of spectrum amplitude sequence.

Step 4: utilize the no inclined to one side estimated form

of standard deviation to obtain the standard deviation d={d (1) of the spectrum amplitude of each segmentation subsequence respectively; D (2); ...; D (K) }; Obtain the mean value of all standard deviations then, just obtain a characteristic quantity of this time-domain audio signal sequence, i.e. the spectrum amplitude standard deviation

D = \frac{1}{K} Σ_{i = 1}^{K} d (i) .

The Signal-to-Noise parameter is as shown in Figure 2, and wherein horizontal ordinate is the span of Signal-to-Noise parameter, and ordinate is a probability density; The probability density function of spectrum amplitude standard deviation is as shown in Figure 3, and wherein horizontal ordinate is the span of spectrum amplitude standard deviation, and ordinate is a probability density.As can be seen from the figure the characteristic quantity of noise signal distributes comparatively concentrated; Therefore single characteristic quantity can reflect the difference of voice signal and noise signal to a certain extent; But can not two types of signals be distinguished completely effectively; Both could realize correct signal classification, therefore continuation execution following steps as the input quantity of sorter so need associating.

Step 5: then audio signal sequence is handled and obtained the another one characteristic quantity.At first with each zero-mean segmentation subsequence x ₁, x ₂..., x _KBe combined into a Chief Signal Boatswain sequence x successively according to order, promptly obtain x={x ₁, x ₂..., x _K}={ x (1), x (2) ..., x (N) }, calculate the normalized autocorrelation matrix of this sequence then, the result does

Wherein

And Q is the dimension of autocorrelation matrix, and span is [50,90], and value is 70 among the present invention.

Step 6: autocorrelation matrix R is carried out SVD decompose, obtain R=V Λ V ^HSuppose that voice signal and noise signal are separate, R=R _x+ R _n=V (Λ _x+ Λ _n) V ^H=V Λ V ^H, R wherein _x, R _nIt is respectively the autocorrelation matrix of voice signal and noise signal.

Can know Λ by the SVD decomposition _x=diag (γ ₁, γ ₂..., γ _p, 0 ..., 0) _{Q * Q}, γ ₁>=γ ₂>=...>=γ _p,

Λ _n＝diag(σ ²，σ ²，…，σ ²) _Q×Q，

Λ＝diag(λ ₁，λ ₂，…，λ _Q) _Q×Q＝diag(γ ₁+σ ²，…，γ _p+σ ²，σ ²，…，σ ²) _Q×Q。

Through Wherein

Be to the rounding downwards of autocorrelation matrix dimension result, calculate a last T+1 eigenvalue _Q-T, λ _Q-T+1..., λ _QAverage, search all then greater than 1.5E _λEigenwert in the maximum following separation p that is designated as, i.e. p={i| λ _i＞1.5E _λ, λ _I+1＜1.5E _λ.

Step 7: calculate another characteristic quantity of this base band time-domain audio signal sequence according to

, promptly signal to noise ratio (S/N ratio) parameter

can reflect the state of signal-to-noise of signal to a certain extent.

Step 8: with two characteristic quantities of this base band time-domain audio signal sequence; Be that spectrum amplitude standard deviation D and signal to noise ratio (S/N ratio) parameter

constitute input vector; Send in the svm classifier device of trained; Just can obtain the classification results of this base band time-domain audio signal, distinguish voice signal and noise signal.

The sorter working result of carrying out this step is as shown in Figure 4; Wherein "+" is the phonic signal character amount; " * " is the noise signal characteristic quantity; Two category feature amounts can be confirmed this base band time-domain audio signal sorter distinguishing signal type and classification correctly effectively based on SVM thus by correct isolation in the space.

Claims

1. based on the base band time-domain audio signal sorting technique of SVM, it is characterized in that, comprise the steps:

\{\begin{matrix} s_{1} = {s_{1} (1), s_{1} (2), . . ., s_{1} (L)} \\ s_{2} = {s_{2} (1), s_{2} (2), . . ., s_{2} (L)} \\ . . . \\ s_{K} = {s_{K} (1), s_{K} (2), . . ., s_{K} (L)} \end{matrix},

\{\begin{matrix} x_{1} = {x_{1} (1), x_{1} (2), . . ., x_{1} (L)} \\ x_{2} = {x_{2} (1), x_{2} (2), . . ., x_{2} (L)} \\ . . . \\ x_{K} = {x_{K} (1), x_{K} (2), . . ., x_{K} (L)} \end{matrix},

Wherein

x_{i} (m) = s_{i} (m) - \frac{1}{L} Σ_{j = 1}^{L} s_{i} (j);

\{\begin{matrix} {x_{1}}^{'} = x_{1} w^{T} \\ {x_{2}}^{'} = x_{2} w^{T} \\ . . . \\ {x_{K}}^{'} = x_{K} w^{T} \end{matrix},

Wherein w is a Hanning window;

\{\begin{matrix} f_{1} = | FFT ({x_{1}}^{'}) | = {f_{1} (1), f_{1} (2), . . ., f_{1} (M)} \\ f_{2} = | FFT ({x_{2}}^{'}) | = {f_{2} (1), f_{2} (2), . . ., f_{2} (M)} \\ . . . \\ f_{K} = | FFT ({x_{K}}^{'}) | = {f_{K} (1), f_{K} (2), . . ., f_{K} (M)} \end{matrix},

Wherein M is the length of spectrum amplitude sequence;

Step 5: with each zero-mean segmentation subsequence x ₁, x ₂..., x _KBe combined into long sequence x, i.e. an x={x successively according to order ₁, x ₂..., x _K}={ x (1), x (2) ..., x (N) }, calculate the normalized autocorrelation matrix of this sequence then, the result does

Wherein

Q is the dimension of autocorrelation matrix, and span is [50,90];

Step 7: another characteristic quantity according to

constitute input vector; Send in the svm classifier device of having trained; Thereby identify the kind of this base band time-domain audio signal, distinguish voice signal and noise signal.

2. according to the said signal sorting technique of claim 1, it is characterized in that subspace separation p can be obtained by following method: by

Wherein Be to the rounding downwards of autocorrelation matrix dimension result, calculate a last T+1 eigenvalue _Q-T, λ _Q-T+1..., λ _QAverage, all are greater than 1.5E then _λEigenwert in the maximum following p that is designated as, i.e. p={i| λ _i＞1.5E _λ, λ _I+1＜1.5E _λ.

3. according to the said signal sorting technique of claim 1, it is characterized in that be divided into the K section in the said step 1, every section time corresponding is not more than 20ms.