CN103021406B

CN103021406B - Robust speech emotion recognition method based on compressive sensing

Info

Publication number: CN103021406B
Application number: CN201210551585.7A
Authority: CN
Inventors: 赵小明; 张石清
Original assignee: Taizhou University
Current assignee: Taizhou University
Priority date: 2012-12-18
Filing date: 2012-12-18
Publication date: 2014-10-22
Anticipated expiration: 2032-12-18
Also published as: CN103021406A

Abstract

The invention discloses a robust speech emotion recognition method based on compressive sensing. The recognition method includes generating a noisy emotion speech sample, establishing an acoustic feature extraction module, constructing a sparse representation classifier model, and outputting a speech emotion recognition result. The robust speech emotion recognition method has the advantages that effects of noise on emotion speech in the natural environment are fully considered, and the robust speech emotion recognition method under the noise background is provided; validity of feature parameters in different types is fully considered, extraction of feature parameters is extended to the Mel frequency cepstrum coefficient (MFCC) from prosodic and tone features, and anti-noise effects of the feature parameters are further improved; and the high-performance robust speech emotion recognition method based on the compressive sensing theory is provided through the sparse representation distinguishing in the compressive sensing theory.

Description

Robustness speech emotion identification method based on compressed sensing

Technical field

The present invention relates to speech processes, area of pattern recognition, particularly relate to a kind of robustness speech emotion identification method based on compressed sensing.

Background technology

The mankind's language has not only comprised text symbol information, is also carrying the information such as people's emotion and mood simultaneously.How to allow computing machine pass through voice signal automatic analysis and judgement speaker's affective state, i.e. so-called " speech emotional identification " research of aspect has become the focus in the fields such as speech processes, pattern-recognition.The final purpose of this research will be given computing machine emotion intelligence exactly, makes computing machine as people, can carry out nature, cordiality and vivo mutual.This research has important using value in fields such as artificial intelligence, Robotics, natural human-computer interaction technologies.

At present, for the research of speech emotional identification, be to using the emotion language material recorded in quiet environment as sentiment analysis and research object substantially.Yet the emotional speech in physical environment all can be subject to the interference of noise conventionally, comprised noise in various degree.Therefore, the more approaching reality of research for the robustness speech emotion recognition aspect under noise background, has more using value.But for the robustness speech emotion recognition research under noise background, the Research Literature of this respect is very few at present.

Speech emotional automatic identification technology mainly comprises two problems: the one, and which kind of effective speech characteristic parameter is affective feature extraction problem, extract for emotion recognition; The 2nd, emotion identification method problem, adopt which kind of effective mode identification method to classify and (see patent: Zou Cairong, a kind of speech-emotion recognition method-application number/patent No. based on support vector machine: 2006100973016) the emotion classification under the statement that comprises certain emotion.

At present, aspect affective feature extraction, in speech emotional identification, conventional affective characteristics parameter is prosodic features and tonequality feature, and the former comprises fundamental frequency, amplitude and pronunciation duration, and the latter comprises resonance peak, frequency band energy distribution, harmonic noise ratio and short-term jitter parameter etc.But the antinoise effect that these characteristic parameters itself show is very limited.Therefore, only use prosodic features and tonequality feature, be difficult to obtain good speech emotional recognition performance under noise background.In order to improve the antinoise effect of characteristic parameter, be necessary that the characteristic parameter that extracts other type is as spectrum signature, itself and prosodic features and tonequality feature are merged mutually.A kind of representational spectrum signature is exactly the Mel frequency cepstral coefficient (MFCC) that can reflect human hearing characteristic.

Aspect emotion identification method, the existing speech emotional knowledge method for distinguishing that has been successfully applied to mainly comprises: linear discrimination classification device (LDC), k nearest neighbor method (KNN), artificial neural network (ANN) and support vector machine (SVM).But these recognition methodss are more responsive to noise ratio, be difficult to obtain good robustness speech emotion recognition performance.Therefore, be necessary to develop new high performance robustness speech emotion identification method.

Introduce again compressed sensing (CS) technology.

Compressed sensing (CS) (is shown in document: E. J. Candes, M. B. Wakin. An introduction to compressive sampling. IEEE Signal Processing Magazine, 2008, 25 (2): 21-30) as a kind of brand-new signal, process and sampling theory, its core concept is, as long as signal can compress, or be sparse at certain transform domain, just can adopt one with the incoherent observing matrix of transform-based, the resulting high dimensional signal of conversion to be projected on a lower dimensional space, then by solving an optimization problem, just can in the middle of these a small amount of projections, with high probability, reconstruct original signal.Under this theoretical frame, sampling rate is no longer decided by the bandwidth of signal, and is decided by structure and the content of information in signal.

The original intention of compressed sensing (CS) research is compression and the expression for signal, but its most sparse expression has good identification, can be used for building sorter and (see document: Guha T, Ward RK. Learning Sparse Representations for Human Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012,34 (8): 1576-1588.).At present, in existing speech emotional Study of recognition document, yet there are no and adopt the identification of the rarefaction representation in compressive sensing theory as the robustness recognition methods of speech emotional identification.The present invention utilizes the identification of the rarefaction representation in compressive sensing theory to realize the robustness speech emotion recognition under noise background.

Summary of the invention

Object of the present invention is exactly in order to overcome the deficiency of above-mentioned existing emotion recognition technology, and a kind of robustness speech emotion identification method based on compressed sensing is provided, for realizing the robustness speech emotion recognition under noise background.

The technical solution adopted in the present invention is:

A robustness speech emotion identification method based on compressed sensing, the method comprises following steps:

Produce Noise emotional speech sample, set up acoustic feature extraction module, build rarefaction representation sorter model, output speech emotional recognition result;

(1) produce the emotional speech sample of Noise, comprising:

By all speech samples of emotional speech Sample Storehouse, be divided into training sample and test sample book two parts, then each training sample and test sample book are added to white Gaussian noise, thereby produce the emotional speech sample of Noise;

(2) set up acoustic feature extraction module, comprising:

The emotional speech sample of Noise is carried out to acoustic feature extraction, and this acoustic feature extraction module comprises three parts: prosodic features parameter extraction, tonequality characteristic parameter extraction, Mel frequency cepstral coefficient MFCC extract;

(2-1) prosodic features parameter extraction, comprising: fundamental frequency, amplitude and pronunciation duration;

(2-2) tonequality characteristic parameter extraction, comprising: resonance peak, frequency band energy distribution, harmonic noise ratio and short-term jitter parameter;

(2-3) Mel frequency cepstral coefficient MFCC extracts, and comprising: extract 13 dimension MFCC feature and single order and second derivative parameters, then calculate their mean value and standard deviation;

(3) build rarefaction representation sorter model, comprising:

By acoustic feature extraction module, each emotional speech sample corresponding an eigenvector being formed by the acoustical characteristic parameters of extracting; The corresponding eigenvector of all emotional speech samples is all input in rarefaction representation sorter, for building rarefaction representation sorter model;

The method that builds rarefaction representation sorter is, first adopt the method for Its Sparse Decomposition, with training sample, test sample book is carried out to rarefaction representation, training sample is seen as to one group of base, by solving the method for 1-Norm minimum, obtain the rarefaction representation coefficient of test sample book, finally by the residual error after test sample book and rarefaction representation, classify;

(4) output speech emotional recognition result, comprising:

By the training and testing of rarefaction representation sorter, output speech emotional recognition result, in emotion recognition test, adopt crosscheck technology 10 times, to be divided equally be 10 parts to all statements, each 9 piece of data wherein of using are used for training, 1 remaining piece of data is for test, and the corresponding repetition of such identification experimentation 10 times, finally gets the mean value of 10 times as recognition result.

Described fundamental frequency adopts correlation method to extract the pitch contour curve of emotional speech, then calculate 10 statistics parameters of this fundamental curve, comprise maximal value, minimum value, variation range, upper quartile, median, lower quartile, interior four minutes extreme values, mean value, standard deviation, average absolute gradient;

Described amplitude adopts a square summation approach to ask for, and extracts 9 statistics parameters that amplitude is relevant, comprises mean value, standard deviation, maximal value, minimum value, variation range, upper quartile, median, lower quartile, interior four minutes extreme values;

The described pronunciation duration: the pronunciation duration characterizes the otherness of speaking on time construction of different emotions voice, extract 6 of relevant parameters of pronunciation duration, comprise that pronunciation continues T.T., the ratio of sound pronunciation duration, noiseless pronunciation duration, sound and noiseless time, the ratio of sound and the T.T. of pronouncing, the ratio of noiseless and T.T. of pronouncing.

Described resonance peak: adopt Burger Burg method to calculate 14 rank linear predictor coefficient LPC of emotional speech, with peak value, detect again the shared bandwidth of median of mean value, standard deviation, median and these three resonance peaks that method calculates first, second, third resonance peak F1, F2, F3, extract altogether 12 resonance peak relevant feature parameters;

Described frequency band energy distributes: extract the energy distribution parameter S ED of 5 different frequency bands, i.e. the frequency band energy mean value SED of 0-500Hz ₅₀₀, 500-1000Hz frequency band energy mean value SED ₁₀₀₀, 1000-2500Hz frequency band energy mean value SED ₂₅₀₀, 2500-4000Hz frequency band energy mean value SED ₄₀₀₀, 4000-5000Hz frequency band energy mean value SED ₅₀₀₀;

Described harmonic noise ratio: extract harmonic noise than the mean value of HNR, standard deviation, minimum value, maximal value, variation range, its computing formula is:

HNR = 10 \log_{10} [Σ_{i = 1}^{N} h {(i)}^{2} / Σ_{i = 1}^{N} n {(i)}^{2}]

(formula 1)

Described short-term jitter parameter: comprise jitter Jitter and Shimmer Shimmer, they represent respectively the subtle change of fundamental frequency and amplitude, can obtain by calculating the slope variation of fundamental curve and amplitude curve;

The computing formula of jitter Jitter is defined as:

Jitter (%) = Σ_{i = 1}^{N - 1} (2 T_{i} - T_{i - 1} - T_{i + 1}) / Σ_{i = 2}^{N - 1} T_{i}

(formula 2)

In formula, T _irepresent i peak-to-peak phase, the number that N is the peak-to-peak phase;

The computing formula of Shimmer Shimmer is defined as:

Shimmer (%) = Σ_{i = 2}^{N - 1} (2 E_{i} - E_{i - 1} - E_{i + 1}) / Σ_{i = 2}^{N - 1} E_{i}

(formula 3)

In formula, E _irepresent i peak-to-peak energy.

The method of described structure rarefaction representation sorter, concrete steps are as follows:

The training sample of given a certain class, test sample book is seen the linear combination of similar training sample as,

y_{k, test} = α_{k, 1} y_{k, 1} + α_{k, 2} y_{k, 2} + . . . + α_{k, n_{k}} y_{{k, n}_{k}} + ϵ_{k} = Σ_{i = 1}^{n_{k}} α_{k, i} y_{k, i} + ϵ_{k}

(formula 1)

In formula, y _{k, test}represent k ^ththe test sample book of class, y _k,irepresent k ^ththe i of class ^thindividual training sample, α _k,ithe weight vector that represents corresponding training sample, ε _krepresent error;

For other training sample of all target class, (formula 1) can be expressed as:

\begin{matrix} y_{k, test} = α_{1,1} y_{1,1} + . . . + α_{k, 1} y_{k, 1} + . . . + α_{k, n_{k}} y_{k, n_{k}} + . . . + α_{c, n_{c}} y_{c, n_{c}} + ϵ \\ = Σ_{i = 1}^{n_{1}} α_{1, i} y_{1, i} + . . . + Σ_{i = k}^{n_{k}} α_{k, i} y_{k, i} + . . . + α_{c, i} y_{c, i} + ϵ \end{matrix}

(formula 2)

In formula, c represents total classification number of all training samples;

Adopt the form of matrix to represent (formula 2),

Y _{k, test}=A α+ε (formula 3)

Wherein

\{\begin{matrix} A = [y_{1,1} | . . . | y_{1, n_{1}} | . . . | y_{k, 1} | . . . | y_{k, n_{k}} | . . . | y_{c, 1} | . . . | y_{c, n_{c}}] \\ α = [α_{1,1} . . . α_{1, n_{1}} . . . α_{k, 1} . . . α_{k, n_{k}} . . . α_{c, 1} . . . α_{c, n_{c}}]^{'} \end{matrix}

(formula 4)

In rarefaction representation sorter, in claim vector α except with k ^thoutside the relevant element of class, remaining element should be all zero, in order to obtain weight vector α, need to solve the optimization problem under L-0 norm meaning below:

\min_{α} {| | α | |}_{0}, s . t . {| | y_{k, test} - Aα | |}_{2} \leq ϵ

(formula 5)

For solving (formula 5), L-0 norm optimization problem is converted into L-1 norm duty Optimization:

\min_{α} {| | α | |}_{1}, s . t . {| | y_{k, test} - Aα | |}_{2} \leq ϵ

(formula 6)

This is a protruding optimization problem, can be converted into linear programming problem and solve;

In order further to improve the noise robustness of rarefaction representation, the L-1 norm optimization problem of a weighting of design, (formula 6) can be expressed as

\min_{α} {| | α | |}_{1}, s . t . {| | W (y_{k, test} - Aα) | |}_{2} \leq ϵ

(formula 7)

Wherein, weight factor variable W can be expressed as

W_{i} = e^{- \frac{{| | y - y_{recons} (i) | |}_{2}}{2 σ^{2}}}

(formula 8)

In formula, σ is a constant, y _recons(i)=A α _irepresent that one based on weight vector α _ireconstructed sample, wherein, constant σ is made as 1, for the larger data of noise ratio, residual values || y _test-y _test(i) || ₂will be larger, its corresponding weight factor variable can be smaller, thus impact that can attenuating noise;

A given new test sample y _test, first by solving (formula 7), obtain weight vector α, if the corresponding k of coefficient value of maximum in the middle of the nonzero coefficient of weight vector α ^thclass, and by y _testbe included in the middle of this classification, or by y _testbe included in the middle of the corresponding classification of coefficient value maximum in weight vector α.

5, the robustness speech emotion identification method based on compressed sensing as claimed in claim 1, is characterized in that:

The training and testing of described rarefaction representation sorter, comprises the following steps:

(4-1) with the eigenvector of training sample, each class emotion test sample is carried out to rarefaction representation, the test sample book of a given class emotion, obtains its weight vector α by solving the L-1 norm optimization problem of (formula 7);

(4-2) to each class emotion (i=1,2 ..., 7) test sample y _test, be first similar to and reconstruct a new samples, be designated as: , then calculate new samples and the y of this reconstruct _testresidual error, i.e. r (y _test, i)=|| y _test-y _test(i) || ₂;

(4-3) getting residual error is that the classification number i of minimum value is as test sample y _testemotion classification, i.e. identify (y _test)=arg min _ir(y _test, i), thus the recognition result of output different emotions classification.

In described emotional speech Sample Storehouse, choose anger, happiness, sadness, fear, dislike, be sick of and seven kinds of emotional speech samples of ameleia.

Beneficial effect effect of the present invention is:

1. the emotional speech fully taking into account in physical environment can be subject to the impact of noise conventionally, and the emotion identification method of the robustness speech under a kind of noise background is provided.

2. fully take into account the validity of dissimilar characteristic parameter, the extraction of characteristic parameter, from prosodic features and tonequality feature two aspects, is extended to Mel frequency cepstral coefficient MFCC, further improve the antinoise effect of characteristic parameter.

3. utilize the identification of the rarefaction representation in compressive sensing theory, a kind of high performance robustness speech emotion identification method based on compressive sensing theory is provided.

Accompanying drawing explanation

Fig. 1---speech emotional recognition system block diagram.

Fig. 2---the statistics of emotion acoustic feature parameter.

Fig. 3---under different signal to noise ratio snr, the obtained speech emotional recognition performance (%) of distinct methods compares.

Fig. 4---the correct recognition rata (%) of obtained different emotions type when the inventive method behaves oneself best.

Embodiment

Fig. 1 is native system block diagram, mainly comprises two major parts: acoustic feature extracts, the training and testing of rarefaction representation sorter.

One, acoustic feature extracts

From German emotional speech Sample Storehouse, Berlin(is shown in document: Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B. A database of German emotional speech. In:Proceedings of. Interspeech-2005, Lisbon, Portugal, 2005, pp. 1-4.), choose anger, happiness, sadness, fear, dislike, be sick of and neutral (ameleia) seven kinds of emotional speech samples, totally 535.Each emotional speech sample of choosing is added to white Gaussian noise, and through pre-emphasis, minute frame and windowing pre-service, wherein frame length is 10ms.Then extract the acoustical characteristic parameters of three aspects: prosodic features, tonequality feature and Mel frequency cepstral coefficient MFCC.Fig. 2 has provided the statistical conditions of the emotion acoustic feature parameter of these three aspects of extracting, 204 altogether.The concrete condition of these characteristic parameter extraction, is expressed as follows:

1. prosodic features parameter extraction: comprise fundamental frequency, amplitude and pronunciation duration.

(1-1) fundamental frequency: adopt correlation method to extract the pitch contour curve of emotional speech, then calculate 10 statistics parameters of this fundamental curve, comprise maximal value, minimum value, variation range, upper quartile, median, lower quartile, interior four minutes extreme values, mean value, standard deviation, average absolute gradient.

(1-2) amplitude: adopt a square summation approach to ask for, extract 9 statistics parameters that amplitude is relevant, comprise mean value, standard deviation, maximal value, minimum value, variation range, upper quartile, median, lower quartile, interior four minutes extreme values.

(1-3) the pronunciation duration: the pronunciation duration characterizes the otherness of speaking on time construction of different emotions voice, extract 6 of relevant parameters of pronunciation duration, comprise that pronunciation continues T.T., the ratio of sound pronunciation duration, noiseless pronunciation duration, sound and noiseless time, the ratio of sound and the T.T. of pronouncing, the ratio of noiseless and T.T. of pronouncing.

2. tonequality characteristic parameter extraction: comprise that resonance peak, frequency band energy distribute, harmonic noise ratio, and short-term jitter parameter.

(2-1) resonance peak: adopt Burger Burg method to calculate 14 rank linear predictor coefficient LPC of emotional speech, with peak value, detect again the shared bandwidth of median of mean value, standard deviation, median and these three resonance peaks that method calculates first, second, third resonance peak F1, F2, F3, extract altogether 12 resonance peak relevant feature parameters.The criterion of approaching of Burger Burg method is exactly to make forward and reverse prediction square error sum of lattice filter minimum.(see document: Erkelens JS, Broersen PMT. Bias propagation in the autocorrelation method of linear prediction[J]. IEEE Transactions on Speech and Audio Processing, 1997,5 (2): 116-119.)

(2-2) frequency band energy distributes: extract the energy distribution parameter S ED of 5 different frequency bands, i.e. the frequency band energy mean value SED of 0-500Hz ₅₀₀, 500-1000Hz frequency band energy mean value SED ₁₀₀₀, 1000-2500Hz frequency band energy mean value SED ₂₅₀₀, 2500-4000Hz frequency band energy mean value SED ₄₀₀₀, 4000-5000Hz frequency band energy mean value SED ₅₀₀₀.

(2-3) harmonic noise ratio: extract harmonic noise than the mean value of HNR, standard deviation, minimum value, maximal value, variation range, its computing formula is:

HNR = 10 \log_{10} [Σ_{i = 1}^{N} h {(i)}^{2} / Σ_{i = 1}^{N} n {(i)}^{2}]

(formula 1)

(2-4) short-term jitter parameter: comprise that (Jitter and Shimmer Shimmer, they represent respectively the subtle change of fundamental frequency and amplitude to jitter, can obtain by calculating the slope variation of fundamental curve and amplitude curve.

The computing formula of jitter (Jitter) is defined as:

Jitter (%) = Σ_{i = 1}^{N - 1} (2 T_{i} - T_{i - 1} - T_{i + 1}) / Σ_{i = 2}^{N - 1} T_{i}

(formula 2)

In formula, T _irepresent i peak-to-peak phase, the number that N is the peak-to-peak phase.

The computing formula of Shimmer (Shimmer) is defined as:

Shimmer (%) = Σ_{i = 2}^{N - 1} (2 E_{i} - E_{i - 1} - E_{i + 1}) / Σ_{i = 2}^{N - 1} E_{i}

(formula 3)

In formula, E _irepresent i peak-to-peak energy.

3. Mel frequency cepstral coefficient MFCC: extract 13 dimension MFCC feature and single order and second derivative parameters, then calculate their mean value and standard deviation.

Two, the training and testing of rarefaction representation sorter

The training and testing step of rarefaction representation sorter comprises:

1. with the eigenvector of training sample, each class emotion test sample is carried out to rarefaction representation, the test sample book of given a certain class emotion, obtains its weight vector by solving the L-1 norm optimization problem of (formula 7).

To each class emotion (i=1,2 ..., 7) test sample y _test, be first similar to and reconstruct a new samples, be designated as: , then calculate new samples and the y of this reconstruct _testresidual error, i.e. r (y _test, i)=|| y _test-y _recons(i) || ₂.

3. get residual error and be the classification number i of minimum value as test sample y _testemotion classification, i.e. identify (y _test)=arg min _ir(y _test, i), thus the recognition result of output different emotions classification.

Three, the evaluation of recognition system

In order to improve the confidence level of test result, in emotion recognition test, adopt crosscheck technology 10 times.

Fig. 3 has provided the inventive method and other four kinds of recognition methodss in the situation that of different signal to noise ratio (snr), as linear discrimination classification device (LDC), k nearest neighbor method (KNN), artificial neural network (ANN) and support vector machine (SVM), obtained speech emotional recognition performance (%) relatively.The value of signal to noise ratio (snr), before this in noiseless (the acoustic feature data that directly the emotion statement from Berlin database extracts) situation, and then from 30dB, reduces 5dB successively, until-10dB cut-off.This result shows, uses the inventive method obtained speech emotional recognition performance under various signal to noise ratio (S/N ratio) conditions all will obviously be better than other four kinds of recognition methodss.Visible, use the inventive method can obtain excellent robustness speech emotion recognition performance.In addition,, under noise-free case, the inventive method has also obtained best recognition performance.When Fig. 4 has provided use the inventive method and has behaved oneself best, i.e. the correct recognition rata (%) of obtained different emotions type under noise-free case.Wherein, the correct recognition rata that in Fig. 4, each concrete affective style of diagonal line runic data representation obtains.

Claims

1. the robustness speech emotion identification method based on compressed sensing, is characterized in that, the method comprises following steps:

(1) produce the emotional speech sample of Noise, comprising:

(2) set up acoustic feature extraction module, comprising:

(3) build rarefaction representation sorter model, comprising:

(formula 1)

For other training sample of all target class, (formula 1) can be expressed as:

(formula 2)

In formula, c represents total classification number of all training samples;

Adopt the form of matrix to represent (formula 2),

Y _{k, test}=A α+ε (formula 3)

Wherein

(formula 4)

In theory, in rarefaction representation sorter, in claim vector α except with k ^thoutside the relevant element of class, remaining element should be all zero; In order to obtain weight vector α, need to solve the optimization problem under L-0 norm meaning below:

(formula 5)

(formula 6)

In order further to improve the noise robustness of rarefaction representation, the L-1 norm optimization problem of a weighting of design, (formula 6) can be expressed as:

(formula 7)

Wherein, weight factor variable W can be expressed as:

(formula 8)

In formula, σ is a constant, y _recons(i)=A α _irepresent that one based on weight vector α _ireconstructed sample, wherein, constant σ is made as 1, for the larger data of noise ratio, residual values || y-y _recons(i) || ₂will be larger, its corresponding weight factor variable can be smaller;

A given new test sample y _test, first by solving (formula 7), obtain weight vector α, if the corresponding k of coefficient value of maximum in the middle of the nonzero coefficient of weight vector α ^thclass, and by y _testbe included in the middle of this classification, or by y _testbe included in the middle of the corresponding classification of coefficient value maximum in weight vector α;

(4) output speech emotional recognition result, comprising:

2. the robustness speech emotion identification method based on compressed sensing as claimed in claim 1, is characterized in that:

3. the robustness speech emotion identification method based on compressed sensing as claimed in claim 1, is characterized in that,

(formula 1)

The computing formula of jitter Jitter is defined as:

(formula 2)

The computing formula of Shimmer Shimmer is defined as:

(formula 3)

In formula, E _irepresent i peak-to-peak energy.

4. the robustness speech emotion identification method based on compressed sensing as claimed in claim 1, is characterized in that:

(4-2) test sample y to each class emotion (i=1,2, L, 7) _test, be first similar to and reconstruct a new samples, be designated as: then calculate new samples and the y of this reconstruct _testresidual error, i.e. r (y _test, i)=|| y _test-y _recons(i) || ₂;

5. the robustness speech emotion identification method based on compressed sensing as described in claim 1-4 any one, it is characterized in that, in described emotional speech Sample Storehouse, choose anger, happiness, sadness, fear, dislike, be sick of and seven kinds of emotional speech samples of ameleia.