CN102592593B - Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech - Google Patents
Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech Download PDFInfo
- Publication number
- CN102592593B CN102592593B CN201210091525.1A CN201210091525A CN102592593B CN 102592593 B CN102592593 B CN 102592593B CN 201210091525 A CN201210091525 A CN 201210091525A CN 102592593 B CN102592593 B CN 102592593B
- Authority
- CN
- China
- Prior art keywords
- centerdot
- matrix
- rank
- overbar
- carry out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Complex Calculations (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an emotional-characteristic extraction method implemented through considering the sparsity of a multilinear group in a speech. The method comprising the following steps: considering multiple factors such as time, frequency, scale and direction information included in a speech signal; carrying out characteristic extraction by using a sparse decomposition method for multilinear groups; carrying out multilinear characterization on an energy spectrum of the speech signal through Gabor functions with different scales and the directions; solving a characteristic projection matrix by using a sparse tensor decomposition method for groups; calculating a characteristic projection with a frequency order; carrying out characteristic decorrelation through discrete cosine transform; and finally, calculating first-order and second-order differential coefficients so as to obtain the emotional characteristics of the speed. According to the invention, the factors such as time, frequency, scale and direction and the like in a speech signal is taken into consideration and used for extracting emotional characteristics, and the characteristic projection is performed by using a sparse tensor decomposition method for groups, thereby finally improving the accuracy rate of various speech emotion recognitions.
Description
Technical field
The present invention relates to a kind ofly for improving the voice mood feature extracting method of voice mood recognition performance, belong to the voice process technology field.
Background technology
Voice are one of modes the most easily that people are exchanged in daily life, and this also makes the researchist try to explore how to utilize voice as the instrument exchanged between people and machine.Except traditional interactive modes such as speech recognition, speaker's mood is also a kind of important interactive information, and machine Understanding speaker's mood automatically is one of important symbol of human-computer interaction intelligent.
Voice mood is identified in the signal processing and the intelligent human-machine interaction field has important value, and a lot of potential application are arranged.Aspect man-machine interaction, the mood of identifying the speaker by computing machine can improve cordiality and the accuracy of system, and for example long-distance educational system can be adjusted course in time by identification student's mood, thereby promotes teaching efficiency; In telephone contact center and mobile communication, can obtain in time user's emotional information, improve the quality of service; Whether the energy that onboard system can detect the driver by Emotion identification is concentrated, and makes corresponding auxiliary warning.Aspect medical science, voice-based Emotion identification can be used as a kind of instrument, helps the doctor to be diagnosed patient's the state of an illness.
For voice mood identification, an important problem is exactly how to extract effective feature to be used for meaning different moods.According to traditional feature extracting method, usually one section voice signal can be divided into to multiframe, in order to obtain approximate signal stably.The periodic feature obtained from each frame is called local feature, such as fundamental tone, energy etc., its advantage is that existing sorter can utilize local feature to estimate comparatively accurately the parameter of different emotional states, shortcoming is that intrinsic dimensionality and sample number are more, has influence on the speed of feature extraction and classification.Add up and obtain feature and be called global characteristics by the feature to whole sentence, its advantage is to obtain nicety of grading and speed preferably, but has lost the time sequence information of voice signal, the problem of lack of training samples easily occurs.Generally, voice mood is identified feature commonly used following a few class: continuous acoustic feature, spectrum signature, the feature based on the Teager energy operator etc.
According to the result of study of psychology and metrics etc., speaker's mood in voice the most intuitively feature be exactly the continuous feature of the rhythm, as fundamental tone, energy, the speed of speaking etc.Corresponding global characteristics comprises the average, median, standard deviation, maximal value, minimum value of fundamental tone or energy etc., and first, second resonance peak etc.
Spectrum signature provides the useful frequency information in the voice signal, is also important feature extraction mode in voice mood identification.Spectrum signature commonly used comprises linear predictor coefficient (LPC), linear prediction cepstrum coefficient coefficient (LPCC), Mel frequency cepstrum coefficient (MFCC), perceptual weighting linear prediction (PLP) etc.
Voice are that Nonlinear Space air-flow in sonification system produces, and Teager energy operator (TEO) is that a kind of that the people such as Teager proposes can follow the tracks of the arithmetic operation that in the glottis cycle, signal energy changes fast, for the fine structure of analyzing speech.Under different emotional states, the flexible situation of muscle can affect the motion of sonification system hollow air-flow, according to the people's such as Bou-Ghazale result of study, can know, the feature based on TEO can be used for detecting the intense strain in voice.
According to numerous experimental evaluation results, for voice mood identification, select suitable characteristic present for different classification task, the feature based on the Teager energy is suitable for detecting the intense strain in voice signal; Acoustic feature is applicable to distinguishing high mood (high-arousal emotion) and the low mood (low-arousal emotion) of waking up waken up continuously; And, for the mood classification task of multiclass, the voice that spectrum signature is best suited for characterize, if spectrum signature is combined with acoustic feature continuously, or consider the association analysis of many factors, also can reach the purpose of raising nicety of grading.
At voice mood feature extraction and the another one important stage after having selected, classify exactly.In area of pattern recognition, various sorters all are used to the voice mood feature is classified at present, comprise Hidden Markov Model (HMM) (HMM), gauss hybrid models (GMM), support vector machine (SVM), linear discriminant analysis (LDA) and integrated classifier etc.Hidden Markov Model (HMM) is one of recognizer the most widely of application in voice mood identification, this has benefited from its generally application in voice signal, be particularly useful for processing the data with sequential organization, from current result of study, the Emotion identification system based on Hidden Markov Model (HMM) can provide than the high-class accuracy rate.Gauss hybrid models can be regarded as the Hidden Markov Model (HMM) of only having a state, is very suitable for polynary distribution is carried out to modeling, and the people such as Breazeal utilize GMM to be applied to the KISMET speech database as sorter, and five class moods are carried out to Classification and Identification.The support vector machine area of pattern recognition that has been widely used, its ultimate principle is to higher dimensional space, to make characteristic line to divide Projection Character by kernel function, compare HMM and GMM, it has advantages of training algorithm global optimum and Existence dependency in the extensive border of data, and many results of study are utilize support vector machine as the sorter of voice mood identification and obtained classifying quality preferably.
As shown in Figure 1, traditional voice mood recognition methods based on spectrum signature adopts following steps usually:
1) pre-service is carried out in the voice signal of input, comprise windowing, filtering, pre-emphasis etc.;
2) signal is carried out to short time discrete Fourier transform, by the Mei Er quarter window, carry out filtering, then ask logarithmic spectrum (getting log);
3) utilize discrete cosine transform to calculate cepstrum, then weighting, ask cepstral mean to subtract, and calculates difference;
4) utilize gauss hybrid models (GMM) to be trained, obtain the model of different moods;
5) mood model obtained by training, identified test data, obtains recognition accuracy.
At present for two class mood classification, as negative emotions and neutral mood, reached nicety of grading relatively preferably, but the classification for the multiclass mood, due to the unbalancedness of data, only consider the reasons such as single factors (frequency or time), make the feature property distinguished poor, the mood nicety of grading is relatively low, and this makes voice-based Emotion identification system applies be restricted.
Summary of the invention
Only consider single factors for the feature extraction in the traditional voice Emotion identification, as frequency or time, make the poor problem of the feature property distinguished, the present invention proposes a kind of voice mood feature extracting method of considering polyteny group sparse characteristic in voice, identifying and can improve multiclass Emotion identification accuracy rate for voice mood.
The emotional characteristics extracting method of polyteny group sparse characteristic in consideration voice of the present invention is:
Consider that voice signal comprises the multiple factors of time, frequency, yardstick and directional information, utilize the method for polyteny group Its Sparse Decomposition to carry out feature extraction, Gabor function by different scale and direction carries out the polyteny sign to the speech signal energy spectrum, utilize the sparse tensor resolution method of group to solve the Projection Character matrix, Projection Character on the calculated rate rank, to the feature decorrelation, obtain single order and the second order difference coefficient of feature through discrete cosine transform by difference; Specifically comprise the following steps:
(1) gather voice signal s (t) (by equipment collections such as microphones), utilize Short Time Fourier Transform that s (t) is transformed to time-frequency domain, obtain time-frequency representation S (f, t) and the energy spectrum P (f, t) of signal;
(2) utilize the two-dimensional Gabor function with different scale and direction to carry out convolutional filtering to energy spectrum, the Gabor function definition is as follows:
Wherein:
=P (f, t) is the element that energy spectrum P (f, t) is f in t frame, frequency;
be the yardstick of control function and the vector of direction, j means imaginary part unit, k
v=2
-(v+2)/2, φ=u (π/K), the direction of u representative function, the yardstick of v representative function, K means total direction number, σ is the constant of determining the function envelope, is made as 2 π.
The Gabor function is the polyteny sign of voice signal to the result of energy spectrum P (f, t) convolutional filtering
here
that a size is
5 rank tensors, each rank mean respectively time, frequency, direction, yardstick and classification, then right
the frequency rank carry out the filtering of Mei Er quarter window and obtain 5 new rank tensors
p,
psize be N
1* N
2* N
3* N
4* N
5, the length of every single order is N
i, i=1 ..., 5;
(3) polyteny obtained is characterized
pcarry out the sparse tensor resolution of group, calculate the projection matrix U on different factors
(i), i=1 ..., 5, in order to carry out Projection Character, set up following decomposition model:
P≈
Λ×
1U
(1)×
2U
(2)×
3U
(3)×
4U
(4)×
5U
(5)
Wherein, U
(i)that the size that decomposition obtains afterwards is N
ithe projection matrix of * K;
Λbe the 5 rank tensors that diagonal element is 1, size is K * K * K * K * K; *
imean the Matrix Multiplication computing of tensor i rank, it is defined as follows:
Wherein
xmean that a size is N
1* ... * N
mm rank tensor, A is that a size is N
ithe matrix of * K,
it is tensor
xelement,
it is the element of matrix A;
Calculate projection matrix U
(i), i=1 ... the concrete decomposable process of I is as follows, and i means the index of rank (corresponding different factors) here, I=5:
1. adopt alternately lowest mean square or random initializtion U
(i)>=0, i=1 ..., I;
2. to projection matrix U
(i), i=1 ..., each column vector of I
i=1 ..., I, k=1 ..., K carries out normalization;
3. error objective function
while being greater than certain threshold value, following operation is carried out in circulation:
● from i=1 to I, carry out successively:
Wherein, || ||
fmean the Frobenius norm,
it is tensor
p (k)i rank tensor matrixes launch,
⊙ is that the Khatri-Rao of matrix is long-pending, and o means vectorial apposition, λ
kand q
ibe for regulating the weight coefficient of objective function composition degree of rarefication, get the numerical value between 0 to 1;
If ● i ≠ 5,
Wherein
mean
transposition, if i=5,
4. work as objective function
ewhile being less than certain threshold value, circulation finishes, and calculates projection matrix U
(i), i=1 ..., I;
(4) utilize the U of the projection matrix corresponding to frequency domain obtained
(2)polyteny to voice signal characterizes
pcarry out Projection Character:
Wherein, [Y]
+=max (0, Y) mean to choose the matrix that the non-negative element in matrix Y forms, if element is less than 0, be set to 0,
projection matrix U
(2)the matrix that the non-negative element of pseudoinverse forms, *
2representing matrix
with
pcarry out 2 rank Matrix Multiplications of tensor;
(5) the time rank are fixed, to the polyteny sparse representation obtained
scarry out tensor and launch operation, obtain size and be
eigenmatrix S
(f), wherein
(6) utilize discrete cosine transform to S
(f)carry out decorrelation, obtain voice mood feature F, the single order of calculated characteristics and second order difference coefficient obtain final emotional characteristics.
The present invention considers the feature extraction for mood of factors such as time, frequency, yardstick and direction in voice signal, utilizes the sparse tensor resolution method of group to carry out Projection Character, has finally improved the accuracy rate of multiclass voice mood identification.
The accompanying drawing explanation
Fig. 1 is the schematic block diagram of traditional voice Emotion identification process;
Fig. 2 is the schematic diagram of feature extracting method of the present invention;
Fig. 3 is the schematic block diagram that adopts voice mood identifying of the present invention.
Fig. 4 is the experimental result comparison diagram to four class voice mood identifications.
Embodiment
As shown in Figure 2, the voice mood recognition methods based on polyteny group sparse features of the present invention specifically comprises the following steps:
(1) collect voice signal s (t) by equipment such as microphones, utilize Short Time Fourier Transform that s (t) is transformed to time-frequency domain, obtain time-frequency representation S (f, t) and the energy spectrum P (f, t) of signal;
(2) utilize the two-dimensional Gabor function with different scale and direction to carry out convolutional filtering to energy spectrum, the polyteny that obtains voice signal characterizes
then right
the frequency rank carry out the filtering of Mei Er quarter window and characterized
p;
The Gabor function definition is as follows:
Wherein:
it is the element that energy spectrum P (f, t) is f in t frame, frequency;
be the yardstick of control function and the vector of direction, j means imaginary part unit, k
v=2
-(v+2)/2π, φ=u (π/K), the direction of u representative function, the yardstick of v representative function, K means total direction number, σ is the constant of determining the function envelope, is made as 2 π.
The Gabor function is the polyteny sign of voice signal to the result of energy spectrum P (f, t) convolutional filtering
here
that a size is
5 rank tensors, each rank mean respectively time, frequency, direction, yardstick and classification, then right
the frequency rank carry out the filtering of Mei Er quarter window and obtain 5 new rank tensors
p,
psize be N
1* N
2* N
3* N
4* N
5, the length of every single order is N
i, i=1 ..., 5;
(3) to characterizing
pcarry out the sparse tensor resolution of group, calculate the projection matrix U on different factors
(i), i=1 ..., 5, in order to carry out Projection Character.Set up following decomposition model:
P≈Λ×
1U
(1)×
2U
(2)×
3U
(3)×
4U
(4)×
5U
(5)
Wherein, U
(i)that the size that decomposition obtains afterwards is N
ithe projection matrix of * K;
Λbe the 5 rank tensors that diagonal element is 1, size is K * K * K * K * K; *
imean the Matrix Multiplication computing of tensor i rank, it is defined as follows:
Wherein
xmean that a size is N
1* ... * N
mm rank tensor, A is that a size is N
ithe matrix of * K,
it is tensor
xelement,
it is the element of matrix A.
For calculating projection matrix U
(i), i=1 ..., I, I=5 here, concrete decomposable process is as follows:
A) adopt alternately lowest mean square or random initializtion U
(i)>=0, i=1 ..., I;
B) to projection matrix U
(i), i=1 ..., each column vector of I
i=1 ..., I, k=1 ..., K carries out normalization;
C) error objective function
while being greater than certain threshold value, circulation is carried out to finish drilling
Do:
● from n=1 to I, carry out successively
Wherein, || ||
fmean the Frobenius norm,
it is tensor
p (k)i rank tensor matrixes launch,
⊙ is that the Khatri-Rao of matrix is long-pending, and o means vectorial apposition, λ
kand q
ibe for regulating the weight coefficient of objective function composition degree of rarefication, get the numerical value between 0 to 1;
D) work as objective function
ewhile being less than certain threshold value, circulation finishes, and calculates projection matrix U
(i), i=1 ... I;
(4) utilize the U of the projection matrix corresponding to frequency domain obtained
(2)polyteny to voice signal characterizes
pcarry out Projection Character:
Wherein, [Y]
+=max (0, Y) mean to choose the matrix that the non-negative element in matrix Y forms, if element is less than 0, be set to 0,
projection matrix U
(2)the matrix that the non-negative element of pseudoinverse forms, *
2representing matrix
with
pcarry out 2 rank Matrix Multiplications of tensor;
(5) the time rank are fixed, to the polyteny sparse representation obtained
scarry out tensor and launch operation, obtain size and be
eigenmatrix S
(f), wherein
(6) utilize discrete cosine transform to S
(f)carry out decorrelation, obtain voice mood feature F, the single order of calculated characteristics and second order difference coefficient obtain final emotional characteristics.
As shown in Figure 3, adopt above-mentioned feature extracting method to carry out the process of voice mood identification, comprise the following steps:
1) obtain the voice signal data s that there are different mood labels
l(t), l=1 ..., L, the different moods of total L class;
2) utilize the feature extracting method shown in Fig. 2 to be extracted the feature F of different moods
3) utilize mixed Gaussian mixture model (GMM) to carry out modeling to different emotional characteristicses, by learning training, obtain the corresponding mood model M of mood of l class
l;
4) when the voice signal of given unknown type of emotion
while being tested, the mood model M that utilizes GMM to set up
l, l=1 ..., L, carry out the measuring and calculation maximum posteriori probability successively, obtains the mood classification of maximum probability, is the Emotion identification result of this voice signal.
Effect of the present invention can further illustrate by experiment.
The recognition performance of the feature extracting method of the present invention's proposition has been tested in experiment on FAU Aibo data set, and 4 class moods (Anger, Emphatic, Neutral, Rest) are identified.The sampling rate of this experiment voice signal is 8kHz, adopt Hamming window to carry out windowing, the 23ms window is long, the 10ms window moves, utilize Short Time Fourier Transform to calculate the energy spectrum of signal, there are 4 different yardsticks and 4 different directions Gabor functions carry out the time-frequency convolutional filtering to energy spectrum, adopt the Mel bank of filters calculating Mei Er power spectrum that size is 36, utilize projection matrix to carry out Projection Character on the frequency domain rank, utilize DCT to carry out decorrelation to feature.
Fig. 4 has provided the method for the present invention's proposition and the recognition performance of existing Feature Extraction Technology (MFCC and LFPC feature) compares, from final recognition accuracy, after adopting the present invention, the accuracy rate of multiclass voice mood identification effectively improves, than classic method, MFCC has improved 6.1%, than the LFPC method, has improved 5.8%.
Claims (2)
1. a voice mood feature extracting method of considering polyteny group sparse features in voice is characterized in that:
Consider that voice signal comprises the multiple factors of time, frequency, yardstick and directional information, utilize the method for polyteny group Its Sparse Decomposition to carry out feature extraction, Gabor function by different scale and direction carries out the polyteny sign to the speech signal energy spectrum, utilize the sparse tensor resolution method of group to solve the Projection Character matrix, Projection Character on the calculated rate rank, through discrete cosine transform, to the feature decorrelation, the single order of calculated characteristics and second order difference coefficient specifically comprise the following steps:
(1) gather voice signal s (t), utilize Short Time Fourier Transform that s (t) is transformed to time-frequency domain, obtain time-frequency representation S (f, t) and the energy spectrum P (f, t) of signal;
(2) utilize the two-dimensional Gabor function with different scale and direction to carry out convolutional filtering to energy spectrum, the Gabor function definition is as follows:
Wherein:
it is the element that energy spectrum P (f, t) is f in t frame, frequency;
be the yardstick of control function and the vector of direction, j means imaginary part unit, k
v=2
-(v+2)/2π, φ=u (π/K), the direction of u representative function, the yardstick of v representative function, K means total direction number, σ is the constant of determining the function envelope, is made as 2 π;
The Gabor function is the polyteny sign of voice signal to the result of energy spectrum P (f, t) convolutional filtering
here
that a size is
5 rank tensors, each rank mean respectively time, frequency, direction, yardstick and classification, then right
the frequency rank carry out the filtering of Mei Er quarter window and obtain 5 new rank tensors
p, its size is N
1* N
2* N
3* N
4* N
5, the length of every single order is N
i, i=1 ..., 5;
(3) polyteny obtained is characterized
pcarry out the sparse tensor resolution of group, calculate the projection matrix U on different factors
(i), i=1 ..., 5, in order to carry out Projection Character, set up following decomposition model:
P≈
Λ×
1U
(1)×
2U
(2)×
3U
(3)×
4U
(4)×
5U
(5),
Wherein, U
(i)that the size that decomposition obtains afterwards is N
ithe projection matrix of * K,
Λbe the 5 rank tensors that diagonal element is 1, size is K * K * K * K * K, *
imean the Matrix Multiplication computing of tensor i rank, it is defined as follows:
Wherein
xmean that a size is N
1* ... * N
mm rank tensor, A is that a size is N
ithe matrix of * K,
it is tensor
xelement,
it is the element of matrix A;
(4) utilize the U of the projection matrix corresponding to frequency domain obtained
(2)polyteny to voice signal characterizes
pcarry out Projection Character:
Wherein, [Y]
+=max (0, Y) mean to choose the matrix that the non-negative element in matrix Y forms, if element is less than 0, be set to 0,
, be projection matrix U
(2)the matrix that the non-negative element of pseudoinverse forms, *
2representing matrix
with
pcarry out 2 rank Matrix Multiplications of tensor;
(5) the time rank are fixed, to the polyteny sparse representation obtained
scarry out tensor and launch operation, obtain size and be
eigenmatrix S
(f), wherein
(6) utilize discrete cosine transform to S
(f)carry out decorrelation, obtain voice mood feature F, the single order of calculated characteristics and second order difference coefficient obtain final emotional characteristics.
2. the voice mood feature extracting method of polyteny group sparse features in consideration voice according to claim 1, is characterized in that: described calculating projection matrix U
(i), i=1 ..., the concrete decomposable process of I is as follows, and i means the index on rank here, I=5:
1. adopt alternately lowest mean square or random initializtion U
(i)>=0, i=1 ..., I;
2. to projection matrix U
(i), i=1 ..., each column vector of I
i=1 ..., I, k=1 ..., K carries out normalization;
3. error objective function
while being greater than certain threshold value, following operation is carried out in circulation:
● from n=1 to I, carry out successively:
Wherein, || ||
fmean the Frobenius norm,
it is tensor
p (k)i rank tensor matrixes launch,
the Khatri-Rao that is matrix is long-pending, and ο means vectorial apposition, λ
kand q
ibe for regulating the weight coefficient of objective function composition degree of rarefication, get the numerical value between 0 to 1;
4. work as objective function
ewhile being less than certain threshold value, circulation finishes, and calculates projection matrix U
(i), i=1 ..., I.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210091525.1A CN102592593B (en) | 2012-03-31 | 2012-03-31 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210091525.1A CN102592593B (en) | 2012-03-31 | 2012-03-31 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102592593A CN102592593A (en) | 2012-07-18 |
CN102592593B true CN102592593B (en) | 2014-01-01 |
Family
ID=46481134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210091525.1A Expired - Fee Related CN102592593B (en) | 2012-03-31 | 2012-03-31 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102592593B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102833918B (en) * | 2012-08-30 | 2015-07-15 | 四川长虹电器股份有限公司 | Emotional recognition-based intelligent illumination interactive method |
CN103245376B (en) * | 2013-04-10 | 2016-01-20 | 中国科学院上海微系统与信息技术研究所 | A kind of weak signal target detection method |
CN103531206B (en) * | 2013-09-30 | 2017-09-29 | 华南理工大学 | A kind of local speech emotional characteristic extraction method with global information of combination |
CN103531199B (en) * | 2013-10-11 | 2016-03-09 | 福州大学 | Based on the ecological that rapid sparse decomposition and the degree of depth learn |
CN103825678B (en) * | 2014-03-06 | 2017-03-08 | 重庆邮电大学 | A kind of method for precoding amassing 3D MU MIMO based on Khatri Rao |
CN105047194B (en) * | 2015-07-28 | 2018-08-28 | 东南大学 | A kind of self study sound spectrograph feature extracting method for speech emotion recognition |
CN107886942B (en) * | 2017-10-31 | 2021-09-28 | 东南大学 | Voice signal emotion recognition method based on local punishment random spectral regression |
CN109060371A (en) * | 2018-07-04 | 2018-12-21 | 深圳万发创新进出口贸易有限公司 | A kind of auto parts and components abnormal sound detection device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030316A (en) * | 2007-04-17 | 2007-09-05 | 北京中星微电子有限公司 | Safety driving monitoring system and method for vehicle |
CN101404060A (en) * | 2008-11-10 | 2009-04-08 | 北京航空航天大学 | Human face recognition method based on visible light and near-infrared Gabor information amalgamation |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6964023B2 (en) * | 2001-02-05 | 2005-11-08 | International Business Machines Corporation | System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input |
US8886206B2 (en) * | 2009-05-01 | 2014-11-11 | Digimarc Corporation | Methods and systems for content processing |
-
2012
- 2012-03-31 CN CN201210091525.1A patent/CN102592593B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101030316A (en) * | 2007-04-17 | 2007-09-05 | 北京中星微电子有限公司 | Safety driving monitoring system and method for vehicle |
CN101404060A (en) * | 2008-11-10 | 2009-04-08 | 北京航空航天大学 | Human face recognition method based on visible light and near-infrared Gabor information amalgamation |
Non-Patent Citations (3)
Title |
---|
Bimodal Emotion Recognition Based on Speech Signals and Facial Expression;Tu, Binbin; Yu, Fengqin;《6th International Conference on Intelligent Systems and Knowledge Engineering》;20111231;全文 * |
Continuous Emotion Recognition Using Gabor Energy Filters;Dahmane, Mohamed; Meunier, Jean;《4th Bi-Annual International Conference of the Humaine Association on Affective Computing and Intelligent Interaction》;20111231;全文 * |
Feature extraction of speech signals in emotion identification;Morales-Perez,M. et al;《30th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society》;20081231;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN102592593A (en) | 2012-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102592593B (en) | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech | |
CN107331384B (en) | Audio recognition method, device, computer equipment and storage medium | |
CN106057212B (en) | Driving fatigue detection method based on voice personal characteristics and model adaptation | |
CN101136199B (en) | Voice data processing method and equipment | |
CN102800314B (en) | English sentence recognizing and evaluating system with feedback guidance and method | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN112259106A (en) | Voiceprint recognition method and device, storage medium and computer equipment | |
CN105702251B (en) | Reinforce the speech-emotion recognition method of audio bag of words based on Top-k | |
CN101923855A (en) | Test-irrelevant voice print identifying system | |
CN111243569B (en) | Emotional voice automatic generation method and device based on generation type confrontation network | |
CN101930735A (en) | Speech emotion recognition equipment and speech emotion recognition method | |
CN104978507A (en) | Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition | |
CN103456302A (en) | Emotion speaker recognition method based on emotion GMM model weight synthesis | |
He et al. | Stress detection using speech spectrograms and sigma-pi neuron units | |
Awais et al. | Speaker recognition using mel frequency cepstral coefficient and locality sensitive hashing | |
Rahman et al. | Dynamic time warping assisted svm classifier for bangla speech recognition | |
CN101419799A (en) | Speaker identification method based mixed t model | |
Yutai et al. | Speaker recognition based on dynamic MFCC parameters | |
Wisesty et al. | A classification of marked hijaiyah letters’ pronunciation using hidden Markov model | |
CN105006231A (en) | Distributed large population speaker recognition method based on fuzzy clustering decision tree | |
Prakash et al. | Analysis of emotion recognition system through speech signal using KNN & GMM classifier | |
Rida et al. | An efficient supervised dictionary learning method for audio signal recognition | |
Yousfi et al. | Holy Qur'an speech recognition system distinguishing the type of recitation | |
Bouchakour et al. | Noise-robust speech recognition in mobile network based on convolution neural networks | |
Al-Rawahy et al. | Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140101 Termination date: 20170331 |
|
CF01 | Termination of patent right due to non-payment of annual fee |