CN108682431A - A kind of speech-emotion recognition method in PAD three-dimensionals emotional space - Google Patents
A kind of speech-emotion recognition method in PAD three-dimensionals emotional space Download PDFInfo
- Publication number
- CN108682431A CN108682431A CN201810438464.9A CN201810438464A CN108682431A CN 108682431 A CN108682431 A CN 108682431A CN 201810438464 A CN201810438464 A CN 201810438464A CN 108682431 A CN108682431 A CN 108682431A
- Authority
- CN
- China
- Prior art keywords
- emotional
- speech
- pad
- value
- dimensionals
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002996 emotional effect Effects 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000008451 emotion Effects 0.000 claims abstract description 22
- 230000000694 effects Effects 0.000 claims abstract description 9
- 230000004927 fusion Effects 0.000 claims abstract description 8
- 230000008909 emotion recognition Effects 0.000 claims description 12
- 210000002569 neuron Anatomy 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 7
- 230000003321 amplification Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000008901 benefit Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 238000010220 Pearson correlation analysis Methods 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000013016 damping Methods 0.000 claims 1
- 230000004044 response Effects 0.000 claims 1
- 238000002474 experimental method Methods 0.000 abstract description 7
- 239000004615 ingredient Substances 0.000 abstract description 3
- 238000010219 correlation analysis Methods 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000010304 firing Methods 0.000 description 2
- 238000009432 framing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000015654 memory Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses the speech-emotion recognition methods in a kind of PAD three-dimensionals emotional space, the PAD three-dimensionals emotion model based on dimension theory is selected to show mode as recognition result, the PAD values that mel-frequency cepstrum coefficient, time igniting sequence and ignition location information characteristics are individually used for speech emotional are predicted, respectively correlation analysis is carried out from three P (pleasant degree), A (activity), D (dominance) dimensions, the weight coefficient of these three features is calculated, Weighted Fusion obtains final predicted value of the speech emotional in PAD three-dimensional emotional spaces.Experiment shows that this method can more meticulously position the affective state of voice in emotional space, more focuses on the expression and embodiment in emotion in ingredient, more properly reflects the polarity and degree of emotional expression, to show miscellaneous affective content of mixing in emotional speech.
Description
Technical field
The invention belongs to speech emotion recognition fields, are related to a kind of speech-emotion recognition method, and in particular to a kind of PAD tri-
Tie up the speech-emotion recognition method in emotional space.
Background technology
In speech emotion recognition field, common cepstrum feature generally has mel-frequency cepstrum coefficient (Mel-Frequency
Cepstral Coefficient, MFCC) etc..MFCC is calculated using the nonlinear correspondence relation of Mel frequencies and Hz frequencies
The Hz spectrum signatures arrived, Mel dimensions in frequency has booster action to the low frequency details of voice signal, therefore MFCC can protrude voice letter
Number useful information, reduce interference of the ambient noise to voice signal, can effectively identify speech emotional, but due to emotion language
Sound signal it is non-stationary particularly evident, FFT direct to signal cannot reflect that its is non-stationary, thus be used alone MFCC be used for
There are larger false recognition rates for speech emotion recognition.Sound spectrograph because can intuitively present voice signal day part frequency distribution situation
And favored deeply by voice study scholar, by sound spectrograph input pulse coupled neural network (Pulse Coupled Neural
Network, PCNN) extraction time igniting sequence and Entropy sequence be used for speech emotion recognition, and experiment shows usually and high to identifying
Emerging two kinds of emotions are effective.
It is above-mentioned that speech emotional feature obtained by sound spectrograph is handled using MFCC identification speech emotionals and using PCNN into market
The result of sensillary area point is it is found that they emphasize particularly on different fields in terms of identifying affective style, and speech emotion recognition result is by emotional semantic classification
For a few classification, for example, it is glad, sad, sad, angry, however, in fact, emotion is expressed as counting using hyperspace
Value is more convenient for human-computer interaction, because the ability of computer disposal number is stronger.
Invention content
In order to solve the above-mentioned technical problem, the present invention proposes a kind of by MFCC, neurons firing series and neuron point
The method of fiery location information prediction PAD values (P (pleasant degree), A (activity), D (dominance)) result fusion.
The technical solution adopted in the present invention is:A kind of speech-emotion recognition method in PAD three-dimensionals emotional space, it is special
Sign is, includes the following steps:
Step 1:Extract emotional speech data feature, including mel-frequency cepstrum coefficient MFCC, the time igniting sequence and
Ignition location information;
Step 2:Mel-frequency cepstrum coefficient MFCC, time igniting sequence and ignition location information are individually applied
SVR algorithms establish speech emotion recognition model, predict pleasant angle value P, the activation angle value A and advantage angle value D of emotional speech;
Step 3:Predicted value obtained by three kinds of features is calculated in tri- dimensions of P, A, D using Pearson correlation analysis
On related coefficient, determine feature weight;
Step 4:According to the weight of different characteristic, it is final in three-dimensional emotional space that Weighted Fusion obtains emotional speech
PAD values.
The present invention is first by three kinds of MFCC (mel-frequency cepstrum coefficient), time igniting sequence and ignition location information characteristics
The PAD values that speech emotional feature is individually used for speech emotional are predicted, and then (are swashed from P (pleasant degree), A according to prediction result
Activity), three dimensions of D (dominance) carry out correlation analysis, calculate the weight coefficient of these three features, fusion obtains voice feelings
Feel the final predicted value in PAD three-dimensional emotional spaces.
The present invention provides certain ginseng as one new trial of speech emotion recognition, for the research in the field from now on
It examines, final recognition result does not use discrete word label in the studies above (glad, sad, usually etc.) to indicate, but by feelings
Sense is predicted as coordinate values and maps to PAD three-dimensional emotional spaces, by calculating it at a distance from basic emotion PAD values, into one
Step analyzes the constitution element and composition ratio of the emotional speech state, so as to identify " mildly partially sad " or " temperature
With it is higher emerging " etc. mixed feelings type, breach discrete adjective label and describe emotion type limitation, more properly reflect
The polarity and degree of emotional expression are more convenient on the single dimensional problem of processing emotion.The experimental results showed that the present invention proposes
Method on the basis of preferably distinguishing effect, what is more focused on is the table in emotion in ingredient having to basic speech emotional type
It reaches and embodies, the calculating time is short, if faster will be suitble to the application scenarios handled in real time with parallel machine or hardware realization.
Description of the drawings
Fig. 1 is the flow chart of the embodiment of the present invention;
Fig. 2 is the PAD three-dimensional emotional space distribution schematic diagrams of the embodiment of the present invention.
Specific implementation mode
Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair
It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not
For limiting the present invention.
The CASIA Chinese Emotional Corpus that the present embodiment is selected is the discrete voice developed by Institute of Automation, Chinese Academy of sociences
Affection data library, but since in a completed subject study, Chinese Academy of Sciences's psychology has engaged 346 university students to use
Its sinicization PAD emotion scale of simplification version revised has carried out 14 kinds of specific emotional scopes the evaluation of PAD values, has obtained this
Value of 14 kinds of emotions on P (pleasant degree), A (activity) and D (dominance), wherein containing 6 kinds of feelings in CASIA corpus
Type is felt, so speech emotional data known to P, A, D value can be as the dimension emotion language needed for experiment in CASIA corpus
Sound data, it was demonstrated that three kinds of speech emotional features respectively classification function and effect during speech emotion recognition and fusion after have
Effect property.
The running environment of the present embodiment is Matlab (R2014b), system environments Win10, allocation of computer Intel
Core i3-3217U CPU (1.8GHz), 8GB memories.
Referring to Fig.1, the speech-emotion recognition method in a kind of PAD three-dimensionals emotional space provided by the invention, including it is following
Step:
Step 1:Extract emotional speech data feature, including mel-frequency cepstrum coefficient MFCC, the time igniting sequence and
Ignition location information;
Step 1.1:Extract mel-frequency cepstrum coefficient MFCC.
Extraction MFCC characteristic parameters first pre-process voice signal, and pretreatment includes preemphasis and adding window framing,
Frame length takes 256, and frame pipettes 128, and window function is Hamming window.Pre-add Beijing South Maxpower Technology Co. Ltd compensates voice signal and is made after lip and nostril radiation
At the decaying of high frequency section, realize that original signal framing obtains the voice sequence s (n) of every frame voice signal, then to s (n) into
Row Fast Fourier Transform (FFT) (FFT) obtains the frequency spectrum S (n) of each frame, and the energy spectrum of voice signal is obtained to its modulus square | S (n)
|2.Will | S (n) |2Pass through Meier filter group Hm(k), output parameter Pm(m=0,1,2 ..., M-1), computational methods are as follows:
In formula, M is the number of filter, and the present embodiment takes 26;fmIndicate the centre frequency of triangular filter;
Finally to parameter PmIt takes logarithm to do discrete cosine transform (DCT), is transformed into cepstrum domain and obtains mel-frequency cepstrum system
Number Cmel(k)。
Lm=ln (Pm), m=0,1,2 ..., M-1,
In formula, N represents the exponent number of mel cepstrum coefficients, and the present embodiment extracts the first-order difference MFCC characteristic parameters of 12 ranks.
Step 1.2:Extraction time igniting sequence.
Sound spectrograph is obtained first, to being divided into several overlapped speech signal frame windowing processes, the present embodiment selection
Hamming window.Then the short-term spectrum for the signal being obtained by FFT is estimated, makees abscissa with time n, frequency w makees ordinate, any
Given frequency ingredient is indicated in the degree of strength of given time using the gray scale of respective point, and sound spectrograph is constituted.The present embodiment is selected
Simplified PCNN neuron models, parameter setting and value such as the following table 1:
The parameter setting and value for the PCNN neuron models that table 1 simplifies
Wherein αF, αL, αθFeed back input item F is indicated respectivelyij, connection input item Lij, dynamic threshold thresholding θijDecaying when
Between constant.VF, VL, VθIndicate that the feedback amplification coefficient, connection amplification coefficient and threshold value amplification coefficient of PCNN, β are that connection is strong respectively
Coefficient is spent, above-mentioned parameter is empirically worth setting.Link input L, internal activity item U, pulse output Y initial value be set as 0, it is defeated
Enter for normalized gray value, belongs between [0,1].Link field radius r=1.5, internal connection matrix W are one 3 × 3 sides
Battle array, inverse (r of the pixel to the Euclidean distance square of each pixel of surrounding wherein centered on each element numerical value-2)。
Sound spectrograph is inputted the PCNN iteration 50 times that neuron number is equal to sound spectrograph pixel number, is changed each time
Total ignition times in generation are equal to the sum of the neuron number of release pulse, and time igniting is carried out by the image segmentation ability of PCNN
Sequence signature extracts.
Step 1.3:Extract neuron firing location information.
The igniting neuron location map obtained every time is projected on time shaft and frequency axis respectively, then after projection
Two vectors be merged into a vector.Finally, vector obtained by the ignition location distribution map by each moment is lined up according to the time
Multiple row, one matrix of composition is speech emotion recognition eigenmatrix.
Step 2:Mel-frequency cepstrum coefficient MFCC, time igniting sequence and ignition location information are individually applied
SVR algorithms (Support vector regression) establish speech emotion recognition model, predict pleasant angle value P, the activation angle value A of emotional speech
With advantage angle value D;
Step 3:Predicted value obtained by three kinds of features is calculated in tri- dimensions of P, A, D using Pearson correlation analysis
On related coefficient, determine feature weight;
Related coefficient calculation formula is:
In formula, X is respectively represented in each calculate and is represented emotion using P values, A values and D values, Y obtained by the prediction of each feature
P value, A value and D value of the voice in emotion scale, μX, μYVariable X, the average value of Y, σ are indicated respectivelyX, σYVariable is indicated respectively
The standard deviation of X, Y.ρX,YAs feature weight;
Step 4:According to the weight of different characteristic, it is final in three-dimensional emotional space that Weighted Fusion obtains emotional speech
PAD values.
Final PAD value of the emotional speech in three-dimensional emotional space be:
P=P1λ1+P2λ2+P3λ3;
In formula, P1, P2, P3It represents successively and uses mel-frequency cepstrum coefficient, duration of ignition sequence, ignition location information pair
Predicted value of the voice in P (pleasant degree) dimension.λ1, λ2, λ3Indicate three of the above speech emotional feature to the speech emotional successively
Type meets λ in the related coefficient normalized value of P (pleasant degree) dimension1+λ2+λ3=1.
The present embodiment is using root-mean-square error (root-mean-square error, RMSE) as to basic emotion type
The evaluation index of identification, computational methods such as formula are:
In formula, Xobs,iIndicate experiment predicted value, Xmodel,iIndicate PAD scale reference values.
The RMSE and comparison for calculating separately P, A, D predicted value obtained by the present invention and MFCC prediction P, A, D values being used alone, meter
The results are shown in Table 2 for calculation, wherein the RMSE value of each dimension standardizes between 0 to 1, the corresponding method of the smaller explanations of RMSE
Performance is better.
2 context of methods of table and the RMSE value of MFCC prediction PAD values compare
By experiment, find to be distributed in test sample PAD reference values respective coordinates point in each identical affective style sample set
Near, variant affective style sample, which is then distributed, more to be disperseed, and calculates the comparing result proof of gained RMSE originally according to table 2
Invention can effectively carry out speech emotional type classification.Test PAD reference values such as 3 institute of table of the 6 kinds of emotional speech signals used
Show, table 4 is that three kinds of features of wherein one speech samples in this method identification process correspond to the related coefficient normalization of each dimension
As a result, seeking the final predicted values of its PAD for being further weighted fusion.
The PAD scale reference values of 36 kinds of affective domains of table
The related coefficient result of 4 three kinds of speech emotional features of table
Fig. 2 is then the distribution map that experiment sample is mapped to PAD three-dimensional emotional spaces after this method identifies, table 5 is this point
The corresponding numberical range statistics of Butut.
The final distribution of forecasting value ranges of PAD of 56 kinds of emotions of table
Since recognition result of the present invention does not use conventional discrete affective tag word to indicate that advantage, which is presented as, passes through calculating
Its distance value with basic emotion PAD values in emotion coordinate system, further analyze the constitution element of the emotional speech state with
And composition ratio, so as to identify the mixed feelings types such as " mildly partially sad " or " mild higher emerging ".
Recognition result is illustrated in the PAD three-dimensional emotional spaces based on continuous dimension theory by the present invention, by intuitive
Mapping graph can clearly be presented difference between various affective states and contact, and the psychology for describing a variety of basic emotion type hybrids is living
It is dynamic, embody the delicate changeable affective state of the mankind.Experiment shows that the present invention can show emotion by accurate emotion coordinate value
It mixes in voice miscellaneous affective content, more meticulously completes speech emotion recognition task.
It should be understood that the part that this specification does not elaborate belongs to the prior art.
It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this
The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention
Profit requires under protected ambit, can also make replacement or deformation, each fall within protection scope of the present invention, this hair
It is bright range is claimed to be determined by the appended claims.
Claims (5)
1. the speech-emotion recognition method in a kind of PAD three-dimensionals emotional space, which is characterized in that include the following steps:
Step 1:Extract the feature of emotional speech data, including mel-frequency cepstrum coefficient MFCC, time igniting sequence and igniting
Location information;
Step 2:Individually SVR is applied to calculate mel-frequency cepstrum coefficient MFCC, time igniting sequence and ignition location information
Method establishes speech emotion recognition model, predicts pleasant angle value P, the activation angle value A and advantage angle value D of emotional speech;
Step 3:Predicted value obtained by three kinds of features is calculated in tri- dimensions of P, A, D using Pearson correlation analysis
Related coefficient determines feature weight;
Step 4:According to the weight of different characteristic, Weighted Fusion obtains final PAD value of the emotional speech in three-dimensional emotional space.
2. the speech-emotion recognition method in PAD three-dimensionals emotional space according to claim 1, it is characterised in that:Step 1
In, in mel-frequency cepstrum coefficient MFCC extraction process, when pretreatment, frame length takes 256, and frame pipettes 128, and window function is Hamming
Window;In calculating process, pretreated voice data is subjected to Fast Fourier Transform (FFT) and modulus square obtains its energy spectrum, it will
Energy spectrum passes through Meier filter group, output parameter Pm(m=0,1,2 ..., M-1), calculation is:
Wherein, m=0,1,2 ..., M-1, number of filter M take 26, fmIndicate the centre frequency of triangular filter, Hm(k) it indicates
The frequency response of triangular filter, S (n) indicate the Fast Fourier Transform (FFT) acquired results of voice signal;12 ranks of final extraction
First-order difference MFCC characteristic parameters.
3. the speech-emotion recognition method in PAD three-dimensionals emotional space according to claim 1, it is characterised in that:Step 1
In, extraction time igniting sequence and ignition location information are using simplified PCNN neuron models, parameter alphaF、αL、αθ、VF、VL、
Vθ, β values be respectively 0.1,1.0,1.0,0.5,0.2,20,0.1;αF, αL, αθFeed back input item F is indicated respectivelyij, connection input
Item Lij, dynamic threshold thresholding θijDamping time constant, VF, VL, VθFeedback amplification coefficient, the connection amplification of PCNN are indicated respectively
Coefficient and threshold value amplification coefficient, β are bonding strength coefficients;Link input L, internal activity item U, pulse output Y initial value set
It is 0, inputs as normalized gray value, belong between [0,1];Link field radius r=1.5, internal connection matrix W are one 3
× 3 square formation, inverse of the pixel to the Euclidean distance square of each pixel of surrounding wherein centered on each element numerical value
r-2。
4. the speech-emotion recognition method in PAD three-dimensionals emotional space according to claim 1, it is characterised in that:Step 3
In, related coefficient of the gained predicted value in tri- dimensions of P, A, D is three kinds of features respectively:
In formula, X is respectively represented in each calculate and is represented emotional speech using P values, A values and D values, Y obtained by the prediction of each feature
P values, A values in emotion scale and D values, μX, μYVariable X, the average value of Y, σ are indicated respectivelyX, σYVariable X is indicated respectively, Y's
Standard deviation;ρX,YValue is exactly the feature weight being calculated.
5. the speech-emotion recognition method in PAD three-dimensionals emotional space according to any one of claims 1-4, feature
It is:In step 4, final PAD value of the emotional speech in three-dimensional emotional space is:
P=P1λ1+P2λ2+P3λ3;
In formula, P1、P2、P3It respectively represents using mel-frequency cepstrum coefficient, duration of ignition sequence, ignition location information to the language
Predicted value of the sound in P dimensions;λ1、λ2、λ3Indicate three of the above speech emotional feature to the speech emotional type in P dimensions successively
Related coefficient normalized value, meet λ1+λ2+λ3=1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810438464.9A CN108682431B (en) | 2018-05-09 | 2018-05-09 | Voice emotion recognition method in PAD three-dimensional emotion space |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810438464.9A CN108682431B (en) | 2018-05-09 | 2018-05-09 | Voice emotion recognition method in PAD three-dimensional emotion space |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108682431A true CN108682431A (en) | 2018-10-19 |
CN108682431B CN108682431B (en) | 2021-08-03 |
Family
ID=63805990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810438464.9A Expired - Fee Related CN108682431B (en) | 2018-05-09 | 2018-05-09 | Voice emotion recognition method in PAD three-dimensional emotion space |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108682431B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409296A (en) * | 2018-10-30 | 2019-03-01 | 河北工业大学 | The video feeling recognition methods that facial expression recognition and speech emotion recognition are merged |
CN110555084A (en) * | 2019-08-26 | 2019-12-10 | 电子科技大学 | remote supervision relation classification method based on PCNN and multi-layer attention |
CN111402928A (en) * | 2020-03-04 | 2020-07-10 | 华南理工大学 | Attention-based speech emotion state evaluation method, device, medium and equipment |
CN113749656A (en) * | 2021-08-20 | 2021-12-07 | 杭州回车电子科技有限公司 | Emotion identification method and device based on multi-dimensional physiological signals |
CN117933269A (en) * | 2024-03-22 | 2024-04-26 | 合肥工业大学 | Multi-mode depth model construction method and system based on emotion distribution |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012168740A1 (en) * | 2011-06-10 | 2012-12-13 | X-System Limited | Method and system for analysing sound |
CN107633851A (en) * | 2017-07-31 | 2018-01-26 | 中国科学院自动化研究所 | Discrete voice emotion identification method, apparatus and system based on the prediction of emotion dimension |
-
2018
- 2018-05-09 CN CN201810438464.9A patent/CN108682431B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012168740A1 (en) * | 2011-06-10 | 2012-12-13 | X-System Limited | Method and system for analysing sound |
CN107633851A (en) * | 2017-07-31 | 2018-01-26 | 中国科学院自动化研究所 | Discrete voice emotion identification method, apparatus and system based on the prediction of emotion dimension |
Non-Patent Citations (3)
Title |
---|
周慧: "基于PAD三维情绪模型的情感语音转换与识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
宋静: "PAD情绪模型在情感语音识别中的应用研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
梁泽,马义德,张恩溯,朱望飞,汤书森: "一种基于脉冲耦合神经网络的语音情感识别新方法", 《计算机应用》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109409296A (en) * | 2018-10-30 | 2019-03-01 | 河北工业大学 | The video feeling recognition methods that facial expression recognition and speech emotion recognition are merged |
CN109409296B (en) * | 2018-10-30 | 2020-12-01 | 河北工业大学 | Video emotion recognition method integrating facial expression recognition and voice emotion recognition |
CN110555084A (en) * | 2019-08-26 | 2019-12-10 | 电子科技大学 | remote supervision relation classification method based on PCNN and multi-layer attention |
CN110555084B (en) * | 2019-08-26 | 2023-01-24 | 电子科技大学 | Remote supervision relation classification method based on PCNN and multi-layer attention |
CN111402928A (en) * | 2020-03-04 | 2020-07-10 | 华南理工大学 | Attention-based speech emotion state evaluation method, device, medium and equipment |
CN113749656A (en) * | 2021-08-20 | 2021-12-07 | 杭州回车电子科技有限公司 | Emotion identification method and device based on multi-dimensional physiological signals |
CN113749656B (en) * | 2021-08-20 | 2023-12-26 | 杭州回车电子科技有限公司 | Emotion recognition method and device based on multidimensional physiological signals |
CN117933269A (en) * | 2024-03-22 | 2024-04-26 | 合肥工业大学 | Multi-mode depth model construction method and system based on emotion distribution |
Also Published As
Publication number | Publication date |
---|---|
CN108682431B (en) | 2021-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108682431A (en) | A kind of speech-emotion recognition method in PAD three-dimensionals emotional space | |
Badshah et al. | Deep features-based speech emotion recognition for smart affective services | |
Demircan et al. | Application of fuzzy C-means clustering algorithm to spectral features for emotion classification from speech | |
Hao et al. | Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features | |
Kaya et al. | Robust acoustic emotion recognition based on cascaded normalization and extreme learning machines | |
Zhou et al. | Deception detecting from speech signal using relevance vector machine and non-linear dynamics features | |
Xiao et al. | Multi-sensor data fusion for sign language recognition based on dynamic Bayesian network and convolutional neural network | |
Ye et al. | GM-TCNet: Gated multi-scale temporal convolutional network using emotion causality for speech emotion recognition | |
Wu et al. | Environmental sound classification via time–frequency attention and framewise self-attention-based deep neural networks | |
Huang et al. | Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering | |
Dang et al. | Dynamic multi-rater gaussian mixture regression incorporating temporal dependencies of emotion uncertainty using kalman filters | |
Sharma et al. | Framework for gender recognition using voice | |
Karthikeyan | Adaptive boosted random forest-support vector machine based classification scheme for speaker identification | |
Zhao et al. | A survey on automatic emotion recognition using audio big data and deep learning architectures | |
Xue et al. | Physiological-physical feature fusion for automatic voice spoofing detection | |
Gorrostieta et al. | Attention-based Sequence Classification for Affect Detection. | |
Taran | A nonlinear feature extraction approach for speech emotion recognition using VMD and TKEO | |
US20130262097A1 (en) | Systems and methods for automated speech and speaker characterization | |
Fonnegra et al. | Speech emotion recognition based on a recurrent neural network classification model | |
Nelus et al. | Privacy-preserving audio classification using variational information feature extraction | |
Liu et al. | Hierarchical component-attention based speaker turn embedding for emotion recognition | |
Mallikarjunan et al. | Text-independent speaker recognition in clean and noisy backgrounds using modified VQ-LBG algorithm | |
Sukvichai et al. | Automatic speech recognition for Thai sentence based on MFCC and CNNs | |
Lei et al. | Robust scream sound detection via sound event partitioning | |
Trivikram et al. | EVALUATION OF HYBRID FACE AND VOICE RECOGNITION SYSTEMS FOR BIOMETRIC IDENTIFICATION IN AREAS REQUIRING HIGH SECURITY. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210803 |