CN110047516A - A kind of speech-emotion recognition method based on gender perception - Google Patents
A kind of speech-emotion recognition method based on gender perception Download PDFInfo
- Publication number
- CN110047516A CN110047516A CN201910186313.3A CN201910186313A CN110047516A CN 110047516 A CN110047516 A CN 110047516A CN 201910186313 A CN201910186313 A CN 201910186313A CN 110047516 A CN110047516 A CN 110047516A
- Authority
- CN
- China
- Prior art keywords
- gender
- feature
- perception
- layer
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Abstract
The present invention discloses a kind of speech-emotion recognition method based on gender perception, utilizes the gender Perception Features of gender information: distributed sex character and gender guidance feature.And gender Perception Features and sound spectrograph are fused into assemblage characteristic, learn high-level depth characteristic from assemblage characteristic with CNN-BLSTM network and do emotional semantic classification.Key step has: voice segment, feature preparation, Fusion Features, feature extraction and classification.Gender Perception Features of the invention are compared with existing feature, can effectively utilize the information of gender.The speech-emotion recognition method of gender perception can effectively improve the accuracy of speech emotion recognition.
Description
Technical field
The present invention is speech emotion recognition field, and the utilization and speech emotion recognition feature for being specifically related to sex character are melted
Conjunction method.
Background technique
Currently, human-computer interaction is popular in various ways, especially conversational system and intelligent sound assistant.Emotion carries
Important semantic information, it is believed that speech emotion recognition can effectively help machine to understand user's intention.It accurately distinguishes and uses
Family mood can provide good interactivity, and improve user experience.But we still have with machine nature communication aspects
Many difficult, we are not really achieved human-computer interaction.Speech emotion recognition task is still a very big challenge.
Many researchs discovery gender differences sway the emotion expression, this shows that gender information can help speech emotion recognition.
The study found that being better than the classification accuracy that gender information incorporates emotion recognition to establish the independent speech emotion recognition of gender respectively
System.Gender information has been widely used for speech emotion recognition task.But only not with the simple coding mode such as one-hot coding
Gender information can be effectively utilized.Therefore addition gender information, the accuracy rate of emotion recognition only can slightly improve, ineffective.
In order to solve problem above, we have proposed distributed sex characters and two kinds of gender guidance feature new gender senses
Know feature.Distributed sex character describes the distribution and individual difference of men and women;Gender guidance feature passes through DNN network from acoustics
It is extracted in signal.Both gender Perception Features are merged with sound spectrograph respectively, and last point is then done by CNN-BLSTM model
Class.
1) gender information is not efficiently used for traditional voice emotion recognition task, proposes new gender Perception Features,
Effectively utilize gender information.
2) new gender Perception Features not only indicate men and women's information, but also reflect the sound of individual difference and part speaker
Characteristic is learned, the effective rate of utilization of sex character is improved.
3) by gender Perception Features and sound spectrograph effective integration, and emotional semantic classification is carried out with CNN-BLSTM model, it can be effective
The accuracy of ground promotion emotion recognition.
Summary of the invention
The present invention is to propose the gender Perception Features that can effectively utilize gender information the technical issues of solution: distribution
Formula sex character and gender guidance feature.And gender Perception Features and sound spectrograph are fused into assemblage characteristic, with CNN-BLSTM net
Network learns high-level depth characteristic from assemblage characteristic and does emotional semantic classification.Specific technical solution is as follows:
Voice segment: language grade voice signal is divided into the voice segments of regular length by step 1.
Step 2, feature prepare
1) extract sound spectrograph: to segmentation voice carry out short time discrete Fourier transform, obtain primary light spectrogram S, size be a ×
b;
2) extract gender Perception Features: distributed sex character and gender drive feature;
2-1) distributed sex character extracts: the random value of fixed dimension is respectively set for male and female first as male
Female's template.In order to reflect individual difference, stochastic variable is added in fixed sex template.Finally, point of male
Cloth sex character DGFMChange in the range of m-k, and the distributed sex character DGF of womenFChange in the range of k-z;
2-2) gender drives feature extraction: the x for extracting segmentation voice first ties up acoustic feature.It is distinguished to have feature
The function of gender uses deep neural network DNN to extract y from acoustic feature and ties up bottleneck characteristic as gender driving feature GDF.
Step 3, Fusion Features
By step 2 1) extract primary light spectrogram S and 2-1) in DGF be fused together into assemblage characteristic F1.I-th
The assemblage characteristic vector F of j-th of segment in language1It can indicate are as follows:
F1ij=[Sij,DGFij] (1)
By step 2 1) extract primary light spectrogram S and 2-2) in GDF be fused together into assemblage characteristic F2.I-th
The assemblage characteristic vector F of j-th of segment in language2It can indicate are as follows:
F2ij=[Sij,GDFij] (2)
Step 4, feature extraction.Level characteristics are extracted from assemblage characteristic respectively using CNN.
Step 5, classification.By the chronological feature at sentence level of the level characteristics obtained in step 4, it is sent to
Learn context Time Dependent in BLSTM network, completes the emotional semantic classification of language grade.7 kinds of emotions include it is neutral, sad, frightened,
Glad, angry, boring, detest.
Characteristic use DNN network is driven in the step 2 in order to obtain gender.Gender driving feature specifically constructs step
It is rapid as follows:
1) input of DNN is x dimension acoustic feature
2) three hidden layers h1, h2, h3 are set, and wherein h2 hiding unit is less than h1, h3.H2 is also referred to as bottleneck layer.
3) use true gender label as the teacher signal of training DNN.DNN can pass through cost function backpropagation
Derivative train.Cost function is measuring the cross entropy between target output and reality output in each trained example.
DNN is trained, the output of hidden layer h2 is bottleneck characteristic, i.e. gender drives feature.
The present invention is based on the speech-emotion recognition methods of gender perception to be based on CNN-BLSTM model.Its configuration is as follows:
There are two convolutional layers and two maximum pond layers by CNN.First convolutional layer has n1 convolution kernel, and convolution size is k1×
k1.The pond size of first pond layer is p1×p1.Second convolutional layer has n2A convolution kernel, convolution size are k2×k2.The
The size of two pond layers is p2×p2.Flattening layer is in order to two-dimensional characteristic spectrum 1 dimensional vector of flat chemical conversion.In flattening layer
Later, Feature Mapping to s is tieed up using the layer that is fully connected with s hidden unit.There are two hidden layers in BLSTM, each
Layer has u hidden unit.
Beneficial effect
Gender Perception Features of the invention are compared with existing feature, can effectively utilize the information of gender.Gender perception
Speech-emotion recognition method can effectively improve the accuracy of speech emotion recognition.
Detailed description of the invention
Fig. 1 is that the present invention is based on the speech emotion recognition model framework figures of gender driving feature;
Fig. 2 is the DNN model structure for extracting gender driving feature.
Specific embodiment
The present invention is described in detail with reference to the accompanying drawing.
In order to verify the present invention, we verify on Emo-DB database.Emo-DB includes 535 sentences, is divided into
Sad, glad, frightened, neutral, angry, boring, 7 kinds of emotions of detest.
Fig. 1 is that the present invention is based on the speech emotion recognition model framework figures of gender driving feature.As shown in Figure 1, main packet
Containing following five steps.
Step 1, voice segment.Language grade voice signal is divided into the voice segments of regular length.Every section of segment length 265ms.Often
Section includes 25 frames, frame length 25ms, frame shifting 10ms.In Emo-DB affection data library, a language about more than 50,000 is collected according to the method described above
Tablet section is tested.Wherein duration longest sentence is divided into 349 voice segments.
Step 2, feature prepare.
1) it extracts sound spectrograph: short time discrete Fourier transform being carried out to segmentation voice, obtains primary light spectrogram S, size 25
×129;
2) extract gender Perception Features: distributed sex character and gender drive feature.
2-1) distributed sex character extracts: the random value of 32 dimensions is respectively set for male and female first as men and women's mould
Plate.In order to reflect individual difference, stochastic variable is added in fixed sex template.Finally, the distribution of male
Sex character DGFMChange in the range of 0-0.5, and the distributed sex character DGF of womenFChange in the range of 0.5-1.
2-2) gender drives feature extraction: extracting 384 dimension acoustic features of segmentation voice with openSMILE tool first.
384 dimension acoustic features are provided by INTERSPEECH 2009Emotion Challenge, by the 32 rudimentary descriptors (LLDs) of dimension and
Its statistical value composition.32 dimension LLDs include zero-crossing rate, root mean square energy, fundamental frequency, the harmonic to noise ratio of auto-correlation function and Mel frequency
Rate cepstrum coefficient etc..
In order to make feature that there is the other function of distinction, 32 dimension bottles are extracted from acoustic feature with deep neural network DNN
Neck feature drives feature as gender.Fig. 2 is illustrated for extracting bottleneck characteristic DNN model structure.The input of DNN is 384 dimensions
Acoustic feature.The hidden unit of three hidden layers h1, h2, h3 are respectively 1024,32,1024.The output layer of DNN uses true
Teacher signal of the gender label as training DNN.DNN is trained, the output of hidden layer h2 is that 32 dimension genders drive feature.
Step 3, Fusion Features.
By the DGF in step 2 2-2), temporally dimension repeats 25 times into section grade DGF, and size is 25 × 32.By step
Two 2-1) extract original spectrum Figure 25 × 129 and section grade DGF be fused into section grade assemblage characteristic F1, size is 25 × 161.
Similarly, by the GDF in step 2 2-3), temporally dimension repeats 25 times into section grade GDF, and size is 25 × 32.It will
Step 2 2-1) extract original spectrum Figure 25 × 129 and section grade GDF be fused into section grade assemblage characteristic F2, size be 25 ×
161。
Step 4, feature extraction.Using CNN respectively from assemblage characteristic F1、F2Middle extraction level characteristics.There are two convolution by CNN
Layer and two maximum pond layers.First convolutional layer has 32 convolution kernels, and convolution size is 5 × 5, activation primitive relu.First
The pond size of a pond layer is 2 × 2.Second convolutional layer has 64 convolution kernels, and convolution size is 5 × 5, and activation primitive is
relu.The size of second pond layer is 2 × 2.After flattening layer, being fully connected with 1024 hidden units is used
Layer ties up the Feature Mapping of study to 1024.The hidden unit of output layer is 7, activation primitive softmax.It is obtained after full articulamentum
Take the level characteristics in subsequent classification.
Step 5, classification.By the chronological feature at sentence level of the level characteristics obtained in step 4, it is sent to
Learn context Time Dependent in BLSTM network, complete the emotional semantic classification of language grade, distinguishes neutral, sad, frightened, glad, anger
Anger, boring, seven kinds of emotions of detest.There are two hidden layer in BLSTM, each layer has 1024 hidden units.
Table 1 is the result that gender Perception Features are merged with sound spectrograph on EmoDB database
ID | Feature | Size | Weight precision | Unweighted precision |
1 | Sound spectrograph | 25×129 | 86.73% | 86.40% |
2 | Sound spectrograph+only hot sex character | 25×131 | 86.92% | 86.24% |
3 | Sound spectrograph+distribution sex character | 25×161 | 88.97% | 88.31% |
4 | Sound spectrograph+gender drives feature | 25×161 | 92.71% | 92.62% |
Table 1 shows the speech emotion recognition model based on gender perception, carries out speech emotional classification using different characteristic
Weighting precision and unweighted precision.By observing table 1, we concluded that 1) will solely hot sex character (male 01,
Women 10) merged with sound spectrograph as feature carry out speech emotional classification do not obtain good classification results.This is because solely
The dimension of hot sex character is 2, can be ignored substantially, CNN is without calligraphy learning to the information in newly-increased only hot sex character.2) exist
Distributed sex character is added in the language emotion recognition system of gender perception and gender driving aspect ratio only uses sound spectrograph and exists
Unweighted precision aspect relative error reduces by 14.04% and 45.74% respectively.3) aspect ratio distribution gender is driven using gender
Feature has better classification accuracy.The reason is that the feature of gender driving not only indicates sex character, but also it can reflect and speak
The true individual difference of person and acoustic information, and distributed sex character can only reflect the gender information of speaker.As a result it proves
Speech-emotion recognition method based on gender perception can improve the accuracy of speech emotional classification, and the present invention is effective.
Claims (4)
1. a kind of speech-emotion recognition method based on gender perception, which is characterized in that firstly, utilizing the gender sense of gender information
Know feature: distributed sex character and gender guidance feature;Then, gender Perception Features are fused into combination with sound spectrograph respectively
Feature learns high-level depth characteristic with CNN-BLSTM network from assemblage characteristic and does emotional semantic classification.
2. a kind of speech-emotion recognition method based on gender perception according to claim 1, which is characterized in that specific
Steps are as follows:
Voice segment: language grade voice signal is divided into the voice segments of regular length by step 1;
Step 2, feature prepare
1) it extracts sound spectrograph: short time discrete Fourier transform is carried out to segmentation voice, obtain primary light spectrogram S, size is a × b;
2) extract gender Perception Features: distributed sex character and gender drive feature.
2-1) distributed sex character extracts: the random value of fixed dimension is respectively set for male and female first as men and women's mould
Plate;Stochastic variable is added in fixed sex template;Finally, the distributed sex character DGF of maleMIn m-k
In the range of change, and the distributed sex character DGF of womenFChange in the range of k-z;
2-2) gender drives feature extraction: the x for extracting segmentation voice first ties up acoustic feature;With deep neural network DNN from sound
It learns and extracts y dimension bottleneck characteristic in feature as gender driving feature GDF;
Step 3, Fusion Features
By step 2 1) extract primary light spectrogram S and 2-1) in DGF be fused together into assemblage characteristic F1, in i-th of language
The assemblage characteristic vector F of j-th of segment1It can indicate are as follows:
F1ij=[Sij,DGFij] (1)
By step 2 1) extract primary light spectrogram S and 2-2) in GDF be fused together into assemblage characteristic F2, in i-th of language
The assemblage characteristic vector F of j-th of segment2It can indicate are as follows:
F2ij=[Sij,GDFij] (2)
Step 4, feature extraction
Level characteristics are extracted from assemblage characteristic respectively using CNN;
Step 5, classification
By the chronological feature at sentence level of the level characteristics obtained in step 4, it is sent in BLSTM network in study
Hereafter Time Dependent completes the emotional semantic classification of language grade.
3. a kind of speech-emotion recognition method based on gender perception according to claim 1, which is characterized in that the step
The specific construction step of gender driving feature is as follows in rapid two:
1) input of DNN is x dimension acoustic feature;
2) three hidden layers h1, h2, h3 are set, and wherein h2 hiding unit is less than h1, h3, and h2 is also referred to as bottleneck layer;
3) use true gender label as the teacher signal of training DNN, DNN can spreading out by cost function backpropagation
Biology is trained, and cost function is measuring the cross entropy between target output and reality output in each trained example;
DNN is trained, the output of hidden layer h2 is bottleneck characteristic, i.e. gender drives feature.
4. according to a kind of speech-emotion recognition method based on gender perception described in claim 1, which is characterized in that the step
Configuration based on CNN-BLSTM model in four and five is as follows:
There are two convolutional layers and two maximum pond layers by CNN:
First convolutional layer has n1 convolution kernel, and convolution size is k1×k1;
The pond size of first pond layer is p1×p1, second convolutional layer have n2A convolution kernel, convolution size are k2×k2;
The size of second pond layer is p2×p2,
After flattening layer, Feature Mapping to s is tieed up using the layer that is fully connected with s hidden unit;
There are two hidden layer in BLSTM, each layer has u hidden unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910186313.3A CN110047516A (en) | 2019-03-12 | 2019-03-12 | A kind of speech-emotion recognition method based on gender perception |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910186313.3A CN110047516A (en) | 2019-03-12 | 2019-03-12 | A kind of speech-emotion recognition method based on gender perception |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110047516A true CN110047516A (en) | 2019-07-23 |
Family
ID=67274783
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910186313.3A Pending CN110047516A (en) | 2019-03-12 | 2019-03-12 | A kind of speech-emotion recognition method based on gender perception |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110047516A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555379A (en) * | 2019-07-30 | 2019-12-10 | 华南理工大学 | human face pleasure degree estimation method capable of dynamically adjusting features according to gender |
CN110619889A (en) * | 2019-09-19 | 2019-12-27 | Oppo广东移动通信有限公司 | Sign data identification method and device, electronic equipment and storage medium |
CN110675893A (en) * | 2019-09-19 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110728997A (en) * | 2019-11-29 | 2020-01-24 | 中国科学院深圳先进技术研究院 | Multi-modal depression detection method and system based on context awareness |
CN111402927A (en) * | 2019-08-23 | 2020-07-10 | 南京邮电大学 | Speech emotion recognition method based on segmented spectrogram and dual-Attention |
CN111899766A (en) * | 2020-08-24 | 2020-11-06 | 南京邮电大学 | Speech emotion recognition method based on optimization fusion of depth features and acoustic features |
CN112712824A (en) * | 2021-03-26 | 2021-04-27 | 之江实验室 | Crowd information fused speech emotion recognition method and system |
CN112927723A (en) * | 2021-04-20 | 2021-06-08 | 东南大学 | High-performance anti-noise speech emotion recognition method based on deep neural network |
CN113593526A (en) * | 2021-07-27 | 2021-11-02 | 哈尔滨理工大学 | Speech emotion recognition method based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013120467A (en) * | 2011-12-07 | 2013-06-17 | National Institute Of Advanced Industrial & Technology | Device and method for extracting signal features |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
CN109272993A (en) * | 2018-08-21 | 2019-01-25 | 中国平安人寿保险股份有限公司 | Recognition methods, device, computer equipment and the storage medium of voice class |
CN109389992A (en) * | 2018-10-18 | 2019-02-26 | 天津大学 | A kind of speech-emotion recognition method based on amplitude and phase information |
-
2019
- 2019-03-12 CN CN201910186313.3A patent/CN110047516A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013120467A (en) * | 2011-12-07 | 2013-06-17 | National Institute Of Advanced Industrial & Technology | Device and method for extracting signal features |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
CN109272993A (en) * | 2018-08-21 | 2019-01-25 | 中国平安人寿保险股份有限公司 | Recognition methods, device, computer equipment and the storage medium of voice class |
CN109389992A (en) * | 2018-10-18 | 2019-02-26 | 天津大学 | A kind of speech-emotion recognition method based on amplitude and phase information |
Non-Patent Citations (1)
Title |
---|
LINJUAN ZHANG ET AL.: "《Gender-Aware CNN-BLSTM for Speech Emotion Recognition》", 《ICANN 2018: ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555379A (en) * | 2019-07-30 | 2019-12-10 | 华南理工大学 | human face pleasure degree estimation method capable of dynamically adjusting features according to gender |
CN110555379B (en) * | 2019-07-30 | 2022-03-25 | 华南理工大学 | Human face pleasure degree estimation method capable of dynamically adjusting features according to gender |
CN111402927A (en) * | 2019-08-23 | 2020-07-10 | 南京邮电大学 | Speech emotion recognition method based on segmented spectrogram and dual-Attention |
CN110619889B (en) * | 2019-09-19 | 2022-03-15 | Oppo广东移动通信有限公司 | Sign data identification method and device, electronic equipment and storage medium |
CN110619889A (en) * | 2019-09-19 | 2019-12-27 | Oppo广东移动通信有限公司 | Sign data identification method and device, electronic equipment and storage medium |
CN110675893A (en) * | 2019-09-19 | 2020-01-10 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110675893B (en) * | 2019-09-19 | 2022-04-05 | 腾讯音乐娱乐科技(深圳)有限公司 | Song identification method and device, storage medium and electronic equipment |
CN110728997A (en) * | 2019-11-29 | 2020-01-24 | 中国科学院深圳先进技术研究院 | Multi-modal depression detection method and system based on context awareness |
CN110728997B (en) * | 2019-11-29 | 2022-03-22 | 中国科学院深圳先进技术研究院 | Multi-modal depression detection system based on context awareness |
CN111899766A (en) * | 2020-08-24 | 2020-11-06 | 南京邮电大学 | Speech emotion recognition method based on optimization fusion of depth features and acoustic features |
CN111899766B (en) * | 2020-08-24 | 2023-04-14 | 南京邮电大学 | Speech emotion recognition method based on optimization fusion of depth features and acoustic features |
CN112712824A (en) * | 2021-03-26 | 2021-04-27 | 之江实验室 | Crowd information fused speech emotion recognition method and system |
CN112927723A (en) * | 2021-04-20 | 2021-06-08 | 东南大学 | High-performance anti-noise speech emotion recognition method based on deep neural network |
CN113593526A (en) * | 2021-07-27 | 2021-11-02 | 哈尔滨理工大学 | Speech emotion recognition method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110047516A (en) | A kind of speech-emotion recognition method based on gender perception | |
CN105427858B (en) | Realize the method and system that voice is classified automatically | |
CN107993665B (en) | Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system | |
CN106228977B (en) | Multi-mode fusion song emotion recognition method based on deep learning | |
CN110634491B (en) | Series connection feature extraction system and method for general voice task in voice signal | |
CN102982809B (en) | Conversion method for sound of speaker | |
CN109036465B (en) | Speech emotion recognition method | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN109065032B (en) | External corpus speech recognition method based on deep convolutional neural network | |
WO2023273170A1 (en) | Welcoming robot conversation method | |
CN109460737A (en) | A kind of multi-modal speech-emotion recognition method based on enhanced residual error neural network | |
CN109119072A (en) | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM | |
CN110097894A (en) | A kind of method and system of speech emotion recognition end to end | |
CN106847309A (en) | A kind of speech-emotion recognition method | |
CN108763326A (en) | A kind of sentiment analysis model building method of the diversified convolutional neural networks of feature based | |
CN109065033A (en) | A kind of automatic speech recognition method based on random depth time-delay neural network model | |
CN107492382A (en) | Voiceprint extracting method and device based on neutral net | |
CN110390955A (en) | A kind of inter-library speech-emotion recognition method based on Depth Domain adaptability convolutional neural networks | |
CN102800314A (en) | English sentence recognizing and evaluating system with feedback guidance and method of system | |
CN104538027B (en) | The mood of voice social media propagates quantization method and system | |
CN110148408A (en) | A kind of Chinese speech recognition method based on depth residual error | |
CN111583964A (en) | Natural speech emotion recognition method based on multi-mode deep feature learning | |
CN102237083A (en) | Portable interpretation system based on WinCE platform and language recognition method thereof | |
Wang et al. | Research on speech emotion recognition technology based on deep and shallow neural network | |
CN107293290A (en) | The method and apparatus for setting up Speech acoustics model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190723 |
|
WD01 | Invention patent application deemed withdrawn after publication |