CN103531206A - Voice affective characteristic extraction method capable of combining local information and global information - Google Patents

Voice affective characteristic extraction method capable of combining local information and global information Download PDF

Info

Publication number
CN103531206A
CN103531206A CN201310460191.5A CN201310460191A CN103531206A CN 103531206 A CN103531206 A CN 103531206A CN 201310460191 A CN201310460191 A CN 201310460191A CN 103531206 A CN103531206 A CN 103531206A
Authority
CN
China
Prior art keywords
frame
feature
extraction method
characteristic extraction
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310460191.5A
Other languages
Chinese (zh)
Other versions
CN103531206B (en
Inventor
文贵华
孙亚新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201310460191.5A priority Critical patent/CN103531206B/en
Publication of CN103531206A publication Critical patent/CN103531206A/en
Application granted granted Critical
Publication of CN103531206B publication Critical patent/CN103531206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a voice affective characteristic extraction method capable of combining local information and global information, which can extract three characteristics and belongs to the technical fields of voice signal processing and mode recognition. The voice affective characteristic extraction method comprises the following steps of (1) framing voice signals; (2) carrying out Fourier transform on each frame; (3) filtering a Fourier transform result by utilizing a Mel filter, solving energy from the filtering result, and taking the logarithm to the energy; (4) carrying out local Hu operation on the taken logarithm result to obtain a first characteristic; (5) carrying out discrete cosine transform on each frame after being subjected to the local Hu operation to obtain a second characteristic; (6) carrying out difference operation on the obtained logarithm result of the step (3), and carrying out the discrete cosine transform on each frame of the difference operation result to obtain a third characteristic. According to the voice affective characteristic extraction method capable of combining the local information and the global information, which is disclosed by the invention, the voice of each emotion can be quickly and effectively expressed, the application range comprises fields of voice retrieval, voice recognition, emotion computation and the like.

Description

The speech emotional characteristic extraction method of a kind of combination part and global information
Technical field
The present invention relates to a kind of voice signal and process and mode identification technology, particularly the speech emotional characteristic extraction method of a kind of combination part and global information.
Background technology
Along with the development of infotech, social development is calculated and is had higher requirement emotion.For example, aspect man-machine interaction, a computing machine that has emotion ability can obtain human emotion, classify, identify and respond, and then help user to obtain efficient and warm sensation, and can effectively alleviate the sense of defeat that people use computer, even can help people to understand own and other people feeling world.Whether the energy that for example adopts this type of technology to survey driving driver is concentrated, the press water equality of experiencing, and make relative response.In addition, emotion is calculated and can also be applied in the related industries such as robot, intelligent toy, game, ecommerce, to construct the style that more personalizes and scene more true to nature.Emotion has also reflected the mankind's mental health situation, and the application that emotion is calculated can help people to avoid unhealthy emotion effectively, and healthy psychology keeps pleasant.
People's facial expression, voice, physical signs etc. can reflect the mankind's emotion to a certain extent.The present invention relates to the speech feature extraction problem in voice-based emotion recognition.Use at present the feature in speech emotional identification to have a lot, widely used is MFCC feature.But MFCC has ignored the energy distribution information of Mel wave filter inside and the local distribution information between the different wave filter results of each frame, and responsive to noise, the present invention proposes a kind of speech emotional characteristic extraction method of simultaneously considering this two category information for this reason.
Summary of the invention
The shortcoming that the object of the invention is to overcome prior art, with not enough, provides the speech emotional characteristic extraction method of a kind of combination part with global information, and the method is simple, is easy to realize.
Object of the present invention is achieved through the following technical solutions: the speech emotional characteristic extraction method of a kind of combination part and global information, comprises the following steps:
[1] by voice signal, divide frame;
[2] each frame is carried out to Fourier transform;
[3] use Mel wave filter to Fourier transform results filtering, and filtering result is taken the logarithm;
[4] the logarithm result obtaining is used to local Hu computing, obtain the 1st category feature, be called HuLFPC feature;
[5] each frame after local Hu computing is carried out to discrete cosine transform, obtain the 2nd category feature, be called HuMFCC feature;
[6] the logarithm result of [3] step being calculated is carried out calculus of differences, then each frame of calculus of differences result is carried out to discrete cosine transform, obtains the 3rd category feature, is called DMFCC feature.
Described step [4], the logarithm result that step [3] is calculated is used local Hu computing, obtains the 1st category feature, is called HuLFPC feature.
Described step [5], carries out discrete cosine transform to each frame after local Hu computing, obtains the 2nd category feature, is called HuMFCC feature.
Described step [6], the logarithm result that step [3] is calculated is carried out calculus of differences in a window, then each frame of calculus of differences result is carried out to discrete cosine transform, obtains the 3rd category feature, is called DMFCC feature.
The present invention extracts following three category features:
The 1st category feature: for extracting the energy distribution information of each Mel wave filter inside, be called HuLFPC feature, first it divide frame by voice signal, and each frame is carried out to Fourier transform; Then Fourier transform results is used to Mel filter filtering, filtering result is asked to energy, and energy is taken the logarithm; Again the logarithm result obtaining is asked to Hu square in local window, obtain HuLFPC feature.
The 2nd category feature: for extracting the energy distribution information of each Mel wave filter inside, be called HuMFCC feature, its method be obtain HuLFPC feature after, the HuLFPC characteristic coefficient of each frame is carried out to one dimension dct transform, obtain HuMFCC feature.
The 3rd category feature: for extracting the local distribution information between the different wave filter results of each frame, be called DMFCC feature, its method, first divides frame by voice signal, and each frame is carried out to Fourier transform; Then Fourier transform results is used to Mel filter filtering, filtering result is asked to energy, and energy is taken the logarithm; Again the result of taking the logarithm is asked to difference in local window; Finally the difference coefficient of each frame is carried out to one dimension dct transform, obtain DMFCC feature.
Principle of work of the present invention: when speech emotional is different, all can there is corresponding variation in sound articulation, fundamental tone intensity of variation, intensity of phonation, word speed, these change and will change the intensity of sound spectrograph energy, and when as more clear in pronunciation, intensity of phonation is high, sound spectrograph energy comparison is concentrated.And the first moment of Hu just can evaluating data concentration of energy to the degree of data center of gravity, can be good at like this extracting and when speech emotional changes, cause the variation that on sound spectrograph, encircled energy occurs.Most of research is all only applied to derivative on the time shaft of sound spectrograph at present in addition, with this, extract the degree that energy changes, but when changing, emotion can change the frequency distribution of voice signal, thereby change on the frequency axis of sound spectrograph, so the derivative on this paper frequency of utilization axle extracts these variations.
The present invention has following advantage and effect with respect to prior art:
1, method is simple, and whole feature extraction framework is simple, is easy to realize.
2, algorithm complex is low, the formula high less than computation complexity in all feature extracting methods.
3, HuLFPC has local rotation, translation invariance, can prominent resonances peak, the integral energy distributed intelligence of voiceless sound, and can partly overcome various noises.
4, HuMFCC is transformed into frequency domain by each HuLFPC coefficient of each frame from time domain, except having the 3rd effect, relatively MFCC it can weaken the impact of the energy overall offset that the variation of fundamental tone brings.
5, DMFCC has given prominence to speech energy and has changed violent place, has reduced voice global energy and has changed the coefficient skew bringing, and makes the energy trend of sound spectrograph more outstanding simultaneously.
6, from accompanying drawing 2,3, can see in 6,7, HuLFPC and existing MLFPC feature differ larger; From accompanying drawing 4,5, can see in 8,9,10,11, it is also very large that DMFCC, HuMFCC and existing MFCC differ, thus the three class phonetic features that newly put forward to MFCC, the traditional voice features such as MLFPC have good complementation, successful.
Accompanying drawing explanation
Fig. 1 extracts the process flow diagram of three category features for speech emotional characteristic extraction method of the present invention.
The MLFPC feature visualization result of Fig. 2 " being exactly to rain also to go ".
The MLFPC feature visualization result of Fig. 3 " office worker finishes the work ".
The MFCC feature visualization result of Fig. 4 " being exactly to rain also to go ".
The MFCC feature visualization result of Fig. 5 " office worker finishes the work ".
The HuLFPC feature visualization result of Fig. 6 " being exactly to rain also to go ".
The HuLFPC feature visualization result of Fig. 7 " office worker finishes the work ".
The HuMFCC feature visualization result of Fig. 8 " being exactly to rain also to go ".
The HuMFCC feature visualization result of Fig. 9 " office worker finishes the work ".
The DMFCC feature visualization result of Figure 10 " being exactly to rain also to go ".
The DMFCC feature visualization result of Figure 11 " office worker finishes the work ".
Figure 12 is speech emotional recognition system structural drawing.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited to this.
Embodiment
As shown in Figure 1, the speech emotional characteristic extraction method of a kind of combination part and global information, comprises the following steps:
The first step: divide frame and windowing to obtain S to voice signal k(N).Take following two formulas to divide frame, wherein N represents frame length, and inc represents the sampled point number that next frame departs from, fix (x) asks the nearest integer from x, and the sampling rate that fs is voice signal, from speech data, bw is the frequency resolution in sound spectrograph, and k represents k frame, and the present invention gets 60HZ.Windowed function is Hamming window.
N=fix(1.81*fs/bw), (1)
inc=1.81/(4*bw), (2);
Second step: to S k(N) carry out short time discrete Fourier transform F k(N), and to F k(N) use (3) formula to obtain Mel frequency G k(N).
Mel(f)=2595*lg(1+f/700), (3);
The 3rd step: first use formula (4) to define a bank of filters that has M wave filter, each wave filter is triangular filter, M, calculating HuLFPC, gets 160 during HuMFCC, when calculating DMFCC, get 40.Then use formula (5) to calculate m wave filter to the filtered energy E of k frame k(m).The matrix that the E obtaining is K*M, wherein K is the frame number of one section of voice.
H m ( k ) = 0 , k < f ( m - 1 ) 2 ( k - f ( m - 1 ) ) ( f ( m + 1 ) - f ( m - 1 ) ) ( f ( m ) - f ( m - 1 ) ) , f ( m - 1 ) &le; k &le; f ( m ) 2 ( f ( m + 1 ) - k ) ( f ( m + 1 ) - f ( m - 1 ) ) ( f ( m + 1 ) - f ( m ) ) , f ( m ) &le; k &le; f ( m + 1 ) 0 , k &GreaterEqual; f ( m + 1 ) , - - - ( 4 ) ;
Wherein,
Figure BDA0000391454340000042
f (m) increases along with the increase of m with the difference of f (m-1);
E k ( m ) = ln ( &Sigma; n = 0 N - 1 | G k ( n ) | 2 H m ( n ) ) , 0 &le; m < M , 0 &le; k < K , - - - ( 5 ) ;
The 4th step: calculate the 1st category feature HuLFPC, feature visualization result as shown in Figure 6 and Figure 7, is first divided into E nonoverlapping window, each window is the matrix data E (r, c) of 3*3 size, then to all E (r, c) calculate Hu feature and obtain HuLFPC, its dimension is (K-2) * (M-2), and wherein, Hu feature calculation process is as follows: first to 2-D data E (r, c), use formula (6), (7), (8) calculate p+q rank geometric moment m pq, p+q rank centre distance μ pq, the normalized centre distance η in p+q rank pq, in formula, f (k, m) is an element in 2-D data E (r, c), then uses formula (10) to obtain a Hu feature θ in window;
m pq = &Sigma; k = 1 K &Sigma; m = 1 M k p m q f ( k , m ) , p , q = 0,1,2 &CenterDot; &CenterDot; &CenterDot; , - - - ( 6 )
&mu; pq = &Sigma; k = 1 K &Sigma; m = 1 M ( k - k &OverBar; ) p ( m - m &OverBar; ) q f ( k , m ) , p , q = 0,1,2 &CenterDot; &CenterDot; &CenterDot; , - - - ( 7 )
η pqpq/(μ 00 ρ),ρ=(p+q)/2+1, (8)
Wherein, with
Figure BDA0000391454340000054
the center of gravity of representative data:
k &OverBar; = m 10 / m 00 , m &OverBar; = m 01 / m 00 , - - - ( 9 )
θ=η 2002, (10)
The 5th step: calculate the 2nd category feature HuMFCC, feature visualization result as shown in Figure 8 and Figure 9.The HuLFPC of each frame is carried out to DCT algorithm, get second coefficient and to last coefficient, form the HuMFCC feature of (K-2) * (M-3) dimension;
The 6th step: calculate the 3rd category feature DMFCC, feature visualization result as shown in Figure 10 and Figure 11.First E is divided into overlapping 3*3 window, pixel of the relatively previous window sliding of each window, is used formula (11) to calculate difference to all windows and obtains DLFPC.And then the DLFPC coefficient of each frame is carried out to DCT algorithm, get second coefficient and to last coefficient, form the DMFCC feature of (K-2) * (M-3) dimension;
D k ( m ) = &Sigma; i = - 1 1 &Sigma; j = - 1 1 f ( k + i , m + j ) h ( i , j ) - - - ( 11 )
In above formula, h (i, j) is as given a definition:
H = - 1 0 1 - 1 0 1 - 1 0 1 , - - - ( 1 2 )
In a speech emotional recognition system, check the validity of the speech feature extraction method that the present invention proposes below.The speech emotional recognition system of experiment as shown in figure 12, is mainly divided into two large processes: training process and assorting process.Two processes include the feature extraction of speech samples, comprise three category features of the present invention and existing MFCC, MLFPC, the combination of LPCC feature, then adopt characteristic statistics method to change the proper vector of voice into, statistical value comprises: the average of each feature, variance, and the average of each feature first order difference, variance.Feature selection approach adopts SFS (Sequential Forward Feature Selection).Sorter adopts support vector machines (Support Vector Machine).
Training process comprises the following steps:
1) the emotional speech sample database Dw of preparation training, comprises the audio file of voice and corresponding emotion class label: indignation, frightened, irritated, detest, happy, neutral, sad.
2) the method according to this invention obtains three category feature HuMFCC, HuLFPC, DMFCC, and calculates traditional characteristic MFCC, MLFPC, and LPCC, is all converted into two dimensional character matrix by all training utterance samples, and wherein one dimension is the frame of training utterance sample.
3) use average, variance, and average, the variance statistical method of each feature first order difference in frame dimension change into proper vector by two dimensional character matrix.
4) proper vector is used to SFS feature selection approach, obtain the effective subset of emotional semantic classification, and obtain the proper vector V after feature selecting.
5) to V, adopt SVM to do emotion classifiers, with 5 times of interleaved modes, select the optimal parameter of SVM, and obtain corresponding disaggregated model.
As shown in Figure 3, identifying comprises the following steps speech emotional recognition system structural drawing of the present invention:
1) according to the present invention, obtain three category feature HuMFCC, HuLFPC, DMFCC, and calculate traditional characteristic MFCC, MLFPC, LPCC, is all converted into two dimensional character matrix by all identification speech samples, and wherein one dimension is the frame of identification speech samples.
2) use average, variance, and average, the variance statistical method of each feature first order difference in frame dimension change into proper vector by two dimensional character matrix.
3) parameter of the validity feature subset obtaining according to training process, the proper vector V. of the recognition sample of acquisition after feature selecting
4) adopt the disaggregated model of SVM that V is categorized as to a classification in emotion.
The corpus that the effect assessment of emotion recognition of the present invention adopts is German EMO-DB speech emotional database, and it is the standard database in speech emotional identification field.First complete training process, then identify test.Test pattern is undertaken by 5 times of interleaved modes.Can identify indignation, fear, agitation, detest, happy, neutral, sad 7 kinds of emotions, in the situation that speaker relies on, average classification accuracy rate is 91.67%, and except being easier to obscure with indignation ratio happily, between other mood, discrimination is better.Speaker independently in situation average classification accuracy rate be 82.20%, now happy, indignation, detest three kinds of mood ratios and be easier to obscure, the discrimination between other mood is better.As shown in Fig. 2, Fig. 3, Fig. 6 and Fig. 7, HuLFPC and existing MLFPC feature differ larger; As shown in Fig. 4, Fig. 5, Fig. 8, Fig. 9, Figure 10 and Figure 11, it is also very large that DMFCC, HuMFCC and existing MFCC differ.
Above-described embodiment is preferably embodiment of the present invention; but embodiments of the present invention are not restricted to the described embodiments; other any do not deviate from change, the modification done under Spirit Essence of the present invention and principle, substitutes, combination, simplify; all should be equivalent substitute mode, within being included in protection scope of the present invention.

Claims (7)

1. in conjunction with a speech emotional characteristic extraction method local and global information, it is characterized in that, comprise the following steps:
[1] by voice signal, divide frame;
[2] each frame is carried out to Fourier transform;
[3] use Mel wave filter to Fourier transform results filtering, and filtering result is taken the logarithm;
[4] the logarithm result obtaining is used to local Hu computing, obtain the 1st category feature;
[5] each frame after local Hu computing is carried out to discrete cosine transform, obtain the 2nd category feature;
[6] the logarithm result of step [3] being calculated is carried out calculus of differences, then each frame of calculus of differences result is carried out to discrete cosine transform, obtains the 3rd category feature.
2. the speech emotional characteristic extraction method of combination according to claim 1 part and global information, is characterized in that, described step [4] comprises the following steps:
1. E is divided into nonoverlapping window, each window is the matrix data E (r, c) of 3 * 3 sizes;
2. all E (r, c) are calculated to Hu feature and obtain HuLFPC, its dimension is:
(K-2)×(M-2),
Wherein, Hu feature calculation process is as follows:
First, to 2-D data E (r, c), use following (6) formula, (7) formula and (8) formula to calculate p+q rank geometric moment m pq, p+q rank centre distance μ pq, the normalized centre distance η in p+q rank pq:
m pq = &Sigma; k = 1 K &Sigma; m = 1 M k p m q f ( k , m ) , p , q = 0,1,2 &CenterDot; &CenterDot; &CenterDot; , - - - ( 6 )
&mu; pq = &Sigma; k = 1 K &Sigma; m = 1 M ( k - k &OverBar; ) p ( m - m &OverBar; ) q f ( k , m ) , p , q = 0,1,2 &CenterDot; &CenterDot; &CenterDot; , - - - ( 7 )
η pqpq/(μ 00 ρ),ρ=(p+q)/2+1, (8)
Wherein, f (k, m) is an element in 2-D data E (r, c);
Then, obtain a Hu feature θ in window:
k &OverBar; = m 10 / m 00 , m &OverBar; = m 01 / m 00 , - - - ( 9 )
θ=η 2002,(10)
Wherein,
Figure FDA0000391454330000014
with
Figure FDA0000391454330000015
the center of gravity of representative data.
3. the local speech emotional characteristic extraction method with global information of combination according to claim 1, it is characterized in that, in step [5], the HuLFPC of each frame is carried out to DCT algorithm, get second coefficient and form (K-2) * (M-3) HuMFCC feature of dimension to last coefficient.
4. the speech emotional characteristic extraction method of combination according to claim 1 part and global information, is characterized in that, described step [6] comprises the following steps:
I, E is divided into 3 * 3 overlapping windows, pixel of the relatively previous window sliding of each window, to all windows, use (11) formulas to calculate difference and obtain DLFPC:
D k ( m ) = &Sigma; i = - 1 1 &Sigma; j = - 1 1 f ( k + i , m + j ) h ( i , j ) - - - ( 11 )
In formula, h (i, j) is defined as follows:
H = - 1 0 1 - 1 0 1 - 1 0 1 ;
II, the DLFPC coefficient of each frame is carried out to DCT algorithm, get second coefficient and form (K-2) * (M-3) DMFCC feature of dimension to last coefficient.
5. the speech emotional characteristic extraction method of combination according to claim 1 part and global information, is characterized in that, in described step [1], takes (1) formula and (2) formula to divide frame:
N=fix(1.81*fs/bw), (1)
inc=1.81/(4*bw), (2)
Wherein, N represents frame length, and inc represents the sampled point number that next frame departs from, fix (x) asks the nearest integer from x, and the sampling rate that fs is voice signal, from speech data, bw is the frequency resolution in sound spectrograph, and the value of bw is got 60HZ, and windowed function is Hamming window.
6. the speech emotional characteristic extraction method of combination according to claim 1 part and global information, is characterized in that, in described step [2], to S k(N) carry out short time discrete Fourier transform F k(N), and to F k(N) use (3) formula to obtain Mel frequency G k(N):
Mel(f)=2595*lg(1+f/700), (3)
Wherein, k represents k frame.
7. the speech emotional characteristic extraction method of combination according to claim 1 part and global information, is characterized in that, described step [3] comprises the following steps:
(I) defines a bank of filters that has M wave filter, and each wave filter is triangular filter;
(II) use formula (5) is calculated m wave filter to the filtered energy E of k frame k(m), the matrix that the E of acquisition is K * M, wherein K is the frame number of one section of voice:
H m ( k ) = 0 , k < f ( m - 1 ) 2 ( k - f ( m - 1 ) ) ( f ( m + 1 ) - f ( m - 1 ) ) ( f ( m ) - f ( m - 1 ) ) , f ( m - 1 ) &le; k &le; f ( m ) 2 ( f ( m + 1 ) - k ) ( f ( m + 1 ) - f ( m - 1 ) ) ( f ( m + 1 ) - f ( m ) ) , f ( m ) &le; k &le; f ( m + 1 ) 0 , k &GreaterEqual; f ( m + 1 ) , - - - ( 4 )
Wherein, difference between f (m) and f (m-1) increases along with the increase of m;
E k ( m ) = ln ( &Sigma; n = 0 N - 1 | G k ( n ) | 2 H m ( n ) ) , 0 &le; m < M , 0 &le; k < K , - - - ( 5 )
Wherein, M gets 160, M and get 40 when calculating DMFCC when calculating HuLFPC and HuMFCC.
CN201310460191.5A 2013-09-30 2013-09-30 A kind of local speech emotional characteristic extraction method with global information of combination Active CN103531206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310460191.5A CN103531206B (en) 2013-09-30 2013-09-30 A kind of local speech emotional characteristic extraction method with global information of combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310460191.5A CN103531206B (en) 2013-09-30 2013-09-30 A kind of local speech emotional characteristic extraction method with global information of combination

Publications (2)

Publication Number Publication Date
CN103531206A true CN103531206A (en) 2014-01-22
CN103531206B CN103531206B (en) 2017-09-29

Family

ID=49933158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310460191.5A Active CN103531206B (en) 2013-09-30 2013-09-30 A kind of local speech emotional characteristic extraction method with global information of combination

Country Status (1)

Country Link
CN (1) CN103531206B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104637497A (en) * 2015-01-16 2015-05-20 南京工程学院 Speech spectrum characteristic extracting method facing speech emotion identification
CN107358946A (en) * 2017-06-08 2017-11-17 南京邮电大学 Speech-emotion recognition method based on section convolution
CN107564543A (en) * 2017-09-13 2018-01-09 苏州大学 A kind of Speech Feature Extraction of high touch discrimination
CN109087628A (en) * 2018-08-21 2018-12-25 广东工业大学 A kind of speech-emotion recognition method of trajectory-based time-space spectral signature
CN109920450A (en) * 2017-12-13 2019-06-21 北京回龙观医院 Information processing unit and information processing method
CN110246518A (en) * 2019-06-10 2019-09-17 深圳航天科技创新研究院 Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features
CN112489689A (en) * 2020-11-30 2021-03-12 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN112786016A (en) * 2019-11-11 2021-05-11 北京声智科技有限公司 Voice recognition method, device, medium and equipment
CN116434787A (en) * 2023-06-14 2023-07-14 之江实验室 Voice emotion recognition method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
CN101930733A (en) * 2010-09-03 2010-12-29 中国科学院声学研究所 Speech emotional characteristic extraction method for speech emotion recognition
CN102592593A (en) * 2012-03-31 2012-07-18 山东大学 Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech
CN103021406A (en) * 2012-12-18 2013-04-03 台州学院 Robust speech emotion recognition method based on compressive sensing
US20130166291A1 (en) * 2010-07-06 2013-06-27 Rmit University Emotional and/or psychiatric state detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130166291A1 (en) * 2010-07-06 2013-06-27 Rmit University Emotional and/or psychiatric state detection
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
CN101930733A (en) * 2010-09-03 2010-12-29 中国科学院声学研究所 Speech emotional characteristic extraction method for speech emotion recognition
CN102592593A (en) * 2012-03-31 2012-07-18 山东大学 Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech
CN103021406A (en) * 2012-12-18 2013-04-03 台州学院 Robust speech emotion recognition method based on compressive sensing

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104637497A (en) * 2015-01-16 2015-05-20 南京工程学院 Speech spectrum characteristic extracting method facing speech emotion identification
CN107358946A (en) * 2017-06-08 2017-11-17 南京邮电大学 Speech-emotion recognition method based on section convolution
CN107358946B (en) * 2017-06-08 2020-11-13 南京邮电大学 Voice emotion recognition method based on slice convolution
CN107564543B (en) * 2017-09-13 2020-06-26 苏州大学 Voice feature extraction method with high emotion distinguishing degree
CN107564543A (en) * 2017-09-13 2018-01-09 苏州大学 A kind of Speech Feature Extraction of high touch discrimination
CN109920450A (en) * 2017-12-13 2019-06-21 北京回龙观医院 Information processing unit and information processing method
CN109087628A (en) * 2018-08-21 2018-12-25 广东工业大学 A kind of speech-emotion recognition method of trajectory-based time-space spectral signature
CN110246518A (en) * 2019-06-10 2019-09-17 深圳航天科技创新研究院 Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features
CN112786016A (en) * 2019-11-11 2021-05-11 北京声智科技有限公司 Voice recognition method, device, medium and equipment
CN112786016B (en) * 2019-11-11 2022-07-19 北京声智科技有限公司 Voice recognition method, device, medium and equipment
CN112489689A (en) * 2020-11-30 2021-03-12 东南大学 Cross-database voice emotion recognition method and device based on multi-scale difference confrontation
CN116434787A (en) * 2023-06-14 2023-07-14 之江实验室 Voice emotion recognition method and device, storage medium and electronic equipment
CN116434787B (en) * 2023-06-14 2023-09-08 之江实验室 Voice emotion recognition method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN103531206B (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN103531206A (en) Voice affective characteristic extraction method capable of combining local information and global information
WO2021208287A1 (en) Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
CN101261832B (en) Extraction and modeling method for Chinese speech sensibility information
CN104200804B (en) Various-information coupling emotion recognition method for human-computer interaction
CN103544963B (en) A kind of speech-emotion recognition method based on core semi-supervised discrimination and analysis
Sharan et al. Acoustic event recognition using cochleagram image and convolutional neural networks
Cooney et al. Mel frequency cepstral coefficients enhance imagined speech decoding accuracy from EEG
CN103617799B (en) A kind of English statement pronunciation quality detection method being adapted to mobile device
CN102968986B (en) Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
CN102483916B (en) Audio feature extracting apparatus, audio feature extracting method, and audio feature extracting program
CN108231067A (en) Sound scenery recognition methods based on convolutional neural networks and random forest classification
CN102982803A (en) Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN104008754B (en) Speech emotion recognition method based on semi-supervised feature selection
CN102968990B (en) Speaker identifying method and system
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
Reddy et al. A comparison of cepstral features in the detection of pathological voices by varying the input and filterbank of the cepstrum computation
CN102411932A (en) Methods for extracting and modeling Chinese speech emotion in combination with glottis excitation and sound channel modulation information
CN108648760A (en) Real-time sound-groove identification System and method for
CN102655003A (en) Method for recognizing emotion points of Chinese pronunciation based on sound-track modulating signals MFCC (Mel Frequency Cepstrum Coefficient)
Tang et al. Improved convolutional neural networks for acoustic event classification
CN114155876A (en) Traffic flow identification method and device based on audio signal and storage medium
Sharma et al. Audio texture and age-wise analysis of disordered speech in children having specific language impairment
Yang Design of service robot based on user emotion recognition and environmental monitoring
CN102750950B (en) Chinese emotion speech extracting and modeling method combining glottal excitation and sound track modulation information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant