CN109767790A - A kind of speech-emotion recognition method and system - Google Patents
A kind of speech-emotion recognition method and system Download PDFInfo
- Publication number
- CN109767790A CN109767790A CN201910173689.0A CN201910173689A CN109767790A CN 109767790 A CN109767790 A CN 109767790A CN 201910173689 A CN201910173689 A CN 201910173689A CN 109767790 A CN109767790 A CN 109767790A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- obtains
- pretreatment
- sound spectrograph
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Image Analysis (AREA)
Abstract
The present invention discloses a kind of speech-emotion recognition method and system.The recognition methods includes: acquisition voice signal;The voice signal is pre-processed, pretreatment voice signal is obtained;Calculate the corresponding sound spectrograph of the pretreatment voice signal;The corresponding paragraph length of the emotion recognition rate highest is determined as best paragraph length by the emotion recognition rate for calculating the pretreatment voice signal of multiple and different paragraph length;The acoustic feature of the voice signal is extracted according to the corresponding sound spectrograph of the best paragraph length;By the acoustic feature using the emotion of voice signal described in convolutional neural networks Classification and Identification.Speech emotion recognition rate is improved using the speech-emotion recognition method based on sound spectrograph and convolutional neural networks.
Description
Technical field
The present invention relates to field of speech recognition, more particularly to a kind of speech-emotion recognition method and system.
Background technique
Emerging field of the speech emotion recognition as multi-crossed disciplines such as artificial intelligence, psychology, computational sciences, into 21
After century, with the fast development of artificial intelligence field, the demand of speech emotion recognition is increasing, so analysis, research language
The affective characteristics for including in sound judge that the mood of speaker's happiness, anger, grief and joy has very important influence.
The research in traditional speech emotion recognition field is partial to analyze the acoustics statistical nature of voice, wherein emotion language
Speech entry in sound database is less, semantic also simpler emotional speech database.In the prior art, for emotion recognition
Acoustic feature can be divided into prosodic features, the feature based on spectrum, sound quality feature.In the 21st century, is with artificial intelligence field
The demand of fast development, speech emotion recognition becomes larger, in terms of the extraction of affective characteristics, earliest enlightenment formula algorithm, including it is suitable
The selection, preferential selection forward of sequence Backwards selection, sequence, the algorithm of the extraction of linear character parameter is also applied, including principal component
Analytic approach, Fisher face Fisher face, due to analysis method in the prior art analysis result it is accurate
Rate is low, proposes a kind of method that feature is automatically extracted using deepness belief network, and uses in the prior art linear
The method of the method and k nearest neighbor method and support vector machines of identification and classification is returned using maximum likelihood Bayes Method, core
Return with three kinds of classifiers of k nearest neighbor method, achieve the discrimination of 60%-65%.
The discrimination of the progress speech emotional of the classification method and analysis method that use in the prior art is lower.
Summary of the invention
The object of the present invention is to provide a kind of speech-emotion recognition methods of discrimination that can be improved speech emotion recognition
And system.
To achieve the above object, the present invention provides following schemes:
A kind of speech-emotion recognition method, the recognition methods include:
Obtain voice signal;
The voice signal is pre-processed, pretreatment voice signal is obtained;
Calculate the corresponding sound spectrograph of the pretreatment voice signal;
The emotion recognition rate for calculating the pretreatment voice signal of multiple and different paragraph length, by the emotion recognition rate
The corresponding paragraph length of highest is determined as best paragraph length;
The acoustic feature of the voice signal is extracted according to the corresponding sound spectrograph of the best paragraph length;
By the acoustic feature using the emotion of voice signal described in convolutional neural networks Classification and Identification.
Optionally, the pretreatment voice signal obtains pretreatment voice signal and specifically includes:
The voice signal is passed through into digitized processing, obtains pulse voice signal;
By pulse speech signal samples processing, the pulse voice signal of discrete time and continuous amplitude is obtained;
By the pulse voice signal quantification treatment of the discrete time and continuous amplitude, discrete time and discrete amplitude are obtained
Pulse voice signal;
The pulse voice signal of the discrete time and discrete amplitude is subjected to preemphasis processing, obtains preemphasis voice letter
Number;
The preemphasis voice signal is subjected to framing windowing process, obtains pretreatment voice signal.
Optionally, the corresponding sound spectrograph of the pretreatment voice signal that calculates specifically includes:
Obtain the sample frequency F of the pretreatment voice signals, sample data sequence SgWith paragraph length;
According to the long N of window of the paragraph length and window functionnewThe pretreatment voice signal is divided into N sections, obtains N sections
Voice signal;
Frame, which is calculated, according to the paragraph length and the N sections of voice signal moves Nsfgtft;
To the i-th frame voice signal SiWindowing process obtains adding window voice signal S 'i,
S′i=Si×hanning(Nnew), wherein the value of i is 1,2 ..., N;
By the adding window voice signal S 'iFourier transformation is carried out, Fourier transformation voice signal Z is obtainedi;
According to the Fourier transformation voice signal ZiPhase thetaiCalculate the i-th frame voice signal SiEnergy density
Function | Zi|2;The window function is subjected to NsfgtftA frame moves, and obtains i+1 frame voice signal Si+1Energy density function | Zi+1
|2;
Obtain [a Nnew/ 2] the matrix R of+1 row, N column;
The matrix R is mapped as grayscale image, obtains the corresponding sound spectrograph of the calculating pretreatment voice signal.
Optionally, described to have the acoustic feature using the emotion of voice signal described in convolutional neural networks Classification and Identification
Body includes:
The sound spectrograph is handled using the convolutional layer of convolutional neural networks, and the three-dimensional sound spectrograph is converted to N number of two dimension
Feature;
Wherein, bjFor the departure function that can be trained, kijFor convolution kernel, xiIndicate i-th section of sound spectrograph of input;yiIt indicates
The corresponding two dimensional character of i-th section of sound spectrograph of output;
By the corresponding two dimensional character y of i-th section of sound spectrograph of the outputiIt is handled by pond layer, obtains low resolution sound
Learn feature y 'i;
It is provided with full articulamentum between the convolutional layer and the pond layer, has activation primitive in the full articulamentum, institute
Full articulamentum is stated for the data transmission between the convolutional layer and the pond layer.
A kind of speech emotion recognition system, the identifying system include:
Voice signal obtains module, for obtaining voice signal;
Preprocessing module obtains pretreatment voice signal for pre-processing the voice signal;
Sound spectrograph computing module, for calculating the corresponding sound spectrograph of the pretreatment voice signal;
Best paragraph length determination modul, the feelings of the pretreatment voice signal for calculating multiple and different paragraph length
Feel discrimination, the corresponding paragraph length of the emotion recognition rate highest is determined as best paragraph length;
Acoustic feature extraction module, for according to the best paragraph length corresponding sound spectrograph extraction voice signal
Acoustic feature;
Convolutional neural networks module, for believing the acoustic feature using voice described in convolutional neural networks Classification and Identification
Number emotion.
Optionally, the preprocessing module specifically includes:
Digitized processing unit obtains pulse voice signal for the voice signal to be passed through digitized processing;
Sample processing unit, for obtaining discrete time and continuous amplitude for pulse speech signal samples processing
Pulse voice signal;
Quantization processing unit, for obtaining the pulse voice signal quantification treatment of the discrete time and continuous amplitude
The pulse voice signal of discrete time and discrete amplitude;
Preemphasis processing unit, for carrying out the pulse voice signal of the discrete time and discrete amplitude at preemphasis
Reason obtains preemphasis voice signal;
Framing windowing unit obtains pretreatment voice for the preemphasis voice signal to be carried out framing windowing process
Signal.
Optionally, the sound spectrograph computing module specifically includes:
Voice signal information acquisition unit is pre-processed, for obtaining the sample frequency F of the pretreatment voice signals, adopt
Sample data sequence SgWith paragraph length;
Speech signal segments unit is pre-processed, for the long N of window according to the paragraph length and window functionnewIt will be described pre-
Processing voice signal is divided into N sections, obtains N sections of voice signals;
Frame moves computing unit, moves N for calculating frame according to the paragraph length and the N sections of voice signalsfgtft;
Windowing process unit, for the i-th frame voice signal SiWindowing process obtains adding window voice signal S 'i,
S′i=Si×hanning(Nnew), wherein the value of i is 1,2 ..., N;
Fourier transform unit is used for the adding window voice signal S 'iFourier transformation is carried out, Fourier transformation is obtained
Voice signal Zi;
Sound spectrograph acquiring unit, for according to the Fourier transformation voice signal ZiPhase thetaiCalculate the i-th frame language
Sound signal SiEnergy density function | Zi|2;The window function is subjected to NsfgtftA frame moves, and obtains i+1 frame voice signal Si+1
Energy density function | Zi+1|2;
Obtain [a Nnew/ 2] the matrix R of+1 row, N column;
The matrix R is mapped as grayscale image, obtains the corresponding sound spectrograph of the calculating pretreatment voice signal.
Optionally, the convolutional neural networks module specifically includes:
Convolution layer unit is handled for the sound spectrograph using the convolutional layer of convolutional neural networks, three-dimensional institute's predicate spectrum
Figure is converted to N number of two dimensional character;
Pond layer unit, for by the corresponding two dimensional character y of i-th section of sound spectrograph of the outputiIt is handled by pond layer,
Obtain low resolution acoustic feature y 'i;
Full connection layer unit, for being provided with full articulamentum, the full connection between the convolutional layer and the pond layer
There is activation primitive in layer, the full articulamentum is for the data transmission between the convolutional layer and the pond layer.
The specific embodiment provided according to the present invention, the invention discloses following technical effects: the invention discloses one kind
Speech-emotion recognition method and system.The recognition methods is to obtain voice signal;The voice signal is pre-processed, pre- place is obtained
Manage voice signal;Calculate the corresponding sound spectrograph of the pretreatment voice signal;Calculate the pre- place of multiple and different paragraph length
The emotion recognition rate for managing voice signal, is determined as best paragraph length for the corresponding paragraph length of the emotion recognition rate highest;
The acoustic feature of the voice signal is extracted according to the corresponding sound spectrograph of the best paragraph length;The acoustic feature is used
The emotion of voice signal described in convolutional neural networks Classification and Identification.Using the speech emotional based on sound spectrograph and convolutional neural networks
Recognition methods improves speech emotion recognition rate, the feature of the sound spectrograph based on best paragraph length and the knowledge of convolutional neural networks
Other method also further improves the discrimination of speech emotional.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow chart of speech-emotion recognition method provided by the invention;
Fig. 2 is the composition block diagram of speech emotion recognition system provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of speech-emotion recognition methods of discrimination that can be improved speech emotion recognition
And system.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
As shown in Figure 1, a kind of speech-emotion recognition method, the recognition methods include:
Step 100: obtaining voice signal;
Step 200: pre-processing the voice signal, obtain pretreatment voice signal;
Step 300: calculating the corresponding sound spectrograph of the pretreatment voice signal;
Step 400: the emotion recognition rate of the pretreatment voice signal of multiple and different paragraph length is calculated, by the feelings
The corresponding paragraph length of sense discrimination highest is determined as best paragraph length;
Step 500: the acoustic feature of the voice signal is extracted according to the corresponding sound spectrograph of the best paragraph length;
Step 600: by the acoustic feature using the emotion of voice signal described in convolutional neural networks Classification and Identification.
The step 200: pre-processing the voice signal, obtains pretreatment voice signal and specifically includes:
The voice signal is passed through into digitized processing, obtains pulse voice signal;
By pulse speech signal samples processing, the pulse voice signal of discrete time and continuous amplitude is obtained;
By the pulse voice signal quantification treatment of the discrete time and continuous amplitude, discrete time and discrete amplitude are obtained
Pulse voice signal;
The pulse voice signal of the discrete time and discrete amplitude is subjected to preemphasis processing, obtains preemphasis voice letter
Number;
The preemphasis voice signal is subjected to framing windowing process, obtains pretreatment voice signal.
The step 300: it calculates the corresponding sound spectrograph of the pretreatment voice signal and specifically includes:
Obtain the sample frequency F of the pretreatment voice signals, sample data sequence SgWith paragraph length;
According to the long N of window of the paragraph length and window functionnewThe pretreatment voice signal is divided into N sections, obtains N sections
Voice signal;
Frame, which is calculated, according to the paragraph length and the N sections of voice signal moves Nsfgtft;
To the i-th frame voice signal SiWindowing process obtains adding window voice signal S 'i,
S′i=Si×hanning(Nnew), wherein the value of i is 1,2 ..., N;
By the adding window voice signal S 'iFourier transformation is carried out, Fourier transformation voice signal Z is obtainedi;
According to the Fourier transformation voice signal ZiPhase thetaiCalculate the i-th frame voice signal SiEnergy density
Function | Zi|2;The window function is subjected to NsfgtftA frame moves, and obtains i+1 frame voice signal Si+1Energy density function | Zi+1
|2;
Obtain [a Nnew/ 2] the matrix R of+1 row, N column;
The matrix R is mapped as grayscale image, the corresponding sound spectrograph of the calculating pretreatment voice signal is obtained, leads to
The quantity for needing the coefficient of training can be reduced by crossing the shared filter of weight.
The step 600: by the acoustic feature using the emotion of voice signal described in convolutional neural networks Classification and Identification
It specifically includes:
The sound spectrograph is handled using the convolutional layer of convolutional neural networks, and the three-dimensional sound spectrograph is converted to N number of two dimension
Feature;
Wherein, bjFor the departure function that can be trained, kijFor convolution kernel, xiIndicate i-th section of sound spectrograph of input;yiIt indicates
The corresponding two dimensional character of i-th section of sound spectrograph of output;
By the corresponding two dimensional character y of i-th section of sound spectrograph of the outputiIt is handled by pond layer, obtains low resolution sound
Learn feature y 'i;
It is provided with full articulamentum between the convolutional layer and the pond layer, has activation primitive in the full articulamentum, institute
Full articulamentum is stated for the data transmission between the convolutional layer and the pond layer.
As shown in Fig. 2, a kind of speech emotion recognition system, the identifying system include:
Voice signal obtains module 1, for obtaining voice signal;
Preprocessing module 2 obtains pretreatment voice signal for pre-processing the voice signal;
Sound spectrograph computing module 3, for calculating the corresponding sound spectrograph of the pretreatment voice signal;
Best paragraph length determination modul 4, for calculating the pretreatment voice signal of multiple and different paragraph length
The corresponding paragraph length of the emotion recognition rate highest is determined as best paragraph length by emotion recognition rate;
Acoustic feature extraction module 5, for being believed according to the best paragraph length corresponding sound spectrograph extraction voice
Number acoustic feature;
Convolutional neural networks module 6, for the acoustic feature to be used voice described in convolutional neural networks Classification and Identification
The emotion of signal.
The preprocessing module 2 specifically includes:
Digitized processing unit obtains pulse voice signal for the voice signal to be passed through digitized processing;
Sample processing unit, for obtaining discrete time and continuous amplitude for pulse speech signal samples processing
Pulse voice signal;
Quantization processing unit, for obtaining the pulse voice signal quantification treatment of the discrete time and continuous amplitude
The pulse voice signal of discrete time and discrete amplitude;
Preemphasis processing unit, for carrying out the pulse voice signal of the discrete time and discrete amplitude at preemphasis
Reason obtains preemphasis voice signal;
Framing windowing unit obtains pretreatment voice for the preemphasis voice signal to be carried out framing windowing process
Signal.
The sound spectrograph computing module 3 specifically includes:
Voice signal information acquisition unit is pre-processed, for obtaining the sample frequency F of the pretreatment voice signals, adopt
Sample data sequence SgWith paragraph length;
Speech signal segments unit is pre-processed, for the long N of window according to the paragraph length and window functionnewIt will be described pre-
Processing voice signal is divided into N sections, obtains N sections of voice signals;
Frame moves computing unit, moves N for calculating frame according to the paragraph length and the N sections of voice signalsfgtft;
Windowing process unit, for the i-th frame voice signal SiWindowing process obtains adding window voice signal S 'i,
S′i=Si×hanning(Nnew), wherein the value of i is 1,2 ..., N;
Fourier transform unit is used for the adding window voice signal S 'iFourier transformation is carried out, Fourier transformation is obtained
Voice signal Zi;
Sound spectrograph acquiring unit, for according to the Fourier transformation voice signal ZiPhase thetaiCalculate the i-th frame language
Sound signal SiEnergy density function | Zi|2;The window function is subjected to NsfgtftA frame moves, and obtains i+1 frame voice signal Si+1
Energy density function | Zi+1|2;
Obtain [a Nnew/ 2] the matrix R of+1 row, N column;
The matrix R is mapped as grayscale image, obtains the corresponding sound spectrograph of the calculating pretreatment voice signal.
The convolutional neural networks module 6 specifically includes:
Convolution layer unit is handled for the sound spectrograph using the convolutional layer of convolutional neural networks, three-dimensional institute's predicate spectrum
Figure is converted to N number of two dimensional character;
Pond layer unit, for by the corresponding two dimensional character y of i-th section of sound spectrograph of the outputiIt is handled by pond layer,
Obtain low resolution acoustic feature y 'i;
Full connection layer unit, for being provided with full articulamentum, the full connection between the convolutional layer and the pond layer
There is activation primitive in layer, the full articulamentum is for the data transmission between the convolutional layer and the pond layer.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For system disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (8)
1. a kind of speech-emotion recognition method, which is characterized in that the recognition methods includes:
Obtain voice signal;
The voice signal is pre-processed, pretreatment voice signal is obtained;
Calculate the corresponding sound spectrograph of the pretreatment voice signal;
The emotion recognition rate for calculating the pretreatment voice signal of multiple and different paragraph length, by the emotion recognition rate highest
Corresponding paragraph length is determined as best paragraph length;
The acoustic feature of the voice signal is extracted according to the corresponding sound spectrograph of the best paragraph length;
By the acoustic feature using the emotion of voice signal described in convolutional neural networks Classification and Identification.
2. a kind of speech-emotion recognition method according to claim 1, which is characterized in that the pretreatment voice letter
Number, it obtains pretreatment voice signal and specifically includes:
The voice signal is passed through into digitized processing, obtains pulse voice signal;
By pulse speech signal samples processing, the pulse voice signal of discrete time and continuous amplitude is obtained;
By the pulse voice signal quantification treatment of the discrete time and continuous amplitude, the arteries and veins of discrete time and discrete amplitude is obtained
Rush voice signal;
The pulse voice signal of the discrete time and discrete amplitude is subjected to preemphasis processing, obtains preemphasis voice signal;
The preemphasis voice signal is subjected to framing windowing process, obtains pretreatment voice signal.
3. a kind of speech-emotion recognition method according to claim 1, which is characterized in that described to calculate the pretreatment language
The corresponding sound spectrograph of sound signal specifically includes:
Obtain the sample frequency F of the pretreatment voice signals, sample data sequence SgWith paragraph length;
According to the long N of window of the paragraph length and window functionnewThe pretreatment voice signal is divided into N sections, obtains N sections of voices
Signal;
Frame, which is calculated, according to the paragraph length and the N sections of voice signal moves Nsfgtft;
To the i-th frame voice signal SiWindowing process obtains adding window voice signal Si',
Si'=Si×hanning(Nnew), wherein the value of i is 1,2 ..., N;
By the adding window voice signal Si' Fourier transformation is carried out, obtain Fourier transformation voice signal Zi;
According to the Fourier transformation voice signal ZiPhase thetaiCalculate the i-th frame voice signal SiEnergy density function |
Zi|2;The window function is subjected to NsfgtftA frame moves, and obtains i+1 frame voice signal Si+1Energy density function | Zi+1|2;
Obtain [a Nnew/ 2] the matrix R of+1 row, N column;
The matrix R is mapped as grayscale image, obtains the corresponding sound spectrograph of the calculating pretreatment voice signal.
4. a kind of speech-emotion recognition method according to claim 1, which is characterized in that described to adopt the acoustic feature
The emotion of the voice signal described in convolutional neural networks Classification and Identification specifically includes:
The sound spectrograph is handled using the convolutional layer of convolutional neural networks, and the three-dimensional sound spectrograph is converted to N number of two dimensional character;
Wherein, bjFor the departure function that can be trained, kijFor convolution kernel, xiIndicate i-th section of sound spectrograph of input;yiIndicate output
The corresponding two dimensional character of i-th section of sound spectrograph;
By the corresponding two dimensional character y of i-th section of sound spectrograph of the outputiIt is handled by pond layer, obtains low resolution acoustic feature
yi′;
It is provided with full articulamentum between the convolutional layer and the pond layer, there is activation primitive in the full articulamentum, it is described complete
Articulamentum is for the data transmission between the convolutional layer and the pond layer.
5. a kind of speech emotion recognition system, which is characterized in that the identifying system includes:
Voice signal obtains module, for obtaining voice signal;
Preprocessing module obtains pretreatment voice signal for pre-processing the voice signal;
Sound spectrograph computing module, for calculating the corresponding sound spectrograph of the pretreatment voice signal;
Best paragraph length determination modul is known for calculating the emotion of the pretreatment voice signal of multiple and different paragraph length
The corresponding paragraph length of the emotion recognition rate highest is determined as best paragraph length by not rate;
Acoustic feature extraction module, for extracting the sound of the voice signal according to the corresponding sound spectrograph of the best paragraph length
Learn feature;
Convolutional neural networks module, for the acoustic feature to be used voice signal described in convolutional neural networks Classification and Identification
Emotion.
6. a kind of speech emotion recognition system according to claim 5, which is characterized in that the preprocessing module is specifically wrapped
It includes:
Digitized processing unit obtains pulse voice signal for the voice signal to be passed through digitized processing;
Sample processing unit, for obtaining the pulse of discrete time and continuous amplitude for pulse speech signal samples processing
Voice signal;
Quantization processing unit, for obtaining the pulse voice signal quantification treatment of the discrete time and continuous amplitude discrete
The pulse voice signal of time and discrete amplitude;
Preemphasis processing unit, for the pulse voice signal of the discrete time and discrete amplitude to be carried out preemphasis processing,
Obtain preemphasis voice signal;
Framing windowing unit obtains pretreatment voice signal for the preemphasis voice signal to be carried out framing windowing process.
7. a kind of speech emotion recognition system according to claim 5, which is characterized in that the sound spectrograph computing module tool
Body includes:
Voice signal information acquisition unit is pre-processed, for obtaining the sample frequency F of the pretreatment voice signals, sampled data
Sequence SgWith paragraph length;
Speech signal segments unit is pre-processed, for the long N of window according to the paragraph length and window functionnewBy the pretreatment
Voice signal is divided into N sections, obtains N sections of voice signals;
Frame moves computing unit, moves N for calculating frame according to the paragraph length and the N sections of voice signalsfgtft;
Windowing process unit, for the i-th frame voice signal SiWindowing process obtains adding window voice signal Si',
Si'=Si×hanning(Nnew), wherein the value of i is 1,2 ..., N;
Fourier transform unit is used for the adding window voice signal Si' Fourier transformation is carried out, obtain Fourier transformation voice
Signal Zi;
Sound spectrograph acquiring unit, for according to the Fourier transformation voice signal ZiPhase thetaiCalculate the i-th frame voice letter
Number SiEnergy density function | Zi|2;The window function is subjected to NsfgtftA frame moves, and obtains i+1 frame voice signal Si+1Energy
Metric density function | Zi+1|2;
Obtain [a Nnew/ 2] the matrix R of+1 row, N column;
The matrix R is mapped as grayscale image, obtains the corresponding sound spectrograph of the calculating pretreatment voice signal.
8. a kind of speech-emotion recognition method according to claim 1, which is characterized in that the convolutional neural networks module
It specifically includes:
Convolution layer unit is handled for the sound spectrograph using the convolutional layer of convolutional neural networks, and the three-dimensional sound spectrograph turns
It is changed to N number of two dimensional character;
Pond layer unit, for by the corresponding two dimensional character y of i-th section of sound spectrograph of the outputiIt is handled, is obtained by pond layer
Low resolution acoustic feature yi′;
Full connection layer unit, for being provided with full articulamentum between the convolutional layer and the pond layer, in the full articulamentum
There is activation primitive, the full articulamentum is for the data transmission between the convolutional layer and the pond layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910173689.0A CN109767790A (en) | 2019-02-28 | 2019-02-28 | A kind of speech-emotion recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910173689.0A CN109767790A (en) | 2019-02-28 | 2019-02-28 | A kind of speech-emotion recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109767790A true CN109767790A (en) | 2019-05-17 |
Family
ID=66457882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910173689.0A Pending CN109767790A (en) | 2019-02-28 | 2019-02-28 | A kind of speech-emotion recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109767790A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415728A (en) * | 2019-07-29 | 2019-11-05 | 内蒙古工业大学 | A kind of method and apparatus identifying emotional speech |
CN110534133A (en) * | 2019-08-28 | 2019-12-03 | 珠海亿智电子科技有限公司 | A kind of speech emotion recognition system and speech-emotion recognition method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090063202A (en) * | 2009-05-29 | 2009-06-17 | 포항공과대학교 산학협력단 | Method for apparatus for providing emotion speech recognition |
US20130297297A1 (en) * | 2012-05-07 | 2013-11-07 | Erhan Guven | System and method for classification of emotion in human speech |
CN104021373A (en) * | 2014-05-27 | 2014-09-03 | 江苏大学 | Semi-supervised speech feature variable factor decomposition method |
CN107705806A (en) * | 2017-08-22 | 2018-02-16 | 北京联合大学 | A kind of method for carrying out speech emotion recognition using spectrogram and deep convolutional neural networks |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108899049A (en) * | 2018-05-31 | 2018-11-27 | 中国地质大学(武汉) | A kind of speech-emotion recognition method and system based on convolutional neural networks |
CN109036465A (en) * | 2018-06-28 | 2018-12-18 | 南京邮电大学 | Speech-emotion recognition method |
-
2019
- 2019-02-28 CN CN201910173689.0A patent/CN109767790A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090063202A (en) * | 2009-05-29 | 2009-06-17 | 포항공과대학교 산학협력단 | Method for apparatus for providing emotion speech recognition |
US20130297297A1 (en) * | 2012-05-07 | 2013-11-07 | Erhan Guven | System and method for classification of emotion in human speech |
CN104021373A (en) * | 2014-05-27 | 2014-09-03 | 江苏大学 | Semi-supervised speech feature variable factor decomposition method |
CN107705806A (en) * | 2017-08-22 | 2018-02-16 | 北京联合大学 | A kind of method for carrying out speech emotion recognition using spectrogram and deep convolutional neural networks |
CN108010514A (en) * | 2017-11-20 | 2018-05-08 | 四川大学 | A kind of method of speech classification based on deep neural network |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108899049A (en) * | 2018-05-31 | 2018-11-27 | 中国地质大学(武汉) | A kind of speech-emotion recognition method and system based on convolutional neural networks |
CN109036465A (en) * | 2018-06-28 | 2018-12-18 | 南京邮电大学 | Speech-emotion recognition method |
Non-Patent Citations (5)
Title |
---|
SATHIT PRASOMPHAN: "Improvement of speech emotion recognition with neural network classifier by using speech spectrogram", 《2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP)》 * |
张若凡 等: "基于语谱图的老年人语音情感识别方法", 《软件导刊》 * |
王建伟: "基于深度学习的情绪感知系统的研究与设计", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
田熙燕 等: "基于语谱图和卷积神经网络的语音情感识别", 《河南科技学院学报》 * |
黄晨晨 等: "基于深度信念网络的语音情感识别的研究", 《计算机研究与发展》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415728A (en) * | 2019-07-29 | 2019-11-05 | 内蒙古工业大学 | A kind of method and apparatus identifying emotional speech |
CN110415728B (en) * | 2019-07-29 | 2022-04-01 | 内蒙古工业大学 | Method and device for recognizing emotion voice |
CN110534133A (en) * | 2019-08-28 | 2019-12-03 | 珠海亿智电子科技有限公司 | A kind of speech emotion recognition system and speech-emotion recognition method |
CN110534133B (en) * | 2019-08-28 | 2022-03-25 | 珠海亿智电子科技有限公司 | Voice emotion recognition system and voice emotion recognition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alim et al. | Some commonly used speech feature extraction algorithms | |
Stanton et al. | Predicting expressive speaking style from text in end-to-end speech synthesis | |
US8676574B2 (en) | Method for tone/intonation recognition using auditory attention cues | |
CN108922513A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN111724770B (en) | Audio keyword identification method for generating confrontation network based on deep convolution | |
CN115641543B (en) | Multi-modal depression emotion recognition method and device | |
CN109767756A (en) | A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN109036470B (en) | Voice distinguishing method, device, computer equipment and storage medium | |
Rammo et al. | Detecting the speaker language using CNN deep learning algorithm | |
CN109065073A (en) | Speech-emotion recognition method based on depth S VM network model | |
CN112053694A (en) | Voiceprint recognition method based on CNN and GRU network fusion | |
CN114722812A (en) | Method and system for analyzing vulnerability of multi-mode deep learning model | |
Jie et al. | Speech emotion recognition of teachers in classroom teaching | |
CN109767790A (en) | A kind of speech-emotion recognition method and system | |
CN116612541A (en) | Multi-mode emotion recognition method, device and storage medium | |
CN111724809A (en) | Vocoder implementation method and device based on variational self-encoder | |
CN113571095B (en) | Speech emotion recognition method and system based on nested deep neural network | |
CN114898779A (en) | Multi-mode fused speech emotion recognition method and system | |
CN113111786B (en) | Underwater target identification method based on small sample training diagram convolutional network | |
Wang et al. | Speech signal feature parameters extraction algorithm based on PCNN for isolated word recognition | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
CN112712789A (en) | Cross-language audio conversion method and device, computer equipment and storage medium | |
Peng et al. | Multi-scale model for mandarin tone recognition | |
CN116312617A (en) | Voice conversion method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190517 |