CN107945790A - A kind of emotion identification method and emotion recognition system - Google Patents
A kind of emotion identification method and emotion recognition system Download PDFInfo
- Publication number
- CN107945790A CN107945790A CN201810007403.7A CN201810007403A CN107945790A CN 107945790 A CN107945790 A CN 107945790A CN 201810007403 A CN201810007403 A CN 201810007403A CN 107945790 A CN107945790 A CN 107945790A
- Authority
- CN
- China
- Prior art keywords
- feature
- text
- acoustic feature
- speech signal
- current speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Abstract
The embodiment of the invention discloses a kind of emotion identification method and emotion recognition system, wherein, this method includes:Obtain current speech signal;The phonetic feature of current speech signal is extracted, phonetic feature includes:Acoustic feature and text feature;According to phonetic feature and predetermined depth model, the corresponding affective style of identification current speech signal, affective style includes:Positive, neutral and negative, technical scheme can identify corresponding affective style by voice signal, be improved service level with exercising supervision to attendant.
Description
Technical field
The present embodiments relate to field of communication technology, and in particular to a kind of emotion identification method and emotion recognition system.
Background technology
In interpersonal exchange, language is one of most natural and important means.Carried secretly in the speech of speaker
Emotion can to the mood of people around produce strong influence, wherein, emotion includes:Front and negative, especially attendant,
For example, in public arenas such as bus, the home for the aged or hospitals, if attendant behaves badly, the tone is arrogant, and language is vulgar,
That is emotion is negative that will be unfavorable for social harmony to causing deleterious effect by attendant and improve Happiness Index.
Study and find through inventor, can be judged currently without a kind of effective technological means by the speech of attendant
Go out its corresponding emotion, improved service level with being supervised to it.
The content of the invention
In order to solve the above-mentioned technical problem, an embodiment of the present invention provides a kind of emotion identification method and emotion recognition system
System, can identify corresponding emotion by voice signal.
In one aspect, an embodiment of the present invention provides a kind of emotion identification method, including:
Obtain current speech signal;
The phonetic feature of current speech signal is extracted, the phonetic feature includes:Acoustic feature and text feature;
According to the phonetic feature and predetermined depth model, the corresponding affective style of the current speech signal, institute are identified
Stating affective style includes:It is positive, neutral and negative.
Alternatively, before the phonetic feature of the extraction current speech signal, the method further includes:
The current speech signal is pre-processed.
Alternatively, after the corresponding affective style of the identification current speech signal, the method further includes:
According to the affective style, corresponding default counte-rplan are activated.
Alternatively, the acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum.
Alternatively, it is described according to the phonetic feature and predetermined depth model, identify that the current speech signal is corresponding
Affective style includes:
According to acoustic feature and text feature, the acoustic feature information and text feature information for emotion recognition are obtained;
According to the acoustic feature information, K acoustic feature vector is obtained;
According to K acoustic feature vector sum text feature information, K Text eigenvector is obtained;
According to K acoustic feature vector, K Text eigenvector and predetermined depth model, current speech signal is identified
Affective style.
Alternatively, it is described according to acoustic feature and text feature, obtain the acoustic feature information and text for emotion recognition
Eigen information includes:
Acoustic feature and text feature are separately converted to corresponding vector;
The corresponding vector of the corresponding vector sum text feature of acoustic feature is inputted into convolutional neural networks respectively, is used for
The acoustic feature information and text feature information of emotion recognition.
Alternatively, described according to the acoustic feature information, obtaining K acoustic feature vector includes:
By the acoustic feature information pool, K acoustic feature vector is obtained;
It is described to be included according to K acoustic feature vector sum text feature information, K Text eigenvector of acquisition:
Text feature information is focused on using focus mechanism according to the average of K acoustic feature vector;
By the text feature information pool after focusing, K Text eigenvector is obtained.
On the other hand, the embodiment of the present invention also provides a kind of emotion recognition system, including:
Voice acquisition module, is configured as obtaining current speech signal;
Characteristic extracting module, is configured as the phonetic feature of extraction current speech signal, and the phonetic feature includes:Acoustics
Feature and text feature;
Emotion recognition module, is configured as, according to the phonetic feature and predetermined depth model, identifying the current speech
The corresponding affective style of signal, the affective style include:It is positive, neutral and negative.
Alternatively, the system also includes:Signal pre-processing module and active module;
The signal pre-processing module, is configured as pre-processing the current speech signal;
The active module, is configured as, according to the affective style, activating corresponding default counte-rplan.
Alternatively, the emotion recognition module includes:
First obtains unit, is configured as according to acoustic feature and text feature, obtains special for the acoustics of emotion recognition
Reference ceases and text feature information, specifically includes:Acoustic feature and text feature are separately converted to corresponding vector;By acoustics
The corresponding vector of the corresponding vector sum text feature of feature inputs convolutional neural networks respectively, obtains the acoustics for emotion recognition
Characteristic information and text feature information;The acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum;
Second obtaining unit, is configured as according to the acoustic feature information, obtains K acoustic feature vector, specific bag
Include:By the acoustic feature information pool, K acoustic feature vector is obtained;It is additionally configured to according to K acoustic feature vector sum
Text feature information, obtains K Text eigenvector, specifically includes:According to the average of K acoustic feature vector to text feature
Information is focused on using focus mechanism;By the text feature information pool after focusing, K Text eigenvector is obtained;
Emotion recognition unit, is configured as according to K acoustic feature vector, K Text eigenvector and predetermined depth mould
Type, identifies the affective style of current speech signal.
The embodiment of the present invention provides a kind of emotion identification method and emotion recognition system, wherein, this method includes:Obtain and work as
Preceding voice signal;The phonetic feature of current speech signal is extracted, phonetic feature includes:Acoustic feature and text feature;According to institute
Phonetic feature and predetermined depth model are stated, identifies the corresponding affective style of the current speech signal, the affective style includes:
Positive, neutral and negative, technical scheme can identify corresponding affective style by voice signal, with to service
Personnel, which exercise supervision, to improve service level.
Certainly, implement any of the products of the present invention or method it is not absolutely required to reach all the above excellent at the same time
Point.Other features and advantages of the present invention will illustrate in subsequent specification embodiment, also, partly implement from specification
Become apparent in example, or understood by implementing the present invention.The purpose of the embodiment of the present invention and other advantages can pass through
Specifically noted structure is realized and obtained in specification, claims and attached drawing.
Brief description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this
The embodiment of application is used to explain technical scheme together, does not form the limitation to technical solution of the present invention.
Fig. 1 is a flow chart of emotion identification method provided in an embodiment of the present invention;
Fig. 2 is another flow chart of emotion identification method provided in an embodiment of the present invention;
Fig. 3 is the flow chart of step 300 provided in an embodiment of the present invention;
Fig. 4 is a structure diagram of emotion recognition system provided in an embodiment of the present invention;
Fig. 5 is another structure diagram of emotion recognition system provided in an embodiment of the present invention;
Fig. 6 is the structure diagram of emotion recognition module provided in an embodiment of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with attached drawing to the present invention
Embodiment be described in detail.It should be noted that in the case where there is no conflict, in the embodiment and embodiment in the application
Feature can mutually be combined.
In order to illustrate the technical solution described in the embodiment of the present invention, illustrated below by specific embodiment.
Embodiment one
Fig. 1 is a flow chart of emotion identification method provided in an embodiment of the present invention, as shown in Figure 1, the present invention is implemented
The emotion identification method that example provides specifically includes following steps:
Step 100, obtain current speech signal.
Specifically, step 100 obtains voice signal by microphone or microphone array.
Step 200, the phonetic feature for extracting current speech signal.
Wherein, phonetic feature includes:Acoustic feature and text feature.
Alternatively, acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum, wherein, fundamental frequency determines tone height, by certainly
Related algorithm extracts fundamental frequency feature;Duration is related to word speed, and the unvoiced information in current speech signal is also for emotion recognition
It is valuable, duration characteristics are extracted by Visual Speech instruments;Energy is related with amplitude, can pass through existing technology
Extract energy feature and spectrum signature.
Alternatively, text feature is the text message in current speech signal, passes through speech recognition technology such as University of Science and Technology
The winged Auto-Speech Recognition extraction text features of news.
Step 300, according to phonetic feature and predetermined depth model, the corresponding affective style of identification current speech signal.
Wherein, affective style includes:It is front, neutral and negative, it is necessary to explanation, positive affective style can make by
Attendant is pleasant, and neutral affective style will not be to being had an impact by the mood of attendant, and negative affective style is just
It can make to be felt ill by attendant.For same a word, such as " you are fool ", it may be possible to a people in the friend that talks in professional jargon,
It is likely to be and ridicules opponent, the possible front of emotion may also be negative.
It should be noted that predetermined depth model is largely trained by sample database so that the feelings identified
The accuracy rate for feeling type is higher.
Alternatively, emotion identification method provided in an embodiment of the present invention can be applied to the public affairs such as bus, the home for the aged, hospital
Occasion altogether.
Emotion identification method provided in an embodiment of the present invention, including:Obtain current speech signal;Extract current speech signal
Phonetic feature, phonetic feature includes:Acoustic feature and text feature;According to phonetic feature and predetermined depth model, identification is worked as
The corresponding affective style of preceding voice signal, affective style include:Positive, neutral and negative, technical scheme can lead to
Cross voice signal and identify corresponding affective style, improved service level with exercising supervision to attendant.
Alternatively, Fig. 2 is another flow chart of emotion identification method provided in an embodiment of the present invention, as shown in Fig. 2, step
Before 200, emotion identification method provided in an embodiment of the present invention further includes:
Step 400, pre-process current speech signal.
Specifically, the pretreatment in step 400 includes:Ambient noise is eliminated, strengthen useful signal or splits current language
Sound signal etc., it is necessary to explanation, segmentation current speech signal can by signal adding window framing, such as with the long 25ms of window,
The Hamming window (i.e. each frame voice duration 25ms, pane moving step length 10ms) that window moves 10ms is realized.
Alternatively, after step 300, emotion identification method provided in an embodiment of the present invention further includes:
Step 500, according to affective style, activate corresponding default counte-rplan.
Specifically, step 500 includes:In the state of affective style is front or neutrality, attendant is encouraged after continuation of insurance
Hold, in the state of affective style is negative, activate default counte-rplan, wherein, counte-rplan include but not limited to following
It is several:(1) and alarm, prompting remind attendant to pay attention to attitude, and alternatively, alarm includes text importing, buzzing, language
Sound report etc.;(2) the corresponding current speech signal of negative emotion is collected there are high in the clouds, service quality assessment is done for service organization
And improvement;(3) timed message pushes, and After Hours the quality of service information of attendant is pushed on his mobile phone daily, is allowed
He integrates service scenario on the day of understanding is controlled oneself, to further improve service level.
Alternatively, Fig. 3 is the flow chart of step 300 provided in an embodiment of the present invention, as shown in figure 3, step 300 includes:
Step 301, according to acoustic feature and text feature, obtain special for the acoustic feature information and text of emotion recognition
Reference ceases.
Specifically, step 301 includes:Acoustic feature and text feature are separately converted to corresponding vector;Acoustics is special
Levy the corresponding vector of corresponding vector sum text feature and input convolutional neural networks respectively, obtain special for the acoustics of emotion recognition
Reference ceases and text feature information.
Step 302, according to acoustic feature information, it is vectorial to obtain K acoustic feature.
Specifically, step 302 includes:By acoustic feature information pool, K acoustic feature vector is obtained.
Step 303, according to K acoustic feature vector sum text feature information, obtain K Text eigenvector.
Specifically, step 303 includes:Focusing machine is used to text feature information according to the average of K acoustic feature vector
System focuses on;By the text feature information pool after focusing, K Text eigenvector is obtained.
It should be noted that use focus mechanism to distribute different weights for different texts, such as to uncultivated words
The weight of higher is distributed, influences the judgement of emotion, popular says, such as the character representation of such as convolutional Neural output is currently spoken
The attitude of person is very rude and unreasonable, and the focus mechanism of convolutional neural networks can give " uncultivated words " (such as wretch, fool) distribution more
High weight, such as the attitude of the character representation current speaker of convolutional Neural output are very gentle, the focusing of convolutional neural networks
Mechanism would not give " uncultivated words " (such as wretch, fool) weight for distributing higher.
Specifically, the focus mechanism of text feature information is as follows:Weight is distributed for text feature information, wherein, weight is
Determined according to K acoustic feature vector.
Especially, if in t moment, text feature information is ha(t), acoustic feature information is Oq, each text feature letter
It is changed into after ceasing the effect of the focusing of line focus mechanism
mA, q(t)=tanh (Wamha(t)+WqmOq)
Wherein, Wam, Wqm, WmsIt is focusing parameter, SA, q(t) it is weight,It is to be believed according to the text feature after focusing
Breath.
Step 304, according to K acoustic feature vector, K Text eigenvector and predetermined depth model, identify current language
The affective style of sound signal.
Specifically, step 304 specifically includes:Logic is carried out to K speech feature vector and K Text eigenvector to return
Return, according to the K speech feature vector and K Text eigenvector and depth model after logistic regression, identify current speech
The affective style of signal.
Below by the operation principle for illustrating the embodiment of the present invention:Worked as by microphone or microphone array
Preceding voice signal;Current speech information is pre-processed;Extract the acoustic feature of current speech signal and pass through speech recognition
Technology extracts the text feature of current speech signal, and acoustic feature and text feature are separately converted to corresponding vector;By sound
Learn the corresponding vector of the corresponding vector sum text feature of feature and input convolutional neural networks respectively, obtain the sound for emotion recognition
Learn characteristic information and text feature information;By acoustic feature information pool, K acoustic feature vector is obtained;It is special according to K acoustics
The average for levying vector focuses on text feature information using focus mechanism;By the text feature information pool after focusing, K are obtained
Text eigenvector;Logistic regression is carried out to K speech feature vector and K Text eigenvector, according to the K after logistic regression
A speech feature vector and K Text eigenvector and depth model, identify the affective style of current speech signal;According to feelings
Feel type, activate corresponding default counte-rplan.
Embodiment two
Inventive concept based on above-described embodiment, Fig. 4 are a knot of emotion recognition system provided in an embodiment of the present invention
Structure schematic diagram, as shown in figure 4, emotion recognition system provided in an embodiment of the present invention includes:Voice acquisition module 10, feature extraction
Module 20 and emotion recognition module 30.
In the present embodiment, voice acquisition module 10, is configured as obtaining current speech signal;Characteristic extracting module 20,
It is configured as the phonetic feature of extraction current speech signal;Emotion recognition module 30, is configured as according to phonetic feature and presets
Depth model, the corresponding affective style of identification current speech signal.
Alternatively, acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum, wherein, fundamental frequency determines tone height, by certainly
Related algorithm extracts fundamental frequency feature;Duration is related to word speed, and the unvoiced information in current speech signal is also for emotion recognition
It is valuable, duration characteristics are extracted by Visual Speech instruments;Energy is related with amplitude, can pass through existing technology
Extract energy feature and spectrum signature.
Alternatively, text feature is the text message in current speech signal, passes through speech recognition technology such as University of Science and Technology
The winged Auto-Speech Recognition extraction text features of news.
Wherein, affective style includes:It is front, neutral and negative, it is necessary to explanation, positive affective style can make by
Attendant is pleasant, and neutral affective style will not be to being had an impact by the mood of attendant, and negative affective style is just
It can make to be felt ill by attendant.For same a word, such as " you are fool ", it may be possible to a people in the friend that talks in professional jargon,
It is likely to be and ridicules opponent, the possible front of emotion may also be negative.
Alternatively, emotion recognition system provided in an embodiment of the present invention can be applied to the public affairs such as bus, the home for the aged, hospital
Occasion altogether.
Emotion recognition system provided in an embodiment of the present invention, including:Voice acquisition module, is configured as obtaining current speech
Signal;Characteristic extracting module is configured as the phonetic feature of extraction current speech signal, and phonetic feature includes:Acoustic feature and text
Eigen;Emotion recognition module is configured as corresponding according to phonetic feature and predetermined depth model, identification current speech signal
Affective style, affective style include:Positive, neutral and negative, technical scheme can be identified by voice signal
Corresponding affective style, is improved service level with exercising supervision to attendant.
Alternatively, Fig. 5 is another structure diagram of emotion recognition system provided in an embodiment of the present invention, as shown in figure 5,
System provided in an embodiment of the present invention further includes:Signal pre-processing module 40 and active module 50.
Signal pre-processing module 40, is configured as pre-processing current speech signal.
Specifically, pretreatment includes:Ambient noise is eliminated, strengthen useful signal or splits current speech signal etc., is needed
It is noted that segmentation current speech signal can be by signal adding window framing, for example move with the long 25ms of window, window the Chinese of 10ms
Bright window (i.e. each frame voice duration 25ms, pane moving step length 10ms) is realized.
Active module 50, is configured as, according to affective style, activating corresponding default counte-rplan.
Specifically, active module 50 encourages attendant to continue to keep in the state of affective style is front or neutrality,
In the state of affective style is negative, default counte-rplan are activated, wherein, counte-rplan include but not limited to following several
Kind:(1) and alarm, prompting remind attendant to pay attention to attitude, and alternatively, alarm includes text importing, buzzing, voice
Report etc.;(2) the corresponding current speech signal of negative emotion is collected there are high in the clouds, for service organization do service quality assessment and
Improve;(3) timed message pushes, and After Hours the quality of service information of attendant is pushed on his mobile phone daily, allows him
Service scenario on the day of comprehensive understanding is controlled oneself, to further improve service level.
Alternatively, Fig. 6 is the structure diagram of emotion recognition module provided in an embodiment of the present invention, as shown in fig. 6, emotion
Identification module includes:First obtains unit 31, the second obtaining unit 32 and emotion recognition unit 33.
First obtains unit 31, is configured as, according to acoustic feature and text feature, obtaining the acoustics for emotion recognition
Characteristic information and text feature information, specifically include:Acoustic feature and text feature are separately converted to corresponding vector;By sound
Learn the corresponding vector of the corresponding vector sum text feature of feature and input convolutional neural networks respectively, obtain the sound for emotion recognition
Learn characteristic information and text feature information;Acoustic feature includes:;
Second obtaining unit 31, is configured as according to acoustic feature information, obtains K acoustic feature vector, specifically includes:
By acoustic feature information pool, K acoustic feature vector is obtained;It is additionally configured to special according to K acoustic feature vector sum text
Reference ceases, and obtains K Text eigenvector, specifically includes:Text feature information is adopted according to the average of K acoustic feature vector
Focused on focus mechanism;By the text feature information pool after focusing, K Text eigenvector is obtained;
Emotion recognition unit 33, is configured as according to K acoustic feature vector, K Text eigenvector and predetermined depth
Model, identifies the affective style of current speech signal.
It will be appreciated by those skilled in the art that the modules or unit that include for above-described embodiment two are simply according to function
What logic was divided, but above-mentioned division is not limited to, as long as corresponding function can be realized;In addition, each function
The specific name of unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.
Those of ordinary skill in the art are further appreciated that all or part of step realized in above-described embodiment method is can
Completed with performing relevant hardware by program, the program can be stored in a computer read/write memory medium
In, the storage medium, including:ROM/RAM, disk, CD etc..
Although disclosed herein embodiment as above, the content be only readily appreciate the present invention and use
Embodiment, is not limited to the present invention.Technical staff in any fields of the present invention, is taken off not departing from the present invention
On the premise of the spirit and scope of dew, any modification and change, but the present invention can be carried out in the form and details of implementation
Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.
Claims (10)
- A kind of 1. emotion identification method, it is characterised in that including:Obtain current speech signal;The phonetic feature of current speech signal is extracted, the phonetic feature includes:Acoustic feature and text feature;According to the phonetic feature and predetermined depth model, the corresponding affective style of the current speech signal, the feelings are identified Sense type includes:It is positive, neutral and negative.
- 2. according to the method described in claim 1, it is characterized in that, it is described extraction current speech signal phonetic feature before, The method further includes:The current speech signal is pre-processed.
- 3. method according to claim 1 or 2, it is characterised in that described to identify the corresponding feelings of the current speech signal After feeling type, the method further includes:According to the affective style, corresponding default counte-rplan are activated.
- 4. according to the method described in claim 1, it is characterized in that, the acoustic feature includes:Fundamental frequency, duration, energy and frequency Spectrum.
- 5. according to the method described in claim 1, it is characterized in that, described according to the phonetic feature and predetermined depth model, Identify that the corresponding affective style of the current speech signal includes:According to acoustic feature and text feature, the acoustic feature information and text feature information for emotion recognition are obtained;According to the acoustic feature information, K acoustic feature vector is obtained;According to K acoustic feature vector sum text feature information, K Text eigenvector is obtained;According to K acoustic feature vector, K Text eigenvector and predetermined depth model, the emotion of current speech signal is identified Type.
- 6. according to the method described in claim 5, it is characterized in that, described according to acoustic feature and text feature, it is used for The acoustic feature information and text feature information of emotion recognition include:Acoustic feature and text feature are separately converted to corresponding vector;The corresponding vector of the corresponding vector sum text feature of acoustic feature is inputted into convolutional neural networks respectively, acquisition is used for emotion The acoustic feature information and text feature information of identification.
- 7. the method according to claim 5 or 6, it is characterised in that it is described according to the acoustic feature information, obtain K Acoustic feature vector includes:By the acoustic feature information pool, K acoustic feature vector is obtained;It is described to be included according to K acoustic feature vector sum text feature information, K Text eigenvector of acquisition:Text feature information is focused on using focus mechanism according to the average of K acoustic feature vector;By the text feature information pool after focusing, K Text eigenvector is obtained.
- A kind of 8. emotion recognition system, it is characterised in that including:Voice acquisition module, is configured as obtaining current speech signal;Characteristic extracting module, is configured as the phonetic feature of extraction current speech signal, and the phonetic feature includes:Acoustic feature And text feature;Emotion recognition module, is configured as, according to the phonetic feature and predetermined depth model, identifying the current speech signal Corresponding affective style, the affective style include:It is positive, neutral and negative.
- 9. system according to claim 8, it is characterised in that the system also includes:Signal pre-processing module and activation Module;The signal pre-processing module, is configured as pre-processing the current speech signal;The active module, is configured as, according to the affective style, activating corresponding default counte-rplan.
- 10. system according to claim 8, it is characterised in that the emotion recognition module includes:First obtains unit, is configured as according to acoustic feature and text feature, obtains and believes for the acoustic feature of emotion recognition Breath and text feature information, specifically include:Acoustic feature and text feature are separately converted to corresponding vector;By acoustic feature The corresponding vector of corresponding vector sum text feature inputs convolutional neural networks respectively, obtains the acoustic feature for emotion recognition Information and text feature information;The acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum;Second obtaining unit, is configured as according to the acoustic feature information, obtains K acoustic feature vector, specifically includes:Will The acoustic feature information pool, obtains K acoustic feature vector;It is additionally configured to according to K acoustic feature vector sum text Characteristic information, obtains K Text eigenvector, specifically includes:According to the average of K acoustic feature vector to text feature information Focused on using focus mechanism;By the text feature information pool after focusing, K Text eigenvector is obtained;Emotion recognition unit, is configured as, according to K acoustic feature vector, K Text eigenvector and predetermined depth model, knowing The affective style of other current speech signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810007403.7A CN107945790B (en) | 2018-01-03 | 2018-01-03 | Emotion recognition method and emotion recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810007403.7A CN107945790B (en) | 2018-01-03 | 2018-01-03 | Emotion recognition method and emotion recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107945790A true CN107945790A (en) | 2018-04-20 |
CN107945790B CN107945790B (en) | 2021-01-26 |
Family
ID=61938328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810007403.7A Active CN107945790B (en) | 2018-01-03 | 2018-01-03 | Emotion recognition method and emotion recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107945790B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108833722A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN109192225A (en) * | 2018-09-28 | 2019-01-11 | 清华大学 | The method and device of speech emotion recognition and mark |
CN109243490A (en) * | 2018-10-11 | 2019-01-18 | 平安科技(深圳)有限公司 | Driver's Emotion identification method and terminal device |
CN109410986A (en) * | 2018-11-21 | 2019-03-01 | 咪咕数字传媒有限公司 | A kind of Emotion identification method, apparatus and storage medium |
CN109741732A (en) * | 2018-08-30 | 2019-05-10 | 京东方科技集团股份有限公司 | Name entity recognition method, name entity recognition device, equipment and medium |
CN110047517A (en) * | 2019-04-24 | 2019-07-23 | 京东方科技集团股份有限公司 | Speech-emotion recognition method, answering method and computer equipment |
CN110473571A (en) * | 2019-07-26 | 2019-11-19 | 北京影谱科技股份有限公司 | Emotion identification method and device based on short video speech |
CN110600033A (en) * | 2019-08-26 | 2019-12-20 | 北京大米科技有限公司 | Learning condition evaluation method and device, storage medium and electronic equipment |
CN110660412A (en) * | 2018-06-28 | 2020-01-07 | Tcl集团股份有限公司 | Emotion guiding method and device and terminal equipment |
CN110728983A (en) * | 2018-07-16 | 2020-01-24 | 科大讯飞股份有限公司 | Information display method, device, equipment and readable storage medium |
CN111128189A (en) * | 2019-12-30 | 2020-05-08 | 秒针信息技术有限公司 | Warning information prompting method and device |
CN111354361A (en) * | 2018-12-21 | 2020-06-30 | 深圳市优必选科技有限公司 | Emotion communication method and system and robot |
US11810596B2 (en) | 2021-08-16 | 2023-11-07 | Hong Kong Applied Science and Technology Research Institute Company Limited | Apparatus and method for speech-emotion recognition with quantified emotional states |
CN110728983B (en) * | 2018-07-16 | 2024-04-30 | 科大讯飞股份有限公司 | Information display method, device, equipment and readable storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1391876A1 (en) * | 2002-08-14 | 2004-02-25 | Sony International (Europe) GmbH | Method of determining phonemes in spoken utterances suitable for recognizing emotions using voice quality features |
EP1429314A1 (en) * | 2002-12-13 | 2004-06-16 | Sony International (Europe) GmbH | Correction of energy as input feature for speech processing |
JP2005283647A (en) * | 2004-03-26 | 2005-10-13 | Matsushita Electric Ind Co Ltd | Feeling recognition device |
CN101894550A (en) * | 2010-07-19 | 2010-11-24 | 东南大学 | Speech emotion classifying method for emotion-based characteristic optimization |
US20130080169A1 (en) * | 2011-09-27 | 2013-03-28 | Fuji Xerox Co., Ltd. | Audio analysis system, audio analysis apparatus, audio analysis terminal |
CN104050965A (en) * | 2013-09-02 | 2014-09-17 | 广东外语外贸大学 | English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof |
KR20160116586A (en) * | 2015-03-30 | 2016-10-10 | 한국전자통신연구원 | Method and apparatus for emotion recognition |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
WO2017048730A1 (en) * | 2015-09-14 | 2017-03-23 | Cogito Corporation | Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices |
US20170140757A1 (en) * | 2011-04-22 | 2017-05-18 | Angel A. Penilla | Methods and vehicles for processing voice commands and moderating vehicle response |
CN106782615A (en) * | 2016-12-20 | 2017-05-31 | 科大讯飞股份有限公司 | Speech data emotion detection method and apparatus and system |
CN107112006A (en) * | 2014-10-02 | 2017-08-29 | 微软技术许可有限责任公司 | Speech processes based on neutral net |
JP6213476B2 (en) * | 2012-10-31 | 2017-10-18 | 日本電気株式会社 | Dissatisfied conversation determination device and dissatisfied conversation determination method |
CN107516511A (en) * | 2016-06-13 | 2017-12-26 | 微软技术许可有限责任公司 | The Text To Speech learning system of intention assessment and mood |
-
2018
- 2018-01-03 CN CN201810007403.7A patent/CN107945790B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1391876A1 (en) * | 2002-08-14 | 2004-02-25 | Sony International (Europe) GmbH | Method of determining phonemes in spoken utterances suitable for recognizing emotions using voice quality features |
EP1429314A1 (en) * | 2002-12-13 | 2004-06-16 | Sony International (Europe) GmbH | Correction of energy as input feature for speech processing |
JP2005283647A (en) * | 2004-03-26 | 2005-10-13 | Matsushita Electric Ind Co Ltd | Feeling recognition device |
CN101894550A (en) * | 2010-07-19 | 2010-11-24 | 东南大学 | Speech emotion classifying method for emotion-based characteristic optimization |
US20170140757A1 (en) * | 2011-04-22 | 2017-05-18 | Angel A. Penilla | Methods and vehicles for processing voice commands and moderating vehicle response |
US20130080169A1 (en) * | 2011-09-27 | 2013-03-28 | Fuji Xerox Co., Ltd. | Audio analysis system, audio analysis apparatus, audio analysis terminal |
JP6213476B2 (en) * | 2012-10-31 | 2017-10-18 | 日本電気株式会社 | Dissatisfied conversation determination device and dissatisfied conversation determination method |
CN104050965A (en) * | 2013-09-02 | 2014-09-17 | 广东外语外贸大学 | English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof |
CN107112006A (en) * | 2014-10-02 | 2017-08-29 | 微软技术许可有限责任公司 | Speech processes based on neutral net |
KR20160116586A (en) * | 2015-03-30 | 2016-10-10 | 한국전자통신연구원 | Method and apparatus for emotion recognition |
WO2017048730A1 (en) * | 2015-09-14 | 2017-03-23 | Cogito Corporation | Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices |
CN107516511A (en) * | 2016-06-13 | 2017-12-26 | 微软技术许可有限责任公司 | The Text To Speech learning system of intention assessment and mood |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN106782615A (en) * | 2016-12-20 | 2017-05-31 | 科大讯飞股份有限公司 | Speech data emotion detection method and apparatus and system |
Non-Patent Citations (3)
Title |
---|
DAVID GRIOL: "Combing speech-based and linguistic classifiers to recognize emotion in user spoken utterances", 《NEUROCOMPUTING》 * |
朱从贤: "基于深度学习的语音情感识别方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
李承程: "基于深度学习的文本语音耦合情感识别方法研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108833722B (en) * | 2018-05-29 | 2021-05-11 | 平安科技(深圳)有限公司 | Speech recognition method, speech recognition device, computer equipment and storage medium |
CN108833722A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN110660412A (en) * | 2018-06-28 | 2020-01-07 | Tcl集团股份有限公司 | Emotion guiding method and device and terminal equipment |
CN110728983B (en) * | 2018-07-16 | 2024-04-30 | 科大讯飞股份有限公司 | Information display method, device, equipment and readable storage medium |
CN110728983A (en) * | 2018-07-16 | 2020-01-24 | 科大讯飞股份有限公司 | Information display method, device, equipment and readable storage medium |
CN109741732A (en) * | 2018-08-30 | 2019-05-10 | 京东方科技集团股份有限公司 | Name entity recognition method, name entity recognition device, equipment and medium |
CN109741732B (en) * | 2018-08-30 | 2022-06-21 | 京东方科技集团股份有限公司 | Named entity recognition method, named entity recognition device, equipment and medium |
WO2020043123A1 (en) * | 2018-08-30 | 2020-03-05 | 京东方科技集团股份有限公司 | Named-entity recognition method, named-entity recognition apparatus and device, and medium |
CN109192225A (en) * | 2018-09-28 | 2019-01-11 | 清华大学 | The method and device of speech emotion recognition and mark |
CN109243490A (en) * | 2018-10-11 | 2019-01-18 | 平安科技(深圳)有限公司 | Driver's Emotion identification method and terminal device |
CN109410986A (en) * | 2018-11-21 | 2019-03-01 | 咪咕数字传媒有限公司 | A kind of Emotion identification method, apparatus and storage medium |
CN109410986B (en) * | 2018-11-21 | 2021-08-06 | 咪咕数字传媒有限公司 | Emotion recognition method and device and storage medium |
CN111354361A (en) * | 2018-12-21 | 2020-06-30 | 深圳市优必选科技有限公司 | Emotion communication method and system and robot |
CN110047517A (en) * | 2019-04-24 | 2019-07-23 | 京东方科技集团股份有限公司 | Speech-emotion recognition method, answering method and computer equipment |
CN110473571A (en) * | 2019-07-26 | 2019-11-19 | 北京影谱科技股份有限公司 | Emotion identification method and device based on short video speech |
CN110600033B (en) * | 2019-08-26 | 2022-04-05 | 北京大米科技有限公司 | Learning condition evaluation method and device, storage medium and electronic equipment |
CN110600033A (en) * | 2019-08-26 | 2019-12-20 | 北京大米科技有限公司 | Learning condition evaluation method and device, storage medium and electronic equipment |
CN111128189A (en) * | 2019-12-30 | 2020-05-08 | 秒针信息技术有限公司 | Warning information prompting method and device |
US11810596B2 (en) | 2021-08-16 | 2023-11-07 | Hong Kong Applied Science and Technology Research Institute Company Limited | Apparatus and method for speech-emotion recognition with quantified emotional states |
Also Published As
Publication number | Publication date |
---|---|
CN107945790B (en) | 2021-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107945790A (en) | A kind of emotion identification method and emotion recognition system | |
CN108564942B (en) | Voice emotion recognition method and system based on adjustable sensitivity | |
CN105096941B (en) | Audio recognition method and device | |
CN109256136B (en) | Voice recognition method and device | |
Nwe et al. | Speech based emotion classification | |
CN102723078B (en) | Emotion speech recognition method based on natural language comprehension | |
CN106504768B (en) | Phone testing audio frequency classification method and device based on artificial intelligence | |
CN108806667A (en) | The method for synchronously recognizing of voice and mood based on neural network | |
CN110265040A (en) | Training method, device, storage medium and the electronic equipment of sound-groove model | |
CN107492382A (en) | Voiceprint extracting method and device based on neutral net | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
CN104538043A (en) | Real-time emotion reminder for call | |
CN111260761B (en) | Method and device for generating mouth shape of animation character | |
Fan et al. | End-to-end post-filter for speech separation with deep attention fusion features | |
Samantaray et al. | A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages | |
CN111144367B (en) | Auxiliary semantic recognition method based on gesture recognition | |
CN108711429A (en) | Electronic equipment and apparatus control method | |
EP1280137B1 (en) | Method for speaker identification | |
CN109599094A (en) | The method of sound beauty and emotion modification | |
CN106653002A (en) | Literal live broadcasting method and platform | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
Sinha et al. | Acoustic-phonetic feature based dialect identification in Hindi Speech | |
CN110246518A (en) | Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features | |
CN114283820A (en) | Multi-character voice interaction method, electronic equipment and storage medium | |
Luong et al. | LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |