CN107945790A - A kind of emotion identification method and emotion recognition system - Google Patents

A kind of emotion identification method and emotion recognition system Download PDF

Info

Publication number
CN107945790A
CN107945790A CN201810007403.7A CN201810007403A CN107945790A CN 107945790 A CN107945790 A CN 107945790A CN 201810007403 A CN201810007403 A CN 201810007403A CN 107945790 A CN107945790 A CN 107945790A
Authority
CN
China
Prior art keywords
feature
text
acoustic feature
speech signal
current speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810007403.7A
Other languages
Chinese (zh)
Other versions
CN107945790B (en
Inventor
王雪云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN201810007403.7A priority Critical patent/CN107945790B/en
Publication of CN107945790A publication Critical patent/CN107945790A/en
Application granted granted Critical
Publication of CN107945790B publication Critical patent/CN107945790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The embodiment of the invention discloses a kind of emotion identification method and emotion recognition system, wherein, this method includes:Obtain current speech signal;The phonetic feature of current speech signal is extracted, phonetic feature includes:Acoustic feature and text feature;According to phonetic feature and predetermined depth model, the corresponding affective style of identification current speech signal, affective style includes:Positive, neutral and negative, technical scheme can identify corresponding affective style by voice signal, be improved service level with exercising supervision to attendant.

Description

A kind of emotion identification method and emotion recognition system
Technical field
The present embodiments relate to field of communication technology, and in particular to a kind of emotion identification method and emotion recognition system.
Background technology
In interpersonal exchange, language is one of most natural and important means.Carried secretly in the speech of speaker Emotion can to the mood of people around produce strong influence, wherein, emotion includes:Front and negative, especially attendant, For example, in public arenas such as bus, the home for the aged or hospitals, if attendant behaves badly, the tone is arrogant, and language is vulgar, That is emotion is negative that will be unfavorable for social harmony to causing deleterious effect by attendant and improve Happiness Index.
Study and find through inventor, can be judged currently without a kind of effective technological means by the speech of attendant Go out its corresponding emotion, improved service level with being supervised to it.
The content of the invention
In order to solve the above-mentioned technical problem, an embodiment of the present invention provides a kind of emotion identification method and emotion recognition system System, can identify corresponding emotion by voice signal.
In one aspect, an embodiment of the present invention provides a kind of emotion identification method, including:
Obtain current speech signal;
The phonetic feature of current speech signal is extracted, the phonetic feature includes:Acoustic feature and text feature;
According to the phonetic feature and predetermined depth model, the corresponding affective style of the current speech signal, institute are identified Stating affective style includes:It is positive, neutral and negative.
Alternatively, before the phonetic feature of the extraction current speech signal, the method further includes:
The current speech signal is pre-processed.
Alternatively, after the corresponding affective style of the identification current speech signal, the method further includes:
According to the affective style, corresponding default counte-rplan are activated.
Alternatively, the acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum.
Alternatively, it is described according to the phonetic feature and predetermined depth model, identify that the current speech signal is corresponding Affective style includes:
According to acoustic feature and text feature, the acoustic feature information and text feature information for emotion recognition are obtained;
According to the acoustic feature information, K acoustic feature vector is obtained;
According to K acoustic feature vector sum text feature information, K Text eigenvector is obtained;
According to K acoustic feature vector, K Text eigenvector and predetermined depth model, current speech signal is identified Affective style.
Alternatively, it is described according to acoustic feature and text feature, obtain the acoustic feature information and text for emotion recognition Eigen information includes:
Acoustic feature and text feature are separately converted to corresponding vector;
The corresponding vector of the corresponding vector sum text feature of acoustic feature is inputted into convolutional neural networks respectively, is used for The acoustic feature information and text feature information of emotion recognition.
Alternatively, described according to the acoustic feature information, obtaining K acoustic feature vector includes:
By the acoustic feature information pool, K acoustic feature vector is obtained;
It is described to be included according to K acoustic feature vector sum text feature information, K Text eigenvector of acquisition:
Text feature information is focused on using focus mechanism according to the average of K acoustic feature vector;
By the text feature information pool after focusing, K Text eigenvector is obtained.
On the other hand, the embodiment of the present invention also provides a kind of emotion recognition system, including:
Voice acquisition module, is configured as obtaining current speech signal;
Characteristic extracting module, is configured as the phonetic feature of extraction current speech signal, and the phonetic feature includes:Acoustics Feature and text feature;
Emotion recognition module, is configured as, according to the phonetic feature and predetermined depth model, identifying the current speech The corresponding affective style of signal, the affective style include:It is positive, neutral and negative.
Alternatively, the system also includes:Signal pre-processing module and active module;
The signal pre-processing module, is configured as pre-processing the current speech signal;
The active module, is configured as, according to the affective style, activating corresponding default counte-rplan.
Alternatively, the emotion recognition module includes:
First obtains unit, is configured as according to acoustic feature and text feature, obtains special for the acoustics of emotion recognition Reference ceases and text feature information, specifically includes:Acoustic feature and text feature are separately converted to corresponding vector;By acoustics The corresponding vector of the corresponding vector sum text feature of feature inputs convolutional neural networks respectively, obtains the acoustics for emotion recognition Characteristic information and text feature information;The acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum;
Second obtaining unit, is configured as according to the acoustic feature information, obtains K acoustic feature vector, specific bag Include:By the acoustic feature information pool, K acoustic feature vector is obtained;It is additionally configured to according to K acoustic feature vector sum Text feature information, obtains K Text eigenvector, specifically includes:According to the average of K acoustic feature vector to text feature Information is focused on using focus mechanism;By the text feature information pool after focusing, K Text eigenvector is obtained;
Emotion recognition unit, is configured as according to K acoustic feature vector, K Text eigenvector and predetermined depth mould Type, identifies the affective style of current speech signal.
The embodiment of the present invention provides a kind of emotion identification method and emotion recognition system, wherein, this method includes:Obtain and work as Preceding voice signal;The phonetic feature of current speech signal is extracted, phonetic feature includes:Acoustic feature and text feature;According to institute Phonetic feature and predetermined depth model are stated, identifies the corresponding affective style of the current speech signal, the affective style includes: Positive, neutral and negative, technical scheme can identify corresponding affective style by voice signal, with to service Personnel, which exercise supervision, to improve service level.
Certainly, implement any of the products of the present invention or method it is not absolutely required to reach all the above excellent at the same time Point.Other features and advantages of the present invention will illustrate in subsequent specification embodiment, also, partly implement from specification Become apparent in example, or understood by implementing the present invention.The purpose of the embodiment of the present invention and other advantages can pass through Specifically noted structure is realized and obtained in specification, claims and attached drawing.
Brief description of the drawings
Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this The embodiment of application is used to explain technical scheme together, does not form the limitation to technical solution of the present invention.
Fig. 1 is a flow chart of emotion identification method provided in an embodiment of the present invention;
Fig. 2 is another flow chart of emotion identification method provided in an embodiment of the present invention;
Fig. 3 is the flow chart of step 300 provided in an embodiment of the present invention;
Fig. 4 is a structure diagram of emotion recognition system provided in an embodiment of the present invention;
Fig. 5 is another structure diagram of emotion recognition system provided in an embodiment of the present invention;
Fig. 6 is the structure diagram of emotion recognition module provided in an embodiment of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the case where there is no conflict, in the embodiment and embodiment in the application Feature can mutually be combined.
In order to illustrate the technical solution described in the embodiment of the present invention, illustrated below by specific embodiment.
Embodiment one
Fig. 1 is a flow chart of emotion identification method provided in an embodiment of the present invention, as shown in Figure 1, the present invention is implemented The emotion identification method that example provides specifically includes following steps:
Step 100, obtain current speech signal.
Specifically, step 100 obtains voice signal by microphone or microphone array.
Step 200, the phonetic feature for extracting current speech signal.
Wherein, phonetic feature includes:Acoustic feature and text feature.
Alternatively, acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum, wherein, fundamental frequency determines tone height, by certainly Related algorithm extracts fundamental frequency feature;Duration is related to word speed, and the unvoiced information in current speech signal is also for emotion recognition It is valuable, duration characteristics are extracted by Visual Speech instruments;Energy is related with amplitude, can pass through existing technology Extract energy feature and spectrum signature.
Alternatively, text feature is the text message in current speech signal, passes through speech recognition technology such as University of Science and Technology The winged Auto-Speech Recognition extraction text features of news.
Step 300, according to phonetic feature and predetermined depth model, the corresponding affective style of identification current speech signal.
Wherein, affective style includes:It is front, neutral and negative, it is necessary to explanation, positive affective style can make by Attendant is pleasant, and neutral affective style will not be to being had an impact by the mood of attendant, and negative affective style is just It can make to be felt ill by attendant.For same a word, such as " you are fool ", it may be possible to a people in the friend that talks in professional jargon, It is likely to be and ridicules opponent, the possible front of emotion may also be negative.
It should be noted that predetermined depth model is largely trained by sample database so that the feelings identified The accuracy rate for feeling type is higher.
Alternatively, emotion identification method provided in an embodiment of the present invention can be applied to the public affairs such as bus, the home for the aged, hospital Occasion altogether.
Emotion identification method provided in an embodiment of the present invention, including:Obtain current speech signal;Extract current speech signal Phonetic feature, phonetic feature includes:Acoustic feature and text feature;According to phonetic feature and predetermined depth model, identification is worked as The corresponding affective style of preceding voice signal, affective style include:Positive, neutral and negative, technical scheme can lead to Cross voice signal and identify corresponding affective style, improved service level with exercising supervision to attendant.
Alternatively, Fig. 2 is another flow chart of emotion identification method provided in an embodiment of the present invention, as shown in Fig. 2, step Before 200, emotion identification method provided in an embodiment of the present invention further includes:
Step 400, pre-process current speech signal.
Specifically, the pretreatment in step 400 includes:Ambient noise is eliminated, strengthen useful signal or splits current language Sound signal etc., it is necessary to explanation, segmentation current speech signal can by signal adding window framing, such as with the long 25ms of window, The Hamming window (i.e. each frame voice duration 25ms, pane moving step length 10ms) that window moves 10ms is realized.
Alternatively, after step 300, emotion identification method provided in an embodiment of the present invention further includes:
Step 500, according to affective style, activate corresponding default counte-rplan.
Specifically, step 500 includes:In the state of affective style is front or neutrality, attendant is encouraged after continuation of insurance Hold, in the state of affective style is negative, activate default counte-rplan, wherein, counte-rplan include but not limited to following It is several:(1) and alarm, prompting remind attendant to pay attention to attitude, and alternatively, alarm includes text importing, buzzing, language Sound report etc.;(2) the corresponding current speech signal of negative emotion is collected there are high in the clouds, service quality assessment is done for service organization And improvement;(3) timed message pushes, and After Hours the quality of service information of attendant is pushed on his mobile phone daily, is allowed He integrates service scenario on the day of understanding is controlled oneself, to further improve service level.
Alternatively, Fig. 3 is the flow chart of step 300 provided in an embodiment of the present invention, as shown in figure 3, step 300 includes:
Step 301, according to acoustic feature and text feature, obtain special for the acoustic feature information and text of emotion recognition Reference ceases.
Specifically, step 301 includes:Acoustic feature and text feature are separately converted to corresponding vector;Acoustics is special Levy the corresponding vector of corresponding vector sum text feature and input convolutional neural networks respectively, obtain special for the acoustics of emotion recognition Reference ceases and text feature information.
Step 302, according to acoustic feature information, it is vectorial to obtain K acoustic feature.
Specifically, step 302 includes:By acoustic feature information pool, K acoustic feature vector is obtained.
Step 303, according to K acoustic feature vector sum text feature information, obtain K Text eigenvector.
Specifically, step 303 includes:Focusing machine is used to text feature information according to the average of K acoustic feature vector System focuses on;By the text feature information pool after focusing, K Text eigenvector is obtained.
It should be noted that use focus mechanism to distribute different weights for different texts, such as to uncultivated words The weight of higher is distributed, influences the judgement of emotion, popular says, such as the character representation of such as convolutional Neural output is currently spoken The attitude of person is very rude and unreasonable, and the focus mechanism of convolutional neural networks can give " uncultivated words " (such as wretch, fool) distribution more High weight, such as the attitude of the character representation current speaker of convolutional Neural output are very gentle, the focusing of convolutional neural networks Mechanism would not give " uncultivated words " (such as wretch, fool) weight for distributing higher.
Specifically, the focus mechanism of text feature information is as follows:Weight is distributed for text feature information, wherein, weight is Determined according to K acoustic feature vector.
Especially, if in t moment, text feature information is ha(t), acoustic feature information is Oq, each text feature letter It is changed into after ceasing the effect of the focusing of line focus mechanism
mA, q(t)=tanh (Wamha(t)+WqmOq)
Wherein, Wam, Wqm, WmsIt is focusing parameter, SA, q(t) it is weight,It is to be believed according to the text feature after focusing Breath.
Step 304, according to K acoustic feature vector, K Text eigenvector and predetermined depth model, identify current language The affective style of sound signal.
Specifically, step 304 specifically includes:Logic is carried out to K speech feature vector and K Text eigenvector to return Return, according to the K speech feature vector and K Text eigenvector and depth model after logistic regression, identify current speech The affective style of signal.
Below by the operation principle for illustrating the embodiment of the present invention:Worked as by microphone or microphone array Preceding voice signal;Current speech information is pre-processed;Extract the acoustic feature of current speech signal and pass through speech recognition Technology extracts the text feature of current speech signal, and acoustic feature and text feature are separately converted to corresponding vector;By sound Learn the corresponding vector of the corresponding vector sum text feature of feature and input convolutional neural networks respectively, obtain the sound for emotion recognition Learn characteristic information and text feature information;By acoustic feature information pool, K acoustic feature vector is obtained;It is special according to K acoustics The average for levying vector focuses on text feature information using focus mechanism;By the text feature information pool after focusing, K are obtained Text eigenvector;Logistic regression is carried out to K speech feature vector and K Text eigenvector, according to the K after logistic regression A speech feature vector and K Text eigenvector and depth model, identify the affective style of current speech signal;According to feelings Feel type, activate corresponding default counte-rplan.
Embodiment two
Inventive concept based on above-described embodiment, Fig. 4 are a knot of emotion recognition system provided in an embodiment of the present invention Structure schematic diagram, as shown in figure 4, emotion recognition system provided in an embodiment of the present invention includes:Voice acquisition module 10, feature extraction Module 20 and emotion recognition module 30.
In the present embodiment, voice acquisition module 10, is configured as obtaining current speech signal;Characteristic extracting module 20, It is configured as the phonetic feature of extraction current speech signal;Emotion recognition module 30, is configured as according to phonetic feature and presets Depth model, the corresponding affective style of identification current speech signal.
Alternatively, acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum, wherein, fundamental frequency determines tone height, by certainly Related algorithm extracts fundamental frequency feature;Duration is related to word speed, and the unvoiced information in current speech signal is also for emotion recognition It is valuable, duration characteristics are extracted by Visual Speech instruments;Energy is related with amplitude, can pass through existing technology Extract energy feature and spectrum signature.
Alternatively, text feature is the text message in current speech signal, passes through speech recognition technology such as University of Science and Technology The winged Auto-Speech Recognition extraction text features of news.
Wherein, affective style includes:It is front, neutral and negative, it is necessary to explanation, positive affective style can make by Attendant is pleasant, and neutral affective style will not be to being had an impact by the mood of attendant, and negative affective style is just It can make to be felt ill by attendant.For same a word, such as " you are fool ", it may be possible to a people in the friend that talks in professional jargon, It is likely to be and ridicules opponent, the possible front of emotion may also be negative.
Alternatively, emotion recognition system provided in an embodiment of the present invention can be applied to the public affairs such as bus, the home for the aged, hospital Occasion altogether.
Emotion recognition system provided in an embodiment of the present invention, including:Voice acquisition module, is configured as obtaining current speech Signal;Characteristic extracting module is configured as the phonetic feature of extraction current speech signal, and phonetic feature includes:Acoustic feature and text Eigen;Emotion recognition module is configured as corresponding according to phonetic feature and predetermined depth model, identification current speech signal Affective style, affective style include:Positive, neutral and negative, technical scheme can be identified by voice signal Corresponding affective style, is improved service level with exercising supervision to attendant.
Alternatively, Fig. 5 is another structure diagram of emotion recognition system provided in an embodiment of the present invention, as shown in figure 5, System provided in an embodiment of the present invention further includes:Signal pre-processing module 40 and active module 50.
Signal pre-processing module 40, is configured as pre-processing current speech signal.
Specifically, pretreatment includes:Ambient noise is eliminated, strengthen useful signal or splits current speech signal etc., is needed It is noted that segmentation current speech signal can be by signal adding window framing, for example move with the long 25ms of window, window the Chinese of 10ms Bright window (i.e. each frame voice duration 25ms, pane moving step length 10ms) is realized.
Active module 50, is configured as, according to affective style, activating corresponding default counte-rplan.
Specifically, active module 50 encourages attendant to continue to keep in the state of affective style is front or neutrality, In the state of affective style is negative, default counte-rplan are activated, wherein, counte-rplan include but not limited to following several Kind:(1) and alarm, prompting remind attendant to pay attention to attitude, and alternatively, alarm includes text importing, buzzing, voice Report etc.;(2) the corresponding current speech signal of negative emotion is collected there are high in the clouds, for service organization do service quality assessment and Improve;(3) timed message pushes, and After Hours the quality of service information of attendant is pushed on his mobile phone daily, allows him Service scenario on the day of comprehensive understanding is controlled oneself, to further improve service level.
Alternatively, Fig. 6 is the structure diagram of emotion recognition module provided in an embodiment of the present invention, as shown in fig. 6, emotion Identification module includes:First obtains unit 31, the second obtaining unit 32 and emotion recognition unit 33.
First obtains unit 31, is configured as, according to acoustic feature and text feature, obtaining the acoustics for emotion recognition Characteristic information and text feature information, specifically include:Acoustic feature and text feature are separately converted to corresponding vector;By sound Learn the corresponding vector of the corresponding vector sum text feature of feature and input convolutional neural networks respectively, obtain the sound for emotion recognition Learn characteristic information and text feature information;Acoustic feature includes:;
Second obtaining unit 31, is configured as according to acoustic feature information, obtains K acoustic feature vector, specifically includes: By acoustic feature information pool, K acoustic feature vector is obtained;It is additionally configured to special according to K acoustic feature vector sum text Reference ceases, and obtains K Text eigenvector, specifically includes:Text feature information is adopted according to the average of K acoustic feature vector Focused on focus mechanism;By the text feature information pool after focusing, K Text eigenvector is obtained;
Emotion recognition unit 33, is configured as according to K acoustic feature vector, K Text eigenvector and predetermined depth Model, identifies the affective style of current speech signal.
It will be appreciated by those skilled in the art that the modules or unit that include for above-described embodiment two are simply according to function What logic was divided, but above-mentioned division is not limited to, as long as corresponding function can be realized;In addition, each function The specific name of unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.
Those of ordinary skill in the art are further appreciated that all or part of step realized in above-described embodiment method is can Completed with performing relevant hardware by program, the program can be stored in a computer read/write memory medium In, the storage medium, including:ROM/RAM, disk, CD etc..
Although disclosed herein embodiment as above, the content be only readily appreciate the present invention and use Embodiment, is not limited to the present invention.Technical staff in any fields of the present invention, is taken off not departing from the present invention On the premise of the spirit and scope of dew, any modification and change, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (10)

  1. A kind of 1. emotion identification method, it is characterised in that including:
    Obtain current speech signal;
    The phonetic feature of current speech signal is extracted, the phonetic feature includes:Acoustic feature and text feature;
    According to the phonetic feature and predetermined depth model, the corresponding affective style of the current speech signal, the feelings are identified Sense type includes:It is positive, neutral and negative.
  2. 2. according to the method described in claim 1, it is characterized in that, it is described extraction current speech signal phonetic feature before, The method further includes:
    The current speech signal is pre-processed.
  3. 3. method according to claim 1 or 2, it is characterised in that described to identify the corresponding feelings of the current speech signal After feeling type, the method further includes:
    According to the affective style, corresponding default counte-rplan are activated.
  4. 4. according to the method described in claim 1, it is characterized in that, the acoustic feature includes:Fundamental frequency, duration, energy and frequency Spectrum.
  5. 5. according to the method described in claim 1, it is characterized in that, described according to the phonetic feature and predetermined depth model, Identify that the corresponding affective style of the current speech signal includes:
    According to acoustic feature and text feature, the acoustic feature information and text feature information for emotion recognition are obtained;
    According to the acoustic feature information, K acoustic feature vector is obtained;
    According to K acoustic feature vector sum text feature information, K Text eigenvector is obtained;
    According to K acoustic feature vector, K Text eigenvector and predetermined depth model, the emotion of current speech signal is identified Type.
  6. 6. according to the method described in claim 5, it is characterized in that, described according to acoustic feature and text feature, it is used for The acoustic feature information and text feature information of emotion recognition include:
    Acoustic feature and text feature are separately converted to corresponding vector;
    The corresponding vector of the corresponding vector sum text feature of acoustic feature is inputted into convolutional neural networks respectively, acquisition is used for emotion The acoustic feature information and text feature information of identification.
  7. 7. the method according to claim 5 or 6, it is characterised in that it is described according to the acoustic feature information, obtain K Acoustic feature vector includes:
    By the acoustic feature information pool, K acoustic feature vector is obtained;
    It is described to be included according to K acoustic feature vector sum text feature information, K Text eigenvector of acquisition:
    Text feature information is focused on using focus mechanism according to the average of K acoustic feature vector;
    By the text feature information pool after focusing, K Text eigenvector is obtained.
  8. A kind of 8. emotion recognition system, it is characterised in that including:
    Voice acquisition module, is configured as obtaining current speech signal;
    Characteristic extracting module, is configured as the phonetic feature of extraction current speech signal, and the phonetic feature includes:Acoustic feature And text feature;
    Emotion recognition module, is configured as, according to the phonetic feature and predetermined depth model, identifying the current speech signal Corresponding affective style, the affective style include:It is positive, neutral and negative.
  9. 9. system according to claim 8, it is characterised in that the system also includes:Signal pre-processing module and activation Module;
    The signal pre-processing module, is configured as pre-processing the current speech signal;
    The active module, is configured as, according to the affective style, activating corresponding default counte-rplan.
  10. 10. system according to claim 8, it is characterised in that the emotion recognition module includes:
    First obtains unit, is configured as according to acoustic feature and text feature, obtains and believes for the acoustic feature of emotion recognition Breath and text feature information, specifically include:Acoustic feature and text feature are separately converted to corresponding vector;By acoustic feature The corresponding vector of corresponding vector sum text feature inputs convolutional neural networks respectively, obtains the acoustic feature for emotion recognition Information and text feature information;The acoustic feature includes:Fundamental frequency, duration, energy and frequency spectrum;
    Second obtaining unit, is configured as according to the acoustic feature information, obtains K acoustic feature vector, specifically includes:Will The acoustic feature information pool, obtains K acoustic feature vector;It is additionally configured to according to K acoustic feature vector sum text Characteristic information, obtains K Text eigenvector, specifically includes:According to the average of K acoustic feature vector to text feature information Focused on using focus mechanism;By the text feature information pool after focusing, K Text eigenvector is obtained;
    Emotion recognition unit, is configured as, according to K acoustic feature vector, K Text eigenvector and predetermined depth model, knowing The affective style of other current speech signal.
CN201810007403.7A 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system Active CN107945790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810007403.7A CN107945790B (en) 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810007403.7A CN107945790B (en) 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system

Publications (2)

Publication Number Publication Date
CN107945790A true CN107945790A (en) 2018-04-20
CN107945790B CN107945790B (en) 2021-01-26

Family

ID=61938328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810007403.7A Active CN107945790B (en) 2018-01-03 2018-01-03 Emotion recognition method and emotion recognition system

Country Status (1)

Country Link
CN (1) CN107945790B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833722A (en) * 2018-05-29 2018-11-16 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN109192225A (en) * 2018-09-28 2019-01-11 清华大学 The method and device of speech emotion recognition and mark
CN109243490A (en) * 2018-10-11 2019-01-18 平安科技(深圳)有限公司 Driver's Emotion identification method and terminal device
CN109410986A (en) * 2018-11-21 2019-03-01 咪咕数字传媒有限公司 A kind of Emotion identification method, apparatus and storage medium
CN109741732A (en) * 2018-08-30 2019-05-10 京东方科技集团股份有限公司 Name entity recognition method, name entity recognition device, equipment and medium
CN110047517A (en) * 2019-04-24 2019-07-23 京东方科技集团股份有限公司 Speech-emotion recognition method, answering method and computer equipment
CN110473571A (en) * 2019-07-26 2019-11-19 北京影谱科技股份有限公司 Emotion identification method and device based on short video speech
CN110600033A (en) * 2019-08-26 2019-12-20 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN110660412A (en) * 2018-06-28 2020-01-07 Tcl集团股份有限公司 Emotion guiding method and device and terminal equipment
CN110728983A (en) * 2018-07-16 2020-01-24 科大讯飞股份有限公司 Information display method, device, equipment and readable storage medium
CN111128189A (en) * 2019-12-30 2020-05-08 秒针信息技术有限公司 Warning information prompting method and device
CN111354361A (en) * 2018-12-21 2020-06-30 深圳市优必选科技有限公司 Emotion communication method and system and robot
US11810596B2 (en) 2021-08-16 2023-11-07 Hong Kong Applied Science and Technology Research Institute Company Limited Apparatus and method for speech-emotion recognition with quantified emotional states
CN110728983B (en) * 2018-07-16 2024-04-30 科大讯飞股份有限公司 Information display method, device, equipment and readable storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1391876A1 (en) * 2002-08-14 2004-02-25 Sony International (Europe) GmbH Method of determining phonemes in spoken utterances suitable for recognizing emotions using voice quality features
EP1429314A1 (en) * 2002-12-13 2004-06-16 Sony International (Europe) GmbH Correction of energy as input feature for speech processing
JP2005283647A (en) * 2004-03-26 2005-10-13 Matsushita Electric Ind Co Ltd Feeling recognition device
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
US20130080169A1 (en) * 2011-09-27 2013-03-28 Fuji Xerox Co., Ltd. Audio analysis system, audio analysis apparatus, audio analysis terminal
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
KR20160116586A (en) * 2015-03-30 2016-10-10 한국전자통신연구원 Method and apparatus for emotion recognition
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
WO2017048730A1 (en) * 2015-09-14 2017-03-23 Cogito Corporation Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices
US20170140757A1 (en) * 2011-04-22 2017-05-18 Angel A. Penilla Methods and vehicles for processing voice commands and moderating vehicle response
CN106782615A (en) * 2016-12-20 2017-05-31 科大讯飞股份有限公司 Speech data emotion detection method and apparatus and system
CN107112006A (en) * 2014-10-02 2017-08-29 微软技术许可有限责任公司 Speech processes based on neutral net
JP6213476B2 (en) * 2012-10-31 2017-10-18 日本電気株式会社 Dissatisfied conversation determination device and dissatisfied conversation determination method
CN107516511A (en) * 2016-06-13 2017-12-26 微软技术许可有限责任公司 The Text To Speech learning system of intention assessment and mood

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1391876A1 (en) * 2002-08-14 2004-02-25 Sony International (Europe) GmbH Method of determining phonemes in spoken utterances suitable for recognizing emotions using voice quality features
EP1429314A1 (en) * 2002-12-13 2004-06-16 Sony International (Europe) GmbH Correction of energy as input feature for speech processing
JP2005283647A (en) * 2004-03-26 2005-10-13 Matsushita Electric Ind Co Ltd Feeling recognition device
CN101894550A (en) * 2010-07-19 2010-11-24 东南大学 Speech emotion classifying method for emotion-based characteristic optimization
US20170140757A1 (en) * 2011-04-22 2017-05-18 Angel A. Penilla Methods and vehicles for processing voice commands and moderating vehicle response
US20130080169A1 (en) * 2011-09-27 2013-03-28 Fuji Xerox Co., Ltd. Audio analysis system, audio analysis apparatus, audio analysis terminal
JP6213476B2 (en) * 2012-10-31 2017-10-18 日本電気株式会社 Dissatisfied conversation determination device and dissatisfied conversation determination method
CN104050965A (en) * 2013-09-02 2014-09-17 广东外语外贸大学 English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN107112006A (en) * 2014-10-02 2017-08-29 微软技术许可有限责任公司 Speech processes based on neutral net
KR20160116586A (en) * 2015-03-30 2016-10-10 한국전자통신연구원 Method and apparatus for emotion recognition
WO2017048730A1 (en) * 2015-09-14 2017-03-23 Cogito Corporation Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices
CN107516511A (en) * 2016-06-13 2017-12-26 微软技术许可有限责任公司 The Text To Speech learning system of intention assessment and mood
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN106782615A (en) * 2016-12-20 2017-05-31 科大讯飞股份有限公司 Speech data emotion detection method and apparatus and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DAVID GRIOL: "Combing speech-based and linguistic classifiers to recognize emotion in user spoken utterances", 《NEUROCOMPUTING》 *
朱从贤: "基于深度学习的语音情感识别方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李承程: "基于深度学习的文本语音耦合情感识别方法研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833722B (en) * 2018-05-29 2021-05-11 平安科技(深圳)有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN108833722A (en) * 2018-05-29 2018-11-16 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN110660412A (en) * 2018-06-28 2020-01-07 Tcl集团股份有限公司 Emotion guiding method and device and terminal equipment
CN110728983B (en) * 2018-07-16 2024-04-30 科大讯飞股份有限公司 Information display method, device, equipment and readable storage medium
CN110728983A (en) * 2018-07-16 2020-01-24 科大讯飞股份有限公司 Information display method, device, equipment and readable storage medium
CN109741732A (en) * 2018-08-30 2019-05-10 京东方科技集团股份有限公司 Name entity recognition method, name entity recognition device, equipment and medium
CN109741732B (en) * 2018-08-30 2022-06-21 京东方科技集团股份有限公司 Named entity recognition method, named entity recognition device, equipment and medium
WO2020043123A1 (en) * 2018-08-30 2020-03-05 京东方科技集团股份有限公司 Named-entity recognition method, named-entity recognition apparatus and device, and medium
CN109192225A (en) * 2018-09-28 2019-01-11 清华大学 The method and device of speech emotion recognition and mark
CN109243490A (en) * 2018-10-11 2019-01-18 平安科技(深圳)有限公司 Driver's Emotion identification method and terminal device
CN109410986A (en) * 2018-11-21 2019-03-01 咪咕数字传媒有限公司 A kind of Emotion identification method, apparatus and storage medium
CN109410986B (en) * 2018-11-21 2021-08-06 咪咕数字传媒有限公司 Emotion recognition method and device and storage medium
CN111354361A (en) * 2018-12-21 2020-06-30 深圳市优必选科技有限公司 Emotion communication method and system and robot
CN110047517A (en) * 2019-04-24 2019-07-23 京东方科技集团股份有限公司 Speech-emotion recognition method, answering method and computer equipment
CN110473571A (en) * 2019-07-26 2019-11-19 北京影谱科技股份有限公司 Emotion identification method and device based on short video speech
CN110600033B (en) * 2019-08-26 2022-04-05 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN110600033A (en) * 2019-08-26 2019-12-20 北京大米科技有限公司 Learning condition evaluation method and device, storage medium and electronic equipment
CN111128189A (en) * 2019-12-30 2020-05-08 秒针信息技术有限公司 Warning information prompting method and device
US11810596B2 (en) 2021-08-16 2023-11-07 Hong Kong Applied Science and Technology Research Institute Company Limited Apparatus and method for speech-emotion recognition with quantified emotional states

Also Published As

Publication number Publication date
CN107945790B (en) 2021-01-26

Similar Documents

Publication Publication Date Title
CN107945790A (en) A kind of emotion identification method and emotion recognition system
CN108564942B (en) Voice emotion recognition method and system based on adjustable sensitivity
CN105096941B (en) Audio recognition method and device
CN109256136B (en) Voice recognition method and device
Nwe et al. Speech based emotion classification
CN102723078B (en) Emotion speech recognition method based on natural language comprehension
CN106504768B (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN108806667A (en) The method for synchronously recognizing of voice and mood based on neural network
CN110265040A (en) Training method, device, storage medium and the electronic equipment of sound-groove model
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN103996155A (en) Intelligent interaction and psychological comfort robot service system
CN104538043A (en) Real-time emotion reminder for call
CN111260761B (en) Method and device for generating mouth shape of animation character
Fan et al. End-to-end post-filter for speech separation with deep attention fusion features
Samantaray et al. A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages
CN111144367B (en) Auxiliary semantic recognition method based on gesture recognition
CN108711429A (en) Electronic equipment and apparatus control method
EP1280137B1 (en) Method for speaker identification
CN109599094A (en) The method of sound beauty and emotion modification
CN106653002A (en) Literal live broadcasting method and platform
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Sinha et al. Acoustic-phonetic feature based dialect identification in Hindi Speech
CN110246518A (en) Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features
CN114283820A (en) Multi-character voice interaction method, electronic equipment and storage medium
Luong et al. LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant