CN107945790A

CN107945790A - A kind of emotion identification method and emotion recognition system

Info

Publication number: CN107945790A
Application number: CN201810007403.7A
Authority: CN
Inventors: 王雪云
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2018-01-03
Filing date: 2018-01-03
Publication date: 2018-04-20
Anticipated expiration: 2038-01-03
Also published as: CN107945790B

Abstract

The embodiment of the invention discloses a kind of emotion identification method and emotion recognition system, wherein, this method includes：Obtain current speech signal；The phonetic feature of current speech signal is extracted, phonetic feature includes：Acoustic feature and text feature；According to phonetic feature and predetermined depth model, the corresponding affective style of identification current speech signal, affective style includes：Positive, neutral and negative, technical scheme can identify corresponding affective style by voice signal, be improved service level with exercising supervision to attendant.

Description

A kind of emotion identification method and emotion recognition system

Technical field

The present embodiments relate to field of communication technology, and in particular to a kind of emotion identification method and emotion recognition system.

Background technology

In interpersonal exchange, language is one of most natural and important means.Carried secretly in the speech of speaker Emotion can to the mood of people around produce strong influence, wherein, emotion includes：Front and negative, especially attendant, For example, in public arenas such as bus, the home for the aged or hospitals, if attendant behaves badly, the tone is arrogant, and language is vulgar, That is emotion is negative that will be unfavorable for social harmony to causing deleterious effect by attendant and improve Happiness Index.

Study and find through inventor, can be judged currently without a kind of effective technological means by the speech of attendant Go out its corresponding emotion, improved service level with being supervised to it.

The content of the invention

In order to solve the above-mentioned technical problem, an embodiment of the present invention provides a kind of emotion identification method and emotion recognition system System, can identify corresponding emotion by voice signal.

In one aspect, an embodiment of the present invention provides a kind of emotion identification method, including：

Obtain current speech signal；

The phonetic feature of current speech signal is extracted, the phonetic feature includes：Acoustic feature and text feature；

According to the phonetic feature and predetermined depth model, the corresponding affective style of the current speech signal, institute are identified Stating affective style includes：It is positive, neutral and negative.

Alternatively, before the phonetic feature of the extraction current speech signal, the method further includes：

The current speech signal is pre-processed.

Alternatively, after the corresponding affective style of the identification current speech signal, the method further includes：

According to the affective style, corresponding default counte-rplan are activated.

Alternatively, the acoustic feature includes：Fundamental frequency, duration, energy and frequency spectrum.

Alternatively, it is described according to the phonetic feature and predetermined depth model, identify that the current speech signal is corresponding Affective style includes：

According to acoustic feature and text feature, the acoustic feature information and text feature information for emotion recognition are obtained；

According to the acoustic feature information, K acoustic feature vector is obtained；

According to K acoustic feature vector sum text feature information, K Text eigenvector is obtained；

According to K acoustic feature vector, K Text eigenvector and predetermined depth model, current speech signal is identified Affective style.

Alternatively, it is described according to acoustic feature and text feature, obtain the acoustic feature information and text for emotion recognition Eigen information includes：

Acoustic feature and text feature are separately converted to corresponding vector；

The corresponding vector of the corresponding vector sum text feature of acoustic feature is inputted into convolutional neural networks respectively, is used for The acoustic feature information and text feature information of emotion recognition.

Alternatively, described according to the acoustic feature information, obtaining K acoustic feature vector includes：

By the acoustic feature information pool, K acoustic feature vector is obtained；

It is described to be included according to K acoustic feature vector sum text feature information, K Text eigenvector of acquisition：

Text feature information is focused on using focus mechanism according to the average of K acoustic feature vector；

By the text feature information pool after focusing, K Text eigenvector is obtained.

On the other hand, the embodiment of the present invention also provides a kind of emotion recognition system, including：

Voice acquisition module, is configured as obtaining current speech signal；

Characteristic extracting module, is configured as the phonetic feature of extraction current speech signal, and the phonetic feature includes：Acoustics Feature and text feature；

Emotion recognition module, is configured as, according to the phonetic feature and predetermined depth model, identifying the current speech The corresponding affective style of signal, the affective style include：It is positive, neutral and negative.

Alternatively, the system also includes：Signal pre-processing module and active module；

The signal pre-processing module, is configured as pre-processing the current speech signal；

The active module, is configured as, according to the affective style, activating corresponding default counte-rplan.

Alternatively, the emotion recognition module includes：

First obtains unit, is configured as according to acoustic feature and text feature, obtains special for the acoustics of emotion recognition Reference ceases and text feature information, specifically includes：Acoustic feature and text feature are separately converted to corresponding vector；By acoustics The corresponding vector of the corresponding vector sum text feature of feature inputs convolutional neural networks respectively, obtains the acoustics for emotion recognition Characteristic information and text feature information；The acoustic feature includes：Fundamental frequency, duration, energy and frequency spectrum；

Second obtaining unit, is configured as according to the acoustic feature information, obtains K acoustic feature vector, specific bag Include：By the acoustic feature information pool, K acoustic feature vector is obtained；It is additionally configured to according to K acoustic feature vector sum Text feature information, obtains K Text eigenvector, specifically includes：According to the average of K acoustic feature vector to text feature Information is focused on using focus mechanism；By the text feature information pool after focusing, K Text eigenvector is obtained；

Emotion recognition unit, is configured as according to K acoustic feature vector, K Text eigenvector and predetermined depth mould Type, identifies the affective style of current speech signal.

The embodiment of the present invention provides a kind of emotion identification method and emotion recognition system, wherein, this method includes：Obtain and work as Preceding voice signal；The phonetic feature of current speech signal is extracted, phonetic feature includes：Acoustic feature and text feature；According to institute Phonetic feature and predetermined depth model are stated, identifies the corresponding affective style of the current speech signal, the affective style includes： Positive, neutral and negative, technical scheme can identify corresponding affective style by voice signal, with to service Personnel, which exercise supervision, to improve service level.

Certainly, implement any of the products of the present invention or method it is not absolutely required to reach all the above excellent at the same time Point.Other features and advantages of the present invention will illustrate in subsequent specification embodiment, also, partly implement from specification Become apparent in example, or understood by implementing the present invention.The purpose of the embodiment of the present invention and other advantages can pass through Specifically noted structure is realized and obtained in specification, claims and attached drawing.

Brief description of the drawings

Attached drawing is used for providing further understanding technical solution of the present invention, and a part for constitution instruction, with this The embodiment of application is used to explain technical scheme together, does not form the limitation to technical solution of the present invention.

Fig. 1 is a flow chart of emotion identification method provided in an embodiment of the present invention；

Fig. 2 is another flow chart of emotion identification method provided in an embodiment of the present invention；

Fig. 3 is the flow chart of step 300 provided in an embodiment of the present invention；

Fig. 4 is a structure diagram of emotion recognition system provided in an embodiment of the present invention；

Fig. 5 is another structure diagram of emotion recognition system provided in an embodiment of the present invention；

Fig. 6 is the structure diagram of emotion recognition module provided in an embodiment of the present invention.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with attached drawing to the present invention Embodiment be described in detail.It should be noted that in the case where there is no conflict, in the embodiment and embodiment in the application Feature can mutually be combined.

In order to illustrate the technical solution described in the embodiment of the present invention, illustrated below by specific embodiment.

Embodiment one

Fig. 1 is a flow chart of emotion identification method provided in an embodiment of the present invention, as shown in Figure 1, the present invention is implemented The emotion identification method that example provides specifically includes following steps：

Step 100, obtain current speech signal.

Specifically, step 100 obtains voice signal by microphone or microphone array.

Step 200, the phonetic feature for extracting current speech signal.

Wherein, phonetic feature includes：Acoustic feature and text feature.

Alternatively, acoustic feature includes：Fundamental frequency, duration, energy and frequency spectrum, wherein, fundamental frequency determines tone height, by certainly Related algorithm extracts fundamental frequency feature；Duration is related to word speed, and the unvoiced information in current speech signal is also for emotion recognition It is valuable, duration characteristics are extracted by Visual Speech instruments；Energy is related with amplitude, can pass through existing technology Extract energy feature and spectrum signature.

Alternatively, text feature is the text message in current speech signal, passes through speech recognition technology such as University of Science and Technology The winged Auto-Speech Recognition extraction text features of news.

Step 300, according to phonetic feature and predetermined depth model, the corresponding affective style of identification current speech signal.

Wherein, affective style includes：It is front, neutral and negative, it is necessary to explanation, positive affective style can make by Attendant is pleasant, and neutral affective style will not be to being had an impact by the mood of attendant, and negative affective style is just It can make to be felt ill by attendant.For same a word, such as " you are fool ", it may be possible to a people in the friend that talks in professional jargon, It is likely to be and ridicules opponent, the possible front of emotion may also be negative.

It should be noted that predetermined depth model is largely trained by sample database so that the feelings identified The accuracy rate for feeling type is higher.

Alternatively, emotion identification method provided in an embodiment of the present invention can be applied to the public affairs such as bus, the home for the aged, hospital Occasion altogether.

Emotion identification method provided in an embodiment of the present invention, including：Obtain current speech signal；Extract current speech signal Phonetic feature, phonetic feature includes：Acoustic feature and text feature；According to phonetic feature and predetermined depth model, identification is worked as The corresponding affective style of preceding voice signal, affective style include：Positive, neutral and negative, technical scheme can lead to Cross voice signal and identify corresponding affective style, improved service level with exercising supervision to attendant.

Alternatively, Fig. 2 is another flow chart of emotion identification method provided in an embodiment of the present invention, as shown in Fig. 2, step Before 200, emotion identification method provided in an embodiment of the present invention further includes：

Step 400, pre-process current speech signal.

Specifically, the pretreatment in step 400 includes：Ambient noise is eliminated, strengthen useful signal or splits current language Sound signal etc., it is necessary to explanation, segmentation current speech signal can by signal adding window framing, such as with the long 25ms of window, The Hamming window (i.e. each frame voice duration 25ms, pane moving step length 10ms) that window moves 10ms is realized.

Alternatively, after step 300, emotion identification method provided in an embodiment of the present invention further includes：

Step 500, according to affective style, activate corresponding default counte-rplan.

Specifically, step 500 includes：In the state of affective style is front or neutrality, attendant is encouraged after continuation of insurance Hold, in the state of affective style is negative, activate default counte-rplan, wherein, counte-rplan include but not limited to following It is several：(1) and alarm, prompting remind attendant to pay attention to attitude, and alternatively, alarm includes text importing, buzzing, language Sound report etc.；(2) the corresponding current speech signal of negative emotion is collected there are high in the clouds, service quality assessment is done for service organization And improvement；(3) timed message pushes, and After Hours the quality of service information of attendant is pushed on his mobile phone daily, is allowed He integrates service scenario on the day of understanding is controlled oneself, to further improve service level.

Alternatively, Fig. 3 is the flow chart of step 300 provided in an embodiment of the present invention, as shown in figure 3, step 300 includes：

Step 301, according to acoustic feature and text feature, obtain special for the acoustic feature information and text of emotion recognition Reference ceases.

Specifically, step 301 includes：Acoustic feature and text feature are separately converted to corresponding vector；Acoustics is special Levy the corresponding vector of corresponding vector sum text feature and input convolutional neural networks respectively, obtain special for the acoustics of emotion recognition Reference ceases and text feature information.

Step 302, according to acoustic feature information, it is vectorial to obtain K acoustic feature.

Specifically, step 302 includes：By acoustic feature information pool, K acoustic feature vector is obtained.

Step 303, according to K acoustic feature vector sum text feature information, obtain K Text eigenvector.

Specifically, step 303 includes：Focusing machine is used to text feature information according to the average of K acoustic feature vector System focuses on；By the text feature information pool after focusing, K Text eigenvector is obtained.

It should be noted that use focus mechanism to distribute different weights for different texts, such as to uncultivated words The weight of higher is distributed, influences the judgement of emotion, popular says, such as the character representation of such as convolutional Neural output is currently spoken The attitude of person is very rude and unreasonable, and the focus mechanism of convolutional neural networks can give " uncultivated words " (such as wretch, fool) distribution more High weight, such as the attitude of the character representation current speaker of convolutional Neural output are very gentle, the focusing of convolutional neural networks Mechanism would not give " uncultivated words " (such as wretch, fool) weight for distributing higher.

Specifically, the focus mechanism of text feature information is as follows：Weight is distributed for text feature information, wherein, weight is Determined according to K acoustic feature vector.

Especially, if in t moment, text feature information is h_a(t), acoustic feature information is O_q, each text feature letter It is changed into after ceasing the effect of the focusing of line focus mechanism

m_{A, q}(t)=tanh (W_amh_a(t)+W_qmO_q)

Wherein, W_am, W_qm, W_msIt is focusing parameter, S_{A, q}(t) it is weight,It is to be believed according to the text feature after focusing Breath.

Step 304, according to K acoustic feature vector, K Text eigenvector and predetermined depth model, identify current language The affective style of sound signal.

Specifically, step 304 specifically includes：Logic is carried out to K speech feature vector and K Text eigenvector to return Return, according to the K speech feature vector and K Text eigenvector and depth model after logistic regression, identify current speech The affective style of signal.

Below by the operation principle for illustrating the embodiment of the present invention：Worked as by microphone or microphone array Preceding voice signal；Current speech information is pre-processed；Extract the acoustic feature of current speech signal and pass through speech recognition Technology extracts the text feature of current speech signal, and acoustic feature and text feature are separately converted to corresponding vector；By sound Learn the corresponding vector of the corresponding vector sum text feature of feature and input convolutional neural networks respectively, obtain the sound for emotion recognition Learn characteristic information and text feature information；By acoustic feature information pool, K acoustic feature vector is obtained；It is special according to K acoustics The average for levying vector focuses on text feature information using focus mechanism；By the text feature information pool after focusing, K are obtained Text eigenvector；Logistic regression is carried out to K speech feature vector and K Text eigenvector, according to the K after logistic regression A speech feature vector and K Text eigenvector and depth model, identify the affective style of current speech signal；According to feelings Feel type, activate corresponding default counte-rplan.

Embodiment two

Inventive concept based on above-described embodiment, Fig. 4 are a knot of emotion recognition system provided in an embodiment of the present invention Structure schematic diagram, as shown in figure 4, emotion recognition system provided in an embodiment of the present invention includes：Voice acquisition module 10, feature extraction Module 20 and emotion recognition module 30.

In the present embodiment, voice acquisition module 10, is configured as obtaining current speech signal；Characteristic extracting module 20, It is configured as the phonetic feature of extraction current speech signal；Emotion recognition module 30, is configured as according to phonetic feature and presets Depth model, the corresponding affective style of identification current speech signal.

Alternatively, emotion recognition system provided in an embodiment of the present invention can be applied to the public affairs such as bus, the home for the aged, hospital Occasion altogether.

Emotion recognition system provided in an embodiment of the present invention, including：Voice acquisition module, is configured as obtaining current speech Signal；Characteristic extracting module is configured as the phonetic feature of extraction current speech signal, and phonetic feature includes：Acoustic feature and text Eigen；Emotion recognition module is configured as corresponding according to phonetic feature and predetermined depth model, identification current speech signal Affective style, affective style include：Positive, neutral and negative, technical scheme can be identified by voice signal Corresponding affective style, is improved service level with exercising supervision to attendant.

Alternatively, Fig. 5 is another structure diagram of emotion recognition system provided in an embodiment of the present invention, as shown in figure 5, System provided in an embodiment of the present invention further includes：Signal pre-processing module 40 and active module 50.

Signal pre-processing module 40, is configured as pre-processing current speech signal.

Specifically, pretreatment includes：Ambient noise is eliminated, strengthen useful signal or splits current speech signal etc., is needed It is noted that segmentation current speech signal can be by signal adding window framing, for example move with the long 25ms of window, window the Chinese of 10ms Bright window (i.e. each frame voice duration 25ms, pane moving step length 10ms) is realized.

Active module 50, is configured as, according to affective style, activating corresponding default counte-rplan.

Specifically, active module 50 encourages attendant to continue to keep in the state of affective style is front or neutrality, In the state of affective style is negative, default counte-rplan are activated, wherein, counte-rplan include but not limited to following several Kind：(1) and alarm, prompting remind attendant to pay attention to attitude, and alternatively, alarm includes text importing, buzzing, voice Report etc.；(2) the corresponding current speech signal of negative emotion is collected there are high in the clouds, for service organization do service quality assessment and Improve；(3) timed message pushes, and After Hours the quality of service information of attendant is pushed on his mobile phone daily, allows him Service scenario on the day of comprehensive understanding is controlled oneself, to further improve service level.

Alternatively, Fig. 6 is the structure diagram of emotion recognition module provided in an embodiment of the present invention, as shown in fig. 6, emotion Identification module includes：First obtains unit 31, the second obtaining unit 32 and emotion recognition unit 33.

First obtains unit 31, is configured as, according to acoustic feature and text feature, obtaining the acoustics for emotion recognition Characteristic information and text feature information, specifically include：Acoustic feature and text feature are separately converted to corresponding vector；By sound Learn the corresponding vector of the corresponding vector sum text feature of feature and input convolutional neural networks respectively, obtain the sound for emotion recognition Learn characteristic information and text feature information；Acoustic feature includes：；

Second obtaining unit 31, is configured as according to acoustic feature information, obtains K acoustic feature vector, specifically includes： By acoustic feature information pool, K acoustic feature vector is obtained；It is additionally configured to special according to K acoustic feature vector sum text Reference ceases, and obtains K Text eigenvector, specifically includes：Text feature information is adopted according to the average of K acoustic feature vector Focused on focus mechanism；By the text feature information pool after focusing, K Text eigenvector is obtained；

Emotion recognition unit 33, is configured as according to K acoustic feature vector, K Text eigenvector and predetermined depth Model, identifies the affective style of current speech signal.

It will be appreciated by those skilled in the art that the modules or unit that include for above-described embodiment two are simply according to function What logic was divided, but above-mentioned division is not limited to, as long as corresponding function can be realized；In addition, each function The specific name of unit is also only to facilitate mutually distinguish, the protection domain being not intended to limit the invention.

Those of ordinary skill in the art are further appreciated that all or part of step realized in above-described embodiment method is can Completed with performing relevant hardware by program, the program can be stored in a computer read/write memory medium In, the storage medium, including：ROM/RAM, disk, CD etc..

Although disclosed herein embodiment as above, the content be only readily appreciate the present invention and use Embodiment, is not limited to the present invention.Technical staff in any fields of the present invention, is taken off not departing from the present invention On the premise of the spirit and scope of dew, any modification and change, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims

A kind of 1. emotion identification method, it is characterised in that including：

Obtain current speech signal；

The phonetic feature of current speech signal is extracted, the phonetic feature includes：Acoustic feature and text feature；

According to the phonetic feature and predetermined depth model, the corresponding affective style of the current speech signal, the feelings are identified Sense type includes：It is positive, neutral and negative.
2. according to the method described in claim 1, it is characterized in that, it is described extraction current speech signal phonetic feature before, The method further includes：

The current speech signal is pre-processed.
3. method according to claim 1 or 2, it is characterised in that described to identify the corresponding feelings of the current speech signal After feeling type, the method further includes：

According to the affective style, corresponding default counte-rplan are activated.
4. according to the method described in claim 1, it is characterized in that, the acoustic feature includes：Fundamental frequency, duration, energy and frequency Spectrum.
5. according to the method described in claim 1, it is characterized in that, described according to the phonetic feature and predetermined depth model, Identify that the corresponding affective style of the current speech signal includes：

According to acoustic feature and text feature, the acoustic feature information and text feature information for emotion recognition are obtained；

According to the acoustic feature information, K acoustic feature vector is obtained；

According to K acoustic feature vector sum text feature information, K Text eigenvector is obtained；

According to K acoustic feature vector, K Text eigenvector and predetermined depth model, the emotion of current speech signal is identified Type.
6. according to the method described in claim 5, it is characterized in that, described according to acoustic feature and text feature, it is used for The acoustic feature information and text feature information of emotion recognition include：

Acoustic feature and text feature are separately converted to corresponding vector；

The corresponding vector of the corresponding vector sum text feature of acoustic feature is inputted into convolutional neural networks respectively, acquisition is used for emotion The acoustic feature information and text feature information of identification.
7. the method according to claim 5 or 6, it is characterised in that it is described according to the acoustic feature information, obtain K Acoustic feature vector includes：

By the acoustic feature information pool, K acoustic feature vector is obtained；

It is described to be included according to K acoustic feature vector sum text feature information, K Text eigenvector of acquisition：

Text feature information is focused on using focus mechanism according to the average of K acoustic feature vector；

By the text feature information pool after focusing, K Text eigenvector is obtained.
A kind of 8. emotion recognition system, it is characterised in that including：

Voice acquisition module, is configured as obtaining current speech signal；

Characteristic extracting module, is configured as the phonetic feature of extraction current speech signal, and the phonetic feature includes：Acoustic feature And text feature；

Emotion recognition module, is configured as, according to the phonetic feature and predetermined depth model, identifying the current speech signal Corresponding affective style, the affective style include：It is positive, neutral and negative.
9. system according to claim 8, it is characterised in that the system also includes：Signal pre-processing module and activation Module；

The signal pre-processing module, is configured as pre-processing the current speech signal；

The active module, is configured as, according to the affective style, activating corresponding default counte-rplan.
10. system according to claim 8, it is characterised in that the emotion recognition module includes：

First obtains unit, is configured as according to acoustic feature and text feature, obtains and believes for the acoustic feature of emotion recognition Breath and text feature information, specifically include：Acoustic feature and text feature are separately converted to corresponding vector；By acoustic feature The corresponding vector of corresponding vector sum text feature inputs convolutional neural networks respectively, obtains the acoustic feature for emotion recognition Information and text feature information；The acoustic feature includes：Fundamental frequency, duration, energy and frequency spectrum；

Second obtaining unit, is configured as according to the acoustic feature information, obtains K acoustic feature vector, specifically includes：Will The acoustic feature information pool, obtains K acoustic feature vector；It is additionally configured to according to K acoustic feature vector sum text Characteristic information, obtains K Text eigenvector, specifically includes：According to the average of K acoustic feature vector to text feature information Focused on using focus mechanism；By the text feature information pool after focusing, K Text eigenvector is obtained；

Emotion recognition unit, is configured as, according to K acoustic feature vector, K Text eigenvector and predetermined depth model, knowing The affective style of other current speech signal.