CN117153151A

CN117153151A - Emotion recognition method based on user intonation

Info

Publication number: CN117153151A
Application number: CN202311295316.3A
Authority: CN
Inventors: 王沛; 王睿俐; 王铭乾
Original assignee: Guangzhou Yifeng Health Technology Co ltd
Current assignee: Guangzhou Yifeng Health Technology Co ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-12-01
Anticipated expiration: 2043-10-09
Also published as: CN117153151B

Abstract

The application provides a mood recognition method based on user intonation, which comprises the following steps: processing the pre-acquired voice data to obtain text information; carrying out text emotion analysis on the text information according to a first text emotion recognition method to obtain a first user emotion; according to a preset voice feature extraction method, voice feature extraction is carried out on voice data to obtain voice feature information and intonation amplitude change state feature information; carrying out voice emotion analysis on the voice characteristic information according to a second voice emotion recognition method to obtain second user emotion; and according to the emotion processing logic, analyzing and processing the emotion of the first user and the emotion of the second user based on the intonation amplitude change state characteristic information and the voice characteristic information, and determining the current emotion of the user.

Description

Emotion recognition method based on user intonation

Technical Field

The application relates to the technical field of emotion recognition, in particular to an emotion recognition method based on user intonation.

Background

With the rapid development of internet technology, there are artificial intelligence or human-computer interaction figures in many working fields, which are used to replace or simplify the working contents of people in some working scenes, or perform better and more convenient human-computer interaction operations, such as voice assistants, intelligent customer services, etc. Currently, in speech emotion recognition, the following methods are generally adopted: first, speech content is converted into text content using speech recognition technology (e.g., automated Speech Recognition, ASR), and then speech emotion is recognized in conjunction with text emotion analysis (Natural Language Processing) in natural language processing (Natural Language Processing, NLP) technology to achieve human-machine interaction; in the prior art, emotion analysis is only performed on a text, but besides text content, voice characteristic information plays an important role in emotion recognition, emotion analysis is only performed on the text, other voice characteristic information is not considered, and accuracy of voice emotion recognition is reduced, so that a emotion recognition method based on user intonation is needed to be used for recognizing emotion in combination with the text characteristic information and the voice characteristic information.

Disclosure of Invention

Aiming at the defects of the prior art, the application provides a mood recognition method based on user intonation, which is used for solving the problems.

A mood recognition method based on user intonation, comprising: processing the pre-acquired voice data to obtain text information; carrying out text emotion analysis on the text information according to a first text emotion recognition method to obtain a first user emotion; according to a preset voice feature extraction method, voice feature extraction is carried out on voice data to obtain voice feature information and intonation amplitude change state feature information; carrying out voice emotion analysis on the voice characteristic information according to a second voice emotion recognition method to obtain second user emotion; and according to the emotion processing logic, analyzing and processing the emotion of the first user and the emotion of the second user based on the intonation amplitude change state characteristic information and the voice characteristic information, and determining the current emotion of the user.

As one embodiment of the present application, the voice feature information includes speech rate feature information, sound ray feature information, and intonation feature information.

As an embodiment of the present application, processing pre-acquired voice data to obtain text information includes: recognizing the voice data into text data according to a pre-trained voice recognition model, and extracting text keywords from the text data to obtain a plurality of text keywords; automatically generating logic and continuity rules based on preset texts, and generating text information according to a plurality of text keywords.

As an embodiment of the present application, performing speech emotion analysis on speech feature information according to a second speech emotion recognition method to obtain a second user emotion, including: acquiring speech speed characteristic information, and classifying the speech speed of the current user based on a preset speech speed classification rule to generate a speech speed class; wherein each speech rate level corresponds to an emotional state of the user; acquiring sound ray characteristic information, and identifying the sound ray characteristic information based on a pre-trained sound ray identification model to obtain a sound ray emotion state; the method comprises the steps of obtaining intonation feature information, and identifying sound line feature information based on a pre-trained intonation identification model to obtain intonation emotion states; carrying out emotion matching in a predetermined emotion database according to the speech speed level, the sound ray emotion state and the intonation emotion state, and determining the emotion of the second user; the emotion database comprises a plurality of emotions, each emotion corresponds to one emotion speed grade, sound ray emotion state and intonation emotion state, and the combination collocation of the emotion speed grade, the sound ray emotion state and the intonation emotion state in each emotion is unique.

As one embodiment of the present application, obtaining training samples includes: acquiring training WSI of a plurality of HE dyed bladder tissues in the TCGA; labeling training WSI, filtering the training WSI to obtain a plurality of WSI images which retain the labels with gene expression, gene methylation and gene mutation; dividing a plurality of WSI images to obtain a verification WSI set, a training WSI set and a test WSI set; and based on 400 times magnification, performing 320×320 pixel image block division on all WSI sets, and filtering the background to obtain a second verification WSI set, a second training WSI set and a second test WSI set.

According to one embodiment of the present application, according to emotion processing logic, analyzing and processing a first user emotion and a second user emotion based on intonation amplitude variation state feature information and voice feature information, determining a current emotion of a user includes: judging whether the user intonation amplitude value change state characteristic information changes within a preset time, if so, acquiring changed user second intonation characteristic information, judging whether the second intonation characteristic information is negative intonation characteristic information, if so, taking the second user emotion as a main emotion, combining the first user emotion as a secondary emotion, and determining the current emotion of the user based on emotion combination analysis logic; if the intonation amplitude change state characteristic information of the user is not changed within the preset time, judging whether the intonation characteristic information of the user is negative intonation characteristic information, if so, determining the current emotion of the user by taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion based on preset emotion combination analysis logic.

As an embodiment of the present application, a mood recognition method based on user intonation further includes: if the intonation feature information or the second intonation feature information of the user is judged to be not the negative intonation feature information, the first user emotion is taken as a main emotion, the second user emotion is taken as a secondary emotion, and the current emotion of the user is determined based on a second preset emotion combination analysis logic.

As an embodiment of the present application, with the second user emotion as a primary emotion and the first user emotion as a secondary emotion, determining the current emotion of the user based on a preset emotion and analysis logic includes: taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion to construct an emotion link relation; selecting emotion data corresponding to the emotion link relation in a preset emotion combination database as the current emotion of the user; the preset emotion combination database comprises a plurality of different emotion link relations and emotion data corresponding to each emotion link relation one by one.

As an embodiment of the present application, a mood recognition method based on user intonation further includes: and executing man-machine switching operation in real time based on a preset intelligent telephone information recommendation method according to the current emotion of the user.

As an embodiment of the present application, a mood recognition method based on user intonation further includes: the current emotion of the user is obtained, an emotion diffusion simulation scene is constructed, the emotion diffusion matching degree between each currently idle operator and the current emotion of the user is calculated, and the idle operator with the highest emotion diffusion matching degree is selected to be connected with the current user.

As an embodiment of the application, obtaining the current emotion of a user, constructing an emotion diffusion simulation scene, calculating the emotion diffusion matching degree between each currently idle operator and the current emotion of the user, and selecting the idle operator with the highest emotion diffusion matching degree to be connected with the current user, wherein the method comprises the following steps: carrying out emotion infection interaction data test on operators at preset intervals to obtain emotion infection interaction data of each operator when facing emotion of different users; acquiring the current emotion of a user to be simulated, and determining emotion infection interaction data matched with the current emotion of the user by a current idle operator based on the current emotion of the user; constructing an emotion diffusion simulation scene according to all the emotion infection interaction data, and calculating the emotion diffusion matching degree between each currently idle operator and the current emotion of the user based on the emotion diffusion simulation scene, wherein the higher the infection rejection capability of the emotion infection interaction data to the current emotion of the user is, the higher the corresponding emotion diffusion matching degree is; and selecting the idle operator with the highest emotion diffusion matching degree to connect with the current user.

The beneficial effects of the application are as follows:

1. according to the application, the emotion of the current user is comprehensively judged and identified by collecting the voice text of the user, the intonation of the user and the change condition of the intonation of the user in the voice information and by utilizing sound rays, speech speed, intonation amplitude change and the like, and real-time intelligent telephone information recommendation is carried out according to the emotion of the current user, so that the accuracy of voice emotion recognition is higher compared with that of judging only through text content.

2. According to the application, the telephone traffic of the robot connection is distributed to the supervisory personnel for processing in a manual telephone traffic mode according to the mode of carrying out man-machine interaction real-time switching on the current user emotion, thereby improving the satisfaction degree of customer service and saving more business opportunities.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical scheme of the application is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application. In the drawings:

FIG. 1 is a flow chart of a method for emotion recognition based on a user intonation in an embodiment of the present application;

FIG. 2 is a flowchart of a second user emotion determination method based on emotion recognition of a user intonation according to an embodiment of the present application;

fig. 3 is a flowchart of a second method of emotion recognition method based on a user intonation according to an embodiment of the present application.

Detailed Description

The preferred embodiments of the present application will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present application only, and are not intended to limit the present application.

Referring to fig. 1, a mood recognition method based on user intonation includes: s101, processing pre-acquired voice data to obtain text information; s102, carrying out text emotion analysis on the text information according to a first text emotion recognition method to obtain a first user emotion; s103, according to a preset voice feature extraction method, voice feature extraction is carried out on voice data to obtain voice feature information and intonation amplitude change state feature information; s104, carrying out voice emotion analysis on the voice characteristic information according to a second voice emotion recognition method to obtain second user emotion; s105, analyzing and processing the emotion of the first user and the emotion of the second user based on the tone amplitude change state characteristic information and the voice characteristic information according to emotion processing logic, and determining the current emotion of the user;

the working principle of the technical scheme is as follows: the application provides a mood recognition method based on user intonation, which is preferably applied to customer service call scenes; firstly, acquiring real-time sample information of a user voice call, carrying out denoising and other treatments on the sample information to obtain treated voice data, and carrying out real-time treatment on pre-acquired voice data based on a text recognition method to obtain text information; after obtaining text information, carrying out text emotion analysis on the text information according to a first text emotion recognition method to obtain a first user emotion, storing the first user emotion, and then carrying out real-time voice feature extraction on voice data according to a preset voice feature extraction method to obtain voice feature information and intonation amplitude variation state feature information, wherein the preset voice feature extraction method is preferably based on a mel-frequency spectrum mode to carry out feature extraction; carrying out voice emotion analysis on the voice characteristic information according to a second voice emotion recognition method to obtain second user emotion and simultaneously storing the second user emotion; finally, according to emotion processing logic, analyzing and processing the emotion of the first user and the emotion of the second user based on the intonation amplitude change state characteristic information and the voice characteristic information, and determining the current emotion of the user;

the beneficial effects of the technical scheme are as follows: according to the application, the emotion of the current user is comprehensively judged and identified by collecting the voice text of the user, the intonation of the user and the change condition of the intonation of the user by sound rays, the speed of speech, the change of the intonation amplitude and the like in the voice information, and the telephone traffic connected with the robot is distributed to the supervisory personnel for processing in a manual telephone traffic mode according to the mode of carrying out man-machine interaction real-time switching on the emotion of the current user.

In one embodiment, the speech feature information includes speech rate feature information, sound ray feature information, and intonation feature information;

the working principle and beneficial effects of the technical scheme are as follows: the voice characteristic information is used for participating in analyzing the emotion of the user, and a plurality of data show that the voice characteristic information can be converted from the initial voice speed of the user to the faster-paced voice speed under the emotion of agitation, impatience and the like, and the tone characteristic information can be changed, meanwhile, different voice characteristic information can represent the current emotion of the user to a certain extent, for example, sharp voice generally gives people a sense of harshness and unpleasant, and is suitable for representing the emotion of dissatisfaction, anger and the like; the pleasant sound line generally gives people a pleasant and comfortable feeling, and is suitable for occasions needing to show aesthetic feeling, such as singing, recitation and the like; the blurred sound rays generally give an ambiguous, unclear feel, suitable for use in presenting confusing, uncertain conditions; the sad sound ray generally gives people a feeling of sadness and complaint, and is suitable for showing sadness, falling and other emotions; in addition, the sound ray detection device also comprises a plurality of different sound ray characteristics, and the sound ray characteristics are not repeated here; according to the technical scheme, through the detailed limitation of the voice characteristic information, the accuracy of emotion recognition of voice data can be improved.

In one embodiment, processing pre-acquired voice data to obtain text information includes: recognizing the voice data into text data according to a pre-trained voice recognition model, and extracting text keywords from the text data to obtain a plurality of text keywords; automatically generating logic and continuity rules based on a preset text, and generating text information according to a plurality of text keywords;

the working principle and beneficial effects of the technical scheme are as follows: recognizing the voice data into text data according to a pre-trained voice recognition model, and extracting text keywords from the text data to obtain a plurality of text keywords, wherein the keywords are preferably any word which is not meaningless word; automatically generating logic and continuity rules based on a preset text, and generating text information according to a plurality of text keywords; it is worth to say that if the repeated words exist in the client, based on the number of times of repeated words, the repeated words are replaced according to the preset emotion increment vocabulary, so that the text information consistency is ensured, the expression intention of the text information on the emotion of the user is enhanced, and the extraction efficiency of the text information is improved.

Referring to fig. 2, in one embodiment, performing speech emotion analysis on speech feature information according to a second speech emotion recognition method to obtain a second user emotion includes: s201, acquiring speech speed characteristic information, and grading the speech speed of a current user based on a preset speech speed grade grading rule to generate a speech speed grade; wherein each speech rate level corresponds to an emotional state of the user; s202, acquiring sound ray characteristic information, and identifying the sound ray characteristic information based on a pre-trained sound ray identification model to obtain a sound ray emotion state; s203, obtaining intonation feature information, and identifying sound line feature information based on a pre-trained intonation identification model to obtain an intonation emotion state; s204, performing emotion matching in a predetermined emotion database according to the speech speed level, the sound ray emotion state and the intonation emotion state, and determining the emotion of the second user; the emotion database comprises a plurality of emotions, each emotion corresponds to one speech speed grade, sound ray emotion state and intonation emotion state, and the combination collocation of the speech speed grade, the sound ray emotion state and the intonation emotion state in each emotion is unique;

the working principle of the technical scheme is as follows: acquiring speech rate characteristic information of a user, and constructing a speech rate grade division rule according to the speech rate characteristic information of a plurality of pre-sampled users in different states, wherein each speech rate grade in the pre-set speech rate grade division rule corresponds to one emotion state of the user, and each speech rate grade is set with a speech rate range; then, classifying the current user speech speed according to a preset speech speed classification rule and user speech speed characteristic information to generate a speech speed class; furthermore, the collected user speech speed characteristic information can be user speech speed change information, namely, the conventional speech speed of the user is firstly determined, then the user speech speed change information is determined based on the change between the conventional speech speed and the subsequent speech speed, based on the conventional speech speed change information and the conventional speech speed change information, the preset speech speed classification rule is further changed into the sampled speech speed change information of a plurality of users in different states, the speech speed classification rule is constructed, and the subsequent operation is carried out, and compared with the method for directly obtaining the user speech speed characteristic information, the method can obtain the actual speech speed change of the user based on the actual initial speech speed of different users, and the user speech speed grade is determined based on the actual speech speed change, so that the data accuracy of the user speech speed grade is enhanced; acquiring sound ray characteristic information, and identifying the sound ray characteristic information based on a pre-trained sound ray identification model to obtain sound ray emotion states, wherein emotion states corresponding to different sound ray characteristics are described previously, and the description is omitted herein; the method comprises the steps of obtaining intonation feature information, and identifying sound line feature information based on a pre-trained intonation identification model to obtain intonation emotion states; wherein, the sound ray recognition model and the intonation recognition model are preferably trained by adopting a deep learning model; finally, carrying out emotion matching in a predetermined emotion database according to the speech speed grade, the sound ray emotion state and the intonation emotion state, and determining the emotion of the second user; the emotion database comprises a plurality of emotions, each emotion corresponds to one speech speed grade, sound ray emotion state and intonation emotion state, and the combination collocation of the speech speed grade, the sound ray emotion state and the intonation emotion state in each emotion is unique;

the beneficial effects of the technical scheme are as follows: according to the technical scheme, the speech speed grade, the sound ray emotion state and the intonation emotion state of the user are determined based on the user voice characteristic information, and the second user emotion is determined based on the acquired data, so that the accuracy of the second user emotion on the user emotion expression is improved.

In one embodiment, according to emotion processing logic, analyzing the first user emotion and the second user emotion based on the intonation amplitude variation state feature information and the voice feature information, determining the current emotion of the user includes: judging whether the user intonation amplitude value change state characteristic information changes within a preset time, if so, acquiring changed user second intonation characteristic information, judging whether the second intonation characteristic information is negative intonation characteristic information, if so, taking the second user emotion as a main emotion, combining the first user emotion as a secondary emotion, and determining the current emotion of the user based on emotion combination analysis logic; if the intonation amplitude change state characteristic information of the user is not changed within the preset time, judging whether the intonation characteristic information of the user is negative intonation characteristic information, if so, determining the current emotion of the user by taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion based on preset emotion combination analysis logic;

the working principle of the technical scheme is as follows: the method comprises the steps of collecting a plurality of sample analysis, finding that the user emotion state changes when the intonation suddenly changes in the normal situation, analyzing and processing a first user emotion and a second user emotion based on intonation amplitude change state characteristic information and voice characteristic information according to emotion processing logic based on the sample analysis, determining the current emotion of the user, specifically comprising judging whether the user intonation amplitude change state characteristic information changes within preset time, if so, obtaining changed second intonation characteristic information of the user, judging whether the second intonation characteristic information is negative intonation characteristic information, wherein the negative intonation characteristic information comprises intonation information such as sadness, vigilance, mania and the like, if so, taking the second user emotion as a main emotion, combining the first user emotion as an auxiliary emotion, and determining the current emotion of the user based on emotion combining analysis logic; if the intonation amplitude change state characteristic information of the user is not changed within the preset time, judging whether the intonation characteristic information of the user is negative intonation characteristic information, if so, determining the current emotion of the user by taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion based on preset emotion combination analysis logic;

the beneficial effects of the technical scheme are as follows: through the technical scheme, the current emotion state of the user is comprehensively judged by combining the emotion of the first user and the emotion of the second user, so that the emotion judgment accuracy of the user is improved.

In one embodiment, a method for emotion recognition based on a user intonation further comprises: if the intonation feature information or the second intonation feature information of the user is judged to be not the negative intonation feature information, the first user emotion is taken as a main emotion, the second user emotion is taken as a secondary emotion, and the current emotion of the user is determined based on a second preset emotion combination analysis logic;

the beneficial effects of the technical scheme are as follows: by the technical scheme, the judgment path of the emotion state of the user is expanded, and the emotion judgment accuracy of the user is improved.

In one embodiment, with the second user emotion as a primary emotion and the first user emotion as a secondary emotion, determining the current emotion of the user based on the preset emotion and the analysis logic includes: taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion to construct an emotion link relation; selecting emotion data corresponding to the emotion link relation in a preset emotion combination database as the current emotion of the user; the preset emotion combination database comprises a plurality of different emotion link relations and emotion data corresponding to each emotion link relation one by one;

the working principle of the technical scheme is as follows: taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion to construct an emotion link relationship, for example, A is the main emotion, B is the secondary emotion, and the emotion link relationship is A-B; selecting emotion data corresponding to the emotion link relation in a preset emotion combination database as the current emotion of the user; the preset emotion combination database comprises a plurality of different emotion link relations and emotion data corresponding to each emotion link relation one by one; for example, the emotional state of the user is C, the logic expression of each link relation in the preset emotion combination database is A-B-C, and when A-B is determined, C matched with the A-B can be determined through the preset emotion combination database; it is worth to be noted that the second preset emotion combination analysis logic is similar to the preset emotion combination analysis logic and shares the same preset emotion combination database;

In one embodiment, a method for emotion recognition based on a user intonation further comprises: according to the current emotion of the user, executing man-machine switching operation in real time based on a preset intelligent telephone information recommendation method;

the working principle and beneficial effects of the technical scheme are as follows: the intelligent telephone service switching method is preferably suitable for telephone service, when intelligent service can not solve the problem of a user, the man-machine switching operation is executed in real time based on the preset intelligent telephone information recommendation method according to the current emotion of the user, and the service body feeling of the user is improved.

In one embodiment, a method for emotion recognition based on a user intonation further comprises: acquiring the current emotion of a user, constructing an emotion diffusion simulation scene, calculating the emotion diffusion matching degree between each currently idle operator and the current emotion of the user, and selecting the idle operator with the highest emotion diffusion matching degree to be connected with the current user;

the beneficial effects of the technical scheme are as follows: through the technical scheme, the operator is properly selected, so that the operator is prevented from being infected by the emotion of the user, unreasonable operation is further performed, and the user satisfaction is improved.

Referring to fig. 3, in one embodiment, obtaining a current emotion of a user, constructing an emotion diffusion simulation scene, calculating an emotion diffusion matching degree between each currently idle operator and the current emotion of the user, and selecting an idle operator with the highest emotion diffusion matching degree to connect with the current user, including: s301, carrying out emotion infection interaction data test on operators at preset time intervals in advance to obtain emotion infection interaction data of each operator when facing emotion of different users; s302, acquiring a current emotion of a user to be simulated, and determining emotion infection interaction data of a current idle operator matched with the current emotion of the user based on the current emotion of the user; s303, constructing an emotion diffusion simulation scene according to all emotion infection interaction data, and calculating the emotion diffusion matching degree between each currently idle operator and the current emotion of the user based on the emotion diffusion simulation scene, wherein the higher the infection rejection capability of the emotion infection interaction data to the current emotion of the user is, the higher the corresponding emotion diffusion matching degree is; s304, selecting an idle operator with highest emotion diffusion matching degree to connect with the current user;

the working principle of the technical scheme is as follows: before constructing an emotion diffusion simulation scene, carrying out emotion infection interaction data test on all operators at preset intervals in advance, wherein the test is used for predicting emotion infection data received by each operator when facing users to display different emotion utterances, and finally obtaining emotion infection interaction data of each operator when facing different user emotions; the method is characterized in that the method is carried out in a mode that each quarter is selected at preset time intervals, and it is worth to say that a new operator needs to accept a test once when entering a job, and then the next unified test is connected, so that unified test data is facilitated, and data management efficiency is improved; when the system adapted by the application detects that the emotion of the user meets the condition of changing the manual customer service, acquiring the current emotion of the user to be simulated, wherein the current emotion of the user to be simulated is the emotion of the user meeting the condition of changing the manual customer service; determining emotion infection interaction data of currently idle operators and current emotion matching of the user based on the current emotion of the user, specifically determining emotion infection interaction data of all current idle operators based on the current emotion of the user, wherein a blacklist exists in the emotion infection interaction data, the blacklist comprises unqualified emotion infection interaction data of each operator, the unqualified emotion infection interaction data preferably refers to data which are easily affected by a certain emotion of the user to cause unreasonable operation, for example, when a certain operator faces D emotion, service attitudes often fluctuate, and the attitude fluctuation range is large, D emotion is added in the blacklist of the emotion infection interaction data of the operator, namely, a user who subsequently encounters D emotion is not allocated to the operator, namely, the matching is unsuccessful; constructing an emotion diffusion simulation scene according to all emotion infection interaction data, wherein the emotion infection interaction data are the successfully matched interaction data, and when a simulation scene is constructed, a virtual scene of two-person dialogue is preferably constructed, in the virtual scene, based on user emotion, a preset user emotion word stock is matched, user speaking is simulated in a mode of combining an AI voice interaction mode, a voice interaction model based on neural network training is generated based on emotion infection interaction data of each operator, the user speaking and the voice interaction model are simulated in a mode of the AI voice interaction mode to complete virtual scene construction, and a plurality of construction modes of the simulation scene are omitted; calculating the emotion diffusion matching degree between each currently idle operator and the current emotion of the user based on an emotion diffusion simulation scene, wherein the higher the infection rejection capability of the emotion infection interaction data to the current emotion of the user is, the higher the corresponding emotion diffusion matching degree is, namely, the more stable the text emotion state of the interaction text of the voice interaction model of the operator is in the whole simulation state, or the more the text emotion state tends to a preset good customer service emotion attitude range, the higher the infection rejection capability to the current emotion of the user is represented; finally, an idle operator with the highest emotion diffusion matching degree is selected to be connected with the current user;

the beneficial effects of the technical scheme are as follows: through the technical scheme, the method and the device have the advantages that the appropriate operators are selected, the probability of language conflict between the operators and clients is effectively reduced, meanwhile, the probability of depression of the operators is reduced by reducing the number of times of the language conflict, further, the operators are prevented from being unreasonably operated by emotion infection of users, the user satisfaction is effectively prevented from being reduced, and the telephone customer service level is improved.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method for emotion recognition based on a user intonation, comprising: processing the pre-acquired voice data to obtain text information; carrying out text emotion analysis on the text information according to a first text emotion recognition method to obtain a first user emotion; according to a preset voice feature extraction method, voice feature extraction is carried out on voice data to obtain voice feature information and intonation amplitude change state feature information; carrying out voice emotion analysis on the voice characteristic information according to a second voice emotion recognition method to obtain second user emotion; and according to the emotion processing logic, analyzing and processing the emotion of the first user and the emotion of the second user based on the intonation amplitude change state characteristic information and the voice characteristic information, and determining the current emotion of the user.

2. The method of claim 1, wherein the speech feature information includes speech rate feature information, sound line feature information, and intonation feature information.

3. The emotion recognition method based on user intonation according to claim 1, wherein processing the pre-acquired voice data to obtain text information comprises: recognizing the voice data into text data according to a pre-trained voice recognition model, and extracting text keywords from the text data to obtain a plurality of text keywords; automatically generating logic and continuity rules based on preset texts, and generating text information according to a plurality of text keywords.

4. The emotion recognition method based on user intonation according to claim 1, wherein performing speech emotion analysis on the speech feature information according to the second speech emotion recognition method to obtain the second user emotion comprises: acquiring speech speed characteristic information, and classifying the speech speed of the current user based on a preset speech speed classification rule to generate a speech speed class; wherein each speech rate level corresponds to an emotional state of the user; acquiring sound ray characteristic information, and identifying the sound ray characteristic information based on a pre-trained sound ray identification model to obtain a sound ray emotion state; the method comprises the steps of obtaining intonation feature information, and identifying sound line feature information based on a pre-trained intonation identification model to obtain intonation emotion states; carrying out emotion matching in a predetermined emotion database according to the speech speed level, the sound ray emotion state and the intonation emotion state, and determining the emotion of the second user; the emotion database comprises a plurality of emotions, each emotion corresponds to one emotion speed grade, sound ray emotion state and intonation emotion state, and the combination collocation of the emotion speed grade, the sound ray emotion state and the intonation emotion state in each emotion is unique.

5. The method of claim 1, wherein the analyzing the first user emotion and the second user emotion based on the intonation magnitude change state feature information and the voice feature information according to the emotion processing logic, and determining the current emotion of the user comprises: judging whether the user intonation amplitude value change state characteristic information changes within a preset time, if so, acquiring changed user second intonation characteristic information, judging whether the second intonation characteristic information is negative intonation characteristic information, if so, taking the second user emotion as a main emotion, combining the first user emotion as a secondary emotion, and determining the current emotion of the user based on emotion combination analysis logic; if the intonation amplitude change state characteristic information of the user is not changed within the preset time, judging whether the intonation characteristic information of the user is negative intonation characteristic information, if so, determining the current emotion of the user by taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion based on preset emotion combination analysis logic.

6. The user intonation-based emotion recognition method of claim 5, further comprising: if the intonation feature information or the second intonation feature information of the user is judged to be not the negative intonation feature information, the first user emotion is taken as a main emotion, the second user emotion is taken as a secondary emotion, and the current emotion of the user is determined based on a second preset emotion combination analysis logic.

7. The method of claim 5, wherein determining the current emotion of the user based on the preset emotion-combined analysis logic by using the second emotion as the primary emotion and the first emotion as the secondary emotion comprises: taking the emotion of the second user as a main emotion and combining the emotion of the first user as a secondary emotion to construct an emotion link relation; selecting emotion data corresponding to the emotion link relation in a preset emotion combination database as the current emotion of the user; the preset emotion combination database comprises a plurality of different emotion link relations and emotion data corresponding to each emotion link relation one by one.

8. The user intonation-based emotion recognition method of claim 1, further comprising: and executing man-machine switching operation in real time based on a preset intelligent telephone information recommendation method according to the current emotion of the user.

9. The user intonation-based emotion recognition method of claim 8, further comprising: the current emotion of the user is obtained, an emotion diffusion simulation scene is constructed, the emotion diffusion matching degree between each currently idle operator and the current emotion of the user is calculated, and the idle operator with the highest emotion diffusion matching degree is selected to be connected with the current user.

10. The emotion recognition method based on user intonation according to claim 9, wherein obtaining the current emotion of the user, constructing an emotion diffusion simulation scene, calculating the emotion diffusion matching degree between each currently free operator and the current emotion of the user, and selecting the free operator with the highest emotion diffusion matching degree to connect with the current user, comprising: carrying out emotion infection interaction data test on operators at preset intervals to obtain emotion infection interaction data of each operator when facing emotion of different users; acquiring the current emotion of a user to be simulated, and determining emotion infection interaction data matched with the current emotion of the user by a current idle operator based on the current emotion of the user; constructing an emotion diffusion simulation scene according to all the emotion infection interaction data, and calculating the emotion diffusion matching degree between each currently idle operator and the current emotion of the user based on the emotion diffusion simulation scene, wherein the higher the infection rejection capability of the emotion infection interaction data to the current emotion of the user is, the higher the corresponding emotion diffusion matching degree is; and selecting the idle operator with the highest emotion diffusion matching degree to connect with the current user.