CN110751943A

CN110751943A - Voice emotion recognition method and device and related equipment

Info

Publication number: CN110751943A
Application number: CN201911082413.8A
Authority: CN
Inventors: 谌明
Original assignee: Zhejiang Tonghuashun Intelligent Technology Co Ltd
Current assignee: Zhejiang Tonghuashun Intelligent Technology Co Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-02-04

Abstract

The application discloses a voice emotion recognition method, which comprises the steps of carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information; performing character conversion on incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics; carrying out feature extraction on incoming call voice information to obtain tone features; integrating the emotional characteristics and the mood characteristics to obtain combined characteristic words; matching in a preset emotion information base to obtain emotion categories corresponding to the combined characteristic words; outputting the emotion classification; the voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to the feedback information, meets the intelligent requirement of the user on the telephone communication service, and further improves the user experience. The application also discloses a speech emotion recognition device, electronic equipment and a computer readable storage medium, which all have the beneficial effects.

Description

Voice emotion recognition method and device and related equipment

Technical Field

The present application relates to the field of communications technologies, and in particular, to a speech emotion recognition method, a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium.

Background

The development of communication technology is changing day by day, smart phones are also continuously updated, functions are more and more, meanwhile, along with the increasingly accelerated pace of life of users, the requirements of the users on communication service experience are higher and higher, and more specialized, personalized and intelligent communication services are expected to be obtained. However, the existing telephone communication service can only realize simple functions of incoming call display, harassment interception and the like, and cannot acquire other more effective data information so as to effectively help a user to answer or hang up an incoming call according to actual conditions.

Therefore, how to acquire more effective data information according to the incoming call information so that the user can process the incoming call according to the feedback information meets the intelligent requirement of the user on the telephone communication service, and the improvement of the user experience is a problem to be solved urgently by technical personnel in the field.

Disclosure of Invention

The voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to feedback information, meets the intelligent requirement of the user on telephone communication service, and further improves user experience; another object of the present application is to provide a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium, which also have the above-mentioned advantages.

In order to solve the technical problem, the present application provides a speech emotion recognition method, including:

carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;

performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;

extracting the characteristics of the incoming call voice information to obtain tone characteristics;

integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words;

matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word;

outputting the emotion classification.

Preferably, the performing an intelligent dialogue with the call request terminal according to the received call request, and obtaining the incoming call voice information includes:

obtaining voice information according to the call request;

performing character conversion on the voice information to obtain text conversion information;

performing feature extraction on the text conversion information to obtain text features;

matching and obtaining problem information corresponding to the text features in a preset problem knowledge base;

matching and obtaining answer information corresponding to the question information in a preset answer knowledge base;

carrying out voice conversion on the answer information to obtain voice reply information;

feeding back the voice reply information to the call request terminal to realize the intelligent conversation;

and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.

Preferably, the performing feature extraction on the text conversion information to obtain text features includes:

performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information;

and extracting the characteristics of the processed text conversion information by using a preset language model to obtain the text characteristics.

Preferably, the speech emotion recognition method further includes:

recording the text conversion information, the question information and the answer information to generate a question-answer record;

and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.

Preferably, the speech emotion recognition method further includes:

and when the emotion category corresponding to the combined feature word cannot be obtained in the preset emotion information base in a matching manner, creating a new emotion category according to the combined feature word, and outputting the new emotion category.

Preferably, the speech emotion recognition method further includes:

and adding the new emotion type into the preset emotion information base to update the preset emotion information base.

Preferably, the speech emotion recognition method further includes:

generating suggestion information of answering a call according to the emotion type;

and outputting the recommendation information.

Preferably, the speech emotion recognition method further includes:

and carrying out voice playing on the suggestion information through a broadcaster.

Preferably, the speech emotion recognition method further includes:

and adjusting the tone mode of the intelligent conversation according to the emotion category.

For solving the above technical problem, the present application further provides a speech emotion recognition apparatus, the speech emotion recognition apparatus includes:

the intelligent dialogue module is used for carrying out intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;

the first feature extraction module is used for performing character conversion on the incoming call voice information to obtain text information and performing emotion analysis on the text information to obtain emotion features;

the second feature extraction module is used for extracting features of the incoming call voice information to obtain tone features;

the characteristic integration module is used for integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;

the semantic matching module is used for matching in a preset emotion information base to obtain the emotion category corresponding to the combined feature word;

and the information output module is used for outputting the emotion categories.

In order to solve the above technical problem, the present application further provides an electronic device for speech emotion recognition, where the electronic device for speech emotion recognition includes:

a memory for storing a computer program;

a processor for implementing the steps of any one of the above speech emotion recognition methods when executing the computer program.

Preferably, the electronic device for speech emotion recognition further includes:

and the display is used for displaying the identity category of the calling request terminal.

In order to solve the above technical problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above speech emotion recognition methods.

The speech emotion recognition method comprises the steps of carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call speech information; performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics; extracting the characteristics of the incoming call voice information to obtain tone characteristics; integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words; matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word; outputting the emotion classification.

It can be seen that the speech emotion recognition method provided by the present application, when receiving a call request, first performs an intelligent dialogue with a call request end to obtain incoming call speech information, further converts the incoming call speech information into text information, performs emotion analysis on the text information to obtain emotion characteristics, at the same time, performs feature extraction on the incoming call speech information to obtain tone characteristics, i.e. processes the speech and the text respectively to obtain respective corresponding feature information, then performs feature combination to obtain combined feature words, and finally matches and determines the emotion category of the call request party in a preset emotion information base to output to realize call reminding, thus, the implementation manner can obtain more effective data information according to the incoming call information sent by the call request end, determine the emotion information of the call request party, and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.

The speech emotion recognition device, the electronic device and the computer-readable storage medium provided by the application all have the beneficial effects, and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flow chart of a speech emotion recognition method provided in the present application;

FIG. 2 is a block diagram of a speech emotion recognition system provided in the present application;

FIG. 3 is a block diagram of an intelligent question-answering module in the speech emotion recognition system provided by the present application;

fig. 4 is a schematic structural diagram of a speech emotion recognition apparatus provided in the present application;

fig. 5 is a schematic structural diagram of an electronic device for speech emotion recognition provided by the present application.

Detailed Description

The core of the application is to provide a voice emotion recognition method, the voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to feedback information, meets the intelligent requirement of the user on telephone communication service, and further improves the user experience; another core of the present application is to provide a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium, which also have the above beneficial effects.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a schematic flowchart of a speech emotion recognition method provided in the present application, where the speech emotion recognition method may include:

s101: carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;

the step aims to realize the acquisition of the incoming call voice information, specifically, a calling party can initiate a calling request to a called terminal based on a calling request terminal, and the called terminal can carry out intelligent conversation, namely intelligent conversation, with the calling request terminal according to the calling request, so that all voice information sent by the calling request terminal in the intelligent conversation process, namely the incoming call voice information, is obtained.

Preferably, the performing an intelligent dialogue with the call request terminal according to the received call request, and obtaining the incoming call voice information may include: acquiring voice information according to the call request; performing character conversion on the voice information to obtain text conversion information; performing feature extraction on the text conversion information to obtain text features; matching and obtaining problem information corresponding to the text features in a preset problem knowledge base; matching and obtaining answer information corresponding to the question information in a preset answer knowledge base; carrying out voice conversion on the answer information to obtain voice reply information; the voice reply information is fed back to the call request end to realize intelligent conversation; and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.

For the above intelligent dialogue process, the preferred embodiment provides a more specific implementation method, where the preset question-answer knowledge base may include a preset question knowledge base and a preset answer knowledge base, and the preset question knowledge base is used to store question information of a plurality of standards set by factory or by user; the preset answer knowledge base is used for storing answer information of a plurality of standards set by factory or user, and it can be understood that the answer information in the preset answer knowledge base and the question information in the preset question knowledge base are in one-to-one correspondence; further, the specific implementation process is as follows:

firstly, answering a call according to a call request, obtaining voice information, and performing character conversion on the voice information to obtain text conversion information, wherein the specific implementation process can be realized through a corresponding voice Recognition model, such as PyTorch-Kaldi (the PyTorch-Kaldi Speech Recognition Toolkit) and the like; secondly, performing feature extraction on the text conversion information by using a pre-established Language model to obtain text features, wherein the preset speech model can be specifically XLNT (Generalized Autoregressive training model based on Language Understanding); then, Semantic matching is carried out on text features in a preset question knowledge base through a preset Semantic matching model, such as a DSSM (deep structured speech model), so as to inquire and obtain standard question information corresponding to the speech information sent by the incoming call request end, and Semantic matching is carried out on the question information in a preset answer knowledge base so as to inquire and obtain standard answer information corresponding to the question information; further, the answer information is subjected to voice conversion through a preset voice synthesis Model, so that voice response information can be obtained, wherein the voice synthesis Model can be specifically a wainet (Audio Model for Raw Audio, original Audio Generation Model), ClariNet (Parallel Wave Generation in End-to-End Text-to-Speech, End-to-End arler Wave-based Text-to-Speech, Parallel Text-to-Speech, full-Parallel Text-to-Speech) Model, and the like; and finally, feeding back the voice reply information to the incoming call request end to realize intelligent conversation. Therefore, the voice information sent by the call request terminal in the whole intelligent conversation process can be counted to obtain the incoming call voice information.

It should be understood that each of the above-mentioned language processing models is only one specific implementation form provided in the embodiments of the present application, and is not unique, and a developer may select the language processing model according to actual needs.

Preferably, the extracting the feature of the text conversion information to obtain the text feature may include: performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information; and performing feature extraction on the processed text conversion information by using a preset language model to obtain text features.

The preferred embodiment provides a more specific text feature extraction method, and specifically, before feature extraction, word segmentation and labeling processing may be performed on text conversion information, and specifically, word segmentation and labeling may be implemented by using a word segmentation and labeling tool, a THULAC (Chinese vocabulary Analyzer); and further, performing feature extraction on the text conversion information after word segmentation and labeling by utilizing XLNET to obtain text features.

Preferably, the speech emotion recognition method may further include: recording the text conversion information, the question information and the answer information to generate a question-answer record; and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.

Specifically, in the intelligent conversation process, text conversion information, question information and answer information can be recorded in real time, and finally a question-answer record is generated and used for updating a preset question knowledge base and a preset answer knowledge base, namely, new generated question information and answer information or updated more standard question information and answer information are respectively added into corresponding knowledge bases, so that voice emotion recognition can be rapidly realized when a call request of the same type is received again later.

S102: performing character conversion on incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;

the method comprises the following steps of obtaining emotional characteristic information through text analysis, wherein the emotional characteristic can be specifically a characteristic word representing emotion, specifically, firstly, text conversion is carried out on incoming call voice information to obtain text information, the text conversion process can specifically refer to a text conversion method in the intelligent conversation process, and the text conversion process is not repeated herein; further, emotion analysis is performed on the text information to obtain emotion characteristics, and a specific implementation process of the emotion characteristics can be implemented based on a preset emotion analysis model, such as XLNet, BERT (Bidirectional Encoder retrieval from transforms), and the like.

S103: carrying out feature extraction on incoming call voice information to obtain tone features;

this step is intended to obtain the mood features by performing feature extraction on the incoming call voice information, which may also be feature words representing emotion, and for this feature extraction process, it may also be implemented by a preset model, for example, by using a WFST (weighted finite-state translator) model in this application.

It can be understood that the execution sequence of S102 and S103 does not affect the implementation of the present technical solution, and the two may be executed simultaneously or sequentially, for example, as shown in fig. 1, in order to ensure the execution efficiency, an implementation manner of synchronous execution of the two is adopted.

S104: integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;

specifically, the emotion characteristics and the mood characteristics can be integrated to obtain the combined characteristic words, so that the emotion analysis is performed on the voice and the text, and the accuracy of the voice emotion recognition result is effectively guaranteed.

S105: matching in a preset emotion information base to obtain emotion categories corresponding to the combined characteristic words;

the method comprises the steps of determining the emotion type of a call request terminal through semantic matching, specifically, establishing an emotion information base in advance for storing various emotion information set by factory or user, and performing semantic matching on combined feature words in the preset emotion information base to inquire and obtain the emotion type of the call request terminal. The semantic matching process can also be implemented based on a corresponding semantic matching model, and reference may be specifically made to the above description, which is not repeated herein.

S106: outputting the emotion classification.

The step aims to realize the output of emotion type information, namely feeding back the determined emotion type to a display interface, wherein the display form of the emotion type on the display interface can be set by a user in a self-defining way, for example, the display form can be' calling number: (ii) a; the emotion of the caller: (ii) a; the incoming call time is as follows: ***".

The speech emotion recognition method provided by the application comprises the steps of firstly carrying out intelligent conversation with a call request end to obtain incoming call speech information when receiving a call request, further converting the incoming call speech information into text information, carrying out emotion analysis on the text information to obtain emotion characteristics, simultaneously carrying out feature extraction on the incoming call speech information to obtain tone characteristics, namely respectively processing the speech and the text to obtain respective corresponding feature information, carrying out feature combination to obtain combined feature words, and finally matching and determining emotion types of a call request party in a preset emotion information base and outputting the emotion types to realize call reminding, so that the realization mode can obtain more effective data information according to the incoming call information sent by the call request end, determine the emotion information of the call request party and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.

On the basis of the above-described embodiment:

as a preferred embodiment, the speech emotion recognition method may further include: and when the emotion types corresponding to the combined characteristic words cannot be matched and obtained in the preset emotion information base, creating a new emotion type according to the combined characteristic words, and outputting the new emotion type.

Specifically, the emotion categories stored in the preset emotion information base are not necessarily complete and comprehensive, so that the problem that the emotion categories corresponding to the combined feature words cannot be matched in the preset emotion information base may exist, and in order to solve the problem, a new emotion category can be created for the call request terminal according to the combined feature words, so that the new emotion category is output.

As a preferred embodiment, the speech emotion recognition method may further include: and adding the new emotion type into the preset emotion information base to update the preset emotion information base.

The method aims to update the preset emotion information base, namely, the newly created emotion types and the corresponding feature keywords are correspondingly added into the preset emotion information base, so that voice emotion recognition can be rapidly realized when the call requests of the same type are received again in the following step.

As a preferred embodiment, the speech emotion recognition method may further include: generating suggestion information of answering the call according to the emotion type; and outputting the recommendation information.

The embodiment aims to generate corresponding advice information according to the determined emotion category of the call request terminal, so that the requested party performs corresponding operation according to the advice information, for example, when the emotion category of the call request terminal is determined to be "anxious", the advice information of "emergency call back" can be generated and output to the display interface, so that the callee can call back to the call request terminal in time.

As a preferred embodiment, the speech emotion recognition method may further include: and carrying out voice playing on the suggestion information through the broadcaster.

The preferred embodiment provides a more specific output mode of the advice information, namely voice reminding, and specifically, the advice information can be played through a broadcaster and the called party can be reminded in time.

As a preferred embodiment, the speech emotion recognition method may further include: and adjusting the tone mode of the intelligent conversation according to the emotion classification.

The preferred embodiment aims to realize the adjustment of the tone mode in the intelligent conversation process, namely, the self voice mode is adjusted according to the emotion category of the call request terminal, so that more humanized experience is provided for users, and the purpose of improving user friendliness is achieved.

On the basis of the foregoing embodiments, please refer to fig. 2 and fig. 3, fig. 2 is a frame diagram of a speech emotion recognition system provided in the present application, fig. 3 is a frame diagram of an intelligent question and answer module in the speech emotion recognition system provided in the present application, and the following describes the speech emotion recognition method provided in the present application in more detail.

The incoming call information analysis process of the speech emotion recognition system comprises the following steps:

(1) under the condition of an incoming call, a voice emotion recognition system receives an incoming call request (call request) 101, firstly, the call is transferred to an intelligent question and answer module 102, intelligent conversation is carried out with the incoming call according to the existing call theme, a question and answer knowledge base (a preset question knowledge base and a preset answer knowledge base) and the like, and voice recognition is carried out 103;

(2) speech recognition 103 then proceeds to two parallel operational flows: on one hand, text transcription 106 is carried out on the incoming call voice information, and the obtained text information is input to an emotion analysis module 107 for emotion analysis to obtain an emotion analysis result, namely an emotion feature word 108 (the output form can be happy, neutral, too much and the like, and can be given in a proportion form); on the other hand, the voice analysis 104 is directly performed on the incoming call voice information, and the tone feature words 105 (including but not limited to volume, whether the voice is urgent or not, etc.) are extracted and obtained; finally, combining 109 the results obtained in the two parallel steps to obtain a combined feature word;

(3) matching 1010 the combined feature words 109 with an emotion subject database (such as moods, characters and the like) 1011, and if the matching is successful, outputting voice emotion categories (namely moods, characters and the like of calling parties) 1013; if the matching fails, it is selected whether to create a new emotion category 1012 and add it to the emotion topic library 1011 to implement database update, and at the same time, the newly created emotion category is output to the display interface to inform the user of the emotion category determination result 1013 of the caller in the information format such as "incoming number: (xi), incoming call mood: character lattice of incoming call: time of incoming call: and of course, if the calling number belongs to the address book number, the display information can also comprise information such as calling name and the like.

Secondly, an intelligent conversation process of the speech emotion recognition system:

(1) for a speech input (speech information) 201, it is recognized and transcribed into a speech text (text information) 203 using a speech recognition model 202;

(2) performing word segmentation and labeling processing 204 on the voice text 203, and performing analysis and feature extraction on the voice text by using a language model to obtain problem information (text features) 205;

(3) searching the stored information in a question bank (a preset question knowledge bank) 207 and the question information 205 through a semantic matching model to perform semantic matching, and obtaining a question 206 with the highest matching similarity;

(4) searching the stored information in an answer base (a preset answer knowledge base) 209 through a semantic matching model to perform voice matching with the question 206, and obtaining an answer 208 with the highest matching similarity;

(5) performing voice synthesis on the answer information 208 by using a voice synthesis model, and performing voice output 2011; in addition, the full-duplex voice interaction technology 2012 is utilized to combine with the voice input 201 to realize real-time generation response, control the conversation rhythm and real-time recovery of conversation interruption;

(6) and updating the question-answer knowledge base 2010 according to the question-answer record.

It can be seen that the speech emotion recognition method provided by the embodiment of the application, when receiving a call request, performs intelligent dialogue with a call request terminal to obtain incoming call speech information, converts the incoming call speech information into text information, performs emotion analysis on the text information to obtain emotion characteristics, performs feature extraction on the incoming call speech information to obtain tone characteristics, i.e. processes speech and text respectively to obtain respective corresponding feature information, performs feature combination to obtain combined feature words, and finally matches and determines the emotion category of a call request party in a preset emotion information base to output to realize call reminding, thus, the implementation mode can obtain more effective data information according to the incoming call information sent by the call request terminal, determine the emotion information of the call request party, and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.

To solve the above problem, please refer to fig. 4, fig. 4 is a schematic structural diagram of a speech emotion recognition apparatus provided in the present application, where the speech emotion recognition apparatus may include:

the intelligent dialogue module 10 is used for performing intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;

the first feature extraction module 20 is configured to perform character conversion on the incoming call voice information to obtain text information, and perform emotion analysis on the text information to obtain emotion features;

the second feature extraction module 30 is configured to perform feature extraction on the incoming call voice information to obtain a mood feature;

the feature integration module 40 is used for integrating the emotional features and the tone features to obtain combined feature words;

the semantic matching module 50 is used for matching in a preset emotion information base to obtain emotion categories corresponding to the combined feature words;

and an information output module 60 for outputting the emotion classification.

For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.

To solve the above problem, please refer to fig. 5, fig. 5 is a schematic structural diagram of an electronic device for speech emotion recognition provided in the present application, where the electronic device for speech emotion recognition may include:

a memory 1 for storing a computer program;

and the processor 2 is used for realizing any one of the steps of speech emotion recognition when executing the computer program.

As a preferred embodiment, the electronic device for speech emotion recognition may further include a display for displaying the identity category of the call requesting terminal.

For the introduction of the system provided by the present application, please refer to the above method embodiment, which is not described herein again.

To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the steps of any one of the speech emotion recognition methods.

The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.

The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The speech emotion recognition method, the speech emotion recognition apparatus, the electronic device, and the computer-readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and these improvements and modifications also fall into the elements of the protection scope of the claims of the present application.

Claims

1. A speech emotion recognition method, comprising:

outputting the emotion classification.

2. The speech emotion recognition method of claim 1, wherein the obtaining of the incoming call speech information by conducting an intelligent dialogue with the call request terminal according to the received call request comprises:

obtaining voice information according to the call request;

3. The speech emotion recognition method of claim 2, wherein the performing feature extraction on the text conversion information to obtain text features comprises:

4. The speech emotion recognition method of claim 2, further comprising:

5. The speech emotion recognition method of any one of claims 1 to 4, further comprising:

6. The speech emotion recognition method of claim 5, further comprising:

7. The speech emotion recognition method of claim 1, further comprising:

and outputting the recommendation information.

8. The speech emotion recognition method of claim 7, wherein the outputting the recommendation information includes:

9. The speech emotion recognition method of claim 8, further comprising:

10. A speech emotion recognition apparatus, characterized by comprising:

11. An electronic device for speech emotion recognition, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the speech emotion recognition method as claimed in any one of claims 1 to 9 when executing the computer program.

12. The speech emotion recognition electronic device of claim 11, further comprising:

13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the speech emotion recognition method as claimed in any one of claims 1 to 9.