CN110751943A - Voice emotion recognition method and device and related equipment - Google Patents
Voice emotion recognition method and device and related equipment Download PDFInfo
- Publication number
- CN110751943A CN110751943A CN201911082413.8A CN201911082413A CN110751943A CN 110751943 A CN110751943 A CN 110751943A CN 201911082413 A CN201911082413 A CN 201911082413A CN 110751943 A CN110751943 A CN 110751943A
- Authority
- CN
- China
- Prior art keywords
- information
- emotion
- speech
- voice
- incoming call
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 82
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 76
- 230000008451 emotion Effects 0.000 claims abstract description 125
- 238000006243 chemical reaction Methods 0.000 claims abstract description 46
- 230000008569 process Effects 0.000 claims abstract description 26
- 238000000605 extraction Methods 0.000 claims abstract description 25
- 230000036651 mood Effects 0.000 claims abstract description 10
- 230000002996 emotional effect Effects 0.000 claims abstract description 7
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 238000002372 labelling Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 8
- 230000010354 integration Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 12
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Hospice & Palliative Care (AREA)
- General Health & Medical Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
The application discloses a voice emotion recognition method, which comprises the steps of carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information; performing character conversion on incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics; carrying out feature extraction on incoming call voice information to obtain tone features; integrating the emotional characteristics and the mood characteristics to obtain combined characteristic words; matching in a preset emotion information base to obtain emotion categories corresponding to the combined characteristic words; outputting the emotion classification; the voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to the feedback information, meets the intelligent requirement of the user on the telephone communication service, and further improves the user experience. The application also discloses a speech emotion recognition device, electronic equipment and a computer readable storage medium, which all have the beneficial effects.
Description
Technical Field
The present application relates to the field of communications technologies, and in particular, to a speech emotion recognition method, a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium.
Background
The development of communication technology is changing day by day, smart phones are also continuously updated, functions are more and more, meanwhile, along with the increasingly accelerated pace of life of users, the requirements of the users on communication service experience are higher and higher, and more specialized, personalized and intelligent communication services are expected to be obtained. However, the existing telephone communication service can only realize simple functions of incoming call display, harassment interception and the like, and cannot acquire other more effective data information so as to effectively help a user to answer or hang up an incoming call according to actual conditions.
Therefore, how to acquire more effective data information according to the incoming call information so that the user can process the incoming call according to the feedback information meets the intelligent requirement of the user on the telephone communication service, and the improvement of the user experience is a problem to be solved urgently by technical personnel in the field.
Disclosure of Invention
The voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to feedback information, meets the intelligent requirement of the user on telephone communication service, and further improves user experience; another object of the present application is to provide a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium, which also have the above-mentioned advantages.
In order to solve the technical problem, the present application provides a speech emotion recognition method, including:
carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;
performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;
extracting the characteristics of the incoming call voice information to obtain tone characteristics;
integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words;
matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word;
outputting the emotion classification.
Preferably, the performing an intelligent dialogue with the call request terminal according to the received call request, and obtaining the incoming call voice information includes:
obtaining voice information according to the call request;
performing character conversion on the voice information to obtain text conversion information;
performing feature extraction on the text conversion information to obtain text features;
matching and obtaining problem information corresponding to the text features in a preset problem knowledge base;
matching and obtaining answer information corresponding to the question information in a preset answer knowledge base;
carrying out voice conversion on the answer information to obtain voice reply information;
feeding back the voice reply information to the call request terminal to realize the intelligent conversation;
and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.
Preferably, the performing feature extraction on the text conversion information to obtain text features includes:
performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information;
and extracting the characteristics of the processed text conversion information by using a preset language model to obtain the text characteristics.
Preferably, the speech emotion recognition method further includes:
recording the text conversion information, the question information and the answer information to generate a question-answer record;
and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.
Preferably, the speech emotion recognition method further includes:
and when the emotion category corresponding to the combined feature word cannot be obtained in the preset emotion information base in a matching manner, creating a new emotion category according to the combined feature word, and outputting the new emotion category.
Preferably, the speech emotion recognition method further includes:
and adding the new emotion type into the preset emotion information base to update the preset emotion information base.
Preferably, the speech emotion recognition method further includes:
generating suggestion information of answering a call according to the emotion type;
and outputting the recommendation information.
Preferably, the speech emotion recognition method further includes:
and carrying out voice playing on the suggestion information through a broadcaster.
Preferably, the speech emotion recognition method further includes:
and adjusting the tone mode of the intelligent conversation according to the emotion category.
For solving the above technical problem, the present application further provides a speech emotion recognition apparatus, the speech emotion recognition apparatus includes:
the intelligent dialogue module is used for carrying out intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;
the first feature extraction module is used for performing character conversion on the incoming call voice information to obtain text information and performing emotion analysis on the text information to obtain emotion features;
the second feature extraction module is used for extracting features of the incoming call voice information to obtain tone features;
the characteristic integration module is used for integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;
the semantic matching module is used for matching in a preset emotion information base to obtain the emotion category corresponding to the combined feature word;
and the information output module is used for outputting the emotion categories.
In order to solve the above technical problem, the present application further provides an electronic device for speech emotion recognition, where the electronic device for speech emotion recognition includes:
a memory for storing a computer program;
a processor for implementing the steps of any one of the above speech emotion recognition methods when executing the computer program.
Preferably, the electronic device for speech emotion recognition further includes:
and the display is used for displaying the identity category of the calling request terminal.
In order to solve the above technical problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above speech emotion recognition methods.
The speech emotion recognition method comprises the steps of carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call speech information; performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics; extracting the characteristics of the incoming call voice information to obtain tone characteristics; integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words; matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word; outputting the emotion classification.
It can be seen that the speech emotion recognition method provided by the present application, when receiving a call request, first performs an intelligent dialogue with a call request end to obtain incoming call speech information, further converts the incoming call speech information into text information, performs emotion analysis on the text information to obtain emotion characteristics, at the same time, performs feature extraction on the incoming call speech information to obtain tone characteristics, i.e. processes the speech and the text respectively to obtain respective corresponding feature information, then performs feature combination to obtain combined feature words, and finally matches and determines the emotion category of the call request party in a preset emotion information base to output to realize call reminding, thus, the implementation manner can obtain more effective data information according to the incoming call information sent by the call request end, determine the emotion information of the call request party, and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.
The speech emotion recognition device, the electronic device and the computer-readable storage medium provided by the application all have the beneficial effects, and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a speech emotion recognition method provided in the present application;
FIG. 2 is a block diagram of a speech emotion recognition system provided in the present application;
FIG. 3 is a block diagram of an intelligent question-answering module in the speech emotion recognition system provided by the present application;
fig. 4 is a schematic structural diagram of a speech emotion recognition apparatus provided in the present application;
fig. 5 is a schematic structural diagram of an electronic device for speech emotion recognition provided by the present application.
Detailed Description
The core of the application is to provide a voice emotion recognition method, the voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to feedback information, meets the intelligent requirement of the user on telephone communication service, and further improves the user experience; another core of the present application is to provide a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium, which also have the above beneficial effects.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flowchart of a speech emotion recognition method provided in the present application, where the speech emotion recognition method may include:
s101: carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;
the step aims to realize the acquisition of the incoming call voice information, specifically, a calling party can initiate a calling request to a called terminal based on a calling request terminal, and the called terminal can carry out intelligent conversation, namely intelligent conversation, with the calling request terminal according to the calling request, so that all voice information sent by the calling request terminal in the intelligent conversation process, namely the incoming call voice information, is obtained.
Preferably, the performing an intelligent dialogue with the call request terminal according to the received call request, and obtaining the incoming call voice information may include: acquiring voice information according to the call request; performing character conversion on the voice information to obtain text conversion information; performing feature extraction on the text conversion information to obtain text features; matching and obtaining problem information corresponding to the text features in a preset problem knowledge base; matching and obtaining answer information corresponding to the question information in a preset answer knowledge base; carrying out voice conversion on the answer information to obtain voice reply information; the voice reply information is fed back to the call request end to realize intelligent conversation; and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.
For the above intelligent dialogue process, the preferred embodiment provides a more specific implementation method, where the preset question-answer knowledge base may include a preset question knowledge base and a preset answer knowledge base, and the preset question knowledge base is used to store question information of a plurality of standards set by factory or by user; the preset answer knowledge base is used for storing answer information of a plurality of standards set by factory or user, and it can be understood that the answer information in the preset answer knowledge base and the question information in the preset question knowledge base are in one-to-one correspondence; further, the specific implementation process is as follows:
firstly, answering a call according to a call request, obtaining voice information, and performing character conversion on the voice information to obtain text conversion information, wherein the specific implementation process can be realized through a corresponding voice Recognition model, such as PyTorch-Kaldi (the PyTorch-Kaldi Speech Recognition Toolkit) and the like; secondly, performing feature extraction on the text conversion information by using a pre-established Language model to obtain text features, wherein the preset speech model can be specifically XLNT (Generalized Autoregressive training model based on Language Understanding); then, Semantic matching is carried out on text features in a preset question knowledge base through a preset Semantic matching model, such as a DSSM (deep structured speech model), so as to inquire and obtain standard question information corresponding to the speech information sent by the incoming call request end, and Semantic matching is carried out on the question information in a preset answer knowledge base so as to inquire and obtain standard answer information corresponding to the question information; further, the answer information is subjected to voice conversion through a preset voice synthesis Model, so that voice response information can be obtained, wherein the voice synthesis Model can be specifically a wainet (Audio Model for Raw Audio, original Audio Generation Model), ClariNet (Parallel Wave Generation in End-to-End Text-to-Speech, End-to-End arler Wave-based Text-to-Speech, Parallel Text-to-Speech, full-Parallel Text-to-Speech) Model, and the like; and finally, feeding back the voice reply information to the incoming call request end to realize intelligent conversation. Therefore, the voice information sent by the call request terminal in the whole intelligent conversation process can be counted to obtain the incoming call voice information.
It should be understood that each of the above-mentioned language processing models is only one specific implementation form provided in the embodiments of the present application, and is not unique, and a developer may select the language processing model according to actual needs.
Preferably, the extracting the feature of the text conversion information to obtain the text feature may include: performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information; and performing feature extraction on the processed text conversion information by using a preset language model to obtain text features.
The preferred embodiment provides a more specific text feature extraction method, and specifically, before feature extraction, word segmentation and labeling processing may be performed on text conversion information, and specifically, word segmentation and labeling may be implemented by using a word segmentation and labeling tool, a THULAC (Chinese vocabulary Analyzer); and further, performing feature extraction on the text conversion information after word segmentation and labeling by utilizing XLNET to obtain text features.
Preferably, the speech emotion recognition method may further include: recording the text conversion information, the question information and the answer information to generate a question-answer record; and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.
Specifically, in the intelligent conversation process, text conversion information, question information and answer information can be recorded in real time, and finally a question-answer record is generated and used for updating a preset question knowledge base and a preset answer knowledge base, namely, new generated question information and answer information or updated more standard question information and answer information are respectively added into corresponding knowledge bases, so that voice emotion recognition can be rapidly realized when a call request of the same type is received again later.
S102: performing character conversion on incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;
the method comprises the following steps of obtaining emotional characteristic information through text analysis, wherein the emotional characteristic can be specifically a characteristic word representing emotion, specifically, firstly, text conversion is carried out on incoming call voice information to obtain text information, the text conversion process can specifically refer to a text conversion method in the intelligent conversation process, and the text conversion process is not repeated herein; further, emotion analysis is performed on the text information to obtain emotion characteristics, and a specific implementation process of the emotion characteristics can be implemented based on a preset emotion analysis model, such as XLNet, BERT (Bidirectional Encoder retrieval from transforms), and the like.
S103: carrying out feature extraction on incoming call voice information to obtain tone features;
this step is intended to obtain the mood features by performing feature extraction on the incoming call voice information, which may also be feature words representing emotion, and for this feature extraction process, it may also be implemented by a preset model, for example, by using a WFST (weighted finite-state translator) model in this application.
It can be understood that the execution sequence of S102 and S103 does not affect the implementation of the present technical solution, and the two may be executed simultaneously or sequentially, for example, as shown in fig. 1, in order to ensure the execution efficiency, an implementation manner of synchronous execution of the two is adopted.
S104: integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;
specifically, the emotion characteristics and the mood characteristics can be integrated to obtain the combined characteristic words, so that the emotion analysis is performed on the voice and the text, and the accuracy of the voice emotion recognition result is effectively guaranteed.
S105: matching in a preset emotion information base to obtain emotion categories corresponding to the combined characteristic words;
the method comprises the steps of determining the emotion type of a call request terminal through semantic matching, specifically, establishing an emotion information base in advance for storing various emotion information set by factory or user, and performing semantic matching on combined feature words in the preset emotion information base to inquire and obtain the emotion type of the call request terminal. The semantic matching process can also be implemented based on a corresponding semantic matching model, and reference may be specifically made to the above description, which is not repeated herein.
S106: outputting the emotion classification.
The step aims to realize the output of emotion type information, namely feeding back the determined emotion type to a display interface, wherein the display form of the emotion type on the display interface can be set by a user in a self-defining way, for example, the display form can be' calling number: (ii) a; the emotion of the caller: (ii) a; the incoming call time is as follows: ***".
The speech emotion recognition method provided by the application comprises the steps of firstly carrying out intelligent conversation with a call request end to obtain incoming call speech information when receiving a call request, further converting the incoming call speech information into text information, carrying out emotion analysis on the text information to obtain emotion characteristics, simultaneously carrying out feature extraction on the incoming call speech information to obtain tone characteristics, namely respectively processing the speech and the text to obtain respective corresponding feature information, carrying out feature combination to obtain combined feature words, and finally matching and determining emotion types of a call request party in a preset emotion information base and outputting the emotion types to realize call reminding, so that the realization mode can obtain more effective data information according to the incoming call information sent by the call request end, determine the emotion information of the call request party and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.
On the basis of the above-described embodiment:
as a preferred embodiment, the speech emotion recognition method may further include: and when the emotion types corresponding to the combined characteristic words cannot be matched and obtained in the preset emotion information base, creating a new emotion type according to the combined characteristic words, and outputting the new emotion type.
Specifically, the emotion categories stored in the preset emotion information base are not necessarily complete and comprehensive, so that the problem that the emotion categories corresponding to the combined feature words cannot be matched in the preset emotion information base may exist, and in order to solve the problem, a new emotion category can be created for the call request terminal according to the combined feature words, so that the new emotion category is output.
As a preferred embodiment, the speech emotion recognition method may further include: and adding the new emotion type into the preset emotion information base to update the preset emotion information base.
The method aims to update the preset emotion information base, namely, the newly created emotion types and the corresponding feature keywords are correspondingly added into the preset emotion information base, so that voice emotion recognition can be rapidly realized when the call requests of the same type are received again in the following step.
As a preferred embodiment, the speech emotion recognition method may further include: generating suggestion information of answering the call according to the emotion type; and outputting the recommendation information.
The embodiment aims to generate corresponding advice information according to the determined emotion category of the call request terminal, so that the requested party performs corresponding operation according to the advice information, for example, when the emotion category of the call request terminal is determined to be "anxious", the advice information of "emergency call back" can be generated and output to the display interface, so that the callee can call back to the call request terminal in time.
As a preferred embodiment, the speech emotion recognition method may further include: and carrying out voice playing on the suggestion information through the broadcaster.
The preferred embodiment provides a more specific output mode of the advice information, namely voice reminding, and specifically, the advice information can be played through a broadcaster and the called party can be reminded in time.
As a preferred embodiment, the speech emotion recognition method may further include: and adjusting the tone mode of the intelligent conversation according to the emotion classification.
The preferred embodiment aims to realize the adjustment of the tone mode in the intelligent conversation process, namely, the self voice mode is adjusted according to the emotion category of the call request terminal, so that more humanized experience is provided for users, and the purpose of improving user friendliness is achieved.
On the basis of the foregoing embodiments, please refer to fig. 2 and fig. 3, fig. 2 is a frame diagram of a speech emotion recognition system provided in the present application, fig. 3 is a frame diagram of an intelligent question and answer module in the speech emotion recognition system provided in the present application, and the following describes the speech emotion recognition method provided in the present application in more detail.
The incoming call information analysis process of the speech emotion recognition system comprises the following steps:
(1) under the condition of an incoming call, a voice emotion recognition system receives an incoming call request (call request) 101, firstly, the call is transferred to an intelligent question and answer module 102, intelligent conversation is carried out with the incoming call according to the existing call theme, a question and answer knowledge base (a preset question knowledge base and a preset answer knowledge base) and the like, and voice recognition is carried out 103;
(2) speech recognition 103 then proceeds to two parallel operational flows: on one hand, text transcription 106 is carried out on the incoming call voice information, and the obtained text information is input to an emotion analysis module 107 for emotion analysis to obtain an emotion analysis result, namely an emotion feature word 108 (the output form can be happy, neutral, too much and the like, and can be given in a proportion form); on the other hand, the voice analysis 104 is directly performed on the incoming call voice information, and the tone feature words 105 (including but not limited to volume, whether the voice is urgent or not, etc.) are extracted and obtained; finally, combining 109 the results obtained in the two parallel steps to obtain a combined feature word;
(3) matching 1010 the combined feature words 109 with an emotion subject database (such as moods, characters and the like) 1011, and if the matching is successful, outputting voice emotion categories (namely moods, characters and the like of calling parties) 1013; if the matching fails, it is selected whether to create a new emotion category 1012 and add it to the emotion topic library 1011 to implement database update, and at the same time, the newly created emotion category is output to the display interface to inform the user of the emotion category determination result 1013 of the caller in the information format such as "incoming number: (xi), incoming call mood: character lattice of incoming call: time of incoming call: and of course, if the calling number belongs to the address book number, the display information can also comprise information such as calling name and the like.
Secondly, an intelligent conversation process of the speech emotion recognition system:
(1) for a speech input (speech information) 201, it is recognized and transcribed into a speech text (text information) 203 using a speech recognition model 202;
(2) performing word segmentation and labeling processing 204 on the voice text 203, and performing analysis and feature extraction on the voice text by using a language model to obtain problem information (text features) 205;
(3) searching the stored information in a question bank (a preset question knowledge bank) 207 and the question information 205 through a semantic matching model to perform semantic matching, and obtaining a question 206 with the highest matching similarity;
(4) searching the stored information in an answer base (a preset answer knowledge base) 209 through a semantic matching model to perform voice matching with the question 206, and obtaining an answer 208 with the highest matching similarity;
(5) performing voice synthesis on the answer information 208 by using a voice synthesis model, and performing voice output 2011; in addition, the full-duplex voice interaction technology 2012 is utilized to combine with the voice input 201 to realize real-time generation response, control the conversation rhythm and real-time recovery of conversation interruption;
(6) and updating the question-answer knowledge base 2010 according to the question-answer record.
It can be seen that the speech emotion recognition method provided by the embodiment of the application, when receiving a call request, performs intelligent dialogue with a call request terminal to obtain incoming call speech information, converts the incoming call speech information into text information, performs emotion analysis on the text information to obtain emotion characteristics, performs feature extraction on the incoming call speech information to obtain tone characteristics, i.e. processes speech and text respectively to obtain respective corresponding feature information, performs feature combination to obtain combined feature words, and finally matches and determines the emotion category of a call request party in a preset emotion information base to output to realize call reminding, thus, the implementation mode can obtain more effective data information according to the incoming call information sent by the call request terminal, determine the emotion information of the call request party, and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.
To solve the above problem, please refer to fig. 4, fig. 4 is a schematic structural diagram of a speech emotion recognition apparatus provided in the present application, where the speech emotion recognition apparatus may include:
the intelligent dialogue module 10 is used for performing intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;
the first feature extraction module 20 is configured to perform character conversion on the incoming call voice information to obtain text information, and perform emotion analysis on the text information to obtain emotion features;
the second feature extraction module 30 is configured to perform feature extraction on the incoming call voice information to obtain a mood feature;
the feature integration module 40 is used for integrating the emotional features and the tone features to obtain combined feature words;
the semantic matching module 50 is used for matching in a preset emotion information base to obtain emotion categories corresponding to the combined feature words;
and an information output module 60 for outputting the emotion classification.
For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.
To solve the above problem, please refer to fig. 5, fig. 5 is a schematic structural diagram of an electronic device for speech emotion recognition provided in the present application, where the electronic device for speech emotion recognition may include:
a memory 1 for storing a computer program;
and the processor 2 is used for realizing any one of the steps of speech emotion recognition when executing the computer program.
As a preferred embodiment, the electronic device for speech emotion recognition may further include a display for displaying the identity category of the call requesting terminal.
For the introduction of the system provided by the present application, please refer to the above method embodiment, which is not described herein again.
To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the steps of any one of the speech emotion recognition methods.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The speech emotion recognition method, the speech emotion recognition apparatus, the electronic device, and the computer-readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and these improvements and modifications also fall into the elements of the protection scope of the claims of the present application.
Claims (13)
1. A speech emotion recognition method, comprising:
carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;
performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;
extracting the characteristics of the incoming call voice information to obtain tone characteristics;
integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words;
matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word;
outputting the emotion classification.
2. The speech emotion recognition method of claim 1, wherein the obtaining of the incoming call speech information by conducting an intelligent dialogue with the call request terminal according to the received call request comprises:
obtaining voice information according to the call request;
performing character conversion on the voice information to obtain text conversion information;
performing feature extraction on the text conversion information to obtain text features;
matching and obtaining problem information corresponding to the text features in a preset problem knowledge base;
matching and obtaining answer information corresponding to the question information in a preset answer knowledge base;
carrying out voice conversion on the answer information to obtain voice reply information;
feeding back the voice reply information to the call request terminal to realize the intelligent conversation;
and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.
3. The speech emotion recognition method of claim 2, wherein the performing feature extraction on the text conversion information to obtain text features comprises:
performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information;
and extracting the characteristics of the processed text conversion information by using a preset language model to obtain the text characteristics.
4. The speech emotion recognition method of claim 2, further comprising:
recording the text conversion information, the question information and the answer information to generate a question-answer record;
and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.
5. The speech emotion recognition method of any one of claims 1 to 4, further comprising:
and when the emotion category corresponding to the combined feature word cannot be obtained in the preset emotion information base in a matching manner, creating a new emotion category according to the combined feature word, and outputting the new emotion category.
6. The speech emotion recognition method of claim 5, further comprising:
and adding the new emotion type into the preset emotion information base to update the preset emotion information base.
7. The speech emotion recognition method of claim 1, further comprising:
generating suggestion information of answering a call according to the emotion type;
and outputting the recommendation information.
8. The speech emotion recognition method of claim 7, wherein the outputting the recommendation information includes:
and carrying out voice playing on the suggestion information through a broadcaster.
9. The speech emotion recognition method of claim 8, further comprising:
and adjusting the tone mode of the intelligent conversation according to the emotion category.
10. A speech emotion recognition apparatus, characterized by comprising:
the intelligent dialogue module is used for carrying out intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;
the first feature extraction module is used for performing character conversion on the incoming call voice information to obtain text information and performing emotion analysis on the text information to obtain emotion features;
the second feature extraction module is used for extracting features of the incoming call voice information to obtain tone features;
the characteristic integration module is used for integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;
the semantic matching module is used for matching in a preset emotion information base to obtain the emotion category corresponding to the combined feature word;
and the information output module is used for outputting the emotion categories.
11. An electronic device for speech emotion recognition, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the speech emotion recognition method as claimed in any one of claims 1 to 9 when executing the computer program.
12. The speech emotion recognition electronic device of claim 11, further comprising:
and the display is used for displaying the identity category of the calling request terminal.
13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the speech emotion recognition method as claimed in any one of claims 1 to 9.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911082413.8A CN110751943A (en) | 2019-11-07 | 2019-11-07 | Voice emotion recognition method and device and related equipment |
US16/889,823 US11019207B1 (en) | 2019-11-07 | 2020-06-02 | Systems and methods for smart dialogue communication |
US17/238,161 US11323566B2 (en) | 2019-11-07 | 2021-04-22 | Systems and methods for smart dialogue communication |
US17/660,207 US11758047B2 (en) | 2019-11-07 | 2022-04-21 | Systems and methods for smart dialogue communication |
US18/359,075 US20230370549A1 (en) | 2019-11-07 | 2023-07-26 | Systems and methods for smart dialogue communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911082413.8A CN110751943A (en) | 2019-11-07 | 2019-11-07 | Voice emotion recognition method and device and related equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110751943A true CN110751943A (en) | 2020-02-04 |
Family
ID=69282579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911082413.8A Pending CN110751943A (en) | 2019-11-07 | 2019-11-07 | Voice emotion recognition method and device and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110751943A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111324713A (en) * | 2020-02-18 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Automatic replying method and device for conversation, storage medium and computer equipment |
CN111666380A (en) * | 2020-06-12 | 2020-09-15 | 北京百度网讯科技有限公司 | Intelligent calling method, device, equipment and medium |
CN111696556A (en) * | 2020-07-13 | 2020-09-22 | 上海茂声智能科技有限公司 | Method, system, equipment and storage medium for analyzing user conversation emotion |
CN111694938A (en) * | 2020-04-27 | 2020-09-22 | 平安科技(深圳)有限公司 | Emotion recognition-based answering method and device, computer equipment and storage medium |
CN112002348A (en) * | 2020-09-07 | 2020-11-27 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN112148846A (en) * | 2020-08-25 | 2020-12-29 | 北京来也网络科技有限公司 | Reply voice determination method, device, equipment and storage medium combining RPA and AI |
CN112818841A (en) * | 2021-01-29 | 2021-05-18 | 北京搜狗科技发展有限公司 | Method and related device for recognizing user emotion |
CN112908314A (en) * | 2021-01-29 | 2021-06-04 | 深圳通联金融网络科技服务有限公司 | Intelligent voice interaction method and device based on tone recognition |
CN112995422A (en) * | 2021-02-07 | 2021-06-18 | 成都薯片科技有限公司 | Call control method and device, electronic equipment and storage medium |
CN113055523A (en) * | 2021-03-08 | 2021-06-29 | 北京百度网讯科技有限公司 | Crank call interception method and device, electronic equipment and storage medium |
CN113157966A (en) * | 2021-03-15 | 2021-07-23 | 维沃移动通信有限公司 | Display method and device and electronic equipment |
CN113535903A (en) * | 2021-07-19 | 2021-10-22 | 安徽淘云科技股份有限公司 | Emotion guiding method, emotion guiding robot, storage medium and electronic device |
WO2022016580A1 (en) * | 2020-07-21 | 2022-01-27 | 南京智金科技创新服务中心 | Intelligent voice recognition method and device |
CN114093389A (en) * | 2021-11-26 | 2022-02-25 | 重庆凡骄网络科技有限公司 | Speech emotion recognition method and device, electronic equipment and computer readable medium |
CN114222302A (en) * | 2021-12-13 | 2022-03-22 | 北京声智科技有限公司 | Calling method and device for abnormal call, electronic equipment and storage medium |
CN114298019A (en) * | 2021-12-29 | 2022-04-08 | 中国建设银行股份有限公司 | Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product |
CN114758676A (en) * | 2022-04-18 | 2022-07-15 | 哈尔滨理工大学 | Multi-modal emotion recognition method based on deep residual shrinkage network |
US20220375468A1 (en) * | 2021-05-21 | 2022-11-24 | Cogito Corporation | System method and apparatus for combining words and behaviors |
CN116389644A (en) * | 2022-11-10 | 2023-07-04 | 八度云计算(安徽)有限公司 | Outbound system based on big data analysis |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944008A (en) * | 2017-12-08 | 2018-04-20 | 神思电子技术股份有限公司 | A kind of method that Emotion identification is carried out for natural language |
CN109003624A (en) * | 2018-06-29 | 2018-12-14 | 北京百度网讯科技有限公司 | Emotion identification method, apparatus, computer equipment and storage medium |
CN109949071A (en) * | 2019-01-31 | 2019-06-28 | 平安科技(深圳)有限公司 | Products Show method, apparatus, equipment and medium based on voice mood analysis |
-
2019
- 2019-11-07 CN CN201911082413.8A patent/CN110751943A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107944008A (en) * | 2017-12-08 | 2018-04-20 | 神思电子技术股份有限公司 | A kind of method that Emotion identification is carried out for natural language |
CN109003624A (en) * | 2018-06-29 | 2018-12-14 | 北京百度网讯科技有限公司 | Emotion identification method, apparatus, computer equipment and storage medium |
CN109949071A (en) * | 2019-01-31 | 2019-06-28 | 平安科技(深圳)有限公司 | Products Show method, apparatus, equipment and medium based on voice mood analysis |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111324713A (en) * | 2020-02-18 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Automatic replying method and device for conversation, storage medium and computer equipment |
CN111694938A (en) * | 2020-04-27 | 2020-09-22 | 平安科技(深圳)有限公司 | Emotion recognition-based answering method and device, computer equipment and storage medium |
CN111694938B (en) * | 2020-04-27 | 2024-05-14 | 平安科技(深圳)有限公司 | Emotion recognition-based reply method and device, computer equipment and storage medium |
CN111666380A (en) * | 2020-06-12 | 2020-09-15 | 北京百度网讯科技有限公司 | Intelligent calling method, device, equipment and medium |
CN111696556A (en) * | 2020-07-13 | 2020-09-22 | 上海茂声智能科技有限公司 | Method, system, equipment and storage medium for analyzing user conversation emotion |
CN111696556B (en) * | 2020-07-13 | 2023-05-16 | 上海茂声智能科技有限公司 | Method, system, equipment and storage medium for analyzing user dialogue emotion |
WO2022016580A1 (en) * | 2020-07-21 | 2022-01-27 | 南京智金科技创新服务中心 | Intelligent voice recognition method and device |
CN112148846A (en) * | 2020-08-25 | 2020-12-29 | 北京来也网络科技有限公司 | Reply voice determination method, device, equipment and storage medium combining RPA and AI |
CN112002348B (en) * | 2020-09-07 | 2021-12-28 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN112002348A (en) * | 2020-09-07 | 2020-11-27 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN112818841A (en) * | 2021-01-29 | 2021-05-18 | 北京搜狗科技发展有限公司 | Method and related device for recognizing user emotion |
CN112908314A (en) * | 2021-01-29 | 2021-06-04 | 深圳通联金融网络科技服务有限公司 | Intelligent voice interaction method and device based on tone recognition |
CN112995422A (en) * | 2021-02-07 | 2021-06-18 | 成都薯片科技有限公司 | Call control method and device, electronic equipment and storage medium |
CN113055523A (en) * | 2021-03-08 | 2021-06-29 | 北京百度网讯科技有限公司 | Crank call interception method and device, electronic equipment and storage medium |
CN113055523B (en) * | 2021-03-08 | 2022-12-30 | 北京百度网讯科技有限公司 | Crank call interception method and device, electronic equipment and storage medium |
CN113157966A (en) * | 2021-03-15 | 2021-07-23 | 维沃移动通信有限公司 | Display method and device and electronic equipment |
CN113157966B (en) * | 2021-03-15 | 2023-10-31 | 维沃移动通信有限公司 | Display method and device and electronic equipment |
US20220375468A1 (en) * | 2021-05-21 | 2022-11-24 | Cogito Corporation | System method and apparatus for combining words and behaviors |
CN113535903A (en) * | 2021-07-19 | 2021-10-22 | 安徽淘云科技股份有限公司 | Emotion guiding method, emotion guiding robot, storage medium and electronic device |
CN113535903B (en) * | 2021-07-19 | 2024-03-19 | 安徽淘云科技股份有限公司 | Emotion guiding method, emotion guiding robot, storage medium and electronic device |
CN114093389A (en) * | 2021-11-26 | 2022-02-25 | 重庆凡骄网络科技有限公司 | Speech emotion recognition method and device, electronic equipment and computer readable medium |
CN114222302A (en) * | 2021-12-13 | 2022-03-22 | 北京声智科技有限公司 | Calling method and device for abnormal call, electronic equipment and storage medium |
CN114298019A (en) * | 2021-12-29 | 2022-04-08 | 中国建设银行股份有限公司 | Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product |
CN114758676A (en) * | 2022-04-18 | 2022-07-15 | 哈尔滨理工大学 | Multi-modal emotion recognition method based on deep residual shrinkage network |
CN116389644A (en) * | 2022-11-10 | 2023-07-04 | 八度云计算(安徽)有限公司 | Outbound system based on big data analysis |
CN116389644B (en) * | 2022-11-10 | 2023-11-03 | 八度云计算(安徽)有限公司 | Outbound system based on big data analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110751943A (en) | Voice emotion recognition method and device and related equipment | |
US8457964B2 (en) | Detecting and communicating biometrics of recorded voice during transcription process | |
US7987244B1 (en) | Network repository for voice fonts | |
US6895257B2 (en) | Personalized agent for portable devices and cellular phone | |
CN107818798A (en) | Customer service quality evaluating method, device, equipment and storage medium | |
US7877261B1 (en) | Call flow object model in a speech recognition system | |
CN110798578A (en) | Incoming call transaction management method and device and related equipment | |
CN111683175B (en) | Method, device, equipment and storage medium for automatically answering incoming call | |
WO2008084476A2 (en) | Vowel recognition system and method in speech to text applications | |
CN106713111B (en) | Processing method for adding friends, terminal and server | |
CN110428811B (en) | Data processing method and device and electronic equipment | |
KR20150017662A (en) | Method, apparatus and storing medium for text to speech conversion | |
CN112131359A (en) | Intention identification method based on graphical arrangement intelligent strategy and electronic equipment | |
CN110600032A (en) | Voice recognition method and device | |
KR102312993B1 (en) | Method and apparatus for implementing interactive message using artificial neural network | |
CN114818649A (en) | Service consultation processing method and device based on intelligent voice interaction technology | |
JP2020071676A (en) | Speech summary generation apparatus, speech summary generation method, and program | |
CN112420015A (en) | Audio synthesis method, device, equipment and computer readable storage medium | |
EP1317749A1 (en) | Method of and system for improving accuracy in a speech recognition system | |
TW200304638A (en) | Network-accessible speaker-dependent voice models of multiple persons | |
KR20200145776A (en) | Method, apparatus and program of voice correcting synthesis | |
CN108364638A (en) | A kind of voice data processing method, device, electronic equipment and storage medium | |
CN114328867A (en) | Intelligent interruption method and device in man-machine conversation | |
CN112349266A (en) | Voice editing method and related equipment | |
EP3113175A1 (en) | Method for converting text to individual speech, and apparatus for converting text to individual speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200204 |