CN110751943A - Voice emotion recognition method and device and related equipment - Google Patents

Voice emotion recognition method and device and related equipment Download PDF

Info

Publication number
CN110751943A
CN110751943A CN201911082413.8A CN201911082413A CN110751943A CN 110751943 A CN110751943 A CN 110751943A CN 201911082413 A CN201911082413 A CN 201911082413A CN 110751943 A CN110751943 A CN 110751943A
Authority
CN
China
Prior art keywords
information
emotion
speech
voice
incoming call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911082413.8A
Other languages
Chinese (zh)
Inventor
谌明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tonghuashun Intelligent Technology Co Ltd
Original Assignee
Zhejiang Tonghuashun Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Tonghuashun Intelligent Technology Co Ltd filed Critical Zhejiang Tonghuashun Intelligent Technology Co Ltd
Priority to CN201911082413.8A priority Critical patent/CN110751943A/en
Publication of CN110751943A publication Critical patent/CN110751943A/en
Priority to US16/889,823 priority patent/US11019207B1/en
Priority to US17/238,161 priority patent/US11323566B2/en
Priority to US17/660,207 priority patent/US11758047B2/en
Priority to US18/359,075 priority patent/US20230370549A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4936Speech interaction details

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Hospice & Palliative Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

The application discloses a voice emotion recognition method, which comprises the steps of carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information; performing character conversion on incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics; carrying out feature extraction on incoming call voice information to obtain tone features; integrating the emotional characteristics and the mood characteristics to obtain combined characteristic words; matching in a preset emotion information base to obtain emotion categories corresponding to the combined characteristic words; outputting the emotion classification; the voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to the feedback information, meets the intelligent requirement of the user on the telephone communication service, and further improves the user experience. The application also discloses a speech emotion recognition device, electronic equipment and a computer readable storage medium, which all have the beneficial effects.

Description

Voice emotion recognition method and device and related equipment
Technical Field
The present application relates to the field of communications technologies, and in particular, to a speech emotion recognition method, a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium.
Background
The development of communication technology is changing day by day, smart phones are also continuously updated, functions are more and more, meanwhile, along with the increasingly accelerated pace of life of users, the requirements of the users on communication service experience are higher and higher, and more specialized, personalized and intelligent communication services are expected to be obtained. However, the existing telephone communication service can only realize simple functions of incoming call display, harassment interception and the like, and cannot acquire other more effective data information so as to effectively help a user to answer or hang up an incoming call according to actual conditions.
Therefore, how to acquire more effective data information according to the incoming call information so that the user can process the incoming call according to the feedback information meets the intelligent requirement of the user on the telephone communication service, and the improvement of the user experience is a problem to be solved urgently by technical personnel in the field.
Disclosure of Invention
The voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to feedback information, meets the intelligent requirement of the user on telephone communication service, and further improves user experience; another object of the present application is to provide a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium, which also have the above-mentioned advantages.
In order to solve the technical problem, the present application provides a speech emotion recognition method, including:
carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;
performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;
extracting the characteristics of the incoming call voice information to obtain tone characteristics;
integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words;
matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word;
outputting the emotion classification.
Preferably, the performing an intelligent dialogue with the call request terminal according to the received call request, and obtaining the incoming call voice information includes:
obtaining voice information according to the call request;
performing character conversion on the voice information to obtain text conversion information;
performing feature extraction on the text conversion information to obtain text features;
matching and obtaining problem information corresponding to the text features in a preset problem knowledge base;
matching and obtaining answer information corresponding to the question information in a preset answer knowledge base;
carrying out voice conversion on the answer information to obtain voice reply information;
feeding back the voice reply information to the call request terminal to realize the intelligent conversation;
and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.
Preferably, the performing feature extraction on the text conversion information to obtain text features includes:
performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information;
and extracting the characteristics of the processed text conversion information by using a preset language model to obtain the text characteristics.
Preferably, the speech emotion recognition method further includes:
recording the text conversion information, the question information and the answer information to generate a question-answer record;
and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.
Preferably, the speech emotion recognition method further includes:
and when the emotion category corresponding to the combined feature word cannot be obtained in the preset emotion information base in a matching manner, creating a new emotion category according to the combined feature word, and outputting the new emotion category.
Preferably, the speech emotion recognition method further includes:
and adding the new emotion type into the preset emotion information base to update the preset emotion information base.
Preferably, the speech emotion recognition method further includes:
generating suggestion information of answering a call according to the emotion type;
and outputting the recommendation information.
Preferably, the speech emotion recognition method further includes:
and carrying out voice playing on the suggestion information through a broadcaster.
Preferably, the speech emotion recognition method further includes:
and adjusting the tone mode of the intelligent conversation according to the emotion category.
For solving the above technical problem, the present application further provides a speech emotion recognition apparatus, the speech emotion recognition apparatus includes:
the intelligent dialogue module is used for carrying out intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;
the first feature extraction module is used for performing character conversion on the incoming call voice information to obtain text information and performing emotion analysis on the text information to obtain emotion features;
the second feature extraction module is used for extracting features of the incoming call voice information to obtain tone features;
the characteristic integration module is used for integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;
the semantic matching module is used for matching in a preset emotion information base to obtain the emotion category corresponding to the combined feature word;
and the information output module is used for outputting the emotion categories.
In order to solve the above technical problem, the present application further provides an electronic device for speech emotion recognition, where the electronic device for speech emotion recognition includes:
a memory for storing a computer program;
a processor for implementing the steps of any one of the above speech emotion recognition methods when executing the computer program.
Preferably, the electronic device for speech emotion recognition further includes:
and the display is used for displaying the identity category of the calling request terminal.
In order to solve the above technical problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above speech emotion recognition methods.
The speech emotion recognition method comprises the steps of carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call speech information; performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics; extracting the characteristics of the incoming call voice information to obtain tone characteristics; integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words; matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word; outputting the emotion classification.
It can be seen that the speech emotion recognition method provided by the present application, when receiving a call request, first performs an intelligent dialogue with a call request end to obtain incoming call speech information, further converts the incoming call speech information into text information, performs emotion analysis on the text information to obtain emotion characteristics, at the same time, performs feature extraction on the incoming call speech information to obtain tone characteristics, i.e. processes the speech and the text respectively to obtain respective corresponding feature information, then performs feature combination to obtain combined feature words, and finally matches and determines the emotion category of the call request party in a preset emotion information base to output to realize call reminding, thus, the implementation manner can obtain more effective data information according to the incoming call information sent by the call request end, determine the emotion information of the call request party, and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.
The speech emotion recognition device, the electronic device and the computer-readable storage medium provided by the application all have the beneficial effects, and are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a speech emotion recognition method provided in the present application;
FIG. 2 is a block diagram of a speech emotion recognition system provided in the present application;
FIG. 3 is a block diagram of an intelligent question-answering module in the speech emotion recognition system provided by the present application;
fig. 4 is a schematic structural diagram of a speech emotion recognition apparatus provided in the present application;
fig. 5 is a schematic structural diagram of an electronic device for speech emotion recognition provided by the present application.
Detailed Description
The core of the application is to provide a voice emotion recognition method, the voice emotion recognition method can acquire more effective data information according to the incoming call information, is more convenient for a user to process the incoming call according to feedback information, meets the intelligent requirement of the user on telephone communication service, and further improves the user experience; another core of the present application is to provide a speech emotion recognition apparatus, an electronic device, and a computer-readable storage medium, which also have the above beneficial effects.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flowchart of a speech emotion recognition method provided in the present application, where the speech emotion recognition method may include:
s101: carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;
the step aims to realize the acquisition of the incoming call voice information, specifically, a calling party can initiate a calling request to a called terminal based on a calling request terminal, and the called terminal can carry out intelligent conversation, namely intelligent conversation, with the calling request terminal according to the calling request, so that all voice information sent by the calling request terminal in the intelligent conversation process, namely the incoming call voice information, is obtained.
Preferably, the performing an intelligent dialogue with the call request terminal according to the received call request, and obtaining the incoming call voice information may include: acquiring voice information according to the call request; performing character conversion on the voice information to obtain text conversion information; performing feature extraction on the text conversion information to obtain text features; matching and obtaining problem information corresponding to the text features in a preset problem knowledge base; matching and obtaining answer information corresponding to the question information in a preset answer knowledge base; carrying out voice conversion on the answer information to obtain voice reply information; the voice reply information is fed back to the call request end to realize intelligent conversation; and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.
For the above intelligent dialogue process, the preferred embodiment provides a more specific implementation method, where the preset question-answer knowledge base may include a preset question knowledge base and a preset answer knowledge base, and the preset question knowledge base is used to store question information of a plurality of standards set by factory or by user; the preset answer knowledge base is used for storing answer information of a plurality of standards set by factory or user, and it can be understood that the answer information in the preset answer knowledge base and the question information in the preset question knowledge base are in one-to-one correspondence; further, the specific implementation process is as follows:
firstly, answering a call according to a call request, obtaining voice information, and performing character conversion on the voice information to obtain text conversion information, wherein the specific implementation process can be realized through a corresponding voice Recognition model, such as PyTorch-Kaldi (the PyTorch-Kaldi Speech Recognition Toolkit) and the like; secondly, performing feature extraction on the text conversion information by using a pre-established Language model to obtain text features, wherein the preset speech model can be specifically XLNT (Generalized Autoregressive training model based on Language Understanding); then, Semantic matching is carried out on text features in a preset question knowledge base through a preset Semantic matching model, such as a DSSM (deep structured speech model), so as to inquire and obtain standard question information corresponding to the speech information sent by the incoming call request end, and Semantic matching is carried out on the question information in a preset answer knowledge base so as to inquire and obtain standard answer information corresponding to the question information; further, the answer information is subjected to voice conversion through a preset voice synthesis Model, so that voice response information can be obtained, wherein the voice synthesis Model can be specifically a wainet (Audio Model for Raw Audio, original Audio Generation Model), ClariNet (Parallel Wave Generation in End-to-End Text-to-Speech, End-to-End arler Wave-based Text-to-Speech, Parallel Text-to-Speech, full-Parallel Text-to-Speech) Model, and the like; and finally, feeding back the voice reply information to the incoming call request end to realize intelligent conversation. Therefore, the voice information sent by the call request terminal in the whole intelligent conversation process can be counted to obtain the incoming call voice information.
It should be understood that each of the above-mentioned language processing models is only one specific implementation form provided in the embodiments of the present application, and is not unique, and a developer may select the language processing model according to actual needs.
Preferably, the extracting the feature of the text conversion information to obtain the text feature may include: performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information; and performing feature extraction on the processed text conversion information by using a preset language model to obtain text features.
The preferred embodiment provides a more specific text feature extraction method, and specifically, before feature extraction, word segmentation and labeling processing may be performed on text conversion information, and specifically, word segmentation and labeling may be implemented by using a word segmentation and labeling tool, a THULAC (Chinese vocabulary Analyzer); and further, performing feature extraction on the text conversion information after word segmentation and labeling by utilizing XLNET to obtain text features.
Preferably, the speech emotion recognition method may further include: recording the text conversion information, the question information and the answer information to generate a question-answer record; and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.
Specifically, in the intelligent conversation process, text conversion information, question information and answer information can be recorded in real time, and finally a question-answer record is generated and used for updating a preset question knowledge base and a preset answer knowledge base, namely, new generated question information and answer information or updated more standard question information and answer information are respectively added into corresponding knowledge bases, so that voice emotion recognition can be rapidly realized when a call request of the same type is received again later.
S102: performing character conversion on incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;
the method comprises the following steps of obtaining emotional characteristic information through text analysis, wherein the emotional characteristic can be specifically a characteristic word representing emotion, specifically, firstly, text conversion is carried out on incoming call voice information to obtain text information, the text conversion process can specifically refer to a text conversion method in the intelligent conversation process, and the text conversion process is not repeated herein; further, emotion analysis is performed on the text information to obtain emotion characteristics, and a specific implementation process of the emotion characteristics can be implemented based on a preset emotion analysis model, such as XLNet, BERT (Bidirectional Encoder retrieval from transforms), and the like.
S103: carrying out feature extraction on incoming call voice information to obtain tone features;
this step is intended to obtain the mood features by performing feature extraction on the incoming call voice information, which may also be feature words representing emotion, and for this feature extraction process, it may also be implemented by a preset model, for example, by using a WFST (weighted finite-state translator) model in this application.
It can be understood that the execution sequence of S102 and S103 does not affect the implementation of the present technical solution, and the two may be executed simultaneously or sequentially, for example, as shown in fig. 1, in order to ensure the execution efficiency, an implementation manner of synchronous execution of the two is adopted.
S104: integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;
specifically, the emotion characteristics and the mood characteristics can be integrated to obtain the combined characteristic words, so that the emotion analysis is performed on the voice and the text, and the accuracy of the voice emotion recognition result is effectively guaranteed.
S105: matching in a preset emotion information base to obtain emotion categories corresponding to the combined characteristic words;
the method comprises the steps of determining the emotion type of a call request terminal through semantic matching, specifically, establishing an emotion information base in advance for storing various emotion information set by factory or user, and performing semantic matching on combined feature words in the preset emotion information base to inquire and obtain the emotion type of the call request terminal. The semantic matching process can also be implemented based on a corresponding semantic matching model, and reference may be specifically made to the above description, which is not repeated herein.
S106: outputting the emotion classification.
The step aims to realize the output of emotion type information, namely feeding back the determined emotion type to a display interface, wherein the display form of the emotion type on the display interface can be set by a user in a self-defining way, for example, the display form can be' calling number: (ii) a; the emotion of the caller: (ii) a; the incoming call time is as follows: ***".
The speech emotion recognition method provided by the application comprises the steps of firstly carrying out intelligent conversation with a call request end to obtain incoming call speech information when receiving a call request, further converting the incoming call speech information into text information, carrying out emotion analysis on the text information to obtain emotion characteristics, simultaneously carrying out feature extraction on the incoming call speech information to obtain tone characteristics, namely respectively processing the speech and the text to obtain respective corresponding feature information, carrying out feature combination to obtain combined feature words, and finally matching and determining emotion types of a call request party in a preset emotion information base and outputting the emotion types to realize call reminding, so that the realization mode can obtain more effective data information according to the incoming call information sent by the call request end, determine the emotion information of the call request party and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.
On the basis of the above-described embodiment:
as a preferred embodiment, the speech emotion recognition method may further include: and when the emotion types corresponding to the combined characteristic words cannot be matched and obtained in the preset emotion information base, creating a new emotion type according to the combined characteristic words, and outputting the new emotion type.
Specifically, the emotion categories stored in the preset emotion information base are not necessarily complete and comprehensive, so that the problem that the emotion categories corresponding to the combined feature words cannot be matched in the preset emotion information base may exist, and in order to solve the problem, a new emotion category can be created for the call request terminal according to the combined feature words, so that the new emotion category is output.
As a preferred embodiment, the speech emotion recognition method may further include: and adding the new emotion type into the preset emotion information base to update the preset emotion information base.
The method aims to update the preset emotion information base, namely, the newly created emotion types and the corresponding feature keywords are correspondingly added into the preset emotion information base, so that voice emotion recognition can be rapidly realized when the call requests of the same type are received again in the following step.
As a preferred embodiment, the speech emotion recognition method may further include: generating suggestion information of answering the call according to the emotion type; and outputting the recommendation information.
The embodiment aims to generate corresponding advice information according to the determined emotion category of the call request terminal, so that the requested party performs corresponding operation according to the advice information, for example, when the emotion category of the call request terminal is determined to be "anxious", the advice information of "emergency call back" can be generated and output to the display interface, so that the callee can call back to the call request terminal in time.
As a preferred embodiment, the speech emotion recognition method may further include: and carrying out voice playing on the suggestion information through the broadcaster.
The preferred embodiment provides a more specific output mode of the advice information, namely voice reminding, and specifically, the advice information can be played through a broadcaster and the called party can be reminded in time.
As a preferred embodiment, the speech emotion recognition method may further include: and adjusting the tone mode of the intelligent conversation according to the emotion classification.
The preferred embodiment aims to realize the adjustment of the tone mode in the intelligent conversation process, namely, the self voice mode is adjusted according to the emotion category of the call request terminal, so that more humanized experience is provided for users, and the purpose of improving user friendliness is achieved.
On the basis of the foregoing embodiments, please refer to fig. 2 and fig. 3, fig. 2 is a frame diagram of a speech emotion recognition system provided in the present application, fig. 3 is a frame diagram of an intelligent question and answer module in the speech emotion recognition system provided in the present application, and the following describes the speech emotion recognition method provided in the present application in more detail.
The incoming call information analysis process of the speech emotion recognition system comprises the following steps:
(1) under the condition of an incoming call, a voice emotion recognition system receives an incoming call request (call request) 101, firstly, the call is transferred to an intelligent question and answer module 102, intelligent conversation is carried out with the incoming call according to the existing call theme, a question and answer knowledge base (a preset question knowledge base and a preset answer knowledge base) and the like, and voice recognition is carried out 103;
(2) speech recognition 103 then proceeds to two parallel operational flows: on one hand, text transcription 106 is carried out on the incoming call voice information, and the obtained text information is input to an emotion analysis module 107 for emotion analysis to obtain an emotion analysis result, namely an emotion feature word 108 (the output form can be happy, neutral, too much and the like, and can be given in a proportion form); on the other hand, the voice analysis 104 is directly performed on the incoming call voice information, and the tone feature words 105 (including but not limited to volume, whether the voice is urgent or not, etc.) are extracted and obtained; finally, combining 109 the results obtained in the two parallel steps to obtain a combined feature word;
(3) matching 1010 the combined feature words 109 with an emotion subject database (such as moods, characters and the like) 1011, and if the matching is successful, outputting voice emotion categories (namely moods, characters and the like of calling parties) 1013; if the matching fails, it is selected whether to create a new emotion category 1012 and add it to the emotion topic library 1011 to implement database update, and at the same time, the newly created emotion category is output to the display interface to inform the user of the emotion category determination result 1013 of the caller in the information format such as "incoming number: (xi), incoming call mood: character lattice of incoming call: time of incoming call: and of course, if the calling number belongs to the address book number, the display information can also comprise information such as calling name and the like.
Secondly, an intelligent conversation process of the speech emotion recognition system:
(1) for a speech input (speech information) 201, it is recognized and transcribed into a speech text (text information) 203 using a speech recognition model 202;
(2) performing word segmentation and labeling processing 204 on the voice text 203, and performing analysis and feature extraction on the voice text by using a language model to obtain problem information (text features) 205;
(3) searching the stored information in a question bank (a preset question knowledge bank) 207 and the question information 205 through a semantic matching model to perform semantic matching, and obtaining a question 206 with the highest matching similarity;
(4) searching the stored information in an answer base (a preset answer knowledge base) 209 through a semantic matching model to perform voice matching with the question 206, and obtaining an answer 208 with the highest matching similarity;
(5) performing voice synthesis on the answer information 208 by using a voice synthesis model, and performing voice output 2011; in addition, the full-duplex voice interaction technology 2012 is utilized to combine with the voice input 201 to realize real-time generation response, control the conversation rhythm and real-time recovery of conversation interruption;
(6) and updating the question-answer knowledge base 2010 according to the question-answer record.
It can be seen that the speech emotion recognition method provided by the embodiment of the application, when receiving a call request, performs intelligent dialogue with a call request terminal to obtain incoming call speech information, converts the incoming call speech information into text information, performs emotion analysis on the text information to obtain emotion characteristics, performs feature extraction on the incoming call speech information to obtain tone characteristics, i.e. processes speech and text respectively to obtain respective corresponding feature information, performs feature combination to obtain combined feature words, and finally matches and determines the emotion category of a call request party in a preset emotion information base to output to realize call reminding, thus, the implementation mode can obtain more effective data information according to the incoming call information sent by the call request terminal, determine the emotion information of the call request party, and is more convenient for a user to process an incoming call according to feedback information, the intelligent requirement of the user on the telephone communication service is greatly met, and the user experience is further improved.
To solve the above problem, please refer to fig. 4, fig. 4 is a schematic structural diagram of a speech emotion recognition apparatus provided in the present application, where the speech emotion recognition apparatus may include:
the intelligent dialogue module 10 is used for performing intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;
the first feature extraction module 20 is configured to perform character conversion on the incoming call voice information to obtain text information, and perform emotion analysis on the text information to obtain emotion features;
the second feature extraction module 30 is configured to perform feature extraction on the incoming call voice information to obtain a mood feature;
the feature integration module 40 is used for integrating the emotional features and the tone features to obtain combined feature words;
the semantic matching module 50 is used for matching in a preset emotion information base to obtain emotion categories corresponding to the combined feature words;
and an information output module 60 for outputting the emotion classification.
For the introduction of the apparatus provided in the present application, please refer to the above method embodiments, which are not described herein again.
To solve the above problem, please refer to fig. 5, fig. 5 is a schematic structural diagram of an electronic device for speech emotion recognition provided in the present application, where the electronic device for speech emotion recognition may include:
a memory 1 for storing a computer program;
and the processor 2 is used for realizing any one of the steps of speech emotion recognition when executing the computer program.
As a preferred embodiment, the electronic device for speech emotion recognition may further include a display for displaying the identity category of the call requesting terminal.
For the introduction of the system provided by the present application, please refer to the above method embodiment, which is not described herein again.
To solve the above problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, can implement the steps of any one of the speech emotion recognition methods.
The computer-readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
For the introduction of the computer-readable storage medium provided in the present application, please refer to the above method embodiments, which are not described herein again.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The speech emotion recognition method, the speech emotion recognition apparatus, the electronic device, and the computer-readable storage medium provided by the present application are described in detail above. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and these improvements and modifications also fall into the elements of the protection scope of the claims of the present application.

Claims (13)

1. A speech emotion recognition method, comprising:
carrying out intelligent conversation with a call request terminal according to a received call request to obtain incoming call voice information;
performing character conversion on the incoming call voice information to obtain text information, and performing emotion analysis on the text information to obtain emotion characteristics;
extracting the characteristics of the incoming call voice information to obtain tone characteristics;
integrating the emotion characteristics and the mood characteristics to obtain combined characteristic words;
matching in a preset emotion information base to obtain an emotion category corresponding to the combined feature word;
outputting the emotion classification.
2. The speech emotion recognition method of claim 1, wherein the obtaining of the incoming call speech information by conducting an intelligent dialogue with the call request terminal according to the received call request comprises:
obtaining voice information according to the call request;
performing character conversion on the voice information to obtain text conversion information;
performing feature extraction on the text conversion information to obtain text features;
matching and obtaining problem information corresponding to the text features in a preset problem knowledge base;
matching and obtaining answer information corresponding to the question information in a preset answer knowledge base;
carrying out voice conversion on the answer information to obtain voice reply information;
feeding back the voice reply information to the call request terminal to realize the intelligent conversation;
and counting all voice information in the intelligent conversation process to obtain the incoming call voice information.
3. The speech emotion recognition method of claim 2, wherein the performing feature extraction on the text conversion information to obtain text features comprises:
performing word segmentation processing and labeling processing on the text conversion information to obtain processed text conversion information;
and extracting the characteristics of the processed text conversion information by using a preset language model to obtain the text characteristics.
4. The speech emotion recognition method of claim 2, further comprising:
recording the text conversion information, the question information and the answer information to generate a question-answer record;
and updating the preset question knowledge base and the preset answer knowledge base according to the question and answer records.
5. The speech emotion recognition method of any one of claims 1 to 4, further comprising:
and when the emotion category corresponding to the combined feature word cannot be obtained in the preset emotion information base in a matching manner, creating a new emotion category according to the combined feature word, and outputting the new emotion category.
6. The speech emotion recognition method of claim 5, further comprising:
and adding the new emotion type into the preset emotion information base to update the preset emotion information base.
7. The speech emotion recognition method of claim 1, further comprising:
generating suggestion information of answering a call according to the emotion type;
and outputting the recommendation information.
8. The speech emotion recognition method of claim 7, wherein the outputting the recommendation information includes:
and carrying out voice playing on the suggestion information through a broadcaster.
9. The speech emotion recognition method of claim 8, further comprising:
and adjusting the tone mode of the intelligent conversation according to the emotion category.
10. A speech emotion recognition apparatus, characterized by comprising:
the intelligent dialogue module is used for carrying out intelligent dialogue with a call request terminal according to the received call request to obtain incoming call voice information;
the first feature extraction module is used for performing character conversion on the incoming call voice information to obtain text information and performing emotion analysis on the text information to obtain emotion features;
the second feature extraction module is used for extracting features of the incoming call voice information to obtain tone features;
the characteristic integration module is used for integrating the emotional characteristics and the tone characteristics to obtain combined characteristic words;
the semantic matching module is used for matching in a preset emotion information base to obtain the emotion category corresponding to the combined feature word;
and the information output module is used for outputting the emotion categories.
11. An electronic device for speech emotion recognition, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the speech emotion recognition method as claimed in any one of claims 1 to 9 when executing the computer program.
12. The speech emotion recognition electronic device of claim 11, further comprising:
and the display is used for displaying the identity category of the calling request terminal.
13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the speech emotion recognition method as claimed in any one of claims 1 to 9.
CN201911082413.8A 2019-11-07 2019-11-07 Voice emotion recognition method and device and related equipment Pending CN110751943A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201911082413.8A CN110751943A (en) 2019-11-07 2019-11-07 Voice emotion recognition method and device and related equipment
US16/889,823 US11019207B1 (en) 2019-11-07 2020-06-02 Systems and methods for smart dialogue communication
US17/238,161 US11323566B2 (en) 2019-11-07 2021-04-22 Systems and methods for smart dialogue communication
US17/660,207 US11758047B2 (en) 2019-11-07 2022-04-21 Systems and methods for smart dialogue communication
US18/359,075 US20230370549A1 (en) 2019-11-07 2023-07-26 Systems and methods for smart dialogue communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911082413.8A CN110751943A (en) 2019-11-07 2019-11-07 Voice emotion recognition method and device and related equipment

Publications (1)

Publication Number Publication Date
CN110751943A true CN110751943A (en) 2020-02-04

Family

ID=69282579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911082413.8A Pending CN110751943A (en) 2019-11-07 2019-11-07 Voice emotion recognition method and device and related equipment

Country Status (1)

Country Link
CN (1) CN110751943A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324713A (en) * 2020-02-18 2020-06-23 腾讯科技(深圳)有限公司 Automatic replying method and device for conversation, storage medium and computer equipment
CN111666380A (en) * 2020-06-12 2020-09-15 北京百度网讯科技有限公司 Intelligent calling method, device, equipment and medium
CN111696556A (en) * 2020-07-13 2020-09-22 上海茂声智能科技有限公司 Method, system, equipment and storage medium for analyzing user conversation emotion
CN111694938A (en) * 2020-04-27 2020-09-22 平安科技(深圳)有限公司 Emotion recognition-based answering method and device, computer equipment and storage medium
CN112002348A (en) * 2020-09-07 2020-11-27 复旦大学 Method and system for recognizing speech anger emotion of patient
CN112148846A (en) * 2020-08-25 2020-12-29 北京来也网络科技有限公司 Reply voice determination method, device, equipment and storage medium combining RPA and AI
CN112818841A (en) * 2021-01-29 2021-05-18 北京搜狗科技发展有限公司 Method and related device for recognizing user emotion
CN112908314A (en) * 2021-01-29 2021-06-04 深圳通联金融网络科技服务有限公司 Intelligent voice interaction method and device based on tone recognition
CN112995422A (en) * 2021-02-07 2021-06-18 成都薯片科技有限公司 Call control method and device, electronic equipment and storage medium
CN113055523A (en) * 2021-03-08 2021-06-29 北京百度网讯科技有限公司 Crank call interception method and device, electronic equipment and storage medium
CN113157966A (en) * 2021-03-15 2021-07-23 维沃移动通信有限公司 Display method and device and electronic equipment
CN113535903A (en) * 2021-07-19 2021-10-22 安徽淘云科技股份有限公司 Emotion guiding method, emotion guiding robot, storage medium and electronic device
WO2022016580A1 (en) * 2020-07-21 2022-01-27 南京智金科技创新服务中心 Intelligent voice recognition method and device
CN114093389A (en) * 2021-11-26 2022-02-25 重庆凡骄网络科技有限公司 Speech emotion recognition method and device, electronic equipment and computer readable medium
CN114222302A (en) * 2021-12-13 2022-03-22 北京声智科技有限公司 Calling method and device for abnormal call, electronic equipment and storage medium
CN114298019A (en) * 2021-12-29 2022-04-08 中国建设银行股份有限公司 Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product
CN114758676A (en) * 2022-04-18 2022-07-15 哈尔滨理工大学 Multi-modal emotion recognition method based on deep residual shrinkage network
US20220375468A1 (en) * 2021-05-21 2022-11-24 Cogito Corporation System method and apparatus for combining words and behaviors
CN116389644A (en) * 2022-11-10 2023-07-04 八度云计算(安徽)有限公司 Outbound system based on big data analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944008A (en) * 2017-12-08 2018-04-20 神思电子技术股份有限公司 A kind of method that Emotion identification is carried out for natural language
CN109003624A (en) * 2018-06-29 2018-12-14 北京百度网讯科技有限公司 Emotion identification method, apparatus, computer equipment and storage medium
CN109949071A (en) * 2019-01-31 2019-06-28 平安科技(深圳)有限公司 Products Show method, apparatus, equipment and medium based on voice mood analysis

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944008A (en) * 2017-12-08 2018-04-20 神思电子技术股份有限公司 A kind of method that Emotion identification is carried out for natural language
CN109003624A (en) * 2018-06-29 2018-12-14 北京百度网讯科技有限公司 Emotion identification method, apparatus, computer equipment and storage medium
CN109949071A (en) * 2019-01-31 2019-06-28 平安科技(深圳)有限公司 Products Show method, apparatus, equipment and medium based on voice mood analysis

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111324713A (en) * 2020-02-18 2020-06-23 腾讯科技(深圳)有限公司 Automatic replying method and device for conversation, storage medium and computer equipment
CN111694938A (en) * 2020-04-27 2020-09-22 平安科技(深圳)有限公司 Emotion recognition-based answering method and device, computer equipment and storage medium
CN111694938B (en) * 2020-04-27 2024-05-14 平安科技(深圳)有限公司 Emotion recognition-based reply method and device, computer equipment and storage medium
CN111666380A (en) * 2020-06-12 2020-09-15 北京百度网讯科技有限公司 Intelligent calling method, device, equipment and medium
CN111696556A (en) * 2020-07-13 2020-09-22 上海茂声智能科技有限公司 Method, system, equipment and storage medium for analyzing user conversation emotion
CN111696556B (en) * 2020-07-13 2023-05-16 上海茂声智能科技有限公司 Method, system, equipment and storage medium for analyzing user dialogue emotion
WO2022016580A1 (en) * 2020-07-21 2022-01-27 南京智金科技创新服务中心 Intelligent voice recognition method and device
CN112148846A (en) * 2020-08-25 2020-12-29 北京来也网络科技有限公司 Reply voice determination method, device, equipment and storage medium combining RPA and AI
CN112002348B (en) * 2020-09-07 2021-12-28 复旦大学 Method and system for recognizing speech anger emotion of patient
CN112002348A (en) * 2020-09-07 2020-11-27 复旦大学 Method and system for recognizing speech anger emotion of patient
CN112818841A (en) * 2021-01-29 2021-05-18 北京搜狗科技发展有限公司 Method and related device for recognizing user emotion
CN112908314A (en) * 2021-01-29 2021-06-04 深圳通联金融网络科技服务有限公司 Intelligent voice interaction method and device based on tone recognition
CN112995422A (en) * 2021-02-07 2021-06-18 成都薯片科技有限公司 Call control method and device, electronic equipment and storage medium
CN113055523A (en) * 2021-03-08 2021-06-29 北京百度网讯科技有限公司 Crank call interception method and device, electronic equipment and storage medium
CN113055523B (en) * 2021-03-08 2022-12-30 北京百度网讯科技有限公司 Crank call interception method and device, electronic equipment and storage medium
CN113157966A (en) * 2021-03-15 2021-07-23 维沃移动通信有限公司 Display method and device and electronic equipment
CN113157966B (en) * 2021-03-15 2023-10-31 维沃移动通信有限公司 Display method and device and electronic equipment
US20220375468A1 (en) * 2021-05-21 2022-11-24 Cogito Corporation System method and apparatus for combining words and behaviors
CN113535903A (en) * 2021-07-19 2021-10-22 安徽淘云科技股份有限公司 Emotion guiding method, emotion guiding robot, storage medium and electronic device
CN113535903B (en) * 2021-07-19 2024-03-19 安徽淘云科技股份有限公司 Emotion guiding method, emotion guiding robot, storage medium and electronic device
CN114093389A (en) * 2021-11-26 2022-02-25 重庆凡骄网络科技有限公司 Speech emotion recognition method and device, electronic equipment and computer readable medium
CN114222302A (en) * 2021-12-13 2022-03-22 北京声智科技有限公司 Calling method and device for abnormal call, electronic equipment and storage medium
CN114298019A (en) * 2021-12-29 2022-04-08 中国建设银行股份有限公司 Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product
CN114758676A (en) * 2022-04-18 2022-07-15 哈尔滨理工大学 Multi-modal emotion recognition method based on deep residual shrinkage network
CN116389644A (en) * 2022-11-10 2023-07-04 八度云计算(安徽)有限公司 Outbound system based on big data analysis
CN116389644B (en) * 2022-11-10 2023-11-03 八度云计算(安徽)有限公司 Outbound system based on big data analysis

Similar Documents

Publication Publication Date Title
CN110751943A (en) Voice emotion recognition method and device and related equipment
US8457964B2 (en) Detecting and communicating biometrics of recorded voice during transcription process
US7987244B1 (en) Network repository for voice fonts
US6895257B2 (en) Personalized agent for portable devices and cellular phone
CN107818798A (en) Customer service quality evaluating method, device, equipment and storage medium
US7877261B1 (en) Call flow object model in a speech recognition system
CN110798578A (en) Incoming call transaction management method and device and related equipment
CN111683175B (en) Method, device, equipment and storage medium for automatically answering incoming call
WO2008084476A2 (en) Vowel recognition system and method in speech to text applications
CN106713111B (en) Processing method for adding friends, terminal and server
CN110428811B (en) Data processing method and device and electronic equipment
KR20150017662A (en) Method, apparatus and storing medium for text to speech conversion
CN112131359A (en) Intention identification method based on graphical arrangement intelligent strategy and electronic equipment
CN110600032A (en) Voice recognition method and device
KR102312993B1 (en) Method and apparatus for implementing interactive message using artificial neural network
CN114818649A (en) Service consultation processing method and device based on intelligent voice interaction technology
JP2020071676A (en) Speech summary generation apparatus, speech summary generation method, and program
CN112420015A (en) Audio synthesis method, device, equipment and computer readable storage medium
EP1317749A1 (en) Method of and system for improving accuracy in a speech recognition system
TW200304638A (en) Network-accessible speaker-dependent voice models of multiple persons
KR20200145776A (en) Method, apparatus and program of voice correcting synthesis
CN108364638A (en) A kind of voice data processing method, device, electronic equipment and storage medium
CN114328867A (en) Intelligent interruption method and device in man-machine conversation
CN112349266A (en) Voice editing method and related equipment
EP3113175A1 (en) Method for converting text to individual speech, and apparatus for converting text to individual speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200204