CN114595692A - Emotion recognition method, system and terminal equipment - Google Patents

Emotion recognition method, system and terminal equipment Download PDF

Info

Publication number
CN114595692A
CN114595692A CN202011427900.6A CN202011427900A CN114595692A CN 114595692 A CN114595692 A CN 114595692A CN 202011427900 A CN202011427900 A CN 202011427900A CN 114595692 A CN114595692 A CN 114595692A
Authority
CN
China
Prior art keywords
information
emotion
voice
emotion recognition
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011427900.6A
Other languages
Chinese (zh)
Inventor
曲道奎
梁亮
张悦
杜振军
王海鹏
杜威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Siasun Industrial Software Research Institute Co Ltd
Original Assignee
Shandong Siasun Industrial Software Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Siasun Industrial Software Research Institute Co Ltd filed Critical Shandong Siasun Industrial Software Research Institute Co Ltd
Priority to CN202011427900.6A priority Critical patent/CN114595692A/en
Publication of CN114595692A publication Critical patent/CN114595692A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The invention is suitable for the technical field of intelligent recognition, and provides an emotion recognition method, an emotion recognition system and terminal equipment, wherein the emotion recognition method comprises the following steps: receiving voice sent by a user, and selecting voice information corresponding to the voice from a pre-established emotion database; extracting character information of the voice information, and storing the character information in a text file form; fusing the feature items of the voice information and the feature items of the character information to obtain fused information; and recognizing the emotion of the user according to a pre-trained emotion recognition model and the fusion information. The method starts from two aspects of language text and voice intonation, analyzes human emotion from multiple angles, extracts features, and realizes accuracy and practicability of emotion recognition of human.

Description

Emotion recognition method, system and terminal equipment
Technical Field
The invention relates to the field of intelligent recognition, in particular to an emotion recognition method, system, terminal equipment and computer readable storage medium.
Background
With the development of modern society psychology, neurology and computer science, the emotion recognition technology achieves remarkable achievement. Emotion recognition combines two fields of speech processing and natural language processing as well as psychological problems such as ethology and cognition. There are three outward manifestations of human emotion: subjective feeling, physiological arousal and behavioral expression. Currently, emotion recognition is known to draw emotion conclusions from respective analysis of facial images, speech and semantics. Although some results are obtained, the behavior of human expression language is expressed by combining facial expression, language expression, voice tone and limb movement, and it is very subjective and subjective to consider only one expression mode as the judgment basis of human emotion.
Early emotion recognition studies were based on feature modeling. According to different behaviours, features are extracted by different methods for analysis, such as: the facial expression emotion recognition is to extract features of collected images according to technologies of image processing, transformation and the like, the emotion recognition of the speech language is to use a vocabulary-based method, model building is carried out depending on vocabulary resources, and emotion recognition is carried out by mining a large amount of emotion texts and keywords, and the methods are time-consuming, complex and low in accuracy. With the explosive emergence of deep learning technology, neural network models are gradually applied to various research fields, more accurate results can be obtained by processing and analyzing images through deep learning, the most important point of deep learning is that a large number of features with strong generalization capability are needed, but emotion recognition only through a single behavior feature is not suitable for a deep learning method.
Therefore, a new technical solution is needed to solve the above technical problems.
Disclosure of Invention
In view of this, embodiments of the present invention provide an emotion recognition method, system and terminal device, by which human emotion can be accurately recognized.
A first aspect of an embodiment of the present invention provides an emotion recognition method, where the emotion recognition method includes:
receiving voice sent by a user, and selecting voice information corresponding to the voice from a pre-established emotion database;
extracting character information of the voice information, and storing the character information in a text file form;
fusing the feature items of the voice information and the feature items of the character information to obtain fused information;
and recognizing the emotion of the user according to a pre-trained emotion recognition model and the fusion information.
Optionally, in another embodiment provided by the present application, the fusing the feature item of the voice information and the feature item of the text information to obtain fused information includes:
the voice information and the character information are used as input of convolution nerves, and a feature vector of the voice information and a feature vector of the character information are respectively extracted;
and fusing the feature vector of the voice information and the feature vector of the text information to obtain a fused vector, and taking the fused vector as the fusion information.
Optionally, in another embodiment provided by the present application, the extracting the feature vector of the speech information and the feature vector of the text information respectively includes:
and extracting energy and sound waves in the voice information and extracting keywords and semantic features in the text information.
Optionally, in another embodiment provided by the present application, the pre-established emotion databases include a CASIA chinese emotion database and an accocpus series chinese emotion database.
Optionally, in another embodiment provided by the present application, the emotion recognition model is an emotion recognition model obtained through long and short memory network training.
A second aspect of an embodiment of the present invention provides an emotion recognition system, including:
the receiving module is used for receiving voice sent by a user and selecting voice information corresponding to the voice from a pre-established emotion database;
the extraction module is used for extracting the character information of the voice information and storing the character information in a text file form;
the fusion module is used for fusing the feature items of the voice information and the feature items of the text information to obtain fusion information;
and the recognition module is used for recognizing the emotion of the user according to a pre-trained emotion recognition model and the fusion information.
Optionally, in another embodiment provided by the present application, the fusion module is specifically configured to:
the voice information and the character information are used as input of convolution nerves, and a feature vector of the voice information and a feature vector of the character information are respectively extracted;
and fusing the feature vector of the voice information and the feature vector of the text information to obtain a fused vector, and taking the fused vector as the fusion information.
Optionally, in another embodiment provided by the present application, the separately extracting the feature vector of the speech information and the feature vector of the text information includes:
and extracting energy and sound waves in the voice information and extracting keywords and semantic features in the text information.
A third aspect of embodiments of the present invention provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method of any one of the first aspect when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method mentioned in any one of the above first aspects.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: the invention provides a sentiment recognition method based on semantics, which fuses the semantics of human sentiment expression modes. Since human speech is an important behavior signal reflecting human emotion, the research of emotion recognition based on speech language is most suitable for human emotion expression habits. Therefore, the invention starts from two aspects of language text and voice intonation, analyzes human emotion from multiple angles, extracts characteristics and realizes the accuracy and practicability of emotion recognition of human.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic flowchart of an emotion recognition method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an emotion recognition system according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of an emotion recognition system provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal device according to a third embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
In order to illustrate the technical means of the present invention, the following description is given by way of specific examples.
Example one
Fig. 1 is a schematic flow chart of an emotion recognition method provided in an embodiment of the present invention, where the method may include the following steps:
s101: receiving voice sent by a user, and selecting voice information corresponding to the voice from a pre-established emotion database.
The pre-established emotion databases comprise a CASIA Chinese emotion database and an ACCOPus series Chinese emotion database.
In the step, a voice emotion database is collected in advance, and a CASIA Chinese emotion database and an ACCOPus series Chinese emotion database are popular in China at present. The CASIA Chinese emotion database is 500 different texts recorded by two men and two women, and is divided into six types of emotions: happy, sad, angry, surprised, neutral, and frightened. Compared with a CASIA Chinese emotion database, the ACCorpus series Chinese emotion database is richer and more representative, and a voice sub-database of the ACCorpus series Chinese emotion database is formed by 25 men and 25 women who have 5 types of emotions: neutral, happy, angry, fear and sadness, the performance is obtained respectively, the emotion is more full, and the distinction is easier.
S102: and extracting the character information of the voice information, and storing the character information in a text file form.
S103: and fusing the characteristic items of the voice information and the characteristic items of the character information to obtain fused information.
The extracting the feature vector of the voice information and the feature vector of the text information respectively includes:
and extracting energy and sound waves in the voice information and extracting keywords and semantic features in the text information.
The fusing the feature items of the voice information and the feature items of the text information to obtain fused information, including:
the voice information and the character information are used as input of convolution nerves, and a feature vector of the voice information and a feature vector of the character information are respectively extracted;
and fusing the feature vector of the voice information and the feature vector of the text information to obtain a fused vector, and taking the fused vector as the fusion information.
Taking a voice as an example, in order to realize the fusion of a language text and a voice tone and obtain a final fusion semantic feature, text first performs character conversion on the voice and stores the converted voice into a text file, wherein, a plurality of speech-to-text software can be used for conversion, for example: fly-by-fly, WeChat, Google Cloud speed-to-Text, watson speed-to-Text, etc., gave the most accurate conversion results. In order to realize feature fusion, the patent respectively uses a voice file and a text file as the input of a Convolutional Neural Network (CNN), respectively extracts audio features such as energy and sound waves in voice and semantic features such as keywords, front and back context dependence and the like in text, and then performs feature fusion on the two groups of features, so that the purpose of providing more complete voice semantic features to represent different types of emotions is achieved, and emotion recognition and classification are more accurate and more basis is provided. The feature fusion method is to use Canonical Correlation Analysis (CCA). The typical correlation analysis takes the correlation characteristics between two groups of characteristic vectors as an effective discrimination form, and has the advantages of not only fusing information, but also eliminating redundant information in the characteristics. The method comprises the steps of establishing a correlation function between two groups of feature vectors, and extracting the correlation features of the two groups of feature vectors to be effective discrimination vectors as fused vectors.
S104: and recognizing the emotion of the user according to a pre-trained emotion recognition model and the fusion information. The emotion recognition model is obtained through long and short memory network training.
In the step, after the CCA extracts the fusion features, the Long-Short Time Memory Network (LSTM) is used for training a emotion recognition model. LSTM networks are able to handle very well long term dependencies across features, with the network defaults to remembering longer history information, implemented with "memory gates" in the network. Meanwhile, a 'forgetting gate' is arranged in the network to determine the information to be discarded from each memory unit (also called 'cell state'), and only the key information needs to be remembered to prevent the redundancy and overfitting of network parameters caused by excessive information. Finally, the emotion is identified using the softmax function.
The following description is given with reference to specific examples:
collecting a speech emotion database: the CASIA Chinese emotion database and the ACCorpus series Chinese emotion database are used as training and verification data.
Voice to text. Using speed-to-text software, as follows: the news hears, WeChat, Google Cloud Speech-to-Text, watson Speech-to-Text, convert the Speech data into Text data and save it.
And (4) preprocessing data. Preprocessing the text data: removing stop words, special symbols and numbers, drying words and vectorizing words. Preprocessing voice data: denoising, pre-emphasis, framing, windowing, carrying out end point detection, and extracting characteristic parameters by using MFCC (Mel frequency cepstrum coefficient) as the input of a convolutional neural network.
And (5) extracting and fusing the features. And respectively extracting audio features of the voice data and semantic features of the text data by using a Convolutional Neural Network (CNN) model, and performing feature fusion on the two extracted features by using a typical correlation analysis algorithm to obtain strongly correlated fusion features.
Modeling and identifying. And (3) carrying out emotion recognition training by using a long-time memory network (LSTM) model, wherein the input of the network is the fused characteristic, and the model is classified by using a softmax function to recognize different emotions.
The invention provides a sentiment recognition method based on semantics, which fuses the semantics of human sentiment expression modes. Since human speech is an important behavior signal reflecting human emotion, the research of emotion recognition based on speech language is most suitable for human emotion expression habits. Therefore, the invention starts from two aspects of language text and voice intonation, analyzes human emotion from multiple angles, extracts characteristics and realizes the accuracy and the practicability of emotion recognition of human.
Example two
Fig. 2 is a schematic structural diagram of an emotion recognition system according to a second embodiment of the present invention, and only a part related to the second embodiment of the present invention is shown for convenience of description. Fig. 3 shows a flow diagram of an emotion recognition system provided by the present application.
The fault detection system can be a software unit, a hardware unit or a combination unit which is built in the robot, and can also be integrated into the computer or other terminals as an independent pendant.
The emotion recognition system includes:
the receiving module 21 is configured to receive a voice sent by a user, and select voice information corresponding to the voice from a pre-established emotion database;
the extraction module 22 is configured to extract text information of the voice information, and store the text information in a form of a text file;
the fusion module 23 is configured to fuse the feature items of the voice information and the feature items of the text information to obtain fusion information;
and the recognition module 24 is used for recognizing the emotion of the user according to the pre-trained emotion recognition model and the fusion information.
Optionally, in another embodiment provided by the present application, the fusing the feature item of the voice information and the feature item of the text information to obtain fused information includes:
the voice information and the character information are used as input of convolution nerves, and a feature vector of the voice information and a feature vector of the character information are respectively extracted;
and fusing the feature vector of the voice information and the feature vector of the text information to obtain a fused vector, and taking the fused vector as the fusion information.
Optionally, in another embodiment provided by the present application, the separately extracting the feature vector of the speech information and the feature vector of the text information includes:
and extracting energy and sound waves in the voice information and extracting keywords and semantic features in the text information.
Optionally, in another embodiment provided by the present application, the pre-established emotion databases include a CASIA chinese emotion database and an accorus series chinese emotion database.
Optionally, in another embodiment provided by the present application, the emotion recognition model is an emotion recognition model obtained through long and short memory network training.
The working process of the emotion recognition method system is referred to the implementation process of the emotion recognition method, and is not described herein again.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present invention. As shown in fig. 4, the terminal device 4 of this embodiment includes: a processor 40, a memory 41 and a computer program 42, such as an emotion recognition method program, stored in said memory 41 and executable on said processor 40. The processor 40, when executing the computer program 42, implements the steps of the first embodiment of the method, such as the steps S101 to S104 shown in fig. 1. The processor 40, when executing the computer program 42, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the modules 21 to 24 shown in fig. 4.
Illustratively, the computer program 42 may be partitioned into one or more modules/units that are stored in the memory 41 and executed by the processor 40 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program 42 in the terminal device 4. For example, the computer program 42 may be divided into different modules, and the specific functions of each module are as follows:
the system comprises a setting module, a fault detection module and a fault detection module, wherein the setting module is used for setting fault detection contents of the robot, and the fault detection contents comprise an object to be detected, a detection period and a fault condition;
the detection module is used for detecting whether the object to be detected reaches a fault condition according to the detection period to obtain a detection result;
and the recording module is used for recording the running state of the object to be detected in an xml form according to the detection result.
The terminal device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The terminal device may include, but is not limited to, a processor 40, a memory 41. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 4 and does not constitute a limitation of terminal device 4 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 40 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 41 may be an internal storage unit of the terminal device 4, such as a hard disk or a memory of the terminal device 4. The memory 41 may also be an external storage device of the terminal device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 4. Further, the memory 41 may also include both an internal storage unit and an external storage device of the terminal device 4. The memory 41 is used for storing the computer program and other programs and data required by the terminal device. The memory 41 may also be used to temporarily store data that has been output or is to be output.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.
Those of ordinary skill in the art would appreciate that the modules, elements, and/or method steps of the various embodiments described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An emotion recognition method, characterized in that the emotion recognition method includes:
receiving voice sent by a user, and selecting voice information corresponding to the voice from a pre-established emotion database;
extracting character information of the voice information, and storing the character information in a text file form;
fusing the feature items of the voice information and the feature items of the character information to obtain fused information;
and recognizing the emotion of the user according to a pre-trained emotion recognition model and the fusion information.
2. The emotion recognition method of claim 1, wherein the fusing the feature items of the speech information and the feature items of the text information to obtain fused information includes:
the voice information and the character information are used as input of convolution nerves, and a feature vector of the voice information and a feature vector of the character information are respectively extracted;
and fusing the feature vector of the voice information and the feature vector of the text information to obtain a fused vector, and taking the fused vector as the fusion information.
3. The emotion recognition method of claim 2, wherein the extracting the feature vector of the speech information and the feature vector of the text information respectively comprises:
and extracting energy and sound waves in the voice information and extracting keywords and semantic features in the text information.
4. The emotion recognition method of claim 1, wherein the pre-established emotion databases comprise a CASIA chinese emotion database and an accoopus series chinese emotion database.
5. The emotion recognition method according to any one of claims 1 to 4, wherein the emotion recognition model is an emotion recognition model obtained by long-short memory network training.
6. An emotion recognition system, characterized in that the emotion recognition system includes:
the receiving module is used for receiving voice sent by a user and selecting voice information corresponding to the voice from a pre-established emotion database;
the extraction module is used for extracting the character information of the voice information and storing the character information in a text file form;
the fusion module is used for fusing the feature items of the voice information and the feature items of the text information to obtain fusion information;
and the recognition module is used for recognizing the emotion of the user according to a pre-trained emotion recognition model and the fusion information.
7. The emotion recognition system of claim 6, wherein the fusion module is specifically configured to:
the voice information and the character information are used as input of convolution nerves, and a feature vector of the voice information and a feature vector of the character information are respectively extracted;
and fusing the feature vector of the voice information and the feature vector of the text information to obtain a fused vector, and taking the fused vector as the fusion information.
8. The emotion recognition system of claim 7, wherein the extracting the feature vector of the speech information and the feature vector of the text information respectively comprises:
and extracting energy and sound waves in the voice information and extracting keywords and semantic features in the text information.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.
CN202011427900.6A 2020-12-07 2020-12-07 Emotion recognition method, system and terminal equipment Pending CN114595692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011427900.6A CN114595692A (en) 2020-12-07 2020-12-07 Emotion recognition method, system and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011427900.6A CN114595692A (en) 2020-12-07 2020-12-07 Emotion recognition method, system and terminal equipment

Publications (1)

Publication Number Publication Date
CN114595692A true CN114595692A (en) 2022-06-07

Family

ID=81802563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011427900.6A Pending CN114595692A (en) 2020-12-07 2020-12-07 Emotion recognition method, system and terminal equipment

Country Status (1)

Country Link
CN (1) CN114595692A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083434A (en) * 2022-07-22 2022-09-20 平安银行股份有限公司 Emotion recognition method and device, computer equipment and storage medium
CN117371338A (en) * 2023-12-07 2024-01-09 浙江宇宙奇点科技有限公司 AI digital person modeling method and system based on user portrait

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083434A (en) * 2022-07-22 2022-09-20 平安银行股份有限公司 Emotion recognition method and device, computer equipment and storage medium
CN115083434B (en) * 2022-07-22 2022-11-25 平安银行股份有限公司 Emotion recognition method and device, computer equipment and storage medium
CN117371338A (en) * 2023-12-07 2024-01-09 浙江宇宙奇点科技有限公司 AI digital person modeling method and system based on user portrait
CN117371338B (en) * 2023-12-07 2024-03-22 浙江宇宙奇点科技有限公司 AI digital person modeling method and system based on user portrait

Similar Documents

Publication Publication Date Title
CN107680582B (en) Acoustic model training method, voice recognition method, device, equipment and medium
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
US10157619B2 (en) Method and device for searching according to speech based on artificial intelligence
CN112233698B (en) Character emotion recognition method, device, terminal equipment and storage medium
CN113241096B (en) Emotion monitoring device and method
EP4198807A1 (en) Audio processing method and device
CN113243918B (en) Risk detection method and device based on multi-mode hidden information test
CN114595692A (en) Emotion recognition method, system and terminal equipment
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN106710588B (en) Speech data sentence recognition method, device and system
CN116580706B (en) Speech recognition method based on artificial intelligence
CN116775873A (en) Multi-mode dialogue emotion recognition method
Chinmayi et al. Emotion Classification Using Deep Learning
CN111680514A (en) Information processing and model training method, device, equipment and storage medium
CN114974310A (en) Emotion recognition method and device based on artificial intelligence, computer equipment and medium
CN114881668A (en) Multi-mode-based deception detection method
CN114170997A (en) Pronunciation skill detection method, pronunciation skill detection device, storage medium and electronic equipment
CN114141271A (en) Psychological state detection method and system
CN113763934A (en) Training method and device of audio recognition model, storage medium and electronic equipment
CN112712792A (en) Dialect recognition model training method, readable storage medium and terminal device
Bertero et al. Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets
WO2021139737A1 (en) Method and system for man-machine interaction
Tomar et al. CNN-MFCC Model for Speaker Recognition using Emotive Speech
CN116959421B (en) Method and device for processing audio data, audio data processing equipment and medium
Adesola et al. Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination