CN117851543A - Training method of text emotion recognition model, emotion recognition method and device - Google Patents
Training method of text emotion recognition model, emotion recognition method and device Download PDFInfo
- Publication number
- CN117851543A CN117851543A CN202410027220.7A CN202410027220A CN117851543A CN 117851543 A CN117851543 A CN 117851543A CN 202410027220 A CN202410027220 A CN 202410027220A CN 117851543 A CN117851543 A CN 117851543A
- Authority
- CN
- China
- Prior art keywords
- emotion
- text
- category
- training
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 346
- 230000008909 emotion recognition Effects 0.000 title claims abstract description 205
- 238000000034 method Methods 0.000 title claims abstract description 136
- 230000008451 emotion Effects 0.000 claims abstract description 687
- 239000013598 vector Substances 0.000 claims abstract description 253
- 238000012545 processing Methods 0.000 claims abstract description 51
- 239000000523 sample Substances 0.000 claims description 307
- 230000002996 emotional effect Effects 0.000 claims description 105
- 230000004927 fusion Effects 0.000 claims description 36
- 230000015654 memory Effects 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 19
- 238000013507 mapping Methods 0.000 claims description 19
- 230000011218 segmentation Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 10
- 238000013506 data mapping Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 238000007499 fusion processing Methods 0.000 claims description 3
- 239000000758 substrate Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 34
- 239000003550 marker Substances 0.000 description 19
- 230000008569 process Effects 0.000 description 19
- 230000036651 mood Effects 0.000 description 11
- 238000013515 script Methods 0.000 description 11
- 238000002372 labelling Methods 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000003058 natural language processing Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000000873 masking effect Effects 0.000 description 3
- 101100368725 Bacillus subtilis (strain 168) tagF gene Proteins 0.000 description 2
- 206010010904 Convulsion Diseases 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000036461 convulsion Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 101100412394 Drosophila melanogaster Reg-2 gene Proteins 0.000 description 1
- 206010022998 Irritability Diseases 0.000 description 1
- 206010026749 Mania Diseases 0.000 description 1
- 206010040007 Sense of oppression Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 101150053100 cls1 gene Proteins 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Creation or modification of classes or clusters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
Abstract
The application provides a training method of a text emotion recognition model, an emotion recognition method and a device; the method comprises the following steps: acquiring a first training set, wherein training samples in the first training set comprise text samples and emotion labels of the text samples, and the emotion labels comprise emotion types and emotion intensities of the text samples; the emotion labels of at least one training sample in the first training set include a plurality of emotion categories; carrying out vectorization processing on each training sample to obtain a text vector of each training sample; executing a first training task of a text emotion recognition model based on the text sample vector and the emotion class vector of the text sample; a second training task of the text emotion recognition model is performed based on the text sample vector and the emotion intensity vector of the text sample. According to the method and the device, multiple emotion categories compounded in the text and emotion intensity of each emotion category can be accurately identified.
Description
Technical Field
The present disclosure relates to natural language processing technologies, and in particular, to a training method for a text emotion recognition model, an emotion recognition method and a device.
Background
Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. The natural language processing relates to natural language, namely the language used by people in daily life, and is closely researched with linguistics; and relates to the important technology of model training in the fields of computer science and mathematics and artificial intelligence. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.
Text emotion recognition is a typical application of natural language processing, and related technology trains a machine learning model to learn text classification in a supervised mode, but cannot recognize complex emotion types and emotion intensities from texts to be recognized, so that the recognition accuracy of the text emotion recognition model is low.
Disclosure of Invention
The embodiment of the application provides a training method, an emotion recognition device, electronic equipment, a computer program product and a computer readable storage medium for a text emotion recognition model, which can accurately recognize multiple complicated emotion categories in a text and emotion intensities corresponding to each emotion category.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a training method of a text emotion recognition model, which comprises the following steps:
acquiring a first training set, wherein training samples in the first training set comprise text samples and emotion labels of the text samples, and the emotion labels comprise emotion categories and emotion intensities of the text samples; the emotional tags of at least one of the training samples in the first training set include a plurality of emotional categories;
carrying out vectorization processing on each training sample to obtain a text vector of each training sample, wherein the text vector of each training sample comprises a text sample vector of each text sample, an emotion category vector of each text sample and an emotion intensity vector of each text sample;
executing a first training task of the text emotion recognition model based on the text sample vector and the emotion category vector of the text sample, wherein the first training task is used for training the text emotion recognition model to recognize the emotion category to which the text sample belongs;
and executing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, wherein the second training task is used for training the text emotion recognition model to recognize the emotion intensity of the text sample, and the trained text emotion recognition model is used for recognizing the emotion category and emotion intensity of the text to be processed.
The embodiment of the application provides an emotion recognition method based on a text emotion recognition model, wherein the text emotion recognition model is trained by the training method of the text emotion recognition model, and the method comprises the following steps:
acquiring a text to be identified;
extracting a text vector of the text to be recognized;
and calling the text emotion recognition model based on the text vector of the text to be recognized to obtain an emotion tag of the text to be recognized, wherein the emotion tag of the text to be recognized comprises an emotion category and emotion intensity to which the text to be recognized belongs.
The embodiment of the application provides a training device of a text emotion recognition model, which comprises:
the first acquisition module is used for acquiring a first training set, wherein training samples in the first training set comprise text samples and emotion labels of the text samples, and the emotion labels comprise emotion categories and emotion intensities of the text samples; the emotional tags of at least one of the training samples in the first training set include a plurality of emotional categories;
the data mapping module is used for carrying out vectorization processing on each training sample to obtain a text vector of each training sample, wherein the text vector of each training sample comprises a text sample vector of each text sample, an emotion category vector of each text sample and an emotion intensity vector of each text sample;
The first training module is used for executing a first training task of the text emotion recognition model based on the text sample vector and the emotion category vector of the text sample, wherein the first training task is used for training the text emotion recognition model to recognize the emotion category to which the text sample belongs;
and the second training module is used for executing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, wherein the second training task is used for training the text emotion recognition model to recognize the emotion intensity of the text sample, and the trained text emotion recognition model is used for recognizing the emotion type and emotion intensity of the text to be processed.
The embodiment of the application provides an emotion recognition device based on a text emotion recognition model, wherein the text emotion recognition model is obtained by training through the training method of the text emotion recognition model; the device comprises:
the second acquisition module is used for acquiring the text to be identified;
the vector extraction module is used for extracting the text vector of the text to be identified;
And the emotion recognition module is used for calling the text emotion recognition model based on the text vector of the text to be recognized to obtain an emotion label of the text to be recognized, wherein the emotion label of the text to be recognized comprises an emotion category and emotion intensity of the text to be recognized.
An embodiment of the present application provides an electronic device, including:
a memory for storing computer executable instructions or computer programs;
and the processor is used for realizing the training method of the text emotion recognition model or the emotion recognition method based on the text emotion recognition model when executing the computer executable instructions or the computer programs stored in the memory.
The embodiment of the application provides a computer readable storage medium, which stores a computer program or computer executable instructions for implementing the training method of the text emotion recognition model or the emotion recognition method based on the text emotion recognition model provided by the embodiment of the application when being executed by a processor.
The embodiment of the application provides a computer program product, which comprises a computer program or computer executable instructions, wherein the computer program or the computer executable instructions realize the training method of the text emotion recognition model or the emotion recognition method based on the text emotion recognition model when being executed by a processor.
The embodiment of the application has the following beneficial effects:
by introducing a training sample with one emotion label and a training sample with a plurality of emotion labels at the same time in a training set, when a first training task of a text emotion recognition model is executed based on a text sample vector and an emotion category vector of the text sample, the emotion recognition model can not only distinguish which samples have single emotion categories, but also recognize which samples have diversified emotion categories, so that all the emotion contained in the text sample can be accurately recognized, the problem of unclear recognition due to confusion of the emotion in subtle emotion differences is avoided, and the accuracy and comprehensiveness of emotion recognition of the text emotion recognition model are improved;
the training tasks of emotion type recognition and emotion intensity recognition are decoupled, namely after the emotion recognition model is trained to recognize the accurate emotion type, the emotion intensity is recognized by training the text emotion recognition model based on the text sample vector of the text sample and the emotion intensity vector, and the training task of the text emotion recognition model is gradually deepened.
Drawings
FIG. 1 is a schematic architecture diagram of a training system 100 for a text emotion recognition model provided in an embodiment of the present application;
fig. 2A is a schematic structural diagram of a server 200-1 according to an embodiment of the present application;
FIG. 2B is a schematic diagram of a server 200-2 according to an embodiment of the present disclosure;
FIG. 3A is a first flow chart of a training method of a text emotion recognition model according to an embodiment of the present application;
FIG. 3B is a second flow chart of a training method of a text emotion recognition model according to an embodiment of the present application;
FIG. 3C is a third flow chart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3D is a fourth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3E is a fifth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3F is a sixth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3G is a seventh flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3H is an eighth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
Fig. 3I is a ninth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3J is a tenth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 3K is an eleventh flowchart of a training method of a text emotion recognition model according to an embodiment of the present application;
fig. 4 is a flowchart of an emotion recognition method based on a text emotion recognition model according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a first training task provided in an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a second training task provided by an embodiment of the present application;
FIG. 7 is a training frame diagram of a training method for a text emotion recognition model provided in an embodiment of the present application;
FIG. 8 is an application schematic diagram of a training method of a text emotion recognition model provided in an embodiment of the present application;
FIG. 9 is a diagram of a general fine tuning paradigm of BERT provided by embodiments of the present application;
FIG. 10 is a schematic diagram of the structure of BERT provided in an embodiment of the present application;
FIG. 11 is a schematic diagram of a transducer module according to an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of a multi-headed attention mechanism provided by an embodiment of the present application;
FIG. 13 is a schematic diagram of a general word embedding principle of the BERT model provided in the embodiment of the present application;
FIG. 14 is a schematic diagram of word embedding principles for emotion recognition using a text emotion recognition model provided in an embodiment of the present application;
FIG. 15 is a schematic diagram of output of a text emotion recognition model provided by an embodiment of the present application;
fig. 16 is a schematic structural diagram of an emotion tag provided in an embodiment of the present application;
FIG. 17 is a schematic diagram of a classification head according to an embodiment of the present application;
FIG. 18 is a schematic diagram of an application of text emotion recognition provided by an embodiment of the present application;
FIG. 19 is a graph of emotional profiles for a single session provided by an embodiment of the present application;
FIG. 20 is an emotion trend graph of a single episode of a television show provided by an embodiment of the present application;
fig. 21 is an emotion trend chart of an entire television play provided in an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
If a similar description of "first/second" appears in the application document, the following description is added, in which the terms "first/second/third" merely distinguish similar objects and do not represent a specific ordering of the objects, it being understood that "first/second/third" may, where allowed, interchange a specific order or precedence, so that the embodiments of the application described herein may be implemented in an order other than that illustrated or described herein.
In the present embodiment, the term "module" or "unit" refers to a computer program or a part of a computer program having a predetermined function, and works together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.
Unless defined otherwise, all technical and scientific terms used in the embodiments of the present application have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the embodiments of the application is for the purpose of describing the embodiments of the application only and is not intended to be limiting of the application.
Before further describing embodiments of the present application in detail, the terms and expressions that are referred to in the embodiments of the present application are described, and are suitable for the following explanation.
1) Multi-label classification is a machine learning task in which each sample can be assigned multiple labels. Unlike conventional single-tag classification tasks, the goal of multi-tag classification tasks is to predict multiple relevant tags for each sample.
2) Emotional intensity, which relates to the intensity degree of emotion, the level of emotion and the like, comprises the following steps: mild, moderate and severe conditions such as pleasant mood, happiness and mania.
3) Text multi-label classification, where the input is a piece of text and the output is a set of possible labels. For example, in emotion analysis tasks, text may be classified as positive, negative, or neutral.
There are two main emotion classification methods in the related art: the first is a label flattening classification based on a large model, for example defining 6 classes of multi-label emotion classifications: love, happiness, frightening, anger, fear and fun, wherein the emotion value range is (0, 1,2 and 3), 0 represents no emotion, 1 represents weak emotion intensity, 2 represents medium emotion intensity and 3 represents strong emotion intensity, the labels (1,0,2,0,0,3) are flattened, namely each label adopts 4 0 or 1 to represent 4 emotion intensities, the labels are obtained (0,1,0,0,1,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,1), and model learning is carried out by multi-label two classification, wherein a large model can realize target task learning without massive data, so the method is very friendly to data and training efficiency; the second is label regression prediction based on large models, e.g. defining 6 classes of multi-label emotion classification: love, happiness, convulsion, anger, fear and fun, wherein the emotion value range is (0, 1,2 and 3), 0 indicates no emotion, 1 indicates weak emotion intensity, 2 indicates medium emotion intensity and 3 indicates strong emotion intensity, each label value in the labels (1,0,2,0,0,3) is divided by 3 to obtain label scores within 0-1, and corresponding specific emotion labels and intensities are obtained according to different score mapping.
Both methods force to learn a score, so that fitting is easy to be performed on training data, but the problem of emotion distinguishing caused by inaccurate emotion concentration recognition due to correlation and confusion among emotion labels cannot be solved. The emotion concentration is an abstract concept, and some emotions are mixed in practice, such as surprise and surprise, and the related technology breaks away the fusion of the emotions, so that the model directly learns the value of the emotion concentration, and only knows a result (scoring) and does not know a process (corresponding fusion emotion), thereby the error in scoring the emotion concentration causes the error frequency in service application.
Based on the above analysis, the applicant found that the training method of the text emotion recognition model of the related art cannot accurately recognize a plurality of emotions of the text and emotion intensity of each emotion.
The embodiment of the application provides a training method, an emotion recognition device, an electronic device, a computer readable storage medium and a computer program product for accurately recognizing all emotion types and emotion intensities of each emotion type in a text, and an exemplary application of the electronic device provided by the embodiment of the application is described below. In the following, an exemplary application when the electronic device is implemented as a server will be described.
Referring to fig. 1, fig. 1 is a schematic architecture diagram of a training system 100 for a text emotion recognition model according to an embodiment of the present application, in order to implement a training application for supporting a text emotion recognition model, a terminal 400 is connected to a server 200 through a network 300, where the network 300 may be a wide area network or a local area network, or a combination of the two.
The server 200 is used for training a text emotion recognition model; the text to be recognized sent by the terminal 400 is received, the text to be recognized is recognized through the trained emotion recognition model, and the recognition result comprises at least one emotion type and corresponding emotion intensity to which the text to be recognized belongs. In addition, for each emotion category to which the text to be recognized belongs, the recognition result may further include an emotion sub-category to which the text to be recognized belongs under each emotion category. The server 200 transmits the recognition result to the terminal 400. The terminal 400 receives the recognition result and displays the recognition result on the graphical interface 410.
In some embodiments, the server 200 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiments of the present application.
Referring to fig. 2A, fig. 2A is a schematic structural diagram of a server 200-1 provided in the embodiment of the present application, where the server 200-1 is an implementation manner of the server 200 for training a text emotion recognition model, and the server 200-1 shown in fig. 2A includes: at least one processor 210, a memory 230, and at least one network interface 220. The various components in server 200 are coupled together by bus system 240. It is understood that the bus system 240 is used to enable connected communications between these components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 240 in fig. 2A.
The processor 210 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (Digital Signal Processor, DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
Memory 230 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 230 optionally includes one or more storage devices that are physically remote from processor 210.
Memory 230 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (Random Access Memory, RAM). The memory 230 described in embodiments of the present application is intended to comprise any suitable type of memory.
In some embodiments, memory 230 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 231 including system programs, e.g., a framework layer, a core library layer, a driver layer, etc., for handling various basic system services and performing hardware-related tasks, for implementing various basic services and handling hardware-based tasks;
a network communication module 232 for reaching other electronic devices via one or more (wired or wireless) network interfaces 220, the exemplary network interfaces 220 comprising: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (Universal Serial Bus, USB), etc.;
in some embodiments, the training device for a text emotion recognition model provided in the embodiments of the present application may be implemented in a software manner, and fig. 2A shows a training device 233 for a text emotion recognition model stored in a memory 230, which may be software in the form of a program and a plug-in, and includes the following software modules: the first acquisition module 2331, the data mapping module 2332, the first training module 2333 and the second training module 2334 are logical, and thus may be arbitrarily combined or further split according to the implemented functions. The functions of the respective modules will be described hereinafter.
Referring to fig. 2B, fig. 2B is a schematic structural diagram of a server 200-2 provided in the embodiment of the present application, where the server 200-2 is one implementation of text emotion recognition based on a trained text emotion recognition model by the server 200, and the server 200-2 shown in fig. 2B includes: at least one processor 250, a memory 270, and at least one network interface 260. The various components in server 200-2 are coupled together by bus system 280.
In some embodiments, the emotion recognition device based on the text emotion recognition model provided in the embodiments of the present application may be implemented in a software manner, and fig. 2B shows the emotion recognition device 273 of the text emotion recognition model stored in the memory 270, which may be software in the form of a program and a plug-in, and includes the following software modules: a second acquisition module 2731, a vector extraction module 2732 and an emotion recognition module 2733, which are logical, and thus may be arbitrarily combined or further split depending on the implemented functions. The functions of the respective modules will be described hereinafter.
In some embodiments, the server may implement the training method of the text emotion recognition model or the emotion recognition method of the text emotion recognition model provided in the embodiments of the present application by running various computer-executable instructions or computer programs. For example, the computer-executable instructions may be commands at the micro-program level, machine instructions, or software instructions. The computer program may be a native program or a software module in an operating system; may be a Native application (APPlication, APP). In general, the computer-executable instructions may be any form of instructions and the computer program may be any form of application, module, or plug-in.
The training method of the text emotion recognition model provided by the embodiment of the application will be described with reference to the exemplary application and implementation of the electronic device provided by the embodiment of the application, wherein the electronic device can be implemented as a server or a terminal.
Referring to fig. 3A, fig. 3A is a schematic flow chart of a training method of a text emotion recognition model according to an embodiment of the present application, where the method may be executed by an electronic device, and the electronic device may be the server 200-1 or the terminal 400 described above, and will be described with reference to the steps shown in fig. 3A.
In step 101, a first training set is obtained, wherein training samples in the first training set comprise text samples and emotion tags of the text samples, and the emotion tags comprise emotion categories and emotion intensities of the text samples; the emotion tags of at least one training sample in the first training set comprise a plurality of emotion categories.
In some embodiments, the first training set may be a mixture of training samples of a single emotion category and training samples of multiple emotion categories, or may all be training samples containing multiple emotion categories.
In some embodiments, referring to fig. 3B, fig. 3B is a second flow chart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 101 "acquire first training set" of fig. 3A may be implemented through steps 1011 to 1014 of fig. 3B, which will be described in detail below.
In step 1011, a pre-collected second training set is obtained, wherein the second training set comprises a text sample and an emotional category and an emotional intensity of the text sample.
In step 1012, an emotion sub-category of the text sample under the emotion category is obtained.
In some embodiments, a preset emotion mapping table is obtained, wherein the emotion mapping table comprises mapping relations among different emotion categories, different emotion intensities and different emotion sub-categories, and the mapping relations are queried through the emotion categories and the emotion intensities of the text samples to obtain the emotion sub-categories of the text samples under the emotion categories.
By way of example, as shown in table 1, table 1 embodies the mapping relationship between emotion categories, emotion intensities, and emotion sub-categories. A number of emotion categories are shown, namely emotion category 1 to emotion category 4. Each emotion category comprises a plurality of categories of different emotion intensities, including, in the case of emotion category 4, sub-category 1 with emotion intensity 1, sub-category 2 with emotion intensity 2, sub-category 3 with emotion intensity 3, and sub-category 4 with emotion intensity 4. Each emotion category includes a sub-category of no emotion intensity, sub-category 13.
Label (Label) | Intensity of emotion 1 | Intensity of emotion 2 | Emotional intensity 3 | Strength of no emotion |
Emotion category 1 | Subcategory 1 | Subcategory 2 | Subcategory 3 | Subcategory 13 |
Emotion category 2 | Subcategory 4 | Subcategory 5 | Subcategory 6 | Subcategory 13 |
Emotion class 3 | Subcategory 7 | Subcategory 8 | Subcategory 9 | Subcategory 13 |
Emotion category 4 | Subcategory 10 | Subcategory 11 | Subcategory 12 | Subcategory 13 |
TABLE 1
In some embodiments, emotion sub-categories of the text sample are identified from a plurality of preset emotion sub-categories by a pre-trained dialog model.
For example, in addition to using the emotion mapping table to obtain emotion sub-categories of the text sample, emotion sub-categories of the text sample may also be identified by a pre-trained dialog model. For example: and acquiring a plurality of texts containing emotion, marking emotion sub-categories of the texts on each text as labels, and training a dialogue model (ChatGPT, chat Generative Pre-trained Transformer) based on the marked texts so that the ChatGPT can identify the emotion sub-categories of the text samples.
In step 1013, an emotion tag of the text sample is constituted by the emotion category of the text sample, the emotion intensity of the text sample, and the emotion sub-category of the text sample under the emotion category.
By way of example, the training samples may be in the format of: < text sample, emotion tag >, wherein the data structure of emotion tag is expressed as: < emotion category, emotion intensity, emotion subcategory >.
In step 1014, training samples are composed based on the text samples and the emotion tags, and a first training set is composed based on the plurality of training samples.
In some embodiments, emotion sub-categories are added to enrich the emotion tags on the basis of the first training set originally acquired.
According to the embodiment of the application, the emotion sub-category rich emotion labels are added, so that compared with the level or intensity of emotion can only be identified in the related technology, abstract emotion intensity is visualized, more specific sub-emotion sub-categories of the text can be understood conveniently, and the semantic interpretability of an output result is improved.
With continued reference to fig. 3A, in step 102, vectorization processing is performed on each training sample to obtain a text vector of each training sample, where the text vector of the training sample includes a text sample vector of the text sample, an emotion category vector of the text sample, and an emotion intensity vector of the text sample.
In some embodiments, referring to fig. 3C, fig. 3C is a third flow chart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 102 "vectorizing each training sample to obtain a text vector of each training sample" in fig. 3A may be implemented by performing steps 1021 through 1025 in fig. 3C for each training sample, which is described in detail below.
In step 1021, the training sample is subjected to word segmentation to obtain a plurality of words.
For example, the training samples may be word segmented using a word segmentation algorithm. The word segmentation algorithm refers to segmenting a training sample into individual words, for example: forward maximum matching method, backward maximum matching method, word segmentation method based on character string matching, word segmentation algorithm based on hidden Markov model and the like
In step 1022, the emotion tags in the training samples are replaced with mask tags.
In some embodiments, since the emotion tag can obviously reflect the emotion state of the training sample, the text emotion recognition model is easily disturbed when performing emotion recognition, so that other feature information cannot be learned. To avoid this, in the model training phase, masking of the emotion tag information of the portion in each training sample, that is, masking the emotion tag information partially with Mask codes, is required so that the text emotion recognition model can infer the masked portion.
According to the embodiment of the application, the text emotion recognition model does not depend on the emotion label characteristic information when recognizing emotion, but focuses on the semantic features of the learning training sample, so that the interference of the emotion label characteristic on the text emotion recognition model is eliminated, and the universality of the text emotion recognition model is improved.
In step 1023, using each word of the plurality of words as a normal mark, and sequentially connecting the mask mark and the normal marks corresponding to the plurality of words according to the sequence in the training sample to obtain a mark sequence of the training sample, wherein a start mark is inserted into the head of the mark sequence.
In some embodiments, when content segmentation of multiple training samples is included, a marker sequence may be introduced, a start marker (CLS) inserted at the head of the marker sequence, and a segmentation marker (SEP) inserted between the content segmentation of adjacent training samples.
In step 1024, word embedding processing is performed on the tag sequence, so as to obtain a word embedded vector sequence and a position embedded vector sequence corresponding to the training samples respectively.
In some embodiments, the tag sequence includes a regular tag of a keyword, a mask tag, and an inserted start tag (CLS), and the Word embedding vector is constructed by invoking a Word2Vec Word vector model to perform a Word embedding process on the tags in the tag sequence.
For example, for any tag in the tag sequence, a Word embedding vector may be generated after processing by the Word2Vec model, where the dimension of the vector depends on the number of tags in the tag sequence. For example, the tag sequence has N tags, the length of the sequence is N, and based on the tag sequence where the tags are located, an N-dimensional Word embedding vector corresponding to each tag is generated, wherein the Word embedding vector is generated through the processed Word2Vec model.
For example, if a third tag with a tag sequence of "CLS I like apple" is "like", the word embedding vector generated by the tag is E like Can be described as "[0, E ] like ,0]”。
In some embodiments, each marker has a fixed position in the marker sequence, and the word embedding vectors generated by each marker are connected according to the position of each marker in the marker sequence, so as to obtain a word embedding vector sequence of the marker sequence corresponding to the training sample.
For example, the tag sequence has N tags, the length of the sequence is N,and generating N-dimensional word embedding vectors corresponding to each mark based on the mark sequence of each mark. According to the fixed position of each mark in the mark sequence, N generated by N marks are connected as word vectors, so that an N multiplied by N word embedded vector corresponding to the whole mark sequence can be generated. For example, if the tag sequence is "CLS I like apple" and 4 keywords are used together to generate four tags, the word embedding vector generated by the entire tag sequence is "[ E ] CLS ,E like ,E apple ]”。
In some embodiments, since each tag in the tag sequence (including the regular tag, the mask tag, and the inserted start tag) has a fixed position, a sequence number can be assigned based on the location of the tag, with the sequence number of each tag characterizing the location of the tag throughout the tag sequence. And then carrying out word embedding processing on the position of each mark in the mark sequence to obtain a position embedding vector of each mark, such as a conventional mark, a mask mark and a position embedding vector of an inserted starting mark.
For each tag, the tag's position number is also processed using the Word2Vec model, for example, to obtain a position embedding vector. For example, a certain tag sequence is "CLS I like apple", wherein the position number of the second tag is "I" is 2, and the position embedding vector of the tag "I" obtained by the word embedding process is "[0, E2, 0]".
In some embodiments, since each marker has a corresponding position in the marker sequence, the position vectors generated by each marker are concatenated according to the position of each marker in the marker sequence, thereby obtaining a word embedded vector sequence of the marker sequence corresponding to the training sample.
For each marker, the position numbers of the markers are processed by the Word2Vec model to obtain position embedded vectors, and the vector values of the other markers are marked as 0. For example, a certain tag sequence is "CLS I like apple", the second tag is "I" and the position number is 2, and the position embedding vector E2 obtained by the word embedding process is "[0,2,0,0]". At this time, after word embedding processing is performed on 4 markers in the marker sequence according to the positions, the 4 position embedding vectors are connected according to the position serial numbers, so that the position embedding vector sequence of the marker sequence corresponding to the training sample is "[ E1, E2, E3, E4]".
In step 1025, fusion processing is performed based on the word embedded vector sequence and the position embedded vector sequence to obtain a text vector of the training sample.
In some embodiments, the word embedding vector sequence and the position embedding vector sequence each include each tag, each tag has a fixed position in both sequences, and the length of the sequences is fixed, so that the dimensions of the generated word embedding vector and the position embedding vector are the same, and vector addition can be directly performed, that is, the word embedding vector and the position embedding vector corresponding to each tag are subjected to addition processing, and the addition result is represented as the embedding vector of each tag.
The word embedding vector and the position embedding vector generated for each tag are generated sequentially from the tag's position throughout the tag sequence. The length of the mark sequence where each mark is located is fixed, and the dimension of the embedded vector generated by the mark at each corresponding position is the same, so that the embedded vector representations obtained by each mark can be sequentially connected in sequence, and the text vector of the training sample corresponding to the mark sequence is obtained.
For example, a certain tag sequence is "CLS I like apple", and word embedding processing is performed to obtain a word embedding vector "[ E" CLS ,E I ,E like ,E apple ]"and position embedding vector" [ E1, E2, E3, E4 ]]The addition calculation may be directly performed, so that the text vector that may obtain the tag sequence corresponding to the training sample is expressed as: [ E CLS +E1,E I +E2,E like +E3,E apple +E4]。
According to the method and the device for identifying the text emotion, the context relation among the segmentation words of the training sample is enhanced by carrying out random masking on the emotion labels of the training sample, so that the text emotion identification model is more prone to learning semantic features among the segmentation words, and the identification accuracy of the text emotion identification model is improved.
With continued reference to fig. 3A, in step 103, a first training task of the text emotion recognition model is performed based on the text sample vector and the emotion category vector of the text sample, wherein the first training task is used to train the text emotion recognition model to recognize the emotion category to which the text sample belongs.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a first training task provided in an embodiment of the present application, and a text emotion recognition model includes: a language understanding model and a first classifier. Referring to fig. 3D, fig. 3D is a fourth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application. Step 103 of fig. 3A, "performing a first training task of a text emotion recognition model based on a text sample vector and an emotion category vector of a text sample" may be implemented by steps 1031 to 1034 of fig. 3D, which are described in detail below.
In step 1031, a language understanding model is called for encoding processing based on the text sample vector and the emotion type vector of the text sample, and a first fusion vector is obtained.
In some embodiments, the first fusion vector is obtained by invoking an encoder of the language understanding model. The encoder is mainly for mapping natural language sequences into hidden layers, which are mathematical representations containing natural language sequences.
In step 1032, a first classifier is invoked to classify based on the first fusion vector, and a prediction result of emotion classification is obtained.
In some embodiments, the first fusion vector is input into a first classifier, and an emotion type prediction result of the text sample is obtained by calling the first classifier, wherein the emotion type prediction result refers to a classification prediction result of a plurality of emotion types, that is, a prediction probability of belonging to the emotion type, for example, the emotion type has a prediction probability of 0, and the emotion type does not have a prediction probability of 1.
In step 1033, a first penalty between the emotion category prediction result and the emotion category of the text sample is determined, and parameters of the language understanding model and parameters of the first classifier are updated according to the first penalty.
In some embodiments, referring to fig. 3E, fig. 3E is a fifth flowchart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 1033 "determining a first penalty between emotion category prediction results and emotion categories of the text sample" of fig. 3D may be implemented by performing steps 10331 to 10332 of fig. 3E for each of a plurality of emotion categories, as described in detail below.
In step 10331, a classification penalty between the classification prediction result of the emotion classification of the text sample and the actual classification result of the emotion classification is determined.
The two-classification prediction result of the emotion category of the text sample is the prediction probability of the emotion category, for example, the emotion category belongs to 0 and the emotion category does not belong to 1; the actual classification result of the emotion category is the actual probability of belonging to the emotion category, for example, the prediction probability is 0 if the emotion category belongs, and the prediction probability is 1 if the emotion category does not belong.
As an example of step 10331, training samples in a training set may be divided into a plurality of batches; for the text samples in each training sample in each batch, the following is performed: determining a first logarithm of a classification prediction result of the emotion classification of the text sample; determining a first product of the first logarithm and an actual classification result of the text sample; a first ratio between the first product and the number of training samples comprised by the batch is determined, the first ratio being taken as a categorization penalty between a categorization prediction result and an actual categorization result of the emotion category of the text sample.
In step 10332, a sum of the classification penalty for each emotion category is determined as a first penalty between the predicted emotion category and the emotion category of the text sample.
In some embodiments, if there are multiple emotion categories, after calculating the categorization penalty for each emotion category, the categorization penalty for all emotion categories needs to be summed, with the summed result as the first penalty.
For example, if there are three emotion categories, namely, emotion category 1, emotion category 2 and emotion category 3, and through the above calculation process, the classification loss of emotion category 1 is a1, the classification loss of emotion category 2 is a2, the classification loss of emotion category 3 is a3, then the first loss between the emotion category and the emotion category of the text sample is predicted to be a1+a2+a3.
With continued reference to fig. 3D, in step 1034, the first training task is stopped in response to the language understanding model converging and the first classifier converging.
In some embodiments, when the first loss tends to stabilize and the magnitude of the change is always within the preset range, then both the language understanding model and the first classifier are considered to converge, at which point the first training task is stopped.
For example, the preset range is 0 to 0.1, and if the difference between the current calculation and the last calculation of the first loss is smaller than 0.1, for example, 0.05, the language understanding model and the first classifier are considered to be converged.
According to the method and the device for identifying the emotion type of the text sample, through identifying each emotion type of the text sample, when the text sample contains multiple emotions, all the emotions contained in the text sample can be accurately identified, the condition that the identification is unclear due to the fact that the multiple emotions are confused is avoided, and accuracy and comprehensiveness of emotion identification are improved.
With continued reference to fig. 3A, in step 104, a second training task of the text emotion recognition model is performed based on the text sample vector and the emotion intensity vector of the text sample, wherein the second training task is used for training the text emotion recognition model to recognize the emotion intensity of the text sample, and the trained text emotion recognition model is used for recognizing the emotion category and emotion intensity of the text to be processed.
As shown in fig. 5, the text emotion recognition model also includes an encoder model and a second classifier. Referring to fig. 3F, fig. 3F is a sixth flowchart of a training method of a text emotion recognition model according to an embodiment of the present application. Step 104 "performing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample" of fig. 3A may be implemented by steps 1041 to 1043 of fig. 3F, which is described in detail below.
In step 1041, an encoder model is called for encoding based on the text sample vector and the emotion intensity vector of the text sample, so as to obtain a second fusion vector.
The encoding process of the encoder is the same as the processing process of the first fusion vector described in the above embodiment, and a description thereof will not be repeated here.
In step 1042, a second classifier is invoked to classify based on the second fusion vector to obtain an emotion intensity prediction result.
The classification process of the second classifier is the same as that of the first classifier described in the above embodiment, and a description thereof will not be repeated here.
In step 1043, a second penalty between the emotional intensity prediction result and the emotional intensity of the text sample is determined, and parameters of the encoder model and parameters of the second classifier are updated according to the second penalty.
In some embodiments, the second loss is passed back to calculate gradients of the parameters of the encoder model and the parameters of the second classifier, based on which the parameters are updated, wherein the gradients may be calculated using various gradient descent algorithms to calculate the gradients based on the gradient update parameters (i.e. the original parameter-learning rate-gradient), for example by an adaptive motion estimation algorithm (ADAM, adaptive Moment Estimation).
For example, as shown in fig. 5, the training samples are vectorized to obtain text vectors, the text vectors are input into a language understanding model, a part of the text vectors are used for executing a first training task, namely, the language understanding model is called for coding processing based on the text sample vectors and emotion type vectors to obtain first fusion vectors, and then a first classifier is called for classifying processing based on the first fusion vectors to obtain emotion type prediction results; and the other part of the text vector is used for executing a second training task, namely, calling an encoder model to carry out encoding processing based on the text sample vector and the emotion intensity vector to obtain a second fusion vector, and calling a second classifier to carry out classification processing based on the second fusion vector to obtain an emotion intensity prediction result.
In some embodiments, the emotion tag of the text sample further comprises emotion sub-categories of the text sample under emotion categories, each emotion category comprising a plurality of emotion sub-categories; the text vector of the training sample further includes an emotion sub-category vector of an emotion sub-category; accordingly, step 104 may also be implemented by: and executing a second training task of the text emotion recognition model based on the text sample vector, the emotion intensity vector and the emotion sub-category vector of the text sample, wherein the second training task is further used for recognizing emotion sub-categories of the sample under the emotion category when recognizing the emotion category of the text sample.
As an example, referring to fig. 5, the text emotion recognition model further includes an encoder model, a second classifier, and a third classifier; referring to fig. 3G, fig. 3G is a seventh flowchart of a training method of a text emotion recognition model according to an embodiment of the present application. In the above embodiment, "the second training task of performing the text emotion recognition model based on the text sample vector, the emotion intensity vector, and the emotion sub-category vector" may be implemented through steps 201 to 206 of fig. 3G, which will be described in detail below.
In step 201, an encoder model is called for encoding processing based on a text sample vector, an emotion intensity vector and an emotion sub-category vector of the text sample, so as to obtain a third fusion vector.
The encoding process is the same as the process of calling the language understanding model to perform the encoding process to obtain the first fusion vector.
In step 202, a second classifier is invoked to perform classification processing based on the third fusion vector, so as to obtain an emotion intensity prediction result.
The procedure of calling the second classifier to perform the classification processing is the same as the classification procedure of step 1042 described above.
In step 203, a third classifier is invoked to perform classification processing based on the third fusion vector, so as to obtain an emotion sub-category prediction result.
The procedure of calling the third classifier to perform the classification processing is the same as the classification processing procedure of the second classifier described in the above embodiment.
In step 204, a second penalty is determined between the emotional intensity prediction result and the emotional intensity of the text sample, and a third penalty is determined between the emotional sub-category prediction result and the emotional sub-category of the text sample.
In some embodiments, the emotional intensity prediction result includes predicted emotional intensities of the text sample corresponding to the plurality of emotional categories, and the emotional intensity of the text sample includes actual emotional intensities of the text sample corresponding to the plurality of emotional categories.
In some embodiments, referring to fig. 3H, fig. 3H is an eighth flowchart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 204 "determining a second loss between the emotional intensity prediction result and the emotional intensity of the text sample" of fig. 3G may be implemented by performing steps 2041A through 2042A of fig. 3H for each of a plurality of emotional categories, as described in detail below.
In step 2041A, a loss of emotional intensity between the predicted emotional intensity of the text sample corresponding to the emotional category and the actual emotional intensity of the corresponding emotional category is determined.
In some embodiments, for each emotion category of the text sample, it is desirable to calculate the penalty between the predicted and opportunistic emotion intensities for that emotion category.
In some embodiments, referring to fig. 3I, fig. 3I is a ninth flowchart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 2041A of fig. 3H may be implemented by steps 2041A1 to 2041A3 of fig. 3I, as described in detail below.
In step 2041A1, training samples in a training set are divided into a plurality of batches.
For example, if there are N training samples in the training set, each bs training samples is taken as a batch, and N/bs batches are taken, and each N/bs batch completed represents a round of (epoch) iterations completed.
Steps 2041A2 to 2041A3 are performed for the text samples in each training sample in each batch.
In step 2041A2, a difference between the predicted emotional intensity of the text sample for the emotional category and the actual emotional intensity of the corresponding emotional category is determined.
By way of example, pi represents the predicted emotional intensity of the emotion class to which the text sample corresponds, and pi represents the actual emotional intensity of the emotion class to which the text sample corresponds, the difference between them being pi-pi.
In step 2041A3, a second ratio between the square of the difference and the number of training samples comprised by the batch is determined, the second ratio being taken as the loss of emotional intensity between the predicted emotional intensity and the actual emotional intensity of the text sample corresponding to the emotional category.
Exemplary, p [ i ]]Predictive emotional intensity, si, representing the corresponding emotional category of the text sample]Representing the actual emotional intensity of the emotion category to which the text sample corresponds, the square of the difference between the two being (pi]-s[i]) 2 Each batch includes a number of training samples bs, and for each intra-batch training sample, a calculation (pi]-s[i]) 2 And (3) a second ratio to bs, wherein the second ratio is taken as the loss of the emotion intensity between the predicted emotion intensity and the actual emotion intensity of the emotion category corresponding to the text sample.
With continued reference to fig. 3H, in step 2042A, the sum of the emotional intensity losses for each emotional category is determined as a second loss between the emotional intensity prediction result and the emotional intensity of the text sample.
In some embodiments, the emotion sub-category prediction results include a classification prediction result in which the text sample corresponds to a plurality of emotion sub-categories.
In some embodiments, referring to fig. 3J, fig. 3J is a tenth flowchart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 204 "determining third penalty between emotion sub-category prediction result and emotion sub-category of text sample" of fig. 3G may be implemented by performing steps 2041B-2042B of fig. 3J for each of a plurality of emotion sub-categories, as described in detail below.
In step 2041B, a sub-category two-classification penalty between the two-classification prediction result of the emotion sub-category and the actual classification result of the emotion sub-category of the text sample is determined.
In some embodiments, for each emotion sub-category of the text sample, a sub-category two-category loss between the two-category prediction result and the actual two-category result for that emotion sub-category needs to be calculated.
In some embodiments, the two-class prediction result of the emotion sub-class of the text sample is the prediction probability of belonging to the emotion sub-class, which is 0 and not 1; the actual classification result of the emotion sub-category is the actual probability of belonging to the emotion sub-category, the prediction probability is 0 when belonging to the emotion sub-category, and the prediction probability is 1 when not belonging to the emotion sub-category.
In some embodiments, referring to fig. 3K, fig. 3K is an eleventh flowchart of a training method of a text emotion recognition model provided in an embodiment of the present application. Step 2041B of fig. 3J may be implemented by steps 2041B1 to 2041B4 of fig. 3K, as described in detail below.
In step 2041B1, training samples in a training set are divided into a plurality of batches.
For example, if there are N training samples in the training set, each bs training samples is taken as a batch, and N/bs batches are taken, and each N/bs batch completed represents a round of (epoch) iterations completed.
Steps 2041B2 to 2041B4 are performed for the text samples in each training sample in each batch.
In step 2041B2, a second logarithm of the categorized prediction result of the emotion sub-category is determined, and a second product of the second logarithm and the actual categorized result of the emotion sub-category is determined.
For example, if pi is a binary classification prediction result of the emotion sub-category, the second logarithm is log (pi), and if pi represents an actual binary classification result of the emotion sub-category, the second product may be expressed as pi.
In step 2041B3, a third ratio between the second product and the number of training samples comprised by the batch is determined.
For example, each batch includes bs as the number of training samples, and the third ratio is the quotient of y [ i ]. Times.log (pi ]) divided by bs.
In step 2041B4, the opposite number of the third ratio is determined as a sub-category two-classification penalty between the two-classification prediction result of the emotion sub-category and the actual two-classification result of the emotion sub-category of the text sample.
For example, if the third ratio is m, the opposite number of the third ratio is-m, which is the sub-category two-classification loss between the two-classification prediction result of the emotion sub-category and the actual two-classification result of the emotion sub-category of the text sample.
With continued reference to fig. 3J, in step 2042B, a sum of the sub-category two classification loss for each emotion sub-category is determined as a third loss between the emotion sub-category prediction result and the emotion sub-category of the text sample.
For example, referring to fig. 6, fig. 6 is a schematic structural diagram of a second training task provided in an embodiment of the present application. In fig. 6, the training samples are vectorized to obtain text vectors, the text vectors include emotion sub-category vectors in addition to the text sample vectors, emotion category vectors and emotion intensity vectors, the text vectors are input into a language understanding model, a part of the text vectors are used for executing a first training task, namely, the language understanding model is called for coding processing based on the text sample vectors and the emotion category vectors to obtain first fusion vectors, and then a first classifier is called for classifying processing based on the first fusion vectors to obtain emotion category prediction results; and the other part of the text vector is used for executing a second training task, namely, the encoder model is called for encoding processing based on the text sample vector, the emotion intensity vector and the emotion sub-category vector to obtain a third fusion vector, the second classifier is called for classifying processing based on the third fusion vector to obtain an emotion intensity prediction result, and the third classifier is called for classifying processing based on the third fusion vector to obtain an emotion sub-category prediction result.
With continued reference to fig. 3G, in step 205, the first, second, and third losses are weighted and summed to obtain a total loss.
For example, if L tag Representing the first loss, L reg Representing a second loss, L class Representing the third loss, then for the total loss L total Can be expressed as L total =w 1 *L tag +w 2 *L reg +w 3 *L class Wherein w is 1 Weight of first loss, w 2 Weight of second loss, w 3 The weight of the third penalty.
In step 206, parameters of the encoder model, parameters of the second classifier, and parameters of the third classifier are updated based on the total loss.
In some embodiments, the total loss return network calculates gradients of parameters of the encoder model, parameters of the second classifier, and parameters of the third classifier, respectively, and updates each parameter according to each parameter gradient, wherein the gradients may be calculated using various gradient descent algorithms to update the parameters based on the gradients.
According to the method and the device for identifying the emotion, losses among the predicted values and the actual values of the emotion category, the emotion intensity and the emotion sub-category are calculated respectively, different weights are given, and total losses of the three are calculated, so that parameters in the model are updated, the model is optimized from multiple dimensions, and accuracy of the text emotion identification model for emotion identification is improved.
Referring to fig. 4, fig. 4 is a flowchart of a text emotion recognition model-based emotion recognition method according to an embodiment of the present application, which may be executed by an electronic device, where the electronic device may be the server 200-2 or the terminal 400 described above, and will be described with reference to the steps shown in fig. 4.
In some embodiments, the text emotion recognition model is trained by the training method of the text emotion recognition model.
In step 401, text to be recognized is acquired.
In some embodiments, the text to be identified may contain one or more emotions.
In step 402, a text vector of text to be recognized is extracted.
In some embodiments, each word in the text to be classified can be represented as a vector with a fixed length based on the word embedding model, the vector representation of each word is learned by training the neural network, and the vectors of all words are spliced together to obtain the text vector of the text to be recognized.
In step 403, a text emotion recognition model is called based on the text vector of the text to be recognized to obtain an emotion tag of the text to be recognized, wherein the emotion tag of the text to be recognized includes an emotion category and an emotion intensity to which the text to be recognized belongs.
In some embodiments, the emotion tag of the text to be recognized further comprises: emotion subcategories of the text to be identified under the category of emotion to which it belongs.
For example, referring to fig. 6, the emotion classification of the text to be recognized may be obtained through the language understanding model and the first classifier, the emotion intensity of the text to be recognized may be obtained through the encoder and the second classifier, and the emotion sub-classification of the text to be recognized under the emotion classification may be obtained through the encoder and the third classifier.
In the following, an exemplary application of the embodiments of the present application in a practical application scenario will be described.
In a scene of emotion recognition of a movie theatrical script, it is necessary to know the emotion of a main line of a male and a female and the emotion concentration (i.e., emotion intensity) of the line. The emotion concentration of the main line of men and women can assist the understanding of the scenario, for example, when the emotion concentration of the main sad emotion of a certain playslip is highest, the collision or frustration event is met, the playslip is an important scene and needs to be understood, or the playslip can generate highlight shortages such as movie drama preview and the like.
The emotion recognition task of the script statement needs to carry out emotion classification task on the statement of the dialogue in the script, but because the same type of emotion can be confused sometimes, for example, happiness and happiness all represent happiness, but happiness belongs to persistent mood or emotion, happiness belongs to short mood or emotion, the two are easy to be confused, and when two emotions are simultaneously in emotion classification, emotion interleaving is more easy to occur; for example, trust and firmness represent trust and firmness, respectively, both of which contain confidence but which are in different directions, and may also be confused.
The same text may contain multiple emotions, and the ability of discriminating emotion classification, which is not possessed by the language model itself, needs to be learned by utilizing the advantages of the language understanding model. Based on the above, according to the embodiment of the application, aiming at the problem of identifying the concentration of the dialogues with the confusion emotion in the script, a training method of a text emotion identification model is designed to obtain an emotion classification system based on hierarchical labels, and a hierarchical multi-label ordering learning method is carried out based on the label system, different labels are split for layered learning, and then emotion degree internal ordering learning is carried out on a specific emotion level; in addition, in order to improve the distinguishing degree among different emotions, cross-level label contrast learning is introduced, the label classification effect is improved, and the problems of identifying and applying the mixed concentration of the emotions are solved.
Referring to fig. 7, fig. 7 is a training frame diagram of a training method of a text emotion recognition model according to an embodiment of the present application. Firstly, extracting a training set from a script sample; obtaining text vectors corresponding to training samples in the training set through mapping; inputting the text vector into a basic module, wherein the whole training process is divided into two stages, and the basic module only needs to identify emotion types to obtain basic emotion multi-labels (namely emotion types of text samples) and does not have grading capability; in the second stage, the newly added concentration module calculates the concentration of the basic emotion to obtain the basic emotion concentration (namely the emotion intensity of the text sample) and the high-grade emotion multi-label (namely the emotion sub-category). In the process of obtaining the basic emotion multi-label from the basic module, the input does not need to contain the advanced emotion.
Referring to fig. 8, fig. 8 is an application schematic diagram of a training method of a text emotion recognition model according to an embodiment of the present application. In fig. 8, the concentration of each scenario is identified, and the total scenario emotion concentration result is obtained. The text emotion recognition model obtained through training by the training method of the text emotion recognition model provided by the embodiment of the application recognizes the emotion concentration of the dialogue in each session, and counts the total emotion concentration of each session. Since a series has a plurality of episodes, each episode is promoted by a plurality of conversations, the basic concentrations of the episodes are summarized as the concentration of the total episode.
For a sentence from a script dialogue, a base emotion is generated through a base module (i.e., language understanding model), and a high-level emotion and emotion concentration are generated through a concentration module (i.e., encoder). The overall module of the training method of the text emotion recognition model provided by the embodiment of the application comprises the following steps: 1) And (3) data collection: including advanced emotion generation; 2) A monolithic model structure; 3) An overall training method.
As shown in table 2, each emotion category has a plurality of emotion concentrations, wherein the emotion is divided into positive and negative, the concentration score is positive when "happy" is positive emotion in the lower graph, and the concentration score is negative when "complaint anger" is negative emotion.
Label (Label) | First level | Second-level | Three stages | No emotion |
Lele (musical instrument) | 1 | 2 | 3 | 0 |
Frightening device | 1 | 2 | 3 | 0 |
Complaints of | 1 | 2 | 3 | 0 |
Anger (anger) | 1 | 2 | 3 | 0 |
TABLE 2
As shown in table 3, the concentration of the basic emotion and the high-grade emotion are in one-to-one correspondence, and the high-grade emotion is to optimize the concentration prediction method of the basic emotion. The concentration of the basic emotion and the learning of the advanced emotion are two equivalent tasks, and when the basic emotion concentration exists, the advanced emotion can be obtained; the basic mood concentration can also be entered better when higher mood is learned.
Label (Label) | First level | Second-level | Three stages | No emotion |
Lele (musical instrument) | Shu Chang | Open heart | Wild happiness | 0 |
Frightening device | Confusion and strangeness | Surprise (surprise) | Shock and frightening | 0 |
Complaints of | Dissatisfaction with | Complaints of | Complaints of abruptness | 0 |
Anger (anger) | Depression and oppression | Qi generating | Irritability | 0 |
TABLE 3 Table 3
When the positive emotion concentration and the negative emotion concentration of the text are the same, what is needed to be judged for sentences after data cleaning, what is needed to be the high-level emotion of the text is judged through a language model (GPT, generating Pre-Trained Transformer) before data input, and a plurality of high-level emotions can be output.
For data collection, some script sentences are collected in advance as training samples and labeled, for example: "is travel to or leave for the school o? You are the trunk display too exaggerated? "is labeled as convulsion 1 according to table 2. Confusing emotions in emotional concentration are easily present as in statement 1: "math job has not yet been done, you are in the middle of learning this new language", statement 2: "yesterday talked to spanish, you are learning this language. In terms of language concentration: statement 1 expresses angry 3+ fright 3 and statement 2 expresses happy 3+ fright 2. The basic emotion and emotion concentration were mapped to higher emotion according to table 3, and further, for the above-mentioned sentences 1 and 2, sentence 1 expressed anger and sentence 2 expressed surprise from the higher emotion. Whereas both anger and surprise can correspond to the basic emotion and concentration. After the mapping is generated, the corresponding high-grade emotion needs to be found according to the basic emotion and concentration corresponding to each sentence in the training sample. To expedite labeling, chatGPT, chatGPT can be introduced to create a high-level emotion by the following cue words: "following is a case where a person speaks a movie and play scenario (i.e., he speaks spanish only yesterday and he is naturally learning the language)", please determine what emotion the speaker is expressing in the sentence, and select (%s) "from the following 13 emotions, where%s is the specific higher emotion in table 3 plus" no emotion ". And after the high-grade emotion judgment is obtained according to the chatGPT output, checking the result and correcting the wrong emotion.
For the overall model structure, the model structure is divided into five modules: the system comprises a basic emotion module, a concentration module, a basic emotion multi-label module, an advanced emotion classification module and a basic emotion multi-concentration prediction module. The basic emotion module adopts a Chinese pre-training model (Chinese-BERT-wwm). The BERT is an open source language model, and forms a text learning task paradigm based on large-scale language data for pre-training and then fine-tuning in a target task based on the BERT. Referring to fig. 9, fig. 9 is a general fine tuning paradigm diagram of BERT provided in the embodiment of the present application, where CLS is a sentence class, TOK 1-N represent descriptions of questions (e.g. please give a mapping of emotion of a speaker according to the following sentence in a dictionary or obtained word Embedding (Embedding)) and Sep is used to distinguish task questions from specific questions, TOK 1-M are sentences that need to answer questions, and output C is a target class.
Referring to FIG. 10, FIG. 10 is a schematic diagram of a BERT provided in an embodiment of the present application, including from text input to E (E 1 ~E N ) Is composed of a model core (each T) composed of a plurality of transducer coding layers (Transformer Encoder Layer) The layer of the random former is composed of a plurality of transducer modules (Transformer Block), also called Trm), and is formed by a T layer (T 1 ~T N ) And classifying the output of the task for the target. As an example, fig. 10 shows a 2-layer coding layer, actually BERT (BASE): the number of layers l=12, the hidden layer dimension h=768, the number of heads a=12 of multi-head attention (self-attention), and the total number of parameters is 1.1 billion.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a transducer module provided in the embodiment of the present application, where the module 701 may be an encoder, and the module 702 is a decoder of the transducer module, so a general transducer module may be any combination of the encoder and the decoder, and since the training method of the text emotion recognition model provided in the embodiment of the present application only uses the encoder, the transducer module only refers to the encoder. Wherein, multi-head Attention mechanism (Multi-head Attention) is a self-Attention module, feedForward neural network (feed forward) is an intermediate layer module, and Add & Norm represents residual connection and layer normalization. The Multi-head Attention module is integrated into the BERT as a general tool, and can be directly called by a sentence, for example, the method for calling the pad library is as follows: multi-head Attention (end_dim=config.hidden_size), num_heads=config.num_attention_heads, drop=config.attention_probs_drop_prob). The FeedForward module consists of a fully connected layer + active layer (e.g., tanh active). The Add & Norm layer operates by summing the input of the previous layer (e.g., the feed forward layer or the Multi-head attribute layer) with the input of the present layer, and normalizing according to the sum layer: hidden_states=self.
For example, referring to fig. 12, fig. 12 is a schematic structural diagram of a multi-head attention mechanism provided in an embodiment of the present application, and a self-attention model may be regarded as establishing an interaction relationship between different forms of input vectors in a linear projection space. Multi-head attention is directed to creating different projection information in a plurality of different projection spaces. The input matrixes are projected differently to obtain a plurality of output matrixes, which are spliced together. It can be seen from fig. 12 that the values, keys, queries are fixed single values, while the linear layer at the bottom has 3, the point-by-attention mechanism has 3, i.e. 3 multi-heads; finally, all single values are concatenated together, and then the top linear layer is converted to an output value like a single header, similar to integration. The difference between multi-head and single-head is that multiple single-head are duplicated, but the weight coefficients are certainly not the same; the model is similar to a neural network model and a plurality of neural network models, but the weights are different due to different initialization, and finally the result integration is carried out.
Referring to fig. 13, fig. 13 is a schematic diagram of a general word Embedding principle of the BERT model provided in the embodiment of the present application, where CLS is a category tag, and input includes a category tag and a text sample, and word Embedding (Embedding) includes symbol Embedding (Token Embedding), segment Embedding (Segmentation Embedding), and position Embedding (Position Embedding).
Referring to fig. 14, fig. 14 is a schematic diagram of word embedding principle for emotion recognition using a text emotion recognition model according to an embodiment of the present application. The original BERT is an English model, a Chinese model is adopted in the method, when emotion recognition is carried out by using a text emotion recognition model, the segment Embedding is set to be zero, only one Chinese dictionary is needed, and the input Chinese characters are converted into one character of the Embedding through an Embedding one-to-one mapping list. The plus sign indicates the summation of the corresponding positions of the vectors, and the Embedding mapping structure of the part adopts an open source pre-training result and cannot be trained and updated in the model.
By way of example, the process of mapping the input of a sentence to an Embedding is: first, each word is mapped to its corresponding dictionary id (called Token id) using the dictionary vocab.txt; secondly, in order to enable canonical model input (so that sentences with different lengths can be responded to input), token=0 is adopted as the complement to a specified number of tokens (such as 77); then, the dictionary id is used as the word. The pre-trained Chinese BERT is provided with a corresponding Chinese dictionary and can be directly used.
Referring to fig. 15, fig. 15 is a schematic diagram of an output of a text emotion recognition model provided in an embodiment of the present application, where the output includes a text sample and Classes, where Classes represent emotion tags. Referring to fig. 16, fig. 16 is a schematic structural diagram of an emotion tag provided in an embodiment of the present application, where Classes is represented by 9 category tasks (9 output values): cls 1-cls 4 represent basic emotion multi-label classification tasks, i.e. each emotion is respectively classified into two classes, for example, two classes of four emotions of 'complaint of anger' represent whether a certain emotion exists or not; score1 through score4 represent the score (i.e., emotional intensity) of each of the 4 emotions; cls5 represents a class of higher emotions (i.e., emotion subcategories), for a total of 4×3+1=13 higher emotions.
The emotion concentration module is a module newly added on a conventional BERT model and consists of 2 layers of transducer coding layers. The BERT model is used to generate a basic emotion multi-label classification result, and it can be considered that basic emotion information (including information required for a high emotion activated by a plurality of labels simultaneously, such as a surprise emotion, the basic emotion output is 1100, four labels respectively indicate whether or not there is a complaint of anger, and "surprise" includes both surprise and happiness) is already known in the BERT basic model. Therefore, the emotion concentration module further learns the high-level emotion and the basic emotion concentration from the basic emotion information after the model basic module. The module outputs three: cls 1-4 (value 0 or 1) is 4-base emotion two-classification prediction (probability of 2 categories), score 1-4 is 4-base emotion concentration (i.e. output a floating point number between 0 and 1, which is distinguished by 0 (no emotion), 0-0.33 (level 1), 0.33-0.66 (level 2), 0.66-1 (level 3), respectively representing no emotion, level 1 emotion, level 2 emotion, level 3 emotion), cls5 is 13 categories of high emotion (i.e. output probability of 13 categories). The three outputs require 9 classification heads, which are of a full-continuous layer structure.
For example, referring to fig. 17, fig. 17 is a schematic structural diagram of a classification head according to an embodiment of the present application. As shown in fig. 17, where T1 is one of T1 to TN in the uppermost layer of fig. 10, cls1 to 4 employ T1 in the last layer of the basic module, and score1 to 4 and cls5 are T1 in the last layer of the emotion concentration module, and module 703 is a classification head, which includes a basic classification prediction head for predicting basic emotion multi-label output in fig. 7, a basic concentration prediction head for predicting basic emotion multi-label output in fig. 7, and a high-class classification prediction head for predicting high-class emotion multi-label output in fig. 7.
For the whole training method, firstly, initializing parameters, adopting a Chinese-BERT-wwm pre-training model for a basic module, and initializing a new module (namely a concentration module) and a classification head by adopting 0-1 Gaussian normal distribution; next, learning parameters were set, and the basic classification was trained by a random optimization method (Adam) using a learning rate of lr=0.0005. Training the concentration module and the classification head by adopting a learning rate of 0.005 and a random gradient descent (sGD, stochastic Gradient Descent) method, wherein the learning rate is reduced to 0.1 times of the original learning rate through every 10 epochs, wherein one epoch is equal to a process of training once by using all samples in a training set, and when a complete data set passes through a neural network once and returns once, namely forward propagation and backward propagation are carried out once, the process is called one epoch; then, training is performed on all modules and classification heads, and fine tuning is performed on all parameters of the basic modules and the basic classification pre-heads. Because the whole learning task is transferred from the original Chinese-BERT-wwm model to scenario emotion classification, slow training is required to ensure that the model is not trapped into a local optimal point but not a global optimal point too fast, and a low learning rate Adam method is adopted; finally, when the base module converged (no decrease in average loss for n epoch training indicates convergence), model parameters have been better suited for emotion concentration task learning, at which time the last 2 layers of the base module were trained (no more updates for the BERT with 12 layers of parameters for the first 10 layers) and the newly added concentration module and three pre-probes. In order to avoid that the concentration details of emotion destroy the information required by the learned emotion classification of the whole scenario, the trained parameters of the basic model cannot be destroyed completely, and the network parameters of the front part are required to be kept unchanged.
The specific parameter fine adjustment process comprises the following steps: for N pieces of full data, each bs pieces of data is taken as a batch, N/bs batches are used, and each completion of N/bs batches represents the completion of one round of (epoch) iteration; the full amount of samples is processed once per iteration until the average epoch loss at a certain epoch no longer drops. Wherein, whole fine setting process divide into three parts, respectively: model forward direction: each batch training inputs data into a corresponding model module according to the process of fig. 7 to obtain 3 tasks of output, namely basic emotion multi-label output, basic emotion concentration output and high-level emotion multi-label output (the second stage is that, 1 task of output is obtained in the first stage, namely basic emotion multi-label output), and then the corresponding total loss is calculated; model backward: the total loss return network calculates the gradient of each parameter of the network; model parameter updating: and updating each parameter according to the gradient of each parameter of the network.
By way of example, the total losses for the two phases are respectively: as shown in the formula (1), stage one calculates the loss of all 4 basic emotion two-class identification, and each basic emotion adopts a multi-label loss L tag The method comprises the steps of carrying out a first treatment on the surface of the As shown in equation (2), stage two is three loss weights, where w 1 (4 kinds of basic emotion two-classification predictions) can be set to 0.2, w 3 (concentration of 4 basic emotions) of 0.3, w 2 Is 0.5 (13 advanced emotion recognition, convergence is relatively slow).
L total1 =L tag1 +L tag2 +L tag3 +L tag4 (1)
L total2 =w 1 *(L tag1 +L tag2 +L tag3 +L tag4 )+w 2 *L class5 +w 3 *(L reg1 +L reg2 +L reg3 +L reg4 ) (2)
Multi-tag loss L tag : for basic emotion multi-label classification, probability vectors output by a classification layer are calculated, and multi-label losses (bce loss) of the basic emotion multi-label classification and multi-label labeling (or corrected multi-label supervision information) are calculated, as shown in a formula (3), wherein the formula (3) is a batch multi-label loss consisting of an average value of loss of each sample bce under certain batch data, and true value label vectors t [ i ] are calculated for certain samples i]For 1 x nclass 0, 1 vectors, i.e. 4 basic emotion categories, expressed as [ x1, x2, x3, x4 ]]Wherein x 1-x 4 take on a value of 0 or 1, its predicted value oi]For the prediction probability for each of the 1×nclass labels for sample i, the bce penalty is calculated according to the following equation:
where b represents the number of samples per batch (batch).
When a certain tag bit true value is 1, the following plus sign takes effect on the left side, namely:
when the true value is 0, the following formula takes effect on the right side, namely:
so that it is possible to learn the supervision information of a certain sample under all tags.
Multi-class loss Lclass: for multi-class cross entropy loss, y represents a sample label, with a label of 1 at a certain emotion, a negative sample label of 0 (without a certain emotion-negative class miscalculation loss), and p is the probability of being predicted as a positive label, as shown in formula (6).
Predicting Lreg: for fractional prediction of MSE loss, the difference between the predicted and annotated scores is calculated as shown in equation (7).
Referring to fig. 18, fig. 18 is a schematic diagram of an application of text emotion recognition provided in an embodiment of the present application. When the application is applied, a plurality of scenes are acquired from the whole scenario, each scene acquires a dialogue part therein, and in fig. 18, 5 dialogues are acquired in total. And (3) final emotion concentration score output can be obtained in each section of dialogue input text emotion recognition model, the score of the emotion is taken as the dialogue emotion concentration, 5 scores are averaged to obtain the average emotion concentration of the scene, and all scenes are accumulated to obtain the total emotion concentration and the average emotion concentration of the whole drama. The emotion concentrations of the 5-segment dialog are respectively: none, no target emotion highest score is 0; frightening 0.30 (belonging to the first class of frightening); anger 0.93 (tertiary anger); anger 0.72 (secondary anger); anger 0.21 (first order anger). The score average was then calculated as the average emotion concentration for that session.
Description of the mood trend in the field: for a sentence, when only 1 emotion score is available (e.g. anger 0.26), selecting the emotion direction grade of the emotion (e.g. negative grade 1); when two emotions with the same polarity (such as positive directions) exist, calculating the sum of the two emotion scores (such as 0.1+0.2=0.3), and mapping the sum to obtain an emotion direction grade (namely positive direction grade 1); when there are a plurality of emotions of different polarities (+x1, +x2, -x3, such as 0.23, 0.34, -0.3), the sum of all positive and negative emotions (such as the sum of x1+x2-x3=0.27) is calculated, and the emotion classification (positive 1 grade) corresponding to the sum is found, and the result, if negative, is finally the negative emotion.
The emotion trend of a game can be obtained in the above manner, for example, in fig. 18, the continuous 5 sentences of a script are all single emotion (happiness 0.23, surprise 0.56, anger 0.90, anger 0.51 and anger 0.43), and happiness 1, surprise 2, anger 3, anger 2 and anger 1 are obtained, and as shown in table 4, table 4 shows emotion classification and concentration calculation conditions.
TABLE 4 Table 4
Referring to fig. 19, fig. 19 is an emotion trend graph for a single session provided in an embodiment of the present application. In fig. 19, the horizontal axis is the order of dialogue, the vertical axis is emotion direction level, and the broken line represents the trend of emotion direction level in 5 sentences.
Referring to fig. 20, fig. 20 is an emotion trend chart of a single episode of a tv show provided in the embodiment of the present application, where the horizontal axis is the order of episodes included in the single episode of tv show, the vertical axis is emotion direction level, and the broken line represents the trend of emotion direction level in 10 episodes. The average score was taken for each sentence emotion score in the session in fig. 18 as a session emotion score, e.g., the session score in fig. 19 was-0.434, i.e., negative 1. If the set has 10 occasions, the user initially encounters frustration or difficulty, then when the difficulty is solved, the mood of the main angle changes to be happy, and the user encounters frustration (the user buries the pen for the next set) until the set is finished, so that whether the turning points of the set are enough and whether the turning points are properly arranged (such as from 4 to 6, the user starts to directly fall to the negative direction and is frustrated from 7 to 10 paths, if pleasant viewing is needed, the user can consider that positive and negative mood alternations are more dense, such as positive and negative mood alternations of the occasions).
Referring to fig. 21, fig. 21 is an emotion trend chart of an entire tv show provided in the embodiment of the present application, where the horizontal axis is the order of episodes included in the tv show, the vertical axis is emotion direction level, and the broken line represents the trend of emotion direction level in each episode of tv show. And similarly, the emotion scores of all sentences of a certain set are averaged to obtain the emotion score of the set, and the emotion score of all the sentences of a certain television play is plotted to obtain the emotion trend of the television play.
According to the emotion concentration calculation mode, the method and the device can select the scene with the strongest forward emotion intensity as high sweetness and high burning frequency according to the intensity of a certain script, and take the corresponding scene television drama fragment as a quick preview fragment of a certain set, so that the video recommendation click rate is improved. On the other hand, the continuous multi-scenario emotion concentration of the creator's own works can also be displayed so as to check whether the scenario arrangement is reasonable (whether the situation arrangement lacks fluctuation of the fall and the fall, such as low concentration all the time, and the like).
According to the embodiment of the application, abstract emotion intensity concepts are converted to describe emotion by emotion sub-categories which are convenient to understand, emotion intensities and emotion sub-categories are modeled in a combined mode, and emotion confusion in subtle emotion differences is avoided. The tasks with different difficulties are processed in a layering mode, so that the text emotion recognition model is allowed to go deep step by step from a simple task to a complex task, and the accuracy of model judgment is improved. When the model is optimized and the labeling data of the emotion intensity are collected, the language model is used for replacing manual labeling, so that the data labeling efficiency is greatly improved, the model is further optimized, and the semantic interpretability of an output result is improved.
Continuing with the description below of an exemplary structure implemented as a software module of the training apparatus 233 of the text emotion recognition model provided in an embodiment of the present application, in some embodiments, as shown in fig. 2A, the software module stored in the training apparatus 233 of the text emotion recognition model of the memory 230 may include:
a first obtaining module 2331, configured to obtain a first training set, where training samples in the first training set include a text sample and an emotion tag of the text sample, and the emotion tag includes an emotion category and an emotion intensity of the text sample; the emotion tags of at least one training sample in the first training set comprise a plurality of emotion categories.
The data mapping module 2332 is configured to perform vectorization processing on each training sample to obtain a text vector of each training sample, where the text vector of the training sample includes a text sample vector of the text sample, an emotion category vector of the text sample, and an emotion intensity vector of the text sample.
The first training module 2333 is configured to perform a first training task of the text emotion recognition model based on the text sample vector and the emotion category vector of the text sample, where the first training task is used to train the text emotion recognition model to recognize an emotion category to which the text sample belongs.
And a second training module 2334, configured to perform a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, where the second training task is configured to train the text emotion recognition model to recognize the emotion intensity of the text sample, and the trained text emotion recognition model is configured to recognize the emotion type and emotion intensity of the text to be processed.
In some embodiments, the first obtaining module 2331 is further configured to obtain a pre-collected second training set, wherein the second training set includes a text sample and a mood category and a mood intensity of the text sample; acquiring emotion subcategories of the text sample under the emotion category; forming an emotion label of the text sample through the emotion type of the text sample, the emotion intensity of the text sample and the emotion sub-type of the text sample under the emotion type; training samples are composed based on the text samples and the emotion tags, and a first training set is composed based on the plurality of training samples.
In some embodiments, for each training sample, the data mapping module 2332 is further configured to perform word segmentation on the training sample to obtain a plurality of words; replacing emotion labels in the training samples with mask marks; taking each word in the plurality of words as a conventional mark, and sequentially connecting the mask mark and the conventional marks corresponding to the plurality of words according to the sequence in the training sample to obtain a mark sequence of the training sample, wherein the head of the mark sequence is inserted with a start mark; word embedding processing is carried out on the mark sequence to obtain a word embedding vector sequence and a position embedding vector sequence which correspond to the training samples respectively; and carrying out fusion processing based on the word embedded vector sequence and the position embedded vector sequence to obtain a text vector of the training sample.
In some embodiments, the first training module 2333 is further configured to invoke a language understanding model to perform encoding processing based on the text sample vector and the emotion class vector of the text sample, so as to obtain a first fusion vector; calling a first classifier based on the first fusion vector to perform classification processing to obtain an emotion category prediction result; determining a first loss between the emotion category prediction result and the emotion category of the text sample, and updating parameters of the language understanding model and parameters of the first classifier according to the first loss; the first training task is stopped in response to the language understanding model converging and the first classifier converging.
In some embodiments, for each of the plurality of emotional categories, the first training module 2333 is further to determine a classification penalty between a classification prediction result of the emotional category of the text sample and an actual classification result of the emotional category; a sum of the classification penalty for each emotion category is determined as a first penalty between the predicted emotion category and the emotion category of the text sample.
In some embodiments, the first training module 2333 is further configured to divide the training samples in the training set into a plurality of batches; for the text samples in each training sample in each batch, the following is performed: determining a first logarithm of a classification prediction result of the emotion classification of the text sample; determining a first product of the first logarithm and an actual classification result of the text sample; a first ratio between the first product and the number of training samples comprised by the batch is determined, the first ratio being taken as a categorization penalty between a categorization prediction result and an actual categorization result of the emotion category of the text sample.
In some embodiments, the second training module 2334 is further configured to perform a second training task of the text emotion recognition model based on the text sample vector, the emotion intensity vector, and the emotion sub-category vector of the text sample, wherein the second training task is further configured to identify an emotion sub-category of the text sample under the emotion category when identifying the emotion category of the text sample.
In some embodiments, the second training module 2334 is further configured to invoke the encoder model to perform encoding processing based on the text sample vector, the emotion intensity vector, and the emotion sub-category vector of the text sample, to obtain a third fusion vector; calling a second classifier to classify based on the third fusion vector to obtain an emotion intensity prediction result; calling a third classifier to classify based on the third fusion vector to obtain an emotion sub-category prediction result; determining a second penalty between the emotional intensity prediction result and the emotional intensity of the text sample, and determining a third penalty between the emotional sub-category prediction result and the emotional sub-category of the text sample; the first loss, the second loss and the third loss are weighted and summed to obtain total loss; the parameters of the encoder model, the parameters of the second classifier, and the parameters of the third classifier are updated based on the total loss.
In some embodiments, the second training module 2334 is further configured to determine, for each of the plurality of emotional categories, a loss of emotional intensity between the predicted emotional intensity of the text sample for the corresponding emotional category and the actual emotional intensity of the corresponding emotional category; a sum of the emotional intensity losses for each emotional category is determined as a second loss between the emotional intensity prediction result and the emotional intensity of the text sample.
In some embodiments, the second training module 2334 is further configured to divide the training samples in the training set into a plurality of batches; for the text samples in each training sample in each batch, the following is performed: determining a difference value between the predicted emotion intensity of the text sample corresponding to the emotion type and the actual emotion intensity of the corresponding emotion type; a second ratio between the square of the difference and the number of training samples comprised by the batch is determined, the second ratio being taken as the loss of emotional intensity between the predicted emotional intensity and the actual emotional intensity of the text sample corresponding to the emotional category.
In some embodiments, the second training module 2334 is further configured to determine, for each of the plurality of emotion sub-categories, a sub-category two-classification penalty between the two-classification prediction result for the emotion sub-category and the actual classification result for the emotion sub-category of the text sample; a sum of the sub-category two classification loss for each emotion sub-category is determined as a third loss between the emotion sub-category prediction result and the emotion sub-category of the text sample.
In some embodiments, the second training module 2334 is further configured to divide the training samples in the training set into a plurality of batches; for the text samples in each training sample in each batch, the following is performed: determining a second logarithm of the classification prediction result of the emotion sub-category, and determining a second product of the second logarithm and the actual classification result of the emotion sub-category; determining a third ratio between the second product and the number of training samples comprised by the batch; the opposite number of the third ratio is determined as a sub-category two-classification penalty between the two-classification prediction result of the emotion sub-category and the actual two-classification result of the emotion sub-category of the text sample.
In some embodiments, the second training module 2334 is further configured to invoke the encoder model to perform encoding processing based on the text sample vector and the emotion intensity vector of the text sample, to obtain a second fusion vector; calling a second classifier based on the second fusion vector to perform classification processing to obtain an emotion intensity prediction result; a second penalty between the emotional intensity prediction result and the emotional intensity of the text sample is determined, and parameters of the encoder model and parameters of the second classifier are updated according to the second penalty.
Continuing with the description below of an exemplary structure of the text emotion recognition model-based emotion recognition device 273 provided in an embodiment of the present application implemented as a software module, in some embodiments, as shown in fig. 2B, the software module stored in the text emotion recognition model training device 273 of the memory 270 may include:
a second obtaining module 2731 is configured to obtain text to be identified.
The vector extraction module 2732 is configured to extract a text vector of the text to be recognized.
The emotion recognition module 2733 is configured to call a text emotion recognition model based on a text vector of a text to be recognized to obtain an emotion tag of the text to be recognized, where the emotion tag of the text to be recognized includes an emotion category and an emotion intensity to which the text to be recognized belongs.
Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer-executable instructions from the computer-readable storage medium, and the processor executes the computer-executable instructions, so that the electronic device performs the training method of the text emotion recognition model or the emotion recognition method based on the text emotion recognition model according to the embodiment of the application.
The present embodiments provide a computer-readable storage medium storing computer-executable instructions or a computer program stored therein, which when executed by a processor, cause the processor to perform a training method of a text emotion recognition model or an emotion recognition method based on a text emotion recognition model provided by the embodiments of the present application, for example, a training method of a text emotion recognition model as shown in fig. 3A or an emotion recognition method based on a text emotion recognition model as shown in fig. 4.
In some embodiments, the computer readable storage medium may be RAM, ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, or code, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (Hyper Text Markup Language, HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, computer-executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or, alternatively, on multiple electronic devices distributed across multiple sites and interconnected by a communication network.
In summary, according to the embodiment of the application, abstract emotion intensity concepts are converted to be used for emotion description by emotion sub-categories which are convenient to understand, emotion intensities and emotion sub-categories are modeled in a combined mode, and emotion confusion in subtle emotion differences is avoided. The tasks with different difficulties are processed in a layering mode, so that the text emotion recognition model is allowed to go deep step by step from a simple task to a complex task, and the accuracy of model judgment is improved. When the model is optimized and the labeling data of the emotion intensity are collected, the language model is used for replacing manual labeling, so that the data labeling efficiency is greatly improved, the model is further optimized, and the semantic interpretability of an output result is improved.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and scope of the present application are intended to be included within the scope of the present application.
Claims (19)
1. A method for training a text emotion recognition model, the method comprising:
acquiring a first training set, wherein training samples in the first training set comprise text samples and emotion labels of the text samples, and the emotion labels comprise emotion categories and emotion intensities of the text samples; the emotional tags of at least one of the training samples in the first training set include a plurality of emotional categories;
carrying out vectorization processing on each training sample to obtain a text vector of each training sample, wherein the text vector of each training sample comprises a text sample vector of each text sample, an emotion category vector of each text sample and an emotion intensity vector of each text sample;
executing a first training task of the text emotion recognition model based on the text sample vector and the emotion category vector of the text sample, wherein the first training task is used for training the text emotion recognition model to recognize the emotion category to which the text sample belongs;
And executing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, wherein the second training task is used for training the text emotion recognition model to recognize the emotion intensity of the text sample, and the trained text emotion recognition model is used for recognizing the emotion category and emotion intensity of the text to be processed.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the emotion tag of the text sample further comprises emotion sub-categories of the text sample under the emotion categories, each emotion category comprising a plurality of emotion sub-categories;
the text vector of the training sample further comprises an emotion sub-category vector of the emotion sub-category;
the performing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, comprising:
and executing a second training task of the text emotion recognition model based on the text sample vector, the emotion intensity vector and the emotion sub-category vector of the text sample, wherein the second training task is further used for recognizing the emotion sub-category of the sample under the emotion category when recognizing the emotion category of the text sample.
3. A method according to claim 1 or 2, characterized in that,
the text emotion recognition model includes: a language understanding model and a first classifier;
the performing a first training task of the text emotion recognition model based on the text sample vector and the emotion category vector of the text sample, comprising:
calling the language understanding model to carry out coding processing based on the text sample vector of the text sample and the emotion type vector to obtain a first fusion vector;
invoking the first classifier based on the first fusion vector to perform classification processing to obtain an emotion category prediction result;
determining a first penalty between the emotion category prediction result and the emotion category of the text sample, updating parameters of the language understanding model and parameters of the first classifier according to the first penalty;
responsive to the language understanding model converging and the first classifier converging, stopping the first training task.
4. The method of claim 3, wherein the step of,
the text emotion recognition model further comprises an encoder model and a second classifier;
the performing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, comprising:
Invoking the encoder model to perform encoding processing based on the text sample vector and the emotion intensity vector of the text sample to obtain a second fusion vector;
invoking the second classifier based on the second fusion vector to perform classification processing to obtain an emotion intensity prediction result;
determining a second penalty between the emotional intensity prediction result and the emotional intensity of the text sample, updating parameters of the encoder model and parameters of the second classifier according to the second penalty.
5. The method of claim 3, wherein the step of,
the emotion type prediction results comprise classification prediction results of the text sample corresponding to a plurality of emotion types;
the determining a first penalty between the predicted emotion category and the emotion category of the text sample comprises:
performing the following processing for each of a plurality of the emotion categories: determining a classification loss between a classification prediction result of the emotion category of the text sample and an actual classification result of the emotion category;
a sum of the scoring penalty for each of the emotion categories is determined as a first penalty between the predicted emotion category and the emotion category of the text sample.
6. The method of claim 5, wherein said determining a categorization penalty between a categorization prediction result for the emotion category of the text sample and an actual categorization result for the emotion category comprises:
dividing the training samples in the training set into a plurality of batches;
for the text samples in each of the training samples in each of the batches, performing the following: determining a first logarithm of a classification prediction result of the emotion category of the text sample, and determining a first product of the first logarithm and an actual classification result of the text sample;
a first ratio between the first product and the number of training samples included in the batch is determined, the first ratio being taken as a categorization penalty between a categorization prediction result and an actual categorization result of the emotional category of the text sample.
7. The method of claim 3, wherein the step of,
the text emotion recognition model further comprises an encoder model, a second classifier and a third classifier;
the performing a second training task of the text emotion recognition model based on the text sample vector, the emotion intensity vector, and the emotion sub-category vector of the text sample, comprising:
Invoking the encoder model to perform encoding processing based on the text sample vector, the emotion intensity vector and the emotion sub-category vector of the text sample to obtain a third fusion vector;
invoking the second classifier based on the third fusion vector to perform classification processing to obtain an emotion intensity prediction result;
calling the third classifier to carry out classification processing based on the third fusion vector to obtain an emotion sub-category prediction result;
determining a second penalty between the emotional intensity prediction result and the emotional intensity of the text sample, and determining a third penalty between the emotional subcategory prediction result and the emotional subcategory of the text sample;
carrying out weighted summation on the first loss, the second loss and the third loss to obtain total loss;
updating parameters of the encoder model, parameters of the second classifier, and parameters of the third classifier based on the total loss.
8. The method of claim 7, wherein the step of determining the position of the probe is performed,
the emotion intensity prediction result comprises predicted emotion intensities of the text sample corresponding to a plurality of emotion categories, and the emotion intensities of the text sample comprise actual emotion intensities of the text sample corresponding to the plurality of emotion categories;
Said determining a second loss between said emotional intensity prediction result and said emotional intensity of said text sample comprises:
performing the following processing for each of a plurality of the emotion categories:
determining a loss of emotional intensity between the predicted emotional intensity of the text sample corresponding to the emotional category and the actual emotional intensity of the emotional category;
determining a sum of the emotional intensity losses for each of the emotional categories as a second loss between the emotional intensity prediction result and the emotional intensity of the text sample.
9. The method of claim 8, wherein the determining an emotional intensity loss between the predicted emotional intensity and the actual emotional intensity for the text sample corresponding to the emotional category comprises:
dividing the training samples in the training set into a plurality of batches;
for the text samples in each of the training samples in each of the batches, performing the following: determining a difference between the predicted emotional intensity of the text sample corresponding to the emotional category and the actual emotional intensity corresponding to the emotional category;
Determining a second ratio between the square of the difference and the number of training samples included in the batch, the second ratio being taken as a loss of emotional intensity between the predicted emotional intensity and the actual emotional intensity of the text sample corresponding to the emotional category.
10. The method of claim 7, wherein the step of determining the position of the probe is performed,
the emotion sub-category prediction results comprise classification prediction results of the text sample corresponding to a plurality of emotion sub-categories;
the determining a third penalty between the emotion sub-category prediction result and the emotion sub-category of the text sample comprises:
the following is performed for each of a plurality of the emotion sub-categories: determining a sub-category two-classification penalty between a two-classification prediction result of the emotion sub-category and an actual two-classification result of the emotion sub-category of the text sample;
determining a sum of the sub-category two classification penalty for each of the emotion sub-categories as a third penalty between the emotion sub-category prediction result and the emotion sub-category of the text sample.
11. The method of claim 10, wherein said determining a sub-category-two classification penalty between a classification prediction result of the emotion sub-category and an actual classification result of the emotion sub-category of the text sample comprises:
Dividing the training samples in the training set into a plurality of batches;
for the text samples in each of the training samples in each of the batches, performing the following:
determining a second logarithm of a classification prediction result of the emotion sub-category, and determining a second product of the second logarithm and an actual classification result of the emotion sub-category;
determining a third ratio between the second product and the number of training samples included in the batch;
determining an opposite number of the third ratio, the opposite number being taken as a sub-category two-classification loss between a two-classification prediction result of the emotion sub-category and an actual two-classification result of the emotion sub-category of the text sample.
12. The method according to any one of claims 1 to 11, wherein said vectorizing each of said training samples to obtain a text vector for each of said training samples comprises:
the following is performed for each of the training samples:
word segmentation processing is carried out on the training sample to obtain a plurality of words;
replacing the emotion tags in the training samples with mask marks;
taking each word in the plurality of words as a conventional mark, and sequentially connecting the mask mark and the conventional marks corresponding to the plurality of words according to the sequence in the training sample to obtain a mark sequence of the training sample, wherein the head of the mark sequence is inserted with a start mark;
Word embedding processing is carried out on the marking sequence to obtain a word embedding vector sequence and a position embedding vector sequence which correspond to the training sample respectively;
and carrying out fusion processing based on the word embedded vector sequence and the position embedded vector sequence to obtain the text vector of the training sample.
13. The method according to any one of claims 1 to 11, wherein the acquiring a first training set comprises:
acquiring a pre-collected second training set, wherein the second training set comprises the text sample and emotion categories and emotion intensities of the text sample;
acquiring the emotion sub-category of the text sample under the emotion category;
forming an emotion label of the text sample through the emotion category of the text sample, the emotion intensity of the text sample and the emotion sub-category of the text sample under the emotion category;
a training sample is composed based on the text sample and the emotion tags, and the first training set is composed based on the plurality of training samples.
14. The method of claim 13, wherein the obtaining the emotion sub-category of the text sample under the emotion category comprises:
Obtaining a preset emotion mapping table, wherein the emotion mapping table comprises mapping relations among different emotion categories, different emotion intensities and different emotion subcategories; inquiring the mapping relation through the emotion category and the emotion intensity of the text sample to obtain the emotion sub-category of the text sample under the emotion category; or,
and identifying the emotion sub-category of the text sample from a plurality of preset emotion sub-categories through a pre-trained dialogue model.
15. A text emotion recognition model-based emotion recognition method, characterized in that the text emotion recognition model is trained by the training method of the text emotion recognition model according to any one of claims 1 to 14;
the emotion recognition method comprises the following steps:
acquiring a text to be identified;
extracting a text vector of the text to be recognized;
and calling the text emotion recognition model based on the text vector of the text to be recognized to obtain an emotion tag of the text to be recognized, wherein the emotion tag of the text to be recognized comprises an emotion category and emotion intensity to which the text to be recognized belongs.
16. A training device for a text emotion recognition model, the device comprising:
the first acquisition module is used for acquiring a first training set, wherein training samples in the first training set comprise text samples and emotion labels of the text samples, and the emotion labels comprise emotion categories and emotion intensities of the text samples; the emotional tags of at least one of the training samples in the first training set include a plurality of emotional categories;
the data mapping module is used for carrying out vectorization processing on each training sample to obtain a text vector of each training sample, wherein the text vector of each training sample comprises a text sample vector of each text sample, an emotion category vector of each text sample and an emotion intensity vector of each text sample;
the first training module is used for executing a first training task of the text emotion recognition model based on the text sample vector and the emotion category vector of the text sample, wherein the first training task is used for training the text emotion recognition model to recognize the emotion category to which the text sample belongs;
And the second training module is used for executing a second training task of the text emotion recognition model based on the text sample vector and the emotion intensity vector of the text sample, wherein the second training task is used for training the text emotion recognition model to recognize the emotion intensity of the text sample, and the trained text emotion recognition model is used for recognizing the emotion type and emotion intensity of the text to be processed.
17. An emotion recognition device based on a text emotion recognition model, characterized in that the text emotion recognition model is trained by the training method of the text emotion recognition model according to any one of claims 1 to 14; the device comprises:
the second acquisition module is used for acquiring the text to be identified;
the vector extraction module is used for extracting the text vector of the text to be identified;
and the emotion recognition module is used for calling the text emotion recognition model based on the text vector of the text to be recognized to obtain an emotion label of the text to be recognized, wherein the emotion label of the text to be recognized comprises an emotion category and emotion intensity of the text to be recognized.
18. An electronic device, the electronic device comprising:
A memory for storing computer executable instructions or computer programs;
a processor for implementing the training method of a text emotion recognition model according to any one of claims 1 to 14 or the emotion recognition method based on a text emotion recognition model according to claim 15 when executing the computer executable instructions or computer program stored in the memory.
19. A computer-readable storage medium storing computer-executable instructions or a computer program, which when executed by a processor, implements the method for training a text emotion recognition model according to any one of claims 1 to 14 or the method for emotion recognition based on a text emotion recognition model according to claim 15.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410027220.7A CN117851543A (en) | 2024-01-03 | 2024-01-03 | Training method of text emotion recognition model, emotion recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410027220.7A CN117851543A (en) | 2024-01-03 | 2024-01-03 | Training method of text emotion recognition model, emotion recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117851543A true CN117851543A (en) | 2024-04-09 |
Family
ID=90537890
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410027220.7A Pending CN117851543A (en) | 2024-01-03 | 2024-01-03 | Training method of text emotion recognition model, emotion recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117851543A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118885611A (en) * | 2024-07-05 | 2024-11-01 | 北京伯仲汇智科技有限公司 | Enterprise business management method and system based on big data |
CN119623556A (en) * | 2025-02-14 | 2025-03-14 | 西南林业大学 | Model training method, method for mental health detection, terminal and readable storage medium |
-
2024
- 2024-01-03 CN CN202410027220.7A patent/CN117851543A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118885611A (en) * | 2024-07-05 | 2024-11-01 | 北京伯仲汇智科技有限公司 | Enterprise business management method and system based on big data |
CN119623556A (en) * | 2025-02-14 | 2025-03-14 | 西南林业大学 | Model training method, method for mental health detection, terminal and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111783474B (en) | Comment text viewpoint information processing method and device and storage medium | |
CN107798140B (en) | Dialog system construction method, semantic controlled response method and device | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN111488931B (en) | Article quality evaluation method, article recommendation method and corresponding devices | |
CN118093834B (en) | AIGC large model-based language processing question-answering system and method | |
CN113987187B (en) | Public opinion text classification method, system, terminal and medium based on multi-label embedding | |
CN116450796B (en) | Intelligent question-answering model construction method and device | |
CN112905795A (en) | Text intention classification method, device and readable medium | |
CN117033571A (en) | Knowledge question-answering system construction method and system | |
CN115599901B (en) | Machine Question Answering Method, Device, Equipment and Storage Medium Based on Semantic Prompts | |
CN112100377B (en) | Text classification method, apparatus, computer device and storage medium | |
CN112183106B (en) | Semantic understanding method and device based on phoneme association and deep learning | |
CN113821605A (en) | Event extraction method | |
CN117762499B (en) | Task instruction construction method and task processing method | |
CN112101042A (en) | Text emotion recognition method and device, terminal device and storage medium | |
CN118551004B (en) | A Chinese dialogue knowledge retrieval method and system based on knowledge retrieval graph | |
CN116010581A (en) | Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene | |
CN111078546A (en) | Method for expressing page features and electronic equipment | |
CN113486174A (en) | Model training, reading understanding method and device, electronic equipment and storage medium | |
CN117851543A (en) | Training method of text emotion recognition model, emotion recognition method and device | |
CN110969005B (en) | Method and device for determining similarity between entity corpora | |
CN113821610A (en) | Information matching method, device, equipment and storage medium | |
CN113705207A (en) | Grammar error recognition method and device | |
CN114372454A (en) | Text information extraction method, model training method, device and storage medium | |
CN117828024A (en) | Plug-in retrieval method, device, storage medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |