CN118093878A

CN118093878A - PeM-Bert-based knowledge embedded social media emotion detection system

Info

Publication number: CN118093878A
Application number: CN202410350939.4A
Authority: CN
Inventors: 刘美玲; 杨传龙
Original assignee: Northeast Forestry University
Current assignee: Northeast Forestry University
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-05-28

Abstract

PeM-Bert based knowledge embedded social media emotion detection system relates to the field of text detection. The invention aims to solve the problems that the existing social media text emotion recognition method is weak in text emotion recognition pertinence and poor in granularity of recognized emotion classification. The invention comprises the following steps: acquiring an original emotion text sequence X, setting an emotion label for the original emotion text sequence X, forming an original data set by using the original emotion text of the set emotion label, and dividing the original data set into a first training set and a first test set; training the attention mechanism model by using the first training set to obtain a trained attention mechanism model; testing the trained attention mechanism model by using a first test set, and taking the trained attention mechanism model passing the test as an emotion detection network; inputting the text sequence of the emotion to be detected into an emotion detection network to obtain an emotion detection result. The method and the device are used for emotion detection of social media texts.

Description

PeM-Bert-based knowledge embedded social media emotion detection system

Technical Field

The invention relates to the field of text detection, in particular to a PeM-Bert-based knowledge embedded social media emotion detection system.

Background

With the rapid development of online social network platforms (such as Twitter, facebook, etc.), social networks have become a part of people's daily lives, and various websites also catch opportunities to perform emotion detection analysis on massive data, so as to promote new functions and attract new clients. Online Social Media (OSM) platforms provide opportunities to express, communicate, and share people's opinion, ideas, views, and mindsets of local, international questions, matters, and topics through text, images, audio, and video posts. Posts on these social media are public and emotional rich, so analyzing and studying these posts can reveal emotional states and the reasons behind these emotions, understanding of which is important for systematic improvement. Therefore, all of these social platforms play a dominant role in investigating social trends by analyzing people's emotions, feelings, tracking customer feedback, and learning and making business strategies, helping customers/consumers make decisions, etc. Therefore, identifying and analyzing emotions in social media is of great importance.

Traditional social media text emotion recognition includes: keyword-based emotion recognition methods, corpus-based emotion recognition methods, and rule-based emotion recognition methods, but these methods have difficulty in handling long-term dependencies in social media text. With the continuous development of deep learning methods, a new type of deep architecture based on neural attention is proposed, called a transducer. Transformers is good at handling long-term dependencies in social media text, since the introduction of the initial transducer model, various language models have been proposed: the BERT model, XLNet model, roBERTa model and ALBERT model all have good results in emotion detection of social media texts, however, the models do not consider downstream tasks of applications, so that emotion recognition of texts is weak in pertinence, and recognition accuracy is low. In addition, current social media text emotion recognition methods focus on recognizing six "basic" emotions (happy, sad, anger, aversion, surprise and fear), but fail to capture the broad emotions that people experience and express in daily life, such as sad, excited, optimistic, hopeless, etc., more fine-grained emotions, resulting in a poor classification of the recognized emotions.

Disclosure of Invention

The invention aims to solve the problems that the existing social media text emotion recognition method is weak in text emotion recognition pertinence and poor in granularity of recognized emotion classification, and provides a PeM-Bert-based knowledge embedded social media emotion detection system.

The PeM-Bert based knowledge embedded social media emotion detection system comprises: the system comprises a sequence acquisition module, an emotion detection module and a result output module;

The sequence acquisition module is used for acquiring a text sequence of the emotion to be detected and sending the text sequence of the emotion to be detected to the emotion detection module;

The emotion detection module is used for inputting a text sequence of an emotion to be detected into the emotion detection network, obtaining an emotion detection result, and inputting the emotion detection result into the result output module

The result output module is used for outputting emotion detection results;

The emotion detection network is obtained by:

Step one, acquiring an original emotion text sequence X, setting an emotion label for the original emotion text sequence X, forming an original data set by using the original emotion text of the set emotion label, and dividing the original data set into a first training set and a first test set;

Training the attention mechanism model by using a first training set to obtain a trained attention mechanism model;

And thirdly, testing the trained attention mechanism model by using a first test set, taking the current trained attention mechanism model as an emotion detection network if the accuracy of the trained attention mechanism model is greater than or equal to a preset first preset threshold, and returning to the first step if the accuracy of the trained attention mechanism model is less than the preset first preset threshold.

Further, the emotion tag includes: admirable admiration, entertainment amusement, anger anger, vexation annoyance, approval approval, care caring, confusion confusion, curiosity curiosity, craving desire, aversion disgust, happy joy, disappointing disappointment, blame disapproval, embarrassing embarrassment, agitation excitement, fear fear, feeling gratitude, sadness grief, love, tension nervousness, optimistic optimism, pride pride, insight realization, relaxed relief, remonce remorse, sadness sadness, surprise surprise, neutral neuter.

Further, the attention mechanism model includes: knowledge base, representation matrix acquisition layer, dense layer, attention layer, two-layer dense network;

The knowledge base is used for converting an original emotion text sequence in the first training set into a feature vector X _e and sending the feature vector X _e to the dense layer;

The representation matrix acquisition layer acquires a representation matrix T _c＝{t₀,...,t_N corresponding to the hidden state of the last layer of the pre-trained PeM-Bert model by using the original emotion text sequence X and the pre-trained PeM-Bert model, and sends T _c to the attention layer;

Wherein T _c∈R^N′lc,t_b is the contextual representation of the original emotion sequence, b takes 0 to N, N is the total number of time steps, and l _c is the output representation size generated by the pre-trained PeM-Bert model;

The dense layer is used for projecting the feature vector X _e to obtain sentence-level emotion codes H _e∈R^le′lc and sending H _e to the attention layer;

Wherein l _e is the dimension of X _e;

The attention layer acquires an attention score s by using T _c and H _e, so that a final text representation is acquired by using the attention score s, and the final text representation is sent to a two-layer dense network;

And carrying out emotion classification on the two-layer dense network based on the final text representation to obtain a final emotion category.

Further, the pre-trained PeM-Bert model is obtained by:

a1, preprocessing an original emotion text sequence X to obtain a preprocessed original emotion text sequence S;

a2, taking the S with the emotion label as a pre-training data set, and dividing the pre-training data set into a second training set and a second testing set;

a3, pre-training the PeM-Bert neural network model by using a second training set to obtain a pre-trained PeM-Bert neural network model;

A4, testing the pre-trained PeM-Bert neural network model by using a second testing set, taking the pre-trained PeM-Bert neural network model as the pre-trained PeM-Bert model if the accuracy of the pre-trained PeM-Bert neural network model is larger than or equal to a preset second threshold, and returning to A1 if the accuracy of the trained PeM-Bert neural network model is smaller than the preset second threshold.

Further, the preprocessing is performed on the original emotion text sequence X in the A1 to obtain a preprocessed original emotion text sequence S, which specifically includes:

a11, multiplying the original emotion text sequence length by a random number a to obtain the number of times of inserting marks;

Wherein a is a random number between 1 and 1/3;

A12, randomly selecting punctuation marks from the punctuation set, and randomly inserting the punctuation marks into the original emotion text sequence according to the number of times of inserting marks of A11 to obtain a preprocessed original emotion text sequence S= [ P ₀],...,[P_i],[P_i+1]...,[P_m ];

the punctuation set is { ","; ","? ",": ", |! ",", ";

Wherein [ P _i ] is the ith emotion text block obtained after inserting punctuation marks into the original emotion text sequence, and m is the number of emotion text blocks.

Further, the PeM-Bert neural network model includes: an encoder, BERT model;

the encoder adopts a P-tuning method to convert the preprocessed original emotion text sequence S into a template T:

T＝{h₀,...,h_i,z(x),h_i+1,...,h_m,z(y)}

wherein h _i is a trainable vector, z (x) represents a vector representation of the emotion contained in the preprocessed original emotion text sequence, and z (y) represents a vector representation of the target emotion class;

the BERT model optimizes the output vector h _i of the encoder using eMLM methods and templates T.

Further, the output vector h _i of the optimized encoder using eMLM method and template T is specifically:

where M (x, y) is a shielding matrix, Is a downstream task loss function,/>Is an estimate of { h ₀,...,h_i,h_i+1,...,h_m };

The corresponding position of the template T part in the shielding matrix M (x, y) is 1, and the rest part is 0.

Further, eMLM method shields the available dictionary belonging to SThe probability of a midword P (w _n) is as follows:

wherein E is S, which belongs to the available dictionary K is the word masking the available dictionary/>Probability of a midword, |·| represents the size of the collection, w _n is that S does not belong to the available dictionary/>Is a word of (c).

Further, the attention layer obtains an attention score s by using T _c and H _e, so that a final text representation is obtained by using the attention score s, specifically:

first, T _c and H _e are concatenated to obtain a concatenation representation K e R ^(N+le)′lc:

K＝concat(T_c,H_e)

then, the attention score s is obtained using K and t 0:

Finally, the final text representation t _l is obtained using the attention score s:

t_l＝s^T·K。

Further, the two-layer dense network is trained using the following loss function:

Wherein Ω _neg is a set of non-target class samples, Ω _pos is a set of target class samples, r ₀ is a preset class 0 score, i 'is a non-target class sample number, j' is a target class sample number, r _i' is a non-target class sample score, and r _j' is a target class sample score.

The beneficial effects of the invention are as follows:

According to the invention, emotion masking language modeling is introduced in a pre-training model stage, and meanwhile, the P-training method is utilized, so that the feature adaptability of the method to downstream tasks is stronger. The invention provides a data enhancement technology in a data preprocessing stage, and then uses emotion coding of a sentence-level knowledge embedded attention mechanism to enrich context representation, thereby providing richer input text representation, and then connects the context representation generated by a pre-training PeM-Bert language model and the emotion coding in series. The method has strong pertinence to the emotion recognition of the text, the recognized emotion is finer, and the emotion classification granularity is improved.

Drawings

FIG. 1 is PeM-Bert neural network model;

FIG. 2 is a knowledge embedded attention mechanism model structure.

Detailed Description

The first embodiment is as follows: as shown in fig. 2, the social media emotion detection system embedded based on PeM-Bert knowledge in this embodiment includes: the system comprises a sequence acquisition module, an emotion detection module and a result output module;

The result output module is used for outputting emotion detection results;

The emotion detection network is obtained by:

Step one, acquiring an original emotion text sequence X, setting an emotion label for the original emotion text sequence X, forming an original data set by the emotion label and the original emotion text X, and dividing the original data set into a first training set and a first test set;

The emotion tag includes: admirable admiration, entertainment amusement, anger anger, vexation annoyance, approval approval, care caring, confusion confusion, curiosity curiosity, craving desire, aversion disgust, happy joy, disappointing disappointment, blame disapproval, embarrassing embarrassment, agitation excitement, fear fear, feeling gratitude, sadness grief, love, tension nervousness, optimistic optimism, pride pride, insight realization, relaxed relief, remonce remorse, sadness sadness, surprise surprise, neutral neuter.

Neutral means an emotion other than the foregoing emotion.

the attention mechanism model includes: knowledge base, representation matrix acquisition layer, dense layer, attention layer, two-layer dense network;

the knowledge base is used for converting an original emotion text sequence X in the first training set into a feature vector X _e;

The representation matrix acquisition layer acquires a representation matrix T _c＝{t₀,...,t_N corresponding to the hidden state of the last layer of the pre-trained PeM-Bert model by using the original emotion text sequence X in the first training set and the pre-trained PeM-Bert model;

wherein T _c∈R^N′lc,t_b is a representation corresponding to the [ CLS ] mark, which is a context representation of the original emotion sequence, b takes 0 to N, and N is the total number of time steps;

The dense layer is used for projecting the feature vector X _e to obtain sentence-level emotion codes H _e∈R^le′lc;

Where l _e is the dimension of X _e, l _c is the size of the output representation generated by the pre-trained PeM-Bert model, and refers to the length of the dimension or vector of the output representation, which is 768.

The attention layer obtains an attention score s by using T _c and H _e, and obtains a final text representation by using the attention score s, specifically:

K＝concat(T_c,H_e)

Then, using the contextual representation of the original emotional sequence, t ₀, as Query, the attention score s is obtained with K and t ₀:

Finally, the final text representation is obtained using the attention score s:

t_l＝s^T·K

And the two-layer dense network classifies the emotion by using the final text representation to obtain a final emotion category.

In the present embodiment, the representation corresponding to the hidden state from the last layer of PeM-Bert model constitutes the matrix T _c＝{t₀,...,t_N, and the representation T ₀ corresponding to the [ CLS ] token is classified as a contextual representation of the entire input text. The emotion encoding is obtained by converting the input X into a feature vector X _e through a knowledge base, whose dimensions depend on the dictionary data used, which we represent as l _e. We then project the transformed input vector using dense layers to form sentence-level emotion encoding H _e, where H _e∈R^le′lc. The emotion encoding H _e and the context representation T _c are then concatenated according to the self-attention technique to form K, where K ε R ^(N+le)′lc, using T ₀ as Query to obtain a softmax-attention score s. The final representation tl is obtained by weighting the matrix K with s. Inclusion of H _e with T _c in K helps to preserve the context information learned by the encoder and added emotional knowledge while re-weighting T0.

The second embodiment is as follows: as shown in fig. 1, the pre-trained PeM-Bert model is obtained by:

A1, preprocessing an original emotion text sequence X to obtain a preprocessed original emotion text sequence S, wherein the preprocessed original emotion text sequence S specifically comprises the following steps of:

Wherein a is a random number between 1 and 1/3;

The punctuation set is { ","; ","? ",": ", |! ",", "}, [ P _i ] is to insert punctuation marks into the original emotion text sequence to obtain an ith emotion text block, and m is the number of emotion text blocks;

A2, forming a pre-training data set by the preprocessed original emotion text sequence S and the tag, and dividing the pre-training data set into a second training set and a second testing set;

the PeM-Bert neural network model includes: an encoder, BERT model;

the encoder adopts a P-tuning method to convert the preprocessed original emotion text sequence S into a template T, and specifically comprises the following steps:

T＝{h₀,...,h_i,z(x),h_i+1,...,h_m,z(y)}

Where h _i (0.ltoreq.i < m) is a trainable vector inputting s= [ P ₀],...,[P_i],[P_i+1]...,[P_m ] to the encoded vector, z (x) represents a vector representation of the relevant emotion of the input text, and z (y) represents a vector representation of the target or reference emotion classification.

The BERT model optimizes the trainable vector h _i by using a template T by adopting eMLM method, and the following formula is adopted:

The corresponding position of the template T part in the shielding matrix M (x, y) is 1, and the rest part is 0;

The mask S obtained by eMLM method belongs to available dictionary Is defined as the term probability P (w _n) of the term:

Wherein E is a word belonging to the available dictionary in S, k is a probability of masking the word in the available dictionary, |·| represents the size of the set, and w _n is a word not belonging to the available dictionary in S;

And A4, testing the pre-trained PeM-Bert neural network model by using a second testing set, taking the pre-trained PeM-Bert neural network model as the pre-trained PeM-Bert model if the accuracy of the pre-trained PeM-Bert neural network model is greater than or equal to a preset second threshold, and returning to the step A1 if the accuracy of the trained PeM-Bert neural network model is smaller than the preset second threshold.

Unlike conventional sympt, P-tuning would use the pseudo [ P _i ] to replace discrete token and then use its continuous vector to build the target, i.e., template T, instead of randomly initializing and then training directly on several new token, P-tuning computes these Embedding through a small LSTM model and sets this LSTM model as learnable. However, this adds one more LSTM, and the overall feel is somewhat awkward and somewhat cumbersome to implement. Therefore, in the data construction, the mask is constructed according to the proportion of the pre-training eMLM tasks aiming at the training set, and the verification and test set does not construct a new mask. In the training section, the invention freezes all gradients of the pre-training PeM-Bert model except embedding layers; and for the back propagation of embedding layers, the invention designs a mask matrix with only the corresponding position of the template part being 1 and the other parts being 0, and only the gradient of the corresponding position of the template is reserved when the gradient propagates and updates each time through a register_hook, so that the effect of updating embedding of the template part only is achieved. Unlike BERT, which uses a unified probability (15%) to mask the tokens in the input sentence, we assign a higher probability to the tokens from the available dictionaryIs a word with rich emotion. We denote this probability by k, which is a super-parameter in the eMLM method. The masking process can be summarized as follows: given an input sentence S: extracting the Chinese characters belonging to dictionary/>And represent them with E; setting the masking probability of these words to/>A total of 15% of words are ensured to be masked. The present invention uses P-tuning to fine tune the entire model, including the eMLM embedded layers and other layers. The combination method can fully utilize priori knowledge extracted by eMLM and fully utilize the fine tuning capability of P-tuning, thereby improving the performance of the model in emotion detection.

And a third specific embodiment: the loss function adopted by the two-layer dense network is trained and obtained by the following modes:

And finally, inputting t _l into a two-layer dense network during classification to obtain the emotion output probability of corresponding input. In the final classification, we need to select f target classes from n' candidate classes, and it is common practice to activate with Sigmoid, then to change into n classification problems, and use the sum of the cross entropy of the classification as loss. Obviously, when n > f, this approach suffers from serious class imbalance problems.

The cross entropy of the single tag classification is shown as:

Wherein { s ₁,...,s_t-1,s_t+1,...,s_n' } is a non-target class score, s _t is a target class score, s _c is a non-target class score, c takes { 1..t-1, t+1..n' };

It was previously understood that logsumexp is in fact a smooth approximation of max, so there is the following equation:

The loss is characterized in that all non-target class scores { s ₁,...,s_t-1,s_t+1,...,s_n' } are compared with the target class score { s _t } in pairs, and the maximum value of the differences is as small as possible to be zero, so that the effect of 'the target class scores are larger than the score of each non-target class' is achieved.

Therefore, in case of a multi-label classification scenario with multiple target classes, the present invention expects that "each target class score is not smaller than the score of each non-target class", so the following penalty function is obtained:

Where Ω _pos,Ω_neg is the positive and negative sets of categories of the sample, respectively.

For a multi-label classification where f is not fixed, a threshold is required to determine which classes to output. For this reason, the invention introduces an additional class 0, wherein the scores of the target classes are all larger than r ₀, the scores of the non-target classes are all smaller than r ₀, and r _i'＜r_j' is added into log

Thus, the final Loss form of the "softmax+cross entropy" method is obtained, which is ultimately represented by the following formula:

Wherein Ω _neg is a set of negative class sets, i.e., non-target class samples, Ω _pos is a set of positive class sets, i.e., target class samples, r ₀ is a preset class 0 score, i 'is a non-target class sample label, j' is a target class sample label, r _i' is a non-target class sample score, and r _j' is a target class sample score.

The present invention does not suffer from class imbalance because it does not change the multi-label classification into multiple classification problems, but rather into a pairwise comparison of the target class score with the non-target class score, and by virtue of the good nature of logsumexp, automatically balances the weight of each item. Therefore, in the final classification, the invention adopts a scheme of softmax and cross entropy to obtain the emotion output probability of corresponding input.

Claims

1. PeM-Bert based knowledge embedded social media emotion detection system, characterized in that the system comprises: the system comprises a sequence acquisition module, an emotion detection module and a result output module;

The result output module is used for outputting emotion detection results;

The emotion detection network is obtained by:

2. The PeM-Bert based knowledge embedded social media emotion detection system of claim 1, wherein: the emotion tag includes: admirable admiration, entertainment amusement, anger anger, vexation annoyance, approval approval, care caring, confusion confusion, curiosity curiosity, craving desire, aversion disgust, happy joy, disappointing disappointment, blame disapproval, embarrassing embarrassment, agitation excitement, fear fear, feeling gratitude, sadness grief, love, tension nervousness, optimistic optimism, pride pride, insight realization, relaxed relief, remonce remorse, sadness sadness, surprise surprise, neutral neuter.

3. The PeM-Bert based knowledge embedded social media emotion detection system of claim 2, wherein: the attention mechanism model includes: knowledge base, representation matrix acquisition layer, dense layer, attention layer, two-layer dense network;

Wherein T _c∈R^N×lc,t_b is the contextual representation of the original emotion sequence, b takes 0 to N, N is the total number of time steps, and l _c is the output representation size generated by the pre-trained PeM-Bert model;

The dense layer is used for projecting the feature vector X _e to obtain sentence-level emotion codes H _e∈R^le×lc and sending H _e to the attention layer;

Wherein l _e is the dimension of X _e;

4. The PeM-Bert based knowledge embedded social media emotion detection system of claim 3, wherein: the pre-trained PeM-Bert model is obtained by:

5. The PeM-Bert based knowledge embedded social media emotion detection system of claim 4, wherein: the original emotion text sequence X in A1 is preprocessed to obtain a preprocessed original emotion text sequence S, which specifically includes:

Wherein a is a random number between 1 and 1/3;

the punctuation set is { ","; ","? ",": ", |! ",", ";

6. The PeM-Bert based knowledge embedded social media emotion detection system of claim 5, wherein: the PeM-Bert neural network model includes: an encoder, BERT model;

T＝{h₀,...,h_i,z(x),h_i+1,...,h_m,z(y)}

7. The PeM-Bert based knowledge embedded social media emotion detection system of claim 6, wherein: the output vector h _i of the optimized encoder using eMLM method and template T is specifically:

8. The PeM-Bert based knowledge embedded social media emotion detection system of claim 7, wherein: eMLM method for masking available dictionary in SThe probability of a midword P (w _n) is as follows:

9. The PeM-Bert based knowledge embedded social media emotion detection system of claim 8, wherein: the attention layer uses T _c and H _e to obtain an attention score s, and thus uses the attention score s to obtain a final text representation, specifically:

First, T _c and H _e are concatenated to obtain a concatenation representation K e R ^(N+le)×lc:

K＝concat(T_c,H_e)

then, the attention score s is obtained using K and t ₀:

t_l＝s^T·K。

10. the PeM-Bert based knowledge embedded social media emotion detection system of claim 9, wherein: the two-layer dense network is trained using the following loss function: