CN114417814B

CN114417814B - Word distributed expression learning system based on emotion knowledge enhancement

Info

Publication number: CN114417814B
Application number: CN202111531641.6A
Authority: CN
Inventors: 李优; 林志舟; 常亮; 林煜明
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-11-15
Anticipated expiration: 2041-12-14
Also published as: CN114417814A

Abstract

The invention relates to the technical field of emotion detection and emotion analysis, in particular to a word distributed expression learning system based on emotion knowledge enhancement, which comprises an emotion knowledge integration frame and a weak supervision knowledge generation frame; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of three parts, namely label-free text of a target domain, a domain-independent emotion dictionary and labels of the target domain text. The invention can better integrate emotion knowledge.

Description

Word distributed expression learning system based on emotion knowledge enhancement

Technical Field

The invention relates to the technical field of emotion detection and emotion analysis, in particular to a word distributed expression learning system based on emotion knowledge enhancement.

Background

Emotion analysis is an important task in natural language processing and can help consumers, companies and expert systems make more rational decisions. Word vectors are often used in existing research as features of the research, and are used to perform many tasks including sentiment analysis. However, existing word vector learning techniques do not take into account the target dependency of emotion information when dealing with emotion analysis. Such as: in the review sentence S1, "a newly purchased computer is running fast, but its power consumption is also fast," for the same evaluation "fast," existing models are unable to recognize that running fast "is advantageous for a computer, and that running fast" is disadvantageous for a battery. The lack of dependency on the sentimental information target will affect the sentimental analysis effectiveness of the model.

A knowledge graph is a knowledge base of a semantic network, usually presented in the form of triples: head (Head), relationship (Relation) and Tail (Tail). In the general knowledge map, a head part and a tail part are formed by entity nouns, and a relationship expresses the relation between the head part and the tail part in the real world. The emotion knowledge map is a further extension of the semantic network, the head of the emotion knowledge map is an evaluation target, the relation is evaluation content, and the tail of the emotion knowledge map is emotional tendency. For example, in the comment sentence S1, a computer or a battery can be used as an evaluation target, and can be used as evaluation content.

The emotion knowledge map is an aggregate of external knowledge containing a lot of dependency information, and the problem of 'dependency of emotion information target' can be solved to a certain extent by integrating the emotion knowledge map. However, in the existing research, the emotion knowledge map is very rare, and the construction of the emotion knowledge map is manually completed, so that a large amount of human resources are required for constructing the emotion knowledge map. Therefore, the problem of 'dependency on emotion information targets' cannot be completely solved simply by using the existing emotion knowledge map.

The emotion dictionary contains a lot of emotion information, and for a given vocabulary, the emotion dictionary can analyze the emotion polarity of the vocabulary. They have good results in many open areas.

Therefore, some scholars wish to improve the effect of emotion analysis by integrating emotion dictionaries, however, the effect of emotion knowledge is very limited, and they lack emotion dependence information in professional fields. The problem of "emotional information target dependency" in the professional field cannot be solved well.

Disclosure of Invention

It is an object of the present invention to provide a distributed representation learning system for words based on emotional knowledge enhancement that overcomes some or all of the deficiencies of the prior art.

The invention relates to a distributed expression learning system based on emotion knowledge enhancement, which comprises an emotion knowledge integration frame and a weak supervision knowledge generation frame; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of three parts, namely label-free text of a target domain, a domain-independent emotion dictionary and labels of the target domain text.

Preferably, in the knowledge query module, given a comment sentence S, the function of the knowledge query module is to help this sentence S find the knowledge that is most likely to help analyze the sentence S; in order to achieve the aim, the input sentences are segmented, and then each word is used as a query object to query the domain emotion dictionary DSD; and filtering the knowledge obtained by the query by using a filter, introducing knowledge expectation and a knowledge global attention mechanism, and dividing the filtered knowledge into three states: an original knowledge set o _ set, an expected knowledge set e _ set and a candidate knowledge set c _ set; the knowledge set obtained by the knowledge inquiry request, namely the original knowledge set, can be obtained by (1):

o_set＝Knowledge_Query(T,DSD) (1)

t is a Query word, knowledge _ Query is a Knowledge Query function, and o _ set is shown as (2):

o_set＝[(T,op ₀ ,judge ₀ ,fr ₀ ,conflict ₀ ,p_num ₀ ,n_num ₀ ,lexicon_po ₀ ),...,(T,op _i ,judge _i ,fr _i ,conflict _i ,p_num _i ,n_num _i ,lexicon_po _i )] (2)

knowledge in o _ set is raw, unprocessed, with op _i View knowledge matched by the query term T, judge _i Is that the query term T matches the viewpoint term op _i Post-assigned emotional polarity, fr _i Is a query term T and a viewpoint term op _i Number of occurrences on knowledge source corpus, conflict _i Means knowledge of whether knowledge is conflicted or not in knowledge source prediction, p _ num and n _ num represent the number of positive and negative recognitions in conflicted cognition, respectively, and lexicon_po _i An emotional tendency value representing the knowledge in an external emotional dictionary; in order to better screen the knowledge with conflict recognition, a knowledge expectation filter is introduced, and the potentially conflicting knowledge is filtered by (3):

e_set＝E_Filter(o_set,expectation_gate) (3)

in (3), E _ Filter is the knowledge expectation filtering function, E _ set is a subset of o _ set, and expectation _ gate is a hyper-parameter to Filter conflicting knowledge; however, knowledge expectation cannot judge whether the queried knowledge really has help for emotion analysis, introduce a knowledge global attention mechanism, and filter the knowledge in e _ set through the attention filter of (4):

c_set＝K_Attention(e_set,input ₀ ) (4)

c _ set in (4) is a set of triples, and the specific content is as shown in (5):

c_set＝[(T,op ₀ ,judge ₀ ),...,(T,op _s ,judge _s )] (5)

the op is a viewpoint word matched with the query word in the knowledge base, and the judge is the emotion polarity when the query word T is matched with the viewpoint word op; the knowledge in c _ set will be integrated into the text.

Preferably, the knowledge expectation is calculated and the potentially conflicting knowledge is filtered by equations (6), (7):

Em _op ＝(p_num/fr-n_num/fr) (6)

for the emotion classification task, p _ num and n _ num are the number of positive and negative labels assigned by the user to the query words and the viewpoint words in the dataset; for the emotion detection task, dividing emotions into two categories, namely, positive emotion orientation emotion and negative emotion orientation emotion, and taking the number of sub-labels under the two categories of labels as values of p _ num and n _ num; for the knowledge with larger collision probability, the expectation of the knowledge is smaller, so the knowledge can be effectively set by the expectation _ gateFiltering potentially conflicting knowledge; in equation (7), expectation is the calculated knowledge expectation, em _op Intermediate results of the desired calculation for knowledge, em, can be derived from equation (6) _i Expressing the expected value of knowledge when the occurrence frequency of the knowledge is i, by a summation formula

The goal of normalizing the knowledge expectation can be achieved.

Preferably, in the knowledge global attention mechanism, the best matching knowledge for the text is selected by the formulas (8), (9), (10):

simi＝sim(op1,op2)＝cos(vec(op1,op2)) (9)

dis＝|argmax(S)-idx(T)|

S＝sim(opj,input0[i]) (10)

equation (8) includes two steps:

1) Firstly, calculating similarity information and distance information in e _ set; similarity information simi and distance information

The method is obtained by comparing the viewpoint words in the knowledge with the viewpoint words in the input text; as shown in formula (9), the sima is calculated by cosine similarity of the vectorized op1 and op2, wherein op1 is a viewpoint word appearing in the knowledge, and op2 is a viewpoint word appearing in the input text; distance information

The matching degree of the viewpoint words and the query words T is represented; s _ l is the number of words in the input text, in formula (10), the input text is traversed first to obtain a similarity array S of viewpoint words and the text, and then the number of words separated by the viewpoint word with the maximum similarity and the query word T is searched;

2) After similarity information and position information are calculated, considering the information balance problem; knowledge to be integrated has a lower expectation because in equation (3) a lower expectation threshold is set for having more knowledge in e _ set; therefore, in the formula (8), the knowledge expectation obtained by the calculation of the formula (7) is reused, and the knowledge expectation information and the calculation result of the step 1) are subjected to a balance calculation; wherein the hyperparameter C in equation (8) is the balance factor; and finally obtaining the weight w of the knowledge through a formula (8-10), and sorting the knowledge according to w to select the most effective knowledge for the input text.

Preferably, the knowledge integration module is used for integrating the knowledge output by the knowledge query module into the input text; input for input text ₀ Final integrated knowledge is K1 and K2; the integration of K1 and K2 can help the system make more reasonable inferences, however directly stitching K1 and K2 into the input text can misinterpret the meaning of the input text itself.

Preferably, the word representation generation module is configured to enhance knowledge of the text input ₁ Converting into a word representation of knowledge enhancement; knowledge-enhanced text input first ₁ Will be converted to the sum of three codes: sequence coding, segment coding and position coding; the encoded sum is then passed as input to the system.

The invention provides a strategy for automatically generating emotional knowledge, designs a general emotional knowledge integration frame and helps a model to generate word vectors which are enhanced in emotional semantics and contain emotional dependency information. The injected emotional knowledge is filtered considering the possible existence of noise in the automatically generated emotional knowledge, and is assisted by a strict knowledge filtering strategy.

The strategy for automatically generating the emotional knowledge provided by the invention can directly extract the emotional knowledge from the text data. Considering the complexity and the conflict of human emotion, the knowledge expectation provided by the invention utilizes the statistical information of the knowledge to filter potential conflict knowledge. Avoiding these conflicting knowledge misleading the model.

The general emotion knowledge integration framework provided by the invention can select the best matched knowledge for the text through the designed filter, and the knowledge noise optimization target is added for the system by considering that the filtering strategy can not filter all noises, so that the word vector containing 'emotion information target dependence' can be better generated by the model.

Drawings

FIG. 1 is a schematic diagram of an architecture of a distributed term representation learning system based on emotional knowledge enhancement in embodiment 1;

FIG. 2 is a schematic diagram of text analysis of a target area in embodiment 1;

FIG. 3 is a graph showing the effect of the constraint variable g on the experimental results in example 1;

fig. 4 is a graph showing the effect of the constraint variable λ on the experimental effect in example 1.

Detailed Description

For a further understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings and examples. It is to be understood that the examples are illustrative of the invention and not limiting.

Example 1

As shown in fig. 1, this embodiment 1 provides a distributed expression learning system based on emotion knowledge enhancement, which includes an emotion knowledge integration framework and a weakly supervised knowledge generation framework; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of three parts, namely label-free text of a target domain, a domain-independent emotion dictionary and labels of the target domain text.

In FIG. 1, english is translated as follows:

input0 original input text

input1 knowledge enhanced text

Knowledge of global Attention

The model pre-trained by text data in natural language processing is characterized by bidirectional coding of a transform structure and can be used for generating word vectors

Domain text unlabeled domain text can be used to extract viewpoint word pairs

Corresponding label of text label field text

lexicons emotion dictionary capable of assigning emotional tendency to extracted viewpoint word pair

And (4) integrating Source Incorporation resources, and integrating the field texts, the text labels and the emotion dictionaries into a field emotion dictionary.

Filters knowledge filter, filtering invalid knowledge in the inquiry module

Loss functions of the BERT loss BERT model, including MLM and NSP loss functions

Noise loss function, loss function modeled together with BERT loss

SVD singular value decomposition algorithm

Special notation for classification tasks in [ CLS ] BERT, representing global context semantics

[ SEP ] BERT symbol for sentence separation, which represents the end symbol of a sentence if there is only one input sentence

Down-Stream tags downstream tasks, in this framework mainly refer to emotion classification and emotion detection

Knowledge Query Module of Knowledge Query

Knowledge integration Module of Knowledge integration

Word Representation Generation Module.

Knowledge inquiry module

Given a comment sentence S, the function of the knowledge query module is to help this sentence S find the knowledge that is most likely to help analyze the sentence S; in order to achieve the aim, the input sentences are segmented, and then each word is used as a query object to query the domain emotion dictionary DSD; and filtering the knowledge obtained by the query by using a filter, introducing knowledge expectation and a knowledge global attention mechanism, and dividing the filtered knowledge into three states: an original knowledge set o _ set, an expected knowledge set e _ set and a candidate knowledge set c _ set; the knowledge set obtained by the knowledge inquiry request, namely the original knowledge set, can be obtained by (1):

o_set＝Knowledge_Query(T,DSD) (1)

subscripts 0 and i both represent the number of terms, and the knowledge in o _ set is raw, unprocessed, where op _i View knowledge matched by the query term T, judge _i Is that the query term T matches the viewpoint term op _i Post-assigned emotional polarity, fr _i Is a query term T and a viewpoint term op _i Number of occurrences on knowledge source corpus, conflict _i Means knowledge of whether or not there is a conflict in knowledge source prediction, p _ num and n _ num represent the number of positive and negative recognitions in conflict recognition, respectively, lexicon _ po _i Representing an emotional tendency value of the knowledge in an external emotional dictionary; in order to better screen the knowledge with conflict recognition, a knowledge expectation filter is introduced to filter the potentially conflicting knowledge by (3):

e_set＝E_Filter(o_set,expectation_gate) (3)

c_set＝K_Attention(e_set,input ₀ ) (4)

c_set＝[(T,op ₀ ,judge ₀ ),...,(T,op _s ,judge _s )] (5)

subscript s represents that s knowledge is shared, op is a viewpoint word matched with the query word in the knowledge base, and Judge is the emotion polarity when the query word T is matched with the viewpoint word op; the knowledge in c _ set will be integrated into the text.

Knowledge expectation

Different users may have different opinions about the same pair of query terms and viewpoint terms. Taking the query word "movie" as an example, regarding the viewpoint word "new", it is found through summarizing the published movie review data set that 5 users like new movies and 4 users do not like new movies among 9 queried user reviews. We consider such knowledge as potentially conflicting knowledge, since whatever emotional polarity we assign to "movies" and "new" may mislead the model's knowledge of the new movie. The knowledge of potential conflicts is therefore filtered by equations (6), (7):

Em _op ＝(p_num/fr-n_num/fr) (6)

for the emotion classification task, p _ num and n _ num are the number of positive and negative labels assigned by the user to the query term and the opinion term in the data set; for the emotion detection task, dividing emotions into two categories, namely, positive emotion orientation emotion and negative emotion orientation emotion, and taking the number of sub-labels under the two categories of labels as values of p _ num and n _ num; for the knowledge with larger collision probability, the knowledge expectation is smaller, so the knowledge with potential collision can be effectively filtered by setting the expectation _ gate. In equation (7), expectation is the calculated knowledge expectation, em _op Intermediate results of the desired calculation for knowledge, can be obtained from equation (6), em _i Expressing the expected value of knowledge when the occurrence frequency of the knowledge is i, by the summation formula

The goal of normalizing the knowledge expectation can be achieved.

Knowledge global attention mechanism

To avoid integrating knowledge that is not relevant to the text, we have devised a knowledge global attention mechanism, selecting the best matching knowledge for the text by equations (8), (9), (10):

simi＝sim(op1,op2)＝cos(vec(op1,op2)) (9)

equation (8) includes two steps:

Is obtained by comparing the viewpoint words in the knowledge with the viewpoint words in the input text; as shown in formula (9), the sima is calculated by cosine similarity of the vectorized op1 and op2, wherein op1 is a viewpoint word appearing in the knowledge, and op2 is a viewpoint word appearing in the input text; distance information

Showing the matching degree of the viewpoint words and the query words T; s _ l is the number of words in the input text, in formula (10), the input text is traversed first to obtain a similarity array S of viewpoint words and the text, and then the number of words separated by the viewpoint word with the maximum similarity and the query word T is searched;

2) After similarity information and position information are calculated, considering the information balance problem; knowledge to be integrated has a lower expectation value of knowledge because in equation (3), a lower expectation threshold is set for having more knowledge in e _ set; therefore, in the formula (8), the knowledge expectation obtained by the calculation of the formula (7) is reused, and the knowledge expectation information and the calculation result of the step 1) are subjected to a balance calculation; wherein the hyperparameter C in equation (8) is the balance factor; and finally, obtaining the weight w of the knowledge through a formula (8-10), and sorting the knowledge according to w to select the most effective knowledge for the input text.

Knowledge integration module

The knowledge integration module has the function of integrating the knowledge output by the knowledge query module into the input text; as shown in FIG. 1, for input text input ₀ Final integrated knowledge is K1 and K2; the integration of K1 and K2 can help the system (BERT model) to make more reasonable inferences, however, directly stitching K1 and K2 into the input text can misinterpret the meaning of the input text itself.

Word representation generation module

The word representation generation module has the function of enhancing knowledge of text input ₁ Converting into a word representation of knowledge enhancement; knowledge-enhanced text input first ₁ Will be converted to the sum of three codes: sequence coding, segment coding and position coding; the encoded sum is then passed as input to the system (BERT model). The BERT model here contains 12 layers, 12 multi-head attention blocks and the dimension of the word vector to be output finally is 768 dimensions. Like other pre-trained language models, the BERT model herein also includes two phases: pre-training and fine-tuning. The word representation obtained by the pre-training comprises general knowledge, and the fine tuning stage is firstly initialized by using the word vector obtained by the pre-training and is used for carrying out integrated learning on the word representation and the picked knowledge. The word representation learning module is constrained by the knowledge noise module besides the training task of the word representation learning module.

Training an objective function

The objective function consists of three parts: 1) Mask language model loss functions (MLM) that can help the model capture the semantics of individual words within a sentence. 2) A next-sentence prediction loss function (NSP), which may help the model capture the relationships between sentences. 3) Knowledge noise constraint loss functions (KNC) that can denoise the text of the integrated knowledge. Where 1) and 2) are consistent with the BERT model, and 3) is the loss function we have designed.

As shown in equation (11), for the KNC loss function, singular Value Decomposition (SVD) is performed by applying the [ CLS ] tag, i.e., the above-mentioned tag containing the meaning of the whole text.

[CLS]＝UΣV ^T (11)

In equation (11), U and Σ are feature vectors obtained by calculation decomposition by the SVD algorithm and corresponding feature values, respectively. After singular value decomposition, the elements of the main diagonal of the sigma matrix are in descending order. After the main diagonal elements are extracted, the eigenvalue arrangement shown in (12) can be obtained.

SVT＝[σ ₁ ,σ ₂ ,σ ₃ ,...,σ _b ] (12)

For an element in the SVT, its g tail eigenvalues are constrained by equation (13).

The objective function of the whole frame is shown as (14), where λ is the hyperparameter of the loss function of 1) and 2) that balances the noise constraint loss function BERT itself.

L _total ＝L _MLM +L _NSP +λ·L _KNC (14)

Weak supervision knowledge generation framework

For the text data under each domain, the weakly supervised knowledge generation framework will automatically generate a domain emotion dictionary (DSD) for the text data. The generated emotion dictionary integrates three parts of resources, which are respectively:

a) Unlabeled text of target domain

b) Domain independent sentiment dictionary

c) Label for target domain text

Where a) and b) are essential resources and c) are optional resources, higher quality knowledge can be generated if the text of the target domain has tags, and knowledge can be generated without tags.

a) Unlabeled text of target domain

By using the text of the target domain, a viewpoint word pair (a word formed by combining the viewpoint words and the query word is simply referred to as a viewpoint word pair, such as "movie" and "new", which is a viewpoint word pair) can be extracted. Sptaverd's syntactic dependency parse tree and the corresponding amod and nsubj rules are used to extract pairs of opinion words.

As shown in fig. 2, for the text "delivery soup, through the nodles power just undersequenced," the syntactic dependency tree can be used to find that the viewpoint word pair that can be extracted by the amod rule is: "delivery food", the pair of viewpoint words that nsubj can extract is "undersked nodles".

b) Domain independent emotion dictionary

The domain emotion dictionary assigns an appropriate emotion polarity to each extracted viewpoint word pair. Taking the sentiment word dictionary of 3.0, the opinion word in the opinion word pair will first query the sentiment dictionary and calculate the sentiment polarity of the opinion word in the sentiment word. For some viewpoint words, the word may not be found in the emotion dictionary, and for the viewpoint word pair containing such viewpoint words, their emotion polarity assignment will depend on the text label extracting this viewpoint word pair, if the text has a label, the text label will be converted into the emotion polarity of the viewpoint word pair, and if the text has no label, such viewpoint word pair will be directly discarded to avoid introducing knowledge noise. After integrating the text and emotion dictionary information of the target domain, a knowledge triple set (viewpoint word pair + emotion tendency) of the specific domain can be generated.

c) Label for target domain text

Besides assigning emotion labels to the viewpoint word pairs, the text labels in the target domain can also help the calculation of knowledge expectation. After integrating resource a) and resource b), we get a triple set (pairs of term + emotional tendency), and define the emotional tendency in the triple set as voting tags, and for the text in the whole domain, there may be multiple voting tags for the same pair of term, such as the pair of term mentioned above for "new movie", and for a total of 9 voting tags, their emotional polarities are 5 positive and 4 negative. The number of these labels is p _ num and n _ num in equation (6). With the help of the voting labels, the expectation of knowledge can be calculated, and the potentially conflicting knowledge can be filtered out through the expectation of knowledge.

Data set

We verify the effectiveness of our model on the emotion classification and emotion detection data sets. The details of the individual data sets are shown in Table 1, for each data set we have generated corresponding emotional knowledge. To better evaluate the effect of the model on these different domain datasets, we fit the datasets as 7:1:2 to Train (Train), validate (Dev) and Test (Test) sets.

TABLE 1 Multi-Domain data set information Table

Wherein S.C. indicates the task type as emotion classification, E.D. indicates the task type as emotion detection

SST data sets: the dataset contained in the Stanford Sentiment Tree library (SST, stanford Sentiment Treebank) is derived from movie reviews. The task corresponding to the data set is a sentence-level emotion classification task, and the probability that the emotion tendency of a sentence is positive can be obtained by analyzing the original data of the data set. SST-3 converts the original data into three types of emotion polarity labels according to the probability, wherein the three types of emotion polarity labels are respectively as follows: positive, neutral, negative. SST-5 converts the original data into five types of emotion labels according to probability, wherein the five types of emotion labels are respectively as follows: very positive, neutral, negative, very negative.

MR data set: the Movie Review data set (MR) classifies the collected Movie Review data into positive and negative according to emotion polarity. The task corresponding to the data set is an emotion classification task at a sentence level, and is a two-classification task.

Alm data set: fairy tale data set (Effect data, distributed by Cecilia Ovesdotter Alm). The Alm data set originates from the fairy tale of a book. It contains five categories of emotions: anger (anger-distust), fear (fearful), joy (happy), heart hurt (sad), surprise (surrised). It is a sentence-level emotion detection task.

Aman dataset: blog Dataset (Emotion-Anotated Dataset, distributed by Saima Aman). The Aman data set comprises a large amount of informal blog data, and the task corresponding to the data set is an emotion detection task. The data set contained 1290 band signatures of hearts (happy), hearts (sad), nausea (distust), fearful (fearful), and surprise (surrised), respectively.

Baseline model

The knowledge enhancement model is compared with a large-scale corpus pre-training model, an emotion knowledge enhanced pre-training language model and a general emotion word representation learning model which is not pre-trained respectively. These three types of models and our emotional knowledge enhancement model contain the following content and introduction:

large-scale corpus pre-training model: we used BERT and BERT-PT as baselines for large-scale pre-training language models. Wherein BERT is a model pre-trained on Wikipedia and book corpora, and BERT-PT is a model pre-trained on five-star reviewed Amazon data and Yelp data sets. The two models can achieve excellent performance on tasks processed by various natural languages

The emotion knowledge enhanced pre-training language model: we used sentIBERT and K-BERT as baselines for the pre-trained linguistic model for emotion knowledge enhancement. sentiBERT facilitates the effect of multi-class emotion analysis tasks by integrating semantic turn knowledge. K-BERT is a knowledge-driven task that facilitates sentiment analysis by integrating a knowledge graph. Since K-BERT is not specifically designed for English text, we have improved the K-BERT code and let it integrate our own generated knowledge in a unconstrained way.

The general emotion words which are not pre-trained represent learning models: we used SGlove and Emo2Ve as the untrained general emotion word representation learning model baseline. Before the appearance of pre-trained language models, they all achieved very competitive performance on emotion analysis.

Our emotional knowledge enhancement model: our emotional knowledge enhancement models are SKG-BERT and SKG-BERT-PT. SKG-BERT and SKG-BERT-PT are models obtained by applying knowledge generated automatically and corresponding knowledge constraint strategies to a BERT model and a BERT-PT model in a large-scale pre-training language model. These two models can further explore whether our framework and knowledge constraint strategies are still effective if pre-training has learned relevant knowledge.

Experimental setup

Experimental data of three major types of baseline models are obtained by recurrence. In order to ensure the fairness of experimental comparison, for a large-scale corpus pre-training model and an emotion knowledge enhanced pre-training language model in three large Base line models, the settings of the large-scale corpus pre-training model and the emotion knowledge enhanced pre-training language model are kept consistent with those of a Base edition BERT model. For the untrained general emotion word representation learning model, we spliced their word vectors with GloVe and used a logistic regression classifier to perform emotion analysis correlation studies.

Our emotion knowledge enhancement models are SKG-BERT and SKG-BERT-PT, and their pre-training parameters are directly converted from BERT and BERT-PT. The process of knowledge enhancement and fusion is performed during the course of fine tuning. Our experiments were performed on an AMAX compute server (a Tesla V100 GPU). In searching for the best parameters, we use a grid parameter tuning method.

TABLE 2 search table for hyper-parameters

choice indicates that the parameters in the option are to be tested.

As shown in Table 2, when the domain emotion dictionary DSD is generated by integrating resources, an entity filter is set to filter words with higher frequency, knowledge expectation is set in a knowledge query module, a threshold value of the knowledge expectation is set to be 0.6 through data analysis and observation of knowledge, and all knowledge with expectation lower than 0.6 is filtered. We constrain the knowledge by g, which is the number of features we want to constrain in equation (13), and λ, which is the hyper-parameter we use to balance the loss function in equation (14). C is the hyper-parameter used in equation (8) to balance distance information and similarity information, and similarity gate is the hyper-parameter used in the knowledge global attention mechanism, and knowledge below the similarity threshold will not be integrated into the text. The optimal hyper-parameters are searched by using grid parameter adjustment, for emotion classification tasks, the accuracy is a reference standard for determining the optimal parameters during parameter adjustment, and for emotion detection, the reference standard during parameter adjustment in a macro F1 time is used. In order to make our model more consistent with the real world situation, our knowledge generation strategy only generates knowledge for the training set, because the test data is often unknown in the real world.

Results of the experiment

Emotion analysis

The effect of the evaluation on SST-3, SST5 and the MR data set is shown in Table 3. Although the data sets come from different domains, SKG-BERT-PT still achieves the best results compared to all other baseline models. The effectiveness of the word representation of the external emotional knowledge enhancement is reflected. Compared with the large-scale corpus pre-training models BERT and BERT-PT without emotion knowledge enhancement, after the framework proposed by the inventor and the enhancement of the generated automatic knowledge, the SKG-BERT and the SKG-BERT-PT respectively make remarkable improvements on the emotion analysis effect. The effectiveness of the general framework, the generated emotional knowledge and the corresponding knowledge constraint strategy proposed by us is proved.

Emotion classification accuracy (%) of Table 3 model

Emotion detection

We further verified the effects of SKG-BERT and SKG-BERT-PT on the Alm dataset and the Aman dataset, and the model performance is shown in Table 4. We evaluated the models using the macroscopic F1 values, while we also recorded their macroscopic accuracy and recall for each model. Table 4 presents the overall effect of our model, in general our affective knowledge enhancement model SKG-BERT performed best on the Alm dataset and SKG-BERT performed best on the Aman dataset compared to all other baseline models. This further embodies the superior effect of our model on a more fine-grained emotion detection task. In addition, because the corpus of the BERT pre-training is books and Wikipedia, the corpus of the BERT-PT pre-training is comment data, and the Alm data is from fairy tale books, which may coincide with the corpus of the BERT pre-training, the performance of the BERT model will be better than that of the BERT-PT model on the Alm data set, and the performance of the SKG-BERT model will be better than that of the SKG-BERT-PT model on the Alm data set. The Aman data set is from a blog, the text of the blog is mostly informal text, and the text in the form is similar to the pre-training form of the BERT-PT corpus, so that the performance of the BERT-PT model is superior to that of the BERT model and the performance of the SKG-BERT-PT model is superior to that of the SKG-BERT model in the Aman data set.

TABLE 4 accuracy of model emotion detection (P., precision), recall (R., recall), macroscopic F1

Analysis of Experimental Effect

To better demonstrate the effectiveness of our proposed framework, the resulting containing "emotional target-dependent knowledge" and knowledge constraints, we performed further experimental exploration. Under different knowledge constraint strategies, we calculated the macroscopic F1 values of the Alm dataset. The effect diagrams are shown in fig. 3 and 4.

Influence of the constraint variables g and λ

The variable g is the characteristic number of the constraint we refer to in equation (13), and λ is the loss function balance variable we refer to in equation (14), which play a very important role in the knowledge constraint process. Fig. 3 and 4 were experimentally analyzed for their effects. We have found the following:

1) Without knowledge constraints (when g is 0 or λ is 0, our model has no knowledge constraints) our model is still able to improve the performance of the baseline model on the Alm dataset. The quality of our generated emotional knowledge proves to be high.

2) Appropriate g and λ can help the model to perform best. Setting g too large will affect downstream tasks because too many constrained features will change the meaning of the text itself. Setting λ too large will also reduce the performance of the model, since too large λ will result in the model being optimized without considering the emotion detection task itself, but with too much attention paid to noise reduction.

3) The model performance can be promoted by only applying the knowledge constraint strategy to the model without knowledge enhancement. Since the text entered may itself be noisy.

Effects of entity Filter

For those entity words that frequently appear in the pre-training, it is likely that their relevant knowledge has already been learned in the pre-training, and injecting this knowledge into the model will not be of significant help. We therefore use entity filters to filter out words that may occur frequently in the pre-trained corpus, ensuring high quality of the knowledge we generate. From fig. 3 and 4, we have the following findings:

1) The quality of the knowledge we generate is also different for entity filters of different lengths. The overall trend is that the longer the length of the entity rejection limit, the higher the quality of knowledge we generate (in fig. 3 and 4, this finding can be obtained when we set g and λ to 0, respectively).

2) The proper length of the solid filter can improve the effect of the model better if the length of the solid filter is not longer or better. Since the total amount of knowledge generated will also decrease as the entity filter length increases. It is important to balance the quality of good knowledge with the overall amount of knowledge.

By our analytical exploration experiments we can conclude that: the generated knowledge, the corresponding knowledge constraint strategy and the universal emotion knowledge integration framework proposed by the inventor are effective, and the problem of 'target dependence of emotion information' proposed by the inventor at the beginning is solved.

The present invention and its embodiments have been described above schematically, and the description is not intended to be limiting, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, without departing from the spirit of the present invention, a person of ordinary skill in the art should understand that the present invention shall not be limited to the embodiments and the similar structural modes without creative design.

Claims

1. A distributed expression learning system of words based on emotion knowledge enhancement is characterized in that: the system comprises an emotion knowledge integration framework and a weak supervision knowledge generation framework; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of a label-free text of a target domain, a domain-independent emotion dictionary and a label of the target domain text;

in the knowledge query module, a comment sentence S is given, and the function of the knowledge query module is to help the sentence S to find the knowledge which is most likely to help the sentence S to be analyzed; in order to achieve the aim, the input sentences are segmented, and then each word is used as a query object to query a domain emotion dictionary (DSD); and filtering the knowledge obtained by the query by using a filter, introducing knowledge expectation and a knowledge global attention mechanism, and dividing the filtered knowledge into three states: an original knowledge set o _ set, an expected knowledge set e _ set and a candidate knowledge set c _ set; the knowledge set obtained by the knowledge query request, i.e. the original knowledge set, can be obtained by (1):

o_set＝Knowledge_Query(T,DSD) (1)

t is a Query word, knowledge _ Query is a Knowledge Query function, and o _ set specific content is as shown in (2):

knowledge in o _ set is raw, unprocessed, with op _i View knowledge matched by the query term T, judge _i Is that the query term T matches the viewpoint term op _i Post-assigned emotional polarity, fr _i Is a query term T and a viewpoint term op _i Number of occurrences on knowledge source corpus, conflict _i Means the knowledge of whether there is a conflict in knowledge source prediction, p _ num and n _ num represent the number of positive and negative recognitions in conflict recognition, respectively, lexicon _ po _i Representing an emotional tendency value of the knowledge in an external emotional dictionary;

then introduce a knowledge expectation filter to filter potentially conflicting knowledge by (3):

e_set＝E_Filter(o_set,expectation_gate) (3)

in (3), E _ Filter is the knowledge expectation Filter function, E _ set is a subset of o _ set, and expectation _ gate is a hyper-parameter to Filter conflicting knowledge;

then introduce a knowledge global attention mechanism and filter the knowledge in e _ set through the attention filter of (4):

c_set＝K_Attention(e_set,input0) (4)

c_set＝[(T,op0,judge0),...,(T,op _s ,judge _s )] (5)

the op is a viewpoint word matched with the query word in the knowledge base, and the judge is the emotion polarity when the query word T is matched with the viewpoint word op; the knowledge in c _ set will be integrated into the text;

the knowledge expectation is calculated and potentially conflicting knowledge filtered by equations (6), (7):

Em _op ＝(p_num/fr-n_num/fr) (6)

for the emotion classification task, p _ num and n _ num are the number of positive and negative labels assigned by the user to the query term and the opinion term in the data set; for the emotion detection task, the emotions are divided into two categories, namely, the emotion with positive emotion orientation and the emotion with negative emotion orientation respectively, and the number of sub-labels under the two categories of labels is used as the values of p _ num and n _ num; for the knowledge with higher collision probability, the expectation of the knowledge is smaller, so that the knowledge with potential collision can be effectively filtered by setting the expectation _ gate; in equation (7), expectation is the calculated knowledge expectation, em _op Intermediate results of the desired calculation for knowledge, em, can be derived from equation (6) _i Expressing the expected value of knowledge when the occurrence frequency of the knowledge is i, by the summation formula

The purpose of normalizing the knowledge expectation can be achieved;

in the knowledge global attention mechanism, the knowledge that matches the text best is selected by the formulas (8), (9), (10):

simi＝sim(op1,op2)＝cos(vec(op1,op2)) (9)

dis＝|argmax(S)-idx(T)|

S＝sim(op _j ,input ₀ [i]) (10)

equation (8) includes two steps:

2) After similarity information and position information are calculated, considering the information balance problem; knowledge to be integrated has a lower expectation because in equation (3) a lower expectation threshold is set for having more knowledge in e _ set; therefore, in the formula (8), the knowledge expectation expecteration calculated by the formula (7) is reused, and balance calculation is performed on the knowledge expectation information and the calculation result in the step 1); wherein the hyperparameter C in equation (8) is the balance factor; finally, the weight w of the knowledge is obtained through a formula (8-10), and the most effective knowledge for the input text can be selected by sequencing the knowledge according to w;

the knowledge integration module is used for integrating the knowledge output by the knowledge query module into the input text; for input text input ₀ Final integrated knowledge is K1 and K2; the integration of K1 and K2 can help the system to make more reasonable inferences, however, the direct splicing of K1 and K2 into the input text can misinterpret the meaning of the input text itself;

word representation generation module for knowledge enhanced text input ₁ Conversion to knowledge enhancementA word representation; knowledge-enhanced text input first ₁ Will be converted to the sum of three codes: sequence coding, segment coding and position coding; the encoded sum is then passed as input to the system.