CN114417814B - Word distributed expression learning system based on emotion knowledge enhancement - Google Patents

Word distributed expression learning system based on emotion knowledge enhancement Download PDF

Info

Publication number
CN114417814B
CN114417814B CN202111531641.6A CN202111531641A CN114417814B CN 114417814 B CN114417814 B CN 114417814B CN 202111531641 A CN202111531641 A CN 202111531641A CN 114417814 B CN114417814 B CN 114417814B
Authority
CN
China
Prior art keywords
knowledge
emotion
query
expectation
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111531641.6A
Other languages
Chinese (zh)
Other versions
CN114417814A (en
Inventor
李优
林志舟
常亮
林煜明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202111531641.6A priority Critical patent/CN114417814B/en
Publication of CN114417814A publication Critical patent/CN114417814A/en
Application granted granted Critical
Publication of CN114417814B publication Critical patent/CN114417814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of emotion detection and emotion analysis, in particular to a word distributed expression learning system based on emotion knowledge enhancement, which comprises an emotion knowledge integration frame and a weak supervision knowledge generation frame; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of three parts, namely label-free text of a target domain, a domain-independent emotion dictionary and labels of the target domain text. The invention can better integrate emotion knowledge.

Description

Word distributed expression learning system based on emotion knowledge enhancement
Technical Field
The invention relates to the technical field of emotion detection and emotion analysis, in particular to a word distributed expression learning system based on emotion knowledge enhancement.
Background
Emotion analysis is an important task in natural language processing and can help consumers, companies and expert systems make more rational decisions. Word vectors are often used in existing research as features of the research, and are used to perform many tasks including sentiment analysis. However, existing word vector learning techniques do not take into account the target dependency of emotion information when dealing with emotion analysis. Such as: in the review sentence S1, "a newly purchased computer is running fast, but its power consumption is also fast," for the same evaluation "fast," existing models are unable to recognize that running fast "is advantageous for a computer, and that running fast" is disadvantageous for a battery. The lack of dependency on the sentimental information target will affect the sentimental analysis effectiveness of the model.
A knowledge graph is a knowledge base of a semantic network, usually presented in the form of triples: head (Head), relationship (Relation) and Tail (Tail). In the general knowledge map, a head part and a tail part are formed by entity nouns, and a relationship expresses the relation between the head part and the tail part in the real world. The emotion knowledge map is a further extension of the semantic network, the head of the emotion knowledge map is an evaluation target, the relation is evaluation content, and the tail of the emotion knowledge map is emotional tendency. For example, in the comment sentence S1, a computer or a battery can be used as an evaluation target, and can be used as evaluation content.
The emotion knowledge map is an aggregate of external knowledge containing a lot of dependency information, and the problem of 'dependency of emotion information target' can be solved to a certain extent by integrating the emotion knowledge map. However, in the existing research, the emotion knowledge map is very rare, and the construction of the emotion knowledge map is manually completed, so that a large amount of human resources are required for constructing the emotion knowledge map. Therefore, the problem of 'dependency on emotion information targets' cannot be completely solved simply by using the existing emotion knowledge map.
The emotion dictionary contains a lot of emotion information, and for a given vocabulary, the emotion dictionary can analyze the emotion polarity of the vocabulary. They have good results in many open areas.
Therefore, some scholars wish to improve the effect of emotion analysis by integrating emotion dictionaries, however, the effect of emotion knowledge is very limited, and they lack emotion dependence information in professional fields. The problem of "emotional information target dependency" in the professional field cannot be solved well.
Disclosure of Invention
It is an object of the present invention to provide a distributed representation learning system for words based on emotional knowledge enhancement that overcomes some or all of the deficiencies of the prior art.
The invention relates to a distributed expression learning system based on emotion knowledge enhancement, which comprises an emotion knowledge integration frame and a weak supervision knowledge generation frame; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of three parts, namely label-free text of a target domain, a domain-independent emotion dictionary and labels of the target domain text.
Preferably, in the knowledge query module, given a comment sentence S, the function of the knowledge query module is to help this sentence S find the knowledge that is most likely to help analyze the sentence S; in order to achieve the aim, the input sentences are segmented, and then each word is used as a query object to query the domain emotion dictionary DSD; and filtering the knowledge obtained by the query by using a filter, introducing knowledge expectation and a knowledge global attention mechanism, and dividing the filtered knowledge into three states: an original knowledge set o _ set, an expected knowledge set e _ set and a candidate knowledge set c _ set; the knowledge set obtained by the knowledge inquiry request, namely the original knowledge set, can be obtained by (1):
o_set=Knowledge_Query(T,DSD) (1)
t is a Query word, knowledge _ Query is a Knowledge Query function, and o _ set is shown as (2):
o_set=[(T,op 0 ,judge 0 ,fr 0 ,conflict 0 ,p_num 0 ,n_num 0 ,lexicon_po 0 ),...,(T,op i ,judge i ,fr i ,conflict i ,p_num i ,n_num i ,lexicon_po i )] (2)
knowledge in o _ set is raw, unprocessed, with op i View knowledge matched by the query term T, judge i Is that the query term T matches the viewpoint term op i Post-assigned emotional polarity, fr i Is a query term T and a viewpoint term op i Number of occurrences on knowledge source corpus, conflict i Means knowledge of whether knowledge is conflicted or not in knowledge source prediction, p _ num and n _ num represent the number of positive and negative recognitions in conflicted cognition, respectively, and lexicon_po i An emotional tendency value representing the knowledge in an external emotional dictionary; in order to better screen the knowledge with conflict recognition, a knowledge expectation filter is introduced, and the potentially conflicting knowledge is filtered by (3):
e_set=E_Filter(o_set,expectation_gate) (3)
in (3), E _ Filter is the knowledge expectation filtering function, E _ set is a subset of o _ set, and expectation _ gate is a hyper-parameter to Filter conflicting knowledge; however, knowledge expectation cannot judge whether the queried knowledge really has help for emotion analysis, introduce a knowledge global attention mechanism, and filter the knowledge in e _ set through the attention filter of (4):
c_set=K_Attention(e_set,input 0 ) (4)
c _ set in (4) is a set of triples, and the specific content is as shown in (5):
c_set=[(T,op 0 ,judge 0 ),...,(T,op s ,judge s )] (5)
the op is a viewpoint word matched with the query word in the knowledge base, and the judge is the emotion polarity when the query word T is matched with the viewpoint word op; the knowledge in c _ set will be integrated into the text.
Preferably, the knowledge expectation is calculated and the potentially conflicting knowledge is filtered by equations (6), (7):
Em op =(p_num/fr-n_num/fr) (6)
Figure BDA0003411008200000031
for the emotion classification task, p _ num and n _ num are the number of positive and negative labels assigned by the user to the query words and the viewpoint words in the dataset; for the emotion detection task, dividing emotions into two categories, namely, positive emotion orientation emotion and negative emotion orientation emotion, and taking the number of sub-labels under the two categories of labels as values of p _ num and n _ num; for the knowledge with larger collision probability, the expectation of the knowledge is smaller, so the knowledge can be effectively set by the expectation _ gateFiltering potentially conflicting knowledge; in equation (7), expectation is the calculated knowledge expectation, em op Intermediate results of the desired calculation for knowledge, em, can be derived from equation (6) i Expressing the expected value of knowledge when the occurrence frequency of the knowledge is i, by a summation formula
Figure BDA0003411008200000041
The goal of normalizing the knowledge expectation can be achieved.
Preferably, in the knowledge global attention mechanism, the best matching knowledge for the text is selected by the formulas (8), (9), (10):
Figure BDA0003411008200000042
simi=sim(op1,op2)=cos(vec(op1,op2)) (9)
dis=|argmax(S)-idx(T)|
S=sim(opj,input0[i]) (10)
equation (8) includes two steps:
1) Firstly, calculating similarity information and distance information in e _ set; similarity information simi and distance information
Figure BDA0003411008200000043
The method is obtained by comparing the viewpoint words in the knowledge with the viewpoint words in the input text; as shown in formula (9), the sima is calculated by cosine similarity of the vectorized op1 and op2, wherein op1 is a viewpoint word appearing in the knowledge, and op2 is a viewpoint word appearing in the input text; distance information
Figure BDA0003411008200000044
The matching degree of the viewpoint words and the query words T is represented; s _ l is the number of words in the input text, in formula (10), the input text is traversed first to obtain a similarity array S of viewpoint words and the text, and then the number of words separated by the viewpoint word with the maximum similarity and the query word T is searched;
2) After similarity information and position information are calculated, considering the information balance problem; knowledge to be integrated has a lower expectation because in equation (3) a lower expectation threshold is set for having more knowledge in e _ set; therefore, in the formula (8), the knowledge expectation obtained by the calculation of the formula (7) is reused, and the knowledge expectation information and the calculation result of the step 1) are subjected to a balance calculation; wherein the hyperparameter C in equation (8) is the balance factor; and finally obtaining the weight w of the knowledge through a formula (8-10), and sorting the knowledge according to w to select the most effective knowledge for the input text.
Preferably, the knowledge integration module is used for integrating the knowledge output by the knowledge query module into the input text; input for input text 0 Final integrated knowledge is K1 and K2; the integration of K1 and K2 can help the system make more reasonable inferences, however directly stitching K1 and K2 into the input text can misinterpret the meaning of the input text itself.
Preferably, the word representation generation module is configured to enhance knowledge of the text input 1 Converting into a word representation of knowledge enhancement; knowledge-enhanced text input first 1 Will be converted to the sum of three codes: sequence coding, segment coding and position coding; the encoded sum is then passed as input to the system.
The invention provides a strategy for automatically generating emotional knowledge, designs a general emotional knowledge integration frame and helps a model to generate word vectors which are enhanced in emotional semantics and contain emotional dependency information. The injected emotional knowledge is filtered considering the possible existence of noise in the automatically generated emotional knowledge, and is assisted by a strict knowledge filtering strategy.
The strategy for automatically generating the emotional knowledge provided by the invention can directly extract the emotional knowledge from the text data. Considering the complexity and the conflict of human emotion, the knowledge expectation provided by the invention utilizes the statistical information of the knowledge to filter potential conflict knowledge. Avoiding these conflicting knowledge misleading the model.
The general emotion knowledge integration framework provided by the invention can select the best matched knowledge for the text through the designed filter, and the knowledge noise optimization target is added for the system by considering that the filtering strategy can not filter all noises, so that the word vector containing 'emotion information target dependence' can be better generated by the model.
Drawings
FIG. 1 is a schematic diagram of an architecture of a distributed term representation learning system based on emotional knowledge enhancement in embodiment 1;
FIG. 2 is a schematic diagram of text analysis of a target area in embodiment 1;
FIG. 3 is a graph showing the effect of the constraint variable g on the experimental results in example 1;
fig. 4 is a graph showing the effect of the constraint variable λ on the experimental effect in example 1.
Detailed Description
For a further understanding of the present invention, reference is made to the following detailed description taken in conjunction with the accompanying drawings and examples. It is to be understood that the examples are illustrative of the invention and not limiting.
Example 1
As shown in fig. 1, this embodiment 1 provides a distributed expression learning system based on emotion knowledge enhancement, which includes an emotion knowledge integration framework and a weakly supervised knowledge generation framework; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of three parts, namely label-free text of a target domain, a domain-independent emotion dictionary and labels of the target domain text.
In FIG. 1, english is translated as follows:
input0 original input text
input1 knowledge enhanced text
Knowledge of global Attention
The model pre-trained by text data in natural language processing is characterized by bidirectional coding of a transform structure and can be used for generating word vectors
Domain text unlabeled domain text can be used to extract viewpoint word pairs
Corresponding label of text label field text
lexicons emotion dictionary capable of assigning emotional tendency to extracted viewpoint word pair
And (4) integrating Source Incorporation resources, and integrating the field texts, the text labels and the emotion dictionaries into a field emotion dictionary.
Filters knowledge filter, filtering invalid knowledge in the inquiry module
Loss functions of the BERT loss BERT model, including MLM and NSP loss functions
Noise loss function, loss function modeled together with BERT loss
SVD singular value decomposition algorithm
Special notation for classification tasks in [ CLS ] BERT, representing global context semantics
[ SEP ] BERT symbol for sentence separation, which represents the end symbol of a sentence if there is only one input sentence
Down-Stream tags downstream tasks, in this framework mainly refer to emotion classification and emotion detection
Knowledge Query Module of Knowledge Query
Knowledge integration Module of Knowledge integration
Word Representation Generation Module.
Knowledge inquiry module
Given a comment sentence S, the function of the knowledge query module is to help this sentence S find the knowledge that is most likely to help analyze the sentence S; in order to achieve the aim, the input sentences are segmented, and then each word is used as a query object to query the domain emotion dictionary DSD; and filtering the knowledge obtained by the query by using a filter, introducing knowledge expectation and a knowledge global attention mechanism, and dividing the filtered knowledge into three states: an original knowledge set o _ set, an expected knowledge set e _ set and a candidate knowledge set c _ set; the knowledge set obtained by the knowledge inquiry request, namely the original knowledge set, can be obtained by (1):
o_set=Knowledge_Query(T,DSD) (1)
t is a Query word, knowledge _ Query is a Knowledge Query function, and o _ set is shown as (2):
Figure BDA0003411008200000071
subscripts 0 and i both represent the number of terms, and the knowledge in o _ set is raw, unprocessed, where op i View knowledge matched by the query term T, judge i Is that the query term T matches the viewpoint term op i Post-assigned emotional polarity, fr i Is a query term T and a viewpoint term op i Number of occurrences on knowledge source corpus, conflict i Means knowledge of whether or not there is a conflict in knowledge source prediction, p _ num and n _ num represent the number of positive and negative recognitions in conflict recognition, respectively, lexicon _ po i Representing an emotional tendency value of the knowledge in an external emotional dictionary; in order to better screen the knowledge with conflict recognition, a knowledge expectation filter is introduced to filter the potentially conflicting knowledge by (3):
e_set=E_Filter(o_set,expectation_gate) (3)
in (3), E _ Filter is the knowledge expectation filtering function, E _ set is a subset of o _ set, and expectation _ gate is a hyper-parameter to Filter conflicting knowledge; however, knowledge expectation cannot judge whether the queried knowledge really has help for emotion analysis, introduce a knowledge global attention mechanism, and filter the knowledge in e _ set through the attention filter of (4):
c_set=K_Attention(e_set,input 0 ) (4)
c _ set in (4) is a set of triples, and the specific content is as shown in (5):
c_set=[(T,op 0 ,judge 0 ),...,(T,op s ,judge s )] (5)
subscript s represents that s knowledge is shared, op is a viewpoint word matched with the query word in the knowledge base, and Judge is the emotion polarity when the query word T is matched with the viewpoint word op; the knowledge in c _ set will be integrated into the text.
Knowledge expectation
Different users may have different opinions about the same pair of query terms and viewpoint terms. Taking the query word "movie" as an example, regarding the viewpoint word "new", it is found through summarizing the published movie review data set that 5 users like new movies and 4 users do not like new movies among 9 queried user reviews. We consider such knowledge as potentially conflicting knowledge, since whatever emotional polarity we assign to "movies" and "new" may mislead the model's knowledge of the new movie. The knowledge of potential conflicts is therefore filtered by equations (6), (7):
Em op =(p_num/fr-n_num/fr) (6)
Figure BDA0003411008200000081
for the emotion classification task, p _ num and n _ num are the number of positive and negative labels assigned by the user to the query term and the opinion term in the data set; for the emotion detection task, dividing emotions into two categories, namely, positive emotion orientation emotion and negative emotion orientation emotion, and taking the number of sub-labels under the two categories of labels as values of p _ num and n _ num; for the knowledge with larger collision probability, the knowledge expectation is smaller, so the knowledge with potential collision can be effectively filtered by setting the expectation _ gate. In equation (7), expectation is the calculated knowledge expectation, em op Intermediate results of the desired calculation for knowledge, can be obtained from equation (6), em i Expressing the expected value of knowledge when the occurrence frequency of the knowledge is i, by the summation formula
Figure BDA0003411008200000082
The goal of normalizing the knowledge expectation can be achieved.
Knowledge global attention mechanism
To avoid integrating knowledge that is not relevant to the text, we have devised a knowledge global attention mechanism, selecting the best matching knowledge for the text by equations (8), (9), (10):
Figure BDA0003411008200000091
simi=sim(op1,op2)=cos(vec(op1,op2)) (9)
Figure BDA0003411008200000094
equation (8) includes two steps:
1) Firstly, calculating similarity information and distance information in e _ set; similarity information simi and distance information
Figure BDA0003411008200000092
Is obtained by comparing the viewpoint words in the knowledge with the viewpoint words in the input text; as shown in formula (9), the sima is calculated by cosine similarity of the vectorized op1 and op2, wherein op1 is a viewpoint word appearing in the knowledge, and op2 is a viewpoint word appearing in the input text; distance information
Figure BDA0003411008200000093
Showing the matching degree of the viewpoint words and the query words T; s _ l is the number of words in the input text, in formula (10), the input text is traversed first to obtain a similarity array S of viewpoint words and the text, and then the number of words separated by the viewpoint word with the maximum similarity and the query word T is searched;
2) After similarity information and position information are calculated, considering the information balance problem; knowledge to be integrated has a lower expectation value of knowledge because in equation (3), a lower expectation threshold is set for having more knowledge in e _ set; therefore, in the formula (8), the knowledge expectation obtained by the calculation of the formula (7) is reused, and the knowledge expectation information and the calculation result of the step 1) are subjected to a balance calculation; wherein the hyperparameter C in equation (8) is the balance factor; and finally, obtaining the weight w of the knowledge through a formula (8-10), and sorting the knowledge according to w to select the most effective knowledge for the input text.
Knowledge integration module
The knowledge integration module has the function of integrating the knowledge output by the knowledge query module into the input text; as shown in FIG. 1, for input text input 0 Final integrated knowledge is K1 and K2; the integration of K1 and K2 can help the system (BERT model) to make more reasonable inferences, however, directly stitching K1 and K2 into the input text can misinterpret the meaning of the input text itself.
Word representation generation module
The word representation generation module has the function of enhancing knowledge of text input 1 Converting into a word representation of knowledge enhancement; knowledge-enhanced text input first 1 Will be converted to the sum of three codes: sequence coding, segment coding and position coding; the encoded sum is then passed as input to the system (BERT model). The BERT model here contains 12 layers, 12 multi-head attention blocks and the dimension of the word vector to be output finally is 768 dimensions. Like other pre-trained language models, the BERT model herein also includes two phases: pre-training and fine-tuning. The word representation obtained by the pre-training comprises general knowledge, and the fine tuning stage is firstly initialized by using the word vector obtained by the pre-training and is used for carrying out integrated learning on the word representation and the picked knowledge. The word representation learning module is constrained by the knowledge noise module besides the training task of the word representation learning module.
Training an objective function
The objective function consists of three parts: 1) Mask language model loss functions (MLM) that can help the model capture the semantics of individual words within a sentence. 2) A next-sentence prediction loss function (NSP), which may help the model capture the relationships between sentences. 3) Knowledge noise constraint loss functions (KNC) that can denoise the text of the integrated knowledge. Where 1) and 2) are consistent with the BERT model, and 3) is the loss function we have designed.
As shown in equation (11), for the KNC loss function, singular Value Decomposition (SVD) is performed by applying the [ CLS ] tag, i.e., the above-mentioned tag containing the meaning of the whole text.
[CLS]=UΣV T (11)
In equation (11), U and Σ are feature vectors obtained by calculation decomposition by the SVD algorithm and corresponding feature values, respectively. After singular value decomposition, the elements of the main diagonal of the sigma matrix are in descending order. After the main diagonal elements are extracted, the eigenvalue arrangement shown in (12) can be obtained.
SVT=[σ 123 ,...,σ b ] (12)
For an element in the SVT, its g tail eigenvalues are constrained by equation (13).
Figure BDA0003411008200000101
The objective function of the whole frame is shown as (14), where λ is the hyperparameter of the loss function of 1) and 2) that balances the noise constraint loss function BERT itself.
L total =L MLM +L NSP +λ·L KNC (14)
Weak supervision knowledge generation framework
For the text data under each domain, the weakly supervised knowledge generation framework will automatically generate a domain emotion dictionary (DSD) for the text data. The generated emotion dictionary integrates three parts of resources, which are respectively:
a) Unlabeled text of target domain
b) Domain independent sentiment dictionary
c) Label for target domain text
Where a) and b) are essential resources and c) are optional resources, higher quality knowledge can be generated if the text of the target domain has tags, and knowledge can be generated without tags.
a) Unlabeled text of target domain
By using the text of the target domain, a viewpoint word pair (a word formed by combining the viewpoint words and the query word is simply referred to as a viewpoint word pair, such as "movie" and "new", which is a viewpoint word pair) can be extracted. Sptaverd's syntactic dependency parse tree and the corresponding amod and nsubj rules are used to extract pairs of opinion words.
As shown in fig. 2, for the text "delivery soup, through the nodles power just undersequenced," the syntactic dependency tree can be used to find that the viewpoint word pair that can be extracted by the amod rule is: "delivery food", the pair of viewpoint words that nsubj can extract is "undersked nodles".
b) Domain independent emotion dictionary
The domain emotion dictionary assigns an appropriate emotion polarity to each extracted viewpoint word pair. Taking the sentiment word dictionary of 3.0, the opinion word in the opinion word pair will first query the sentiment dictionary and calculate the sentiment polarity of the opinion word in the sentiment word. For some viewpoint words, the word may not be found in the emotion dictionary, and for the viewpoint word pair containing such viewpoint words, their emotion polarity assignment will depend on the text label extracting this viewpoint word pair, if the text has a label, the text label will be converted into the emotion polarity of the viewpoint word pair, and if the text has no label, such viewpoint word pair will be directly discarded to avoid introducing knowledge noise. After integrating the text and emotion dictionary information of the target domain, a knowledge triple set (viewpoint word pair + emotion tendency) of the specific domain can be generated.
c) Label for target domain text
Besides assigning emotion labels to the viewpoint word pairs, the text labels in the target domain can also help the calculation of knowledge expectation. After integrating resource a) and resource b), we get a triple set (pairs of term + emotional tendency), and define the emotional tendency in the triple set as voting tags, and for the text in the whole domain, there may be multiple voting tags for the same pair of term, such as the pair of term mentioned above for "new movie", and for a total of 9 voting tags, their emotional polarities are 5 positive and 4 negative. The number of these labels is p _ num and n _ num in equation (6). With the help of the voting labels, the expectation of knowledge can be calculated, and the potentially conflicting knowledge can be filtered out through the expectation of knowledge.
Data set
We verify the effectiveness of our model on the emotion classification and emotion detection data sets. The details of the individual data sets are shown in Table 1, for each data set we have generated corresponding emotional knowledge. To better evaluate the effect of the model on these different domain datasets, we fit the datasets as 7:1:2 to Train (Train), validate (Dev) and Test (Test) sets.
TABLE 1 Multi-Domain data set information Table
Figure BDA0003411008200000121
Wherein S.C. indicates the task type as emotion classification, E.D. indicates the task type as emotion detection
SST data sets: the dataset contained in the Stanford Sentiment Tree library (SST, stanford Sentiment Treebank) is derived from movie reviews. The task corresponding to the data set is a sentence-level emotion classification task, and the probability that the emotion tendency of a sentence is positive can be obtained by analyzing the original data of the data set. SST-3 converts the original data into three types of emotion polarity labels according to the probability, wherein the three types of emotion polarity labels are respectively as follows: positive, neutral, negative. SST-5 converts the original data into five types of emotion labels according to probability, wherein the five types of emotion labels are respectively as follows: very positive, neutral, negative, very negative.
MR data set: the Movie Review data set (MR) classifies the collected Movie Review data into positive and negative according to emotion polarity. The task corresponding to the data set is an emotion classification task at a sentence level, and is a two-classification task.
Alm data set: fairy tale data set (Effect data, distributed by Cecilia Ovesdotter Alm). The Alm data set originates from the fairy tale of a book. It contains five categories of emotions: anger (anger-distust), fear (fearful), joy (happy), heart hurt (sad), surprise (surrised). It is a sentence-level emotion detection task.
Aman dataset: blog Dataset (Emotion-Anotated Dataset, distributed by Saima Aman). The Aman data set comprises a large amount of informal blog data, and the task corresponding to the data set is an emotion detection task. The data set contained 1290 band signatures of hearts (happy), hearts (sad), nausea (distust), fearful (fearful), and surprise (surrised), respectively.
Baseline model
The knowledge enhancement model is compared with a large-scale corpus pre-training model, an emotion knowledge enhanced pre-training language model and a general emotion word representation learning model which is not pre-trained respectively. These three types of models and our emotional knowledge enhancement model contain the following content and introduction:
large-scale corpus pre-training model: we used BERT and BERT-PT as baselines for large-scale pre-training language models. Wherein BERT is a model pre-trained on Wikipedia and book corpora, and BERT-PT is a model pre-trained on five-star reviewed Amazon data and Yelp data sets. The two models can achieve excellent performance on tasks processed by various natural languages
The emotion knowledge enhanced pre-training language model: we used sentIBERT and K-BERT as baselines for the pre-trained linguistic model for emotion knowledge enhancement. sentiBERT facilitates the effect of multi-class emotion analysis tasks by integrating semantic turn knowledge. K-BERT is a knowledge-driven task that facilitates sentiment analysis by integrating a knowledge graph. Since K-BERT is not specifically designed for English text, we have improved the K-BERT code and let it integrate our own generated knowledge in a unconstrained way.
The general emotion words which are not pre-trained represent learning models: we used SGlove and Emo2Ve as the untrained general emotion word representation learning model baseline. Before the appearance of pre-trained language models, they all achieved very competitive performance on emotion analysis.
Our emotional knowledge enhancement model: our emotional knowledge enhancement models are SKG-BERT and SKG-BERT-PT. SKG-BERT and SKG-BERT-PT are models obtained by applying knowledge generated automatically and corresponding knowledge constraint strategies to a BERT model and a BERT-PT model in a large-scale pre-training language model. These two models can further explore whether our framework and knowledge constraint strategies are still effective if pre-training has learned relevant knowledge.
Experimental setup
Experimental data of three major types of baseline models are obtained by recurrence. In order to ensure the fairness of experimental comparison, for a large-scale corpus pre-training model and an emotion knowledge enhanced pre-training language model in three large Base line models, the settings of the large-scale corpus pre-training model and the emotion knowledge enhanced pre-training language model are kept consistent with those of a Base edition BERT model. For the untrained general emotion word representation learning model, we spliced their word vectors with GloVe and used a logistic regression classifier to perform emotion analysis correlation studies.
Our emotion knowledge enhancement models are SKG-BERT and SKG-BERT-PT, and their pre-training parameters are directly converted from BERT and BERT-PT. The process of knowledge enhancement and fusion is performed during the course of fine tuning. Our experiments were performed on an AMAX compute server (a Tesla V100 GPU). In searching for the best parameters, we use a grid parameter tuning method.
TABLE 2 search table for hyper-parameters
Figure BDA0003411008200000141
choice indicates that the parameters in the option are to be tested.
As shown in Table 2, when the domain emotion dictionary DSD is generated by integrating resources, an entity filter is set to filter words with higher frequency, knowledge expectation is set in a knowledge query module, a threshold value of the knowledge expectation is set to be 0.6 through data analysis and observation of knowledge, and all knowledge with expectation lower than 0.6 is filtered. We constrain the knowledge by g, which is the number of features we want to constrain in equation (13), and λ, which is the hyper-parameter we use to balance the loss function in equation (14). C is the hyper-parameter used in equation (8) to balance distance information and similarity information, and similarity gate is the hyper-parameter used in the knowledge global attention mechanism, and knowledge below the similarity threshold will not be integrated into the text. The optimal hyper-parameters are searched by using grid parameter adjustment, for emotion classification tasks, the accuracy is a reference standard for determining the optimal parameters during parameter adjustment, and for emotion detection, the reference standard during parameter adjustment in a macro F1 time is used. In order to make our model more consistent with the real world situation, our knowledge generation strategy only generates knowledge for the training set, because the test data is often unknown in the real world.
Results of the experiment
Emotion analysis
The effect of the evaluation on SST-3, SST5 and the MR data set is shown in Table 3. Although the data sets come from different domains, SKG-BERT-PT still achieves the best results compared to all other baseline models. The effectiveness of the word representation of the external emotional knowledge enhancement is reflected. Compared with the large-scale corpus pre-training models BERT and BERT-PT without emotion knowledge enhancement, after the framework proposed by the inventor and the enhancement of the generated automatic knowledge, the SKG-BERT and the SKG-BERT-PT respectively make remarkable improvements on the emotion analysis effect. The effectiveness of the general framework, the generated emotional knowledge and the corresponding knowledge constraint strategy proposed by us is proved.
Emotion classification accuracy (%) of Table 3 model
Figure BDA0003411008200000151
Emotion detection
We further verified the effects of SKG-BERT and SKG-BERT-PT on the Alm dataset and the Aman dataset, and the model performance is shown in Table 4. We evaluated the models using the macroscopic F1 values, while we also recorded their macroscopic accuracy and recall for each model. Table 4 presents the overall effect of our model, in general our affective knowledge enhancement model SKG-BERT performed best on the Alm dataset and SKG-BERT performed best on the Aman dataset compared to all other baseline models. This further embodies the superior effect of our model on a more fine-grained emotion detection task. In addition, because the corpus of the BERT pre-training is books and Wikipedia, the corpus of the BERT-PT pre-training is comment data, and the Alm data is from fairy tale books, which may coincide with the corpus of the BERT pre-training, the performance of the BERT model will be better than that of the BERT-PT model on the Alm data set, and the performance of the SKG-BERT model will be better than that of the SKG-BERT-PT model on the Alm data set. The Aman data set is from a blog, the text of the blog is mostly informal text, and the text in the form is similar to the pre-training form of the BERT-PT corpus, so that the performance of the BERT-PT model is superior to that of the BERT model and the performance of the SKG-BERT-PT model is superior to that of the SKG-BERT model in the Aman data set.
TABLE 4 accuracy of model emotion detection (P., precision), recall (R., recall), macroscopic F1
Figure BDA0003411008200000161
Analysis of Experimental Effect
To better demonstrate the effectiveness of our proposed framework, the resulting containing "emotional target-dependent knowledge" and knowledge constraints, we performed further experimental exploration. Under different knowledge constraint strategies, we calculated the macroscopic F1 values of the Alm dataset. The effect diagrams are shown in fig. 3 and 4.
Influence of the constraint variables g and λ
The variable g is the characteristic number of the constraint we refer to in equation (13), and λ is the loss function balance variable we refer to in equation (14), which play a very important role in the knowledge constraint process. Fig. 3 and 4 were experimentally analyzed for their effects. We have found the following:
1) Without knowledge constraints (when g is 0 or λ is 0, our model has no knowledge constraints) our model is still able to improve the performance of the baseline model on the Alm dataset. The quality of our generated emotional knowledge proves to be high.
2) Appropriate g and λ can help the model to perform best. Setting g too large will affect downstream tasks because too many constrained features will change the meaning of the text itself. Setting λ too large will also reduce the performance of the model, since too large λ will result in the model being optimized without considering the emotion detection task itself, but with too much attention paid to noise reduction.
3) The model performance can be promoted by only applying the knowledge constraint strategy to the model without knowledge enhancement. Since the text entered may itself be noisy.
Effects of entity Filter
For those entity words that frequently appear in the pre-training, it is likely that their relevant knowledge has already been learned in the pre-training, and injecting this knowledge into the model will not be of significant help. We therefore use entity filters to filter out words that may occur frequently in the pre-trained corpus, ensuring high quality of the knowledge we generate. From fig. 3 and 4, we have the following findings:
1) The quality of the knowledge we generate is also different for entity filters of different lengths. The overall trend is that the longer the length of the entity rejection limit, the higher the quality of knowledge we generate (in fig. 3 and 4, this finding can be obtained when we set g and λ to 0, respectively).
2) The proper length of the solid filter can improve the effect of the model better if the length of the solid filter is not longer or better. Since the total amount of knowledge generated will also decrease as the entity filter length increases. It is important to balance the quality of good knowledge with the overall amount of knowledge.
By our analytical exploration experiments we can conclude that: the generated knowledge, the corresponding knowledge constraint strategy and the universal emotion knowledge integration framework proposed by the inventor are effective, and the problem of 'target dependence of emotion information' proposed by the inventor at the beginning is solved.
The present invention and its embodiments have been described above schematically, and the description is not intended to be limiting, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, without departing from the spirit of the present invention, a person of ordinary skill in the art should understand that the present invention shall not be limited to the embodiments and the similar structural modes without creative design.

Claims (1)

1. A distributed expression learning system of words based on emotion knowledge enhancement is characterized in that: the system comprises an emotion knowledge integration framework and a weak supervision knowledge generation framework; the emotion knowledge integration framework comprises a knowledge query module, a knowledge integration module and a word representation generation module; the weak supervision knowledge generation framework is used for generating a domain emotion dictionary DSD, and the DSD integrates resources of a label-free text of a target domain, a domain-independent emotion dictionary and a label of the target domain text;
in the knowledge query module, a comment sentence S is given, and the function of the knowledge query module is to help the sentence S to find the knowledge which is most likely to help the sentence S to be analyzed; in order to achieve the aim, the input sentences are segmented, and then each word is used as a query object to query a domain emotion dictionary (DSD); and filtering the knowledge obtained by the query by using a filter, introducing knowledge expectation and a knowledge global attention mechanism, and dividing the filtered knowledge into three states: an original knowledge set o _ set, an expected knowledge set e _ set and a candidate knowledge set c _ set; the knowledge set obtained by the knowledge query request, i.e. the original knowledge set, can be obtained by (1):
o_set=Knowledge_Query(T,DSD) (1)
t is a Query word, knowledge _ Query is a Knowledge Query function, and o _ set specific content is as shown in (2):
o_set=[(T,op 0 ,judge 0 ,fr 0 ,conflict 0 ,p_num 0 ,n_num 0 ,lexicon_po 0 ),...,(T,op i ,judge i ,fr i ,conflict i ,p_num i ,n_num i ,lexicon_po i )] (2)
knowledge in o _ set is raw, unprocessed, with op i View knowledge matched by the query term T, judge i Is that the query term T matches the viewpoint term op i Post-assigned emotional polarity, fr i Is a query term T and a viewpoint term op i Number of occurrences on knowledge source corpus, conflict i Means the knowledge of whether there is a conflict in knowledge source prediction, p _ num and n _ num represent the number of positive and negative recognitions in conflict recognition, respectively, lexicon _ po i Representing an emotional tendency value of the knowledge in an external emotional dictionary;
then introduce a knowledge expectation filter to filter potentially conflicting knowledge by (3):
e_set=E_Filter(o_set,expectation_gate) (3)
in (3), E _ Filter is the knowledge expectation Filter function, E _ set is a subset of o _ set, and expectation _ gate is a hyper-parameter to Filter conflicting knowledge;
then introduce a knowledge global attention mechanism and filter the knowledge in e _ set through the attention filter of (4):
c_set=K_Attention(e_set,input0) (4)
c _ set in (4) is a set of triples, and the specific content is as shown in (5):
c_set=[(T,op0,judge0),...,(T,op s ,judge s )] (5)
the op is a viewpoint word matched with the query word in the knowledge base, and the judge is the emotion polarity when the query word T is matched with the viewpoint word op; the knowledge in c _ set will be integrated into the text;
the knowledge expectation is calculated and potentially conflicting knowledge filtered by equations (6), (7):
Em op =(p_num/fr-n_num/fr) (6)
Figure FDA0003800887000000021
for the emotion classification task, p _ num and n _ num are the number of positive and negative labels assigned by the user to the query term and the opinion term in the data set; for the emotion detection task, the emotions are divided into two categories, namely, the emotion with positive emotion orientation and the emotion with negative emotion orientation respectively, and the number of sub-labels under the two categories of labels is used as the values of p _ num and n _ num; for the knowledge with higher collision probability, the expectation of the knowledge is smaller, so that the knowledge with potential collision can be effectively filtered by setting the expectation _ gate; in equation (7), expectation is the calculated knowledge expectation, em op Intermediate results of the desired calculation for knowledge, em, can be derived from equation (6) i Expressing the expected value of knowledge when the occurrence frequency of the knowledge is i, by the summation formula
Figure FDA0003800887000000022
The purpose of normalizing the knowledge expectation can be achieved;
in the knowledge global attention mechanism, the knowledge that matches the text best is selected by the formulas (8), (9), (10):
Figure FDA0003800887000000023
simi=sim(op1,op2)=cos(vec(op1,op2)) (9)
dis=|argmax(S)-idx(T)|
S=sim(op j ,input 0 [i]) (10)
equation (8) includes two steps:
1) Firstly, calculating similarity information and distance information in e _ set; similarity information simi and distance information
Figure FDA0003800887000000031
The method is obtained by comparing the viewpoint words in the knowledge with the viewpoint words in the input text; as shown in formula (9), the sima is calculated by cosine similarity of the vectorized op1 and op2, wherein op1 is a viewpoint word appearing in the knowledge, and op2 is a viewpoint word appearing in the input text; distance information
Figure FDA0003800887000000032
Showing the matching degree of the viewpoint words and the query words T; s _ l is the number of words in the input text, in formula (10), the input text is traversed first to obtain a similarity array S of viewpoint words and the text, and then the number of words separated by the viewpoint word with the maximum similarity and the query word T is searched;
2) After similarity information and position information are calculated, considering the information balance problem; knowledge to be integrated has a lower expectation because in equation (3) a lower expectation threshold is set for having more knowledge in e _ set; therefore, in the formula (8), the knowledge expectation expecteration calculated by the formula (7) is reused, and balance calculation is performed on the knowledge expectation information and the calculation result in the step 1); wherein the hyperparameter C in equation (8) is the balance factor; finally, the weight w of the knowledge is obtained through a formula (8-10), and the most effective knowledge for the input text can be selected by sequencing the knowledge according to w;
the knowledge integration module is used for integrating the knowledge output by the knowledge query module into the input text; for input text input 0 Final integrated knowledge is K1 and K2; the integration of K1 and K2 can help the system to make more reasonable inferences, however, the direct splicing of K1 and K2 into the input text can misinterpret the meaning of the input text itself;
word representation generation module for knowledge enhanced text input 1 Conversion to knowledge enhancementA word representation; knowledge-enhanced text input first 1 Will be converted to the sum of three codes: sequence coding, segment coding and position coding; the encoded sum is then passed as input to the system.
CN202111531641.6A 2021-12-14 2021-12-14 Word distributed expression learning system based on emotion knowledge enhancement Active CN114417814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111531641.6A CN114417814B (en) 2021-12-14 2021-12-14 Word distributed expression learning system based on emotion knowledge enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111531641.6A CN114417814B (en) 2021-12-14 2021-12-14 Word distributed expression learning system based on emotion knowledge enhancement

Publications (2)

Publication Number Publication Date
CN114417814A CN114417814A (en) 2022-04-29
CN114417814B true CN114417814B (en) 2022-11-15

Family

ID=81268153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111531641.6A Active CN114417814B (en) 2021-12-14 2021-12-14 Word distributed expression learning system based on emotion knowledge enhancement

Country Status (1)

Country Link
CN (1) CN114417814B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750648A (en) * 2019-10-21 2020-02-04 南京大学 Text emotion classification method based on deep learning and feature fusion
CN113535889A (en) * 2020-04-20 2021-10-22 阿里巴巴集团控股有限公司 Comment analysis method and device
CN113535957A (en) * 2021-07-27 2021-10-22 哈尔滨工业大学 Conversation emotion recognition network model based on dual knowledge interaction and multitask learning, construction method, electronic device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933664B (en) * 2019-03-12 2021-09-07 中南大学 Fine-grained emotion analysis improvement method based on emotion word embedding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750648A (en) * 2019-10-21 2020-02-04 南京大学 Text emotion classification method based on deep learning and feature fusion
CN113535889A (en) * 2020-04-20 2021-10-22 阿里巴巴集团控股有限公司 Comment analysis method and device
CN113535957A (en) * 2021-07-27 2021-10-22 哈尔滨工业大学 Conversation emotion recognition network model based on dual knowledge interaction and multitask learning, construction method, electronic device and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-Head Self-Attention Transformation Networks for Aspect-Based Sentiment Analysis;Yuming Lin et.al;《IEEE Access》;IEEE;20210105;第9卷;第8762-8770页 *
基于依赖联系分析的观点词对协同抽取;赵威等;《计算机科学》;20200815;第47卷(第8期);第164-170页 *
基于注意力机制的细粒度文本情感分析研究;孙玲;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20210215;I138-2683 *

Also Published As

Publication number Publication date
CN114417814A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
Devika et al. Sentiment analysis: a comparative study on different approaches
Liu et al. Learning to spot and refactor inconsistent method names
Chouikhi et al. Arabic sentiment analysis using BERT model
Zhao et al. ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN
Ge et al. BACO: A background knowledge-and content-based framework for citing sentence generation
Kaur Incorporating sentimental analysis into development of a hybrid classification model: A comprehensive study
Al-Rajebah et al. Extracting ontologies from Arabic Wikipedia: A linguistic approach
CN112183059A (en) Chinese structured event extraction method
Drozdov et al. Unsupervised labeled parsing with deep inside-outside recursive autoencoders
CN115952292A (en) Multi-label classification method, device and computer readable medium
Samih et al. Enhanced sentiment analysis based on improved word embeddings and XGboost.
Phan et al. Exploring zero-shot cross-lingual aspect-based sentiment analysis using pre-trained multilingual language models
Mahto et al. Emotion prediction for textual data using GloVe based HeBi-CuDNNLSTM model
Li et al. Learning sentiment-enhanced word representations by fusing external hybrid sentiment knowledge
CN112182183A (en) Patent harmful effect knowledge mining method, device, equipment and storage medium
CN114417814B (en) Word distributed expression learning system based on emotion knowledge enhancement
Yang et al. Zero-training sentence embedding via orthogonal basis
CN113869049B (en) Fact extraction method and device with legal attribute based on legal consultation problem
Ahmad et al. Aspect Based Sentiment Analysis and Opinion Mining on Twitter Data Set Using Linguistic Rules
Zhao et al. Multi-modal sarcasm generation: dataset and solution
Nayab et al. Aspect-context level information extraction via transformer based interactive attention mechanism for sentiment classification
Kothuri et al. MALO-LSTM: Multimodal Sentiment Analysis Using Modified Ant Lion Optimization with Long Short Term Memory Network
Oshadi et al. AppGuider: Feature Comparison System using Neural Network with FastText and Aspect-based Sentiment Analysis on Play Store User Reviews
Wang Cross-lingual Transfer Learning for Low-Resource Natural Language Processing Tasks
Jin et al. Representation and Extraction of Diesel Engine Maintenance Knowledge Graph with Bidirectional Relations Based on BERT and the Bi-LSTM-CRF Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20220429

Assignee: Guilin Zhongchen Information Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000215

Denomination of invention: A Word Distributed Representation Learning System Based on Emotional Knowledge Enhancement

Granted publication date: 20221115

License type: Common License

Record date: 20221206