CN115409028A - Knowledge and data driven multi-granularity Chinese text sentiment analysis method - Google Patents

Knowledge and data driven multi-granularity Chinese text sentiment analysis method Download PDF

Info

Publication number
CN115409028A
CN115409028A CN202210830349.2A CN202210830349A CN115409028A CN 115409028 A CN115409028 A CN 115409028A CN 202210830349 A CN202210830349 A CN 202210830349A CN 115409028 A CN115409028 A CN 115409028A
Authority
CN
China
Prior art keywords
word
vector
radical
knowledge
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210830349.2A
Other languages
Chinese (zh)
Inventor
刘忠宝
张兴芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Foreign Language Vocational And Technical University
Original Assignee
Shandong Foreign Language Vocational And Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Foreign Language Vocational And Technical University filed Critical Shandong Foreign Language Vocational And Technical University
Priority to CN202210830349.2A priority Critical patent/CN115409028A/en
Publication of CN115409028A publication Critical patent/CN115409028A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a knowledge and data driven multi-granularity Chinese text sentiment analysis method which comprises the steps of extracting features through models such as a bidirectional gating circulation unit and an attention mechanism on the basis of vectorization representation of characters, words, radicals and parts of speech to obtain feature vectors, representing a sentiment knowledge graph spectrum as sentiment knowledge vectors through a TransE model, carrying out feature fusion on the feature vectors and the sentiment knowledge vectors through a multi-head attention mechanism to obtain the feature vectors with enhanced knowledge, and finally carrying out sentiment tendency identification on the feature vectors through a full connection layer and a classification function. Results of a comparison experiment and an ablation experiment show that the F1 value of the method is obviously improved compared with other models.

Description

Knowledge and data driven multi-granularity Chinese text sentiment analysis method
Technical Field
The invention relates to the technical field of Chinese text sentiment analysis, in particular to a knowledge and data driven multi-granularity Chinese text sentiment analysis method.
Background
In recent years, chinese sentiment analysis has received great attention from researchers and has advanced greatly. At present, the research results of emotion analysis of Chinese texts can be summarized into three types: the method comprises the steps of knowledge-driven emotion analysis research, data-driven emotion analysis research and knowledge and data collaborative drive emotion analysis research.
The knowledge-driven emotion analysis research mainly comprises the steps of constructing a knowledge graph, then obtaining effective explicit characteristics from the knowledge graph, and formulating corresponding knowledge extraction rules to carry out emotion recognition. At present, few knowledge-driven emotion analysis researches are carried out, and the only researches are that the intelligence macro and the like construct a film and television and comment knowledge map with specific attributes in order to fully utilize film and comment data to improve the film and television recommendation effect, so that a semi-automatic film and comment knowledge extraction method is provided. The method comprises the steps of carrying out syntactic analysis on preprocessed film evaluation data, constructing an emotion dictionary, carrying out data annotation and other operations to formulate a knowledge extraction rule, then carrying out knowledge extraction by combining the dictionary and abstract quantitative clustering to obtain film evaluation structured knowledge, and finally fusing the film evaluation structured knowledge with film body knowledge to form a film and comment knowledge map so as to be applied to emotion analysis. Although knowledge-driven emotion analysis research is easy to understand, the research often ignores the context of texts, cannot capture deep semantic information, and has poor generalization capability.
The data-driven emotion analysis research is mainly based on the traditional method and the deep learning method. The traditional method utilizes an emotion dictionary or a machine learning model to carry out emotion analysis, although the method can achieve certain effect, a large amount of manpower, material resources and resources are consumed to artificially design complex semantic and grammatical features, and the applicability and generalization performance of the method are greatly limited. In view of the fact that the deep learning model can automatically complete feature extraction and semantic representation, and overcome the defects of the traditional method, researchers begin to introduce the deep learning model into text emotion analysis. The deep learning-based method mainly extracts deep semantic features of characters, words and other features through a deep learning model so as to identify the text emotional tendency. Although the data-driven emotion analysis method can well capture the syntactic and semantic information between the context and the target, the method has the defects in the aspect of effectively integrating external knowledge to help understand the text.
Knowledge and data collaborative driving emotion analysis research mainly expresses a knowledge map as a knowledge vector, and text emotion analysis is carried out by fusing the knowledge vector with a feature vector obtained by a deep learning model. The learners notice the respective advantages and disadvantages of knowledge-driven emotion analysis and data-driven emotion analysis, so that the knowledge-driven emotion analysis and the data-driven emotion analysis are combined, the priori knowledge in the knowledge map is utilized to provide a monitoring signal for the deep learning model, and the semantic analysis capability of the deep learning model is improved, so that text representation with richer semantic information is obtained and is used for emotion recognition. At present, the research is mostly applied to emotion analysis of English texts. Experiments on the SemEval 2014Task4 data set and the Twitter data set show that the method has better classification effect.
It can be seen from the combing of related research, scholars have obtained a series of research results in the text sentiment analysis research, and as the research goes deep, some important challenges are faced: firstly, in data-driven emotion analysis research, most of Chinese text emotion analysis methods used by the data-driven emotion analysis research are based on English text emotion analysis methods, and the essential difference between Chinese (pictographs) and English (Latin characters) is ignored; secondly, some researches recognize the importance of characters, words and part-of-speech characteristics to emotion analysis, and fuse the characters, words and part-of-speech characteristics to assist emotion analysis, but whether more characteristics can be introduced to realize emotion recognition with higher performance besides the characters, words and part-of-speech characteristics is not deeply researched; finally, emotion analysis research driven by knowledge and data cooperation is mostly researched based on English texts and English knowledge maps, but emotion analysis research based on Chinese knowledge maps and Chinese texts is not discussed yet.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a knowledge and data driven multi-granularity Chinese text sentiment analysis method, which comprises the steps of performing characteristic extraction through models such as a bidirectional gating circulation unit and an attention mechanism on the basis of vectorization representation of characters, words, radicals and parts of speech to obtain a characteristic vector, representing a sentiment knowledge graph spectrum into a sentiment knowledge vector through a TransE model, performing characteristic fusion on the characteristic vector and the sentiment knowledge vector through a multi-head attention mechanism to obtain a knowledge enhanced characteristic vector, and finally performing sentiment tendency identification on the characteristic vector through a full connection layer and a classification function.
The technical scheme adopted by the invention for solving the technical problem is as follows: a knowledge and data driven multi-granularity Chinese text sentiment analysis method is provided, and comprises the following steps:
s1, preprocessing a Chinese text to form 5 types of input data of a word-level text, a word-level radical text, a word-level radical text and a part-of-speech text;
s2, converting input data into a set consisting of 5 types of vectors, namely a word vector, a word-level radical vector, a word-level radical vector and a part-of-speech vector, by using a word embedding method;
s3, performing feature fusion on the word vector and the word-level radical vector to obtain a word-radical feature vector, and performing feature fusion on the word vector and the part-of-speech vector to obtain a word-part-of-speech feature vector by utilizing a BiGRU model and a dot product attention mechanism;
s4, constructing an emotion knowledge map by using the emotion vocabulary ontology library, and performing distributed vector representation on triples in the emotion knowledge map to obtain emotion knowledge vectors; respectively fusing the character-radical feature vector, the word-radical feature vector and the word-part of speech feature vector with the emotion knowledge vector through a multi-head attention mechanism to obtain a knowledge enhanced character-radical feature output vector, a knowledge enhanced word-radical feature vector and a knowledge enhanced word-part of speech feature vector;
and S5, outputting the enhanced character-radical feature output vector, the enhanced word-radical feature vector and the enhanced word-part-of-speech feature vector to generate an emotion recognition result.
The step S1 specifically includes the following processes:
s1.1, for an input text T consisting of m words, it is a word-level text T c ={c 1 ,c 2 ,...,c m -wherein each element represents each word in T; segmentation of an input text T into n words, i.e. word text T, using a word segmentation tool w ={w 1 ,w 2 ,...,w n -wherein each element represents each word in T;
s1.2, according to radical mapping relation of Xinhua dictionary, character level text T is mapped c Harmony word text T w Processing to obtain word-level radical text T rc ={rc 1 ,rc 2 ,...,rc m And word-level radical text T rw ={rw 1 ,rw 2 ,...,rw n Character level radical text T rc Each element in the text of the word-level radicals represents a word-level radical; word text T by utilizing jieba part-of-speech tagging tool w Conversion to part-of-speech text T pos ={pos 1 ,pos 2 ,...,pos n Each element represents a part of speech corresponding to a word, so that input data { T } is obtained c ,T rc ,T w ,T rw ,T pos }。
In step S2, the input data is converted by a word embedding method to obtain a vector set { E } c ,E rc ,E w ,E rw ,E pos }, wherein:
Figure BDA0003747991440000031
representing a word vector set, wherein each element represents a word vector;
Figure BDA0003747991440000032
representing a word-level radical vector set, wherein each element represents a word-level radical vector;
Figure BDA0003747991440000033
representing a word vector set, wherein each element represents a word vector;
Figure BDA0003747991440000034
representing a set of word-level radical vectors, wherein each element represents a word-level radical vector;
Figure BDA0003747991440000035
representing a set of word-property vectors, where each element represents a word-property vector.
In step S3, a BiGRU model and a dot product attention mechanism are used to perform feature fusion on the word vector and the word-level radical vector to obtain a word-radical feature vector, which specifically includes the following steps:
s3.1, biGRU c Setting the initial states of the models to be 0, and collecting the word vector set E c Input BiGRU c Model, obtaining a word feature vector set by the following formula
Figure BDA0003747991440000036
Wherein each element represents a word feature vector, and the calculation formula is:
Figure BDA0003747991440000037
s3.2, respectively converting y through a dot product attention mechanism c And word level part initial vector set E rc Performing feature fusion to obtain a fused vector
Figure BDA0003747991440000038
The calculation formula is as follows:
Figure BDA0003747991440000039
Figure BDA00037479914400000310
wherein alpha is i Representing word level part first vector set E rc The ith element of
Figure BDA00037479914400000311
And character feature vector set y c The ith element y i c A weight matrix after dot product operation represents dot product operation, T is matrix transposition operation, and softmax (·) represents a softmax normalization function;
s3.3, mixing
Figure BDA0003747991440000041
As a BiGRU rc Input vector of model, biGRU c The hidden layer state of the last moment of the model is transmitted to the BiGRU rc The model is used as an initial state, and then a character-radical feature vector set is obtained
Figure BDA0003747991440000042
Wherein each element represents a word-radical feature vector, and the calculation formula is as follows:
Figure BDA0003747991440000043
the word-radical feature output vector after knowledge enhancement in the step S4 is obtained by the following processes:
s4.1, constructing an emotional knowledge map by taking the emotional words in the emotional word ontology library as head entities h, the emotional categories as tail entities t and the emotional intensities of the emotional words as relations r;
s4.2, carrying out distributed vector representation on the triples in the emotion knowledge map through a TransE model to obtain an emotion knowledge vector K r_c
S4.3, integrating character-radical characteristic vectors V by using a multi-head attention mechanism r_c As Query vector, emotion knowledge vector K r_c Performing feature fusion as corresponding Key vector and Value vector to obtain feature output vector after knowledge enhancement
Figure BDA0003747991440000044
The calculation formula is:
K r_c =TransE(h,r,t)
Figure BDA0003747991440000045
wherein TransE (. Cndot.) represents the TransE model and Multihead (. Cndot.) is the multi-headed attention mechanism.
Step S5 specifically includes the following processes:
s5.1, outputting vectors to the enhanced character-radical characteristics
Figure BDA0003747991440000046
Enhanced word-radical feature vector
Figure BDA0003747991440000047
And enhanced word-part-of-speech feature vectors
Figure BDA0003747991440000048
Performing maximum pooling operation, and performing feature fusion by vector splicing to obtain a fused feature vector V y
S5.2, fusing the feature vector V y Inputting a fully-connected neural network, and performing normalization processing by using a softmax function to obtain a probability output P;
and S5.3, selecting the value with the maximum probability as the emotion recognition result y.
The invention has the beneficial effects based on the technical scheme that:
aiming at the practical requirements of the particularity and emotion analysis of the Chinese text, the invention utilizes the part of speech of the Chinese character and the word to assist the semantic understanding of the Chinese text, develops research around the problem of Chinese emotion analysis, introduces a research paradigm of 'knowledge + data' in the field of Chinese emotion analysis, fully excavates the potential semantic and emotion information of the characters, words, parts of speech, part of speech and other features by fusing an emotion knowledge vector and a feature vector of a depth model, improves the capability of Chinese emotion analysis, further enriches the theoretical system and the method system of Chinese emotion analysis, and simultaneously verifies the performance improvement brought by the method on Chinese emotion analysis through experiments.
Drawings
FIG. 1 is a schematic diagram of an example text conversion process.
FIG. 2 is a schematic diagram of the Skip-Gram model structure.
Fig. 3 is a schematic diagram of word-radical feature vector generation.
FIG. 4 is a schematic diagram of the TransE model.
Fig. 5 is a schematic diagram of the generation process of the output vector.
Fig. 6 is a schematic diagram of the output process.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the drawings.
The invention provides a knowledge and data driven multi-granularity Chinese text sentiment analysis method, which comprises the following steps of:
s1, preprocessing the Chinese text to form 5 types of input data of a word-level text, a word-level radical text, a word-level radical text and a part-of-speech text. The method specifically comprises the following steps:
s1.1, for an input text T consisting of m words, it is a word-level text T c ={c 1 ,c 2 ,...,c m In which c is i (i =1,2,. Said., m) represents each word in T; segmenting the input text T into n words by utilizing a jieba word segmentation tool, namely, the word text T w ={w 1 ,w 2 ,...,w n In which w i (i =1, 2.. N) represents each word in T;
s1.2, according to radical mapping relation of Xinhua dictionary, character level text T is mapped c Word-sum text T w Processing to obtain word-level radical text T rc ={rc 1 ,rc 2 ,...,rc m And word-level radical text T rw ={rw 1 ,rw 2 ,...,rw n } in which rc i (i =1, 2.. Said., m) represents a word-level radical, rw i (i =1, 2.. Ang., n) represents a word-level radical; word text T by utilizing jieba part-of-speech tagging tool w Conversion to part-of-speech text T pos ={pos 1 ,pos 2 ,...,pos n At pos, where i (i =1, 2.. Times.n) represents the part of speech corresponding to the word, so far the input data { T } is obtained c ,T rc ,T w ,T rw ,T pos }。
From the above analysis, | T c |=|T rc |,|T w |=|T rw |=|T pos | and | represent the text scale. FIG. 1 shows a schematic representation of a "better predictionThe conversion process of the input text is given by taking a plurality of texts as examples.
And S2, converting the input data into a set consisting of 5 types of vectors such as a word vector, a word-level radical vector, a word-level radical vector and a part-of-speech vector by using a word embedding method. Specifically, the Word2vec Word embedding method is adopted to convert the input data to obtain a vector set { E } c ,E rc ,E w ,E rw ,E pos }, in which:
Figure BDA0003747991440000051
representing a word vector set, wherein each element represents a word vector;
Figure BDA0003747991440000052
representing a word-level radical vector set, wherein each element represents a word-level radical vector;
Figure BDA0003747991440000053
representing a word vector set, wherein each element represents a word vector;
Figure BDA0003747991440000054
representing a set of word-level radical vectors, wherein each element represents a word-level radical vector;
Figure BDA0003747991440000061
representing a set of word-property vectors, where each element represents a word-property vector.
The Word2Vec method has a Continuous Bag of Words model (Continuous Bag of Words, CBOW) and a Skip-Gram model (Skip-Gram) when training vectors. Since the quality of the vector obtained by training the Skip-Gram model in large-scale corpus is better, the model is used for vectorizing the input data. Using the word vector as an example, the Skip-Gram model passes through the central word w c To predict the context background word w o The probability of (c). The model structure is shown in FIG. 2, wherein w t Denotes w c ,w t-2 、w t-1 、w t+1 、w t+2 Denotes w o SUM is a summation operation.
The model represents each word as a word vector of a central word and a word vector of a background word, so as to calculate the conditional probability between the central word and the background word to be predicted, as shown in the following formula:
Figure BDA0003747991440000062
wherein p represents a conditional probability, w c Denotes a core word, w o Representing a background word, v c Word vectors, v, representing the central words o A word vector representing a background word, N representing the size of the lexicon, c, o, i representing the index of the word in the lexicon, exp (·) representing an exponential function based on a natural constant e.
And S3, performing feature fusion on the word vectors and word-level radical vectors to obtain word-radical feature vectors, performing feature fusion on the word vectors and the word-level radical vectors to obtain word-radical feature vectors, and performing feature fusion on the word vectors and the part-of-speech vectors to obtain word-part-of-speech feature vectors by using a BiGRU model and a dot product attention mechanism.
In view of the fact that Chinese texts have obvious sequence characteristics, a BiGRU model is adopted as a basic model. The model realizes the effective utilization of text context semantic features by splicing feature vectors of GRU models with forward direction and reverse direction. The working principle of the GRU model is shown as follows:
r t =sigmoid(x t ×W xr +h t-1 ×W hr +b r )
z t =sigmoid(x t ×W xz +h t-1 ×W hz +b z )
Figure BDA0003747991440000063
Figure BDA0003747991440000064
wherein x is t Is the input vector at time t, r t And z t Representing the reset gate and the update gate, respectively, at time t, W and b are the corresponding weight matrix and offset vector,
Figure BDA0003747991440000065
representing candidate memory cells, sigmoid (-) and tanh (-) represent activation functions, h t An output vector of the current time,. Fwdaruma product,. Times.represents matrix multiplication.
The principle of BiGRU is shown by the following formula:
Figure BDA0003747991440000071
Figure BDA0003747991440000072
Figure BDA0003747991440000073
wherein x is t For the input vector at time t,
Figure BDA0003747991440000074
feature vectors, y, derived from forward and reverse GRU models, respectively t And obtaining the feature vector for the BiGRU model at the current moment.
Inspiration of Attention mechanism (Attention) comes from human Attention, namely, input data can be divided into different degrees of importance by assigning different weights to the input data. From the implementation method, the attention mechanism is mainly implemented by a linear weighting mode and a dot product mode, the effects of the two modes are not essentially different, and the dot product mode is faster in calculation speed, so that the feature fusion is performed by the dot product attention mechanism. The calculation process is shown as the following formula:
Attention(Q,K,V)=softmax(QK T )V
q and K are a Query matrix and a Key matrix, V is a Value matrix, and softmax ((-)) represents a normalization function.
Based on the analysis, the feature extraction layer performs feature fusion on the word vectors and word-level radical vectors by using a BiGRU model and a dot product attention mechanism to obtain word-radical feature vectors, performs feature fusion on the word vectors and word-level radical vectors to obtain word-radical feature vectors, and performs feature fusion on the word vectors and part-of-speech vectors to obtain word-part-of-speech feature vectors.
Referring to fig. 3, taking generation of a word-radical feature vector as an example, the method specifically includes the following processes:
s3.1, biGRU c Setting the initial states of the models to be 0, and collecting the word vector set E c Input BiGRU c Model, obtaining a set of word feature vectors by the following formula
Figure BDA0003747991440000075
Wherein
Figure BDA0003747991440000076
Representing the character feature vector, and calculating as follows:
Figure BDA0003747991440000077
s3.2, respectively converting y through a dot product attention mechanism c And word level part head vector set E rc Performing feature fusion to obtain a fused vector
Figure BDA0003747991440000078
The calculation formula is as follows:
Figure BDA0003747991440000079
Figure BDA00037479914400000710
wherein alpha is i Representing a set E of word-level part radical vectors rc The ith element in
Figure BDA00037479914400000711
And character feature vector set y c The ith element y i c The weight matrix after the dot product operation represents the dot product operation, T is the matrix transposition operation, and softmax (·) represents a softmax normalization function;
s3.3, mixing
Figure BDA0003747991440000081
As a BiGRU rc Input vector of model, and BiGRU c The hidden layer state of the last moment of the model is transmitted to the BiGRU rc The model is used as an initial state, and then a character-radical feature vector set is obtained:
Figure BDA0003747991440000082
wherein
Figure BDA0003747991440000083
Representing the character-radical feature vector, and the calculation formula is as follows:
Figure BDA0003747991440000084
similarly, the generation process of the word-radical feature vector and the word-part-of-speech feature vector is similar to that of the word-radical feature vector, and finally, a word-radical feature vector set is obtained:
Figure BDA0003747991440000085
and word-part-of-speech feature vector sets:
Figure BDA0003747991440000086
wherein
Figure BDA0003747991440000087
The word-radical feature vector is represented,
Figure BDA0003747991440000088
representing word-part-of-speech feature vectors.
S4, constructing an emotion knowledge map by using the emotion vocabulary ontology library, and performing distributed vector representation on triples in the emotion knowledge map to obtain emotion knowledge vectors; and respectively fusing the character-radical feature vector, the word-radical feature vector and the word-part of speech feature vector with the emotion knowledge vector through a multi-head attention mechanism to obtain a knowledge-enhanced character-radical feature output vector, a knowledge-enhanced word-radical feature vector and a knowledge-enhanced word-part of speech feature vector.
The TransE model belongs to a knowledge graph embedding model, and the model carries out distributed vector representation on entities and relations in the knowledge graph so as to obtain entity semantic vectors. The schematic diagram of the TransE model is shown in FIG. 4.
Let the head entity be h, the tail entity be t, and the relationship be r, then for a given triplet (h, r, t), the TransE model represents it as (h, r, t) in FIG. 5, where h is the vector representation of the head entity, r is the vector representation of the relationship, and t is the vector representation of the tail entity. In the training process of the TransE model, a triplet when h + r is approximately equal to t is a positive sample triplet, a triplet when h + r is not equal to t is a negative sample triplet, the distance between the positive sample triplets is reduced by constructing a target loss function L, the distance between the negative sample triplets is increased, and an L calculation formula is shown as the following formula:
d(h,r,t)=||h+r-t|| L1/L2
Figure BDA0003747991440000089
where d is a function measuring the distance between two vectors h + r and t, | | | | |, which represents the Euclidean distance, calculated using L1 or L2 norm, S + Representing positive samplesTriple set, S - Representing a negative sample triplet set, gamma refers to the interval in the loss function, which parameter is greater than 0.
The Multi-Head Attention (Multi-Head) mechanism enhances the Attention capability of the model by transversely splicing a plurality of Attention mechanisms, and can further represent semantic information of different positions and different aspects. The principle is as follows:
head i =Attention(QW i Q ,KW i K ,VW i V )
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,...,head h )W o
wherein, head represents attention head, h represents number of heads, Q, K and V are Query vector, key vector and Value vector,
Figure BDA0003747991440000091
weight matrices of Query, key and Value, W, respectively, of ith head o Being a weight matrix, concat (-) represents the splicing function,
Figure BDA0003747991440000092
and Concat (. Cndot.) W o The Linear layer operation in fig. 1 is shown.
Through the above analysis, taking the enhanced word-radical feature output vector generation process as an example, referring to fig. 5, the specific process is as follows:
s4.1, using about 2.7 ten thousand emotion words in an emotion vocabulary ontology library of the university of the great chain of studios as head entities h, using 21 emotion categories as tail entities t, and using the emotion intensity of the emotion words as relation r to further construct an emotion knowledge map;
s4.2, carrying out distributed vector representation on the triples in the emotion knowledge map through a TransE model to obtain an emotion knowledge vector K r_c 、K r_w And K pos_w (the three are the same vector);
s4.3, integrating character-radical characteristic vectors V by using a multi-head attention mechanism r_c As Query vector, emotion knowledge vector K r_c As a correspondingPerforming feature fusion on the Key vector and the Value vector to obtain a feature output vector after knowledge enhancement
Figure BDA0003747991440000093
The calculation formula is:
K r_c =TransE(h,r,t)
Figure BDA0003747991440000094
wherein TransE (. Cndot.) represents the TransE model and Multihead (. Cndot.) is the multi-headed attention mechanism.
The enhanced word-radical feature vector can be obtained by the same method
Figure BDA0003747991440000095
And enhanced word-part-of-speech feature vectors
Figure BDA0003747991440000096
And S5, outputting the enhanced character-radical feature output vector, the enhanced word-radical feature vector and the enhanced word-part-of-speech feature vector to generate an emotion recognition result. Referring to fig. 6, the specific process is as follows:
s5.1, outputting vectors to the enhanced character-radical characteristics
Figure BDA0003747991440000097
Enhanced word-radical feature vector
Figure BDA0003747991440000098
And enhanced word-part-of-speech feature vectors
Figure BDA0003747991440000099
Performing maximum pooling operation, and performing feature fusion by vector splicing to obtain a fused feature vector V y
S5.2, fusing the feature vector V y Inputting the neural network in full connection, and performing normalization processing by utilizing softmax function to obtainTo a probability output P;
and S5.3, selecting the value with the maximum probability as the emotion recognition result y. The calculation formula is as follows:
Figure BDA00037479914400000910
P=softmax(WV y +b)
y=argmax(P)
where Concat (. Cndot.) represents the splicing function, max (. Cndot.) is the maximum pooling operation, W represents the weight matrix, b represents the bias, softmax (. Cndot.) represents the normalization function, and argmax (. Cndot.) represents the probability maximization function.
To verify the effectiveness of the method of the invention, experimental analysis was performed.
The experimental environment and configuration are shown in table 1:
Figure BDA0003747991440000101
TABLE 1 Experimental Environment and configuration
The data set used in the experiment is NLPECC data set and homemade broad bean movie data set. The NLPECC dataset totals 44875 samples with emotion labels of like, sad, disgust, angry, happy, and six others. In the experiment, an NLPECC data set is divided into a training set, a verification set and a test set, and the proportion of the NLPECC data set is 6:2:2. the homemade broad bean movie data set is 5 movie comment data crawled from a broad bean movie, namely Da Sheng Gui, charlotte vexation, american captain 3, hourly space 3 and July and Ansheng, and the data size is about 30 ten thousand. And carrying out pretreatment such as data cleaning, full-angle character conversion into half-angle character conversion, punctuation mark normalization, english letter capitalization to lowercase, traditional and simple conversion, duplicate removal and the like on the data to obtain about 25 ten thousand pieces of comment data. Table 2 gives the statistics of the pre-processed data.
Figure BDA0003747991440000102
TABLE 2 Pre-processed dataset information
The 1-star comment and the 2-star comment are classified as negative comments, the 3-star comment is classified as neutral comment, the 4-star comment and the 5-star comment are classified as positive comments, and the neutral comment is abandoned because the sentiment label scoring error rate of the neutral comment is too high to meet the experimental requirements. Therefore, about 6.7 thousands of negative reviews, about 14 thousands of positive reviews, and the number of positive reviews and the number of negative reviews are unbalanced, in order to solve the problem of data imbalance, 5 thousands of reviews are randomly extracted from the positive reviews and the negative reviews to serve as an experimental data set, and the experimental data set with the total size of 10 thousands of movie reviews is constructed. In the experiment, the experimental data set is equally divided into 10, and the proportion of the training set, the verification set and the test set is 6:2:2. namely 6 thousands of training sets (3 thousands of positive and negative comments), 2 thousands of verification sets (1 ten thousands of positive and negative comments), and 2 thousands of test sets (1 ten thousands of positive and negative comments).
The hyper-parameter settings for the model proposed by the experiment are shown in table 3:
Figure BDA0003747991440000111
TABLE 3 Superparameter settings
The test measures emotion recognition performance by using Precision (Precision, P), recall (Recall, R) and harmonic mean (F1-score, F1). The calculation formula is as follows:
Figure BDA0003747991440000112
Figure BDA0003747991440000113
Figure BDA0003747991440000114
wherein, true case (TP) represents correctly classified Positive case sample, false Positive case (FP) represents incorrectly classified Positive case sample, and False Negative case (FN) represents incorrectly classified Negative case sample. And P represents the proportion of positive samples predicted to be positive by the model, and R represents the proportion of positive samples predicted to be positive by the model in the positive samples predicted to be correct.
The knowledge and data driven multi-granularity Chinese text sentiment analysis method (KEAMM) provided by the invention is compared with the following existing Chinese text sentiment analysis model:
(1) BiGRU. The model uses a bidirectional gating circulation unit to extract the characteristics of word vectors, and text emotion analysis is achieved.
(2) DCCNN. The model adopts different channels to carry out convolution operation, wherein one channel is a word vector, the other channel is a word vector, and emotion analysis is carried out by fusing the characteristics of the two channels.
(3) FG _ MCCNN. The model comprehensively utilizes word vectors, word vectors and the part of speech fused by the word vectors and the part of speech to carry out emotion analysis on the vectors through multiple channels of the CNN model.
(4) BERT-BilSTM. The model constructs a word vector by using a BERT model, and then performs feature extraction by using BilSTM to realize emotion recognition.
(5) BERT-MCNN. The model is an emotion analysis model fusing a BERT model and a multilayer collaborative convolutional neural network.
The present invention takes characters, words, radicals, parts of speech and other characteristics as input. In order to verify the effectiveness of each module in the method of the invention, the following ablation experiments were performed after adjusting the model structure:
(1) Two BiGRUs. And respectively modeling the word text and the word text by using two BiGRU models, splicing output vectors of the word text and the word text, and identifying the emotion.
(2) Four BiGRUs. And respectively modeling the word text, the word level radical text, the word text and the word level radical text by using four BiGRUs, splicing output vectors of the four channels through the BiGRUs, and performing emotion recognition.
(3) Five BiGRUs. The model abandons the mode of performing feature fusion on characters, words, radicals and parts of speech in a KEAMM model through an attention mechanism, and respectively performs feature extraction on character texts, word-level radical texts, word-level radical texts and part of speech texts through five BiGRUs, splices output feature vectors passing through the BiGRUs on five channels, and performs emotion recognition.
(4) An Attention-based Multi-granularity Model (AMM). And discarding the part introduced by the knowledge in the KEAMM model, namely removing the feature fusion layer, splicing the feature vectors output by the feature extraction layer, and performing emotion recognition.
(5) KEAMM. The model is presented herein.
Table 4 gives the experimental results of the above respective models on the experimental data set.
Figure BDA0003747991440000121
Figure BDA0003747991440000131
TABLE 4 results of the experiment
By comprehensively analyzing the model experiment results in the comparison experiment and the ablation experiment in table 4 for the broad bean movie data set and the NLPECC data set, the following conclusions can be drawn.
The BiGRU model is a model only using word features, the F1 values of the model are 85.63% and 77.70%, the Two-turn BiGRUs model is a dual-channel model simultaneously using the word features and the word features, the F1 values of the model are 86.58% and 79.35%, and the F1 values of the Two-turn BiGRUs model are respectively improved by 0.95% and 1.65% compared with the BiGRU model, so that the characteristics of the words and the words are simultaneously used, the advantages of the words can be combined, and richer semantic feature power-assisted emotion analysis is extracted.
By comparing the DCCNN model with the FG _ MCCNN model, the DCCNN model is a dual-channel CNN model which simultaneously utilizes word characteristics and word characteristics, the F1 values of the model are 86.53% and 79.32% respectively, the FG _ MCCNN model is a three-channel CNN model which comprehensively utilizes the word characteristics, the word characteristics and the part-of-speech characteristics, the F1 values of the model are 86.92% and 80.47% respectively, and the F1 value of the model is improved by 0.39% and 1.15% respectively compared with the former model, which shows that the part-of-speech characteristics can bring increment for the improvement of emotion recognition performance. This conclusion is also confirmed by the improvement in F1 values for the Five BiGRUs model in the ablation experiments by 0.34% and 0.43%, respectively, compared to the Four BiGRUs model.
The F1 values of the DCCNN model are respectively 86.53% and 79.32%, the F1 values of the Two-turn BiGRUs model are respectively 86.58% and 79.35%, the F1 values of the Two-turn BiGRUs model are respectively improved by 0.05% and 0.03% compared with the former model, and analysis shows that the Two models are dual-channel models simultaneously utilizing character features and word features, and the difference is that the former model uses CNN as a basic model, and the latter model uses BiGRU as a basic model, so that the reason that the F1 value of the Two-turn BiGRUs model is slightly higher than that of the DCCNN model is that the convolution characteristics of the CNN model can cause text semantic feature loss, and the BiGRU model can learn the long-term dependence of context, further capture rich text semantic features, and further lead to better results.
From the above analysis, the performance of the CNN model in this experiment is weaker than that of the BiGRU model, and the F1 values of Four BiGRUs in table 4 are 87.07% and 80.80%, respectively, which are improved by 0.49% and 1.45% compared with Two BiGRUs, respectively, and improved by 0.53% and 1.48% compared with the DCCNN model, respectively; the F1 values of the Five BiGRUs model were 87.41% and 81.23%, respectively, which were improved by 0.49% and 0.76%, respectively, compared to the FG _ MCCNN model. The method shows that the characteristics of the radicals play a certain role in Chinese emotion analysis, and on the other hand, the method proves that the performance of emotion recognition can be improved by fusing a plurality of effective characteristics.
The F1 values of the AMM model are respectively 88.24% and 82.20%, compared with Five BiGRU, the F1 values are respectively increased by 0.83% and 0.97%, the comparison of the two models can find that the Five BiGRU model only performs feature fusion on features with different granularities through simple splicing, no information interaction is performed among the features of characters, words, radicals, parts of speech and the like in the feature extraction process, the AMM model extracts the characters and the word features through the BiGRU model, and performs information interaction and fusion on the characters and the word features respectively by using a point-product attention mechanism, so that the fused feature vector can sense emotion tendencies from the deep-level semantic features of the characters, the words, the radicals and the parts of speech, and further improves emotion recognition performance.
The F1 values of the BERT-BilSTM are respectively 88.25% and 82.99%, the F1 values of the BERT-MCNN model are respectively 88.68% and 83.14%, the two models are higher than various reference models and ablation models represented by AMM models, and the recognition performance of the model is only lower than that of the KEAMM model provided by the text, mainly because the BERT model can dynamically represent text vectors and can be finely adjusted for semantic representation capability according to emotion analysis tasks, the model learning field knowledge is assisted, and richer semantic features are generated, so that the performance is improved.
Finally, the F1 values of the present invention reached 89.23% and 84.84%, which exceeded all comparative models, demonstrating the effectiveness and superiority of the present invention. Compared with an AMM model in an ablation experiment, the F1 value of the model is respectively improved by 0.99% and 2.64%, compared with a BERT-BilSTM model in a comparison experiment, the F1 value of the model is respectively improved by 0.98% and 1.85%, and compared with a BERT-MCNN model, the F1 value of the model is respectively improved by 0.55% and 1.7%. The AMM, BERT-BilsTM and BERT-MCNN models are analyzed to be free of introduction of emotion knowledge, and the KEAMM model introduces emotion knowledge vectors into the AMM model through a multi-head attention mechanism, so that the performance of the KEAMM model exceeds that of the BERT-MCNN models and other models, and the fact that in Chinese emotion analysis, explicit emotion knowledge is introduced can guide a depth model to conduct more accurate emotion analysis and further performance is improved is shown.
The knowledge and data driven multi-granularity Chinese text sentiment analysis method provided by the invention introduces a research paradigm of 'knowledge + data' into the field of Chinese sentiment analysis based on the particularity of Chinese text and the actual requirements of sentiment analysis, carries out deep fusion on a sentiment knowledge vector, a BiGRU model and a feature vector obtained by an attention mechanism, and provides a Chinese text sentiment analysis method integrating multi-granularity features such as characters, words, radicals, parts of speech and the like. Results of a comparison experiment and an ablation experiment show that the F1 value of the method is obviously improved compared with other models.

Claims (6)

1. A knowledge and data driven multi-granularity Chinese text sentiment analysis method is characterized by comprising the following steps:
s1, preprocessing a Chinese text to form 5 types of input data of a word-level text, a word-level radical text, a word-level radical text and a part-of-speech text;
s2, converting input data into a set consisting of 5 types of vectors, namely a word vector, a word-level radical vector, a word-level radical vector and a part-of-speech vector, by using a word embedding method;
s3, performing feature fusion on the word vectors and word-level radical vectors to obtain word-radical feature vectors, performing feature fusion on the word vectors and the word-level radical vectors to obtain word-radical feature vectors, and performing feature fusion on the word vectors and the part-of-speech vectors to obtain word-part-of-speech feature vectors by using a BiGRU model and a dot product attention mechanism;
s4, constructing an emotion knowledge map by using the emotion vocabulary ontology library, and performing distributed vector representation on triples in the emotion knowledge map to obtain emotion knowledge vectors; respectively fusing the character-radical feature vector, the word-radical feature vector and the word-part of speech feature vector with the emotion knowledge vector through a multi-head attention mechanism to obtain a knowledge enhanced character-radical feature output vector, a knowledge enhanced word-radical feature vector and a knowledge enhanced word-part of speech feature vector;
and S5, outputting the enhanced character-radical feature output vector, the enhanced word-radical feature vector and the enhanced word-part-of-speech feature vector to generate an emotion recognition result.
2. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 1, wherein: the step S1 specifically includes the following processes:
s1.1, for an input text T consisting of m words, it is a word-level text T c ={c 1 ,c 2 ,...,c m -wherein each element represents each word in T; segmenting the input text T into n words using a word segmentation tool, i.e.Word text T w ={w 1 ,w 2 ,...,w n -wherein each element represents each word in T;
s1.2, according to radical mapping relation of Xinhua dictionary, character level text T is mapped c Harmony word text T w Processing to obtain word-level radical text T rc ={rc 1 ,rc 2 ,...,rc m And word-level radical text T rw ={rw 1 ,rw 2 ,...,rw n }, word-level radical text T rc Each element in the text of the word-level radicals represents the word-level radicals; word text T by utilizing jieba part-of-speech tagging tool w Conversion to part-of-speech text T pos ={pos 1 ,pos 2 ,...,pos n Each element represents a part of speech corresponding to a word, so that input data { T } is obtained c ,T rc ,T w ,T rw ,T pos }。
3. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 2, wherein: in step S2, the input data is converted by a word embedding method to obtain a vector set { E } c ,E rc ,E w ,E rw ,E pos }, in which:
Figure FDA0003747991430000011
representing a word vector set, wherein each element represents a word vector;
Figure FDA0003747991430000021
representing a word-level radical vector set, wherein each element represents a word-level radical vector;
Figure FDA0003747991430000022
representing a word vector set, wherein each element represents a word vector;
Figure FDA0003747991430000023
representing a set of word-level radical vectors, wherein each element represents a word-level radical vector;
Figure FDA0003747991430000024
representing a set of word-property vectors, where each element represents a word-property vector.
4. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 3, wherein: in step S3, a BiGRU model and a dot product attention mechanism are used to perform feature fusion on the word vector and the word-level radical vector to obtain a word-radical feature vector, which specifically includes the following steps:
s3.1, biGRU c Setting the initial states of the models to be 0, and collecting the word vector set E c Input BiGRU c Model, obtaining a word feature vector set by the following formula
Figure FDA0003747991430000025
Wherein each element represents a word feature vector, and the calculation formula is:
Figure FDA0003747991430000026
s3.2, respectively comparing y with y through a dot product attention mechanism c And word level part initial vector set E rc Performing feature fusion to obtain fused vector
Figure FDA0003747991430000027
The calculation formula is:
Figure FDA0003747991430000028
Figure FDA0003747991430000029
wherein alpha is i Representing word level part first vector set E rc The ith element of
Figure FDA00037479914300000210
And character feature vector set y c The ith element in
Figure FDA00037479914300000211
A weight matrix after dot product operation represents dot product operation, T is matrix transposition operation, and softmax (·) represents a softmax normalization function;
s3.3, mixing
Figure FDA00037479914300000212
As a BiGRU rc Input vector of model, biGRU c The hidden layer state of the last moment of the model is transmitted to the BiGRU rc The model is used as an initial state, and then a character-radical feature vector set is obtained
Figure FDA00037479914300000213
Wherein each element represents a word-radical feature vector, and the calculation formula is:
Figure FDA00037479914300000214
5. the knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 4, wherein: the character-radical feature output vector after knowledge enhancement in the step S4 is specifically obtained through the following processes:
s4.1, constructing an emotion knowledge graph by taking the emotion words in the emotion vocabulary ontology library as head entities h, taking emotion categories as tail entities t and taking emotion intensity of the emotion words as relation r;
s4.2, by TranThe sE model carries out distributed vector representation on the triples in the emotion knowledge map to obtain an emotion knowledge vector K r_c
S4.3, integrating character-radical characteristic vectors V by using a multi-head attention mechanism r_c As Query vector, emotion knowledge vector K r_c Performing feature fusion as corresponding Key vector and Value vector to obtain feature output vector after knowledge enhancement
Figure FDA0003747991430000035
The calculation formula is:
K r_c =TransE(h,r,t)
Figure FDA0003747991430000031
wherein TransE (. Cndot.) represents the TransE model and Multihead (. Cndot.) is the multi-headed attention mechanism.
6. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 5, wherein: step S5 specifically includes the following processes:
s5.1, outputting vectors to the enhanced character-radical characteristics
Figure FDA0003747991430000032
Enhanced word-radical feature vector
Figure FDA0003747991430000033
And enhanced word-part-of-speech feature vectors
Figure FDA0003747991430000034
Performing maximum pooling operation, and performing feature fusion by vector splicing to obtain a fused feature vector V y
S5.2, fusing the feature vector V y Inputting a fully-connected neural network, and performing normalization processing by using a softmax function to obtain a probability output P;
and S5.3, selecting the value with the maximum probability as the emotion recognition result y.
CN202210830349.2A 2022-07-15 2022-07-15 Knowledge and data driven multi-granularity Chinese text sentiment analysis method Withdrawn CN115409028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210830349.2A CN115409028A (en) 2022-07-15 2022-07-15 Knowledge and data driven multi-granularity Chinese text sentiment analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210830349.2A CN115409028A (en) 2022-07-15 2022-07-15 Knowledge and data driven multi-granularity Chinese text sentiment analysis method

Publications (1)

Publication Number Publication Date
CN115409028A true CN115409028A (en) 2022-11-29

Family

ID=84158335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210830349.2A Withdrawn CN115409028A (en) 2022-07-15 2022-07-15 Knowledge and data driven multi-granularity Chinese text sentiment analysis method

Country Status (1)

Country Link
CN (1) CN115409028A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786120A (en) * 2024-02-28 2024-03-29 山东省计算中心(国家超级计算济南中心) Text emotion classification method and system based on hierarchical attention mechanism
CN117786120B (en) * 2024-02-28 2024-05-24 山东省计算中心(国家超级计算济南中心) Text emotion classification method and system based on hierarchical attention mechanism

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117786120A (en) * 2024-02-28 2024-03-29 山东省计算中心(国家超级计算济南中心) Text emotion classification method and system based on hierarchical attention mechanism
CN117786120B (en) * 2024-02-28 2024-05-24 山东省计算中心(国家超级计算济南中心) Text emotion classification method and system based on hierarchical attention mechanism

Similar Documents

Publication Publication Date Title
Zhang et al. Multi-scale attention with dense encoder for handwritten mathematical expression recognition
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN110598005B (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
Li et al. Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention
CN109766277B (en) Software fault diagnosis method based on transfer learning and DNN
Zhang et al. Radical analysis network for learning hierarchies of Chinese characters
CN111144448A (en) Video barrage emotion analysis method based on multi-scale attention convolutional coding network
CN106980608A (en) A kind of Chinese electronic health record participle and name entity recognition method and system
CN110647612A (en) Visual conversation generation method based on double-visual attention network
CN110489750A (en) Burmese participle and part-of-speech tagging method and device based on two-way LSTM-CRF
CN113298151A (en) Remote sensing image semantic description method based on multi-level feature fusion
CN111259153B (en) Attribute-level emotion analysis method of complete attention mechanism
Li et al. Text-to-text generative adversarial networks
CN110909736A (en) Image description method based on long-short term memory model and target detection algorithm
Cheng et al. Sentiment analysis using multi-head attention capsules with multi-channel CNN and bidirectional GRU
CN111428481A (en) Entity relation extraction method based on deep learning
CN115630156A (en) Mongolian emotion analysis method and system fusing Prompt and SRU
CN113486645A (en) Text similarity detection method based on deep learning
Wu et al. TDv2: a novel tree-structured decoder for offline mathematical expression recognition
CN116029305A (en) Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning
Yu et al. Cross-Domain Slot Filling as Machine Reading Comprehension.
CN115409028A (en) Knowledge and data driven multi-granularity Chinese text sentiment analysis method
CN115169429A (en) Lightweight aspect-level text emotion analysis method
CN112364654A (en) Education-field-oriented entity and relation combined extraction method
CN115062109A (en) Entity-to-attention mechanism-based entity relationship joint extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20221129

WW01 Invention patent application withdrawn after publication