CN115409028A

CN115409028A - Knowledge and data driven multi-granularity Chinese text sentiment analysis method

Info

Publication number: CN115409028A
Application number: CN202210830349.2A
Authority: CN
Inventors: 刘忠宝; 张兴芹
Original assignee: Shandong Foreign Language Vocational And Technical University
Current assignee: Shandong Foreign Language Vocational And Technical University
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-11-29

Abstract

The invention provides a knowledge and data driven multi-granularity Chinese text sentiment analysis method which comprises the steps of extracting features through models such as a bidirectional gating circulation unit and an attention mechanism on the basis of vectorization representation of characters, words, radicals and parts of speech to obtain feature vectors, representing a sentiment knowledge graph spectrum as sentiment knowledge vectors through a TransE model, carrying out feature fusion on the feature vectors and the sentiment knowledge vectors through a multi-head attention mechanism to obtain the feature vectors with enhanced knowledge, and finally carrying out sentiment tendency identification on the feature vectors through a full connection layer and a classification function. Results of a comparison experiment and an ablation experiment show that the F1 value of the method is obviously improved compared with other models.

Description

Knowledge and data driven multi-granularity Chinese text sentiment analysis method

Technical Field

The invention relates to the technical field of Chinese text sentiment analysis, in particular to a knowledge and data driven multi-granularity Chinese text sentiment analysis method.

Background

In recent years, chinese sentiment analysis has received great attention from researchers and has advanced greatly. At present, the research results of emotion analysis of Chinese texts can be summarized into three types: the method comprises the steps of knowledge-driven emotion analysis research, data-driven emotion analysis research and knowledge and data collaborative drive emotion analysis research.

The knowledge-driven emotion analysis research mainly comprises the steps of constructing a knowledge graph, then obtaining effective explicit characteristics from the knowledge graph, and formulating corresponding knowledge extraction rules to carry out emotion recognition. At present, few knowledge-driven emotion analysis researches are carried out, and the only researches are that the intelligence macro and the like construct a film and television and comment knowledge map with specific attributes in order to fully utilize film and comment data to improve the film and television recommendation effect, so that a semi-automatic film and comment knowledge extraction method is provided. The method comprises the steps of carrying out syntactic analysis on preprocessed film evaluation data, constructing an emotion dictionary, carrying out data annotation and other operations to formulate a knowledge extraction rule, then carrying out knowledge extraction by combining the dictionary and abstract quantitative clustering to obtain film evaluation structured knowledge, and finally fusing the film evaluation structured knowledge with film body knowledge to form a film and comment knowledge map so as to be applied to emotion analysis. Although knowledge-driven emotion analysis research is easy to understand, the research often ignores the context of texts, cannot capture deep semantic information, and has poor generalization capability.

The data-driven emotion analysis research is mainly based on the traditional method and the deep learning method. The traditional method utilizes an emotion dictionary or a machine learning model to carry out emotion analysis, although the method can achieve certain effect, a large amount of manpower, material resources and resources are consumed to artificially design complex semantic and grammatical features, and the applicability and generalization performance of the method are greatly limited. In view of the fact that the deep learning model can automatically complete feature extraction and semantic representation, and overcome the defects of the traditional method, researchers begin to introduce the deep learning model into text emotion analysis. The deep learning-based method mainly extracts deep semantic features of characters, words and other features through a deep learning model so as to identify the text emotional tendency. Although the data-driven emotion analysis method can well capture the syntactic and semantic information between the context and the target, the method has the defects in the aspect of effectively integrating external knowledge to help understand the text.

Knowledge and data collaborative driving emotion analysis research mainly expresses a knowledge map as a knowledge vector, and text emotion analysis is carried out by fusing the knowledge vector with a feature vector obtained by a deep learning model. The learners notice the respective advantages and disadvantages of knowledge-driven emotion analysis and data-driven emotion analysis, so that the knowledge-driven emotion analysis and the data-driven emotion analysis are combined, the priori knowledge in the knowledge map is utilized to provide a monitoring signal for the deep learning model, and the semantic analysis capability of the deep learning model is improved, so that text representation with richer semantic information is obtained and is used for emotion recognition. At present, the research is mostly applied to emotion analysis of English texts. Experiments on the SemEval 2014Task4 data set and the Twitter data set show that the method has better classification effect.

It can be seen from the combing of related research, scholars have obtained a series of research results in the text sentiment analysis research, and as the research goes deep, some important challenges are faced: firstly, in data-driven emotion analysis research, most of Chinese text emotion analysis methods used by the data-driven emotion analysis research are based on English text emotion analysis methods, and the essential difference between Chinese (pictographs) and English (Latin characters) is ignored; secondly, some researches recognize the importance of characters, words and part-of-speech characteristics to emotion analysis, and fuse the characters, words and part-of-speech characteristics to assist emotion analysis, but whether more characteristics can be introduced to realize emotion recognition with higher performance besides the characters, words and part-of-speech characteristics is not deeply researched; finally, emotion analysis research driven by knowledge and data cooperation is mostly researched based on English texts and English knowledge maps, but emotion analysis research based on Chinese knowledge maps and Chinese texts is not discussed yet.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a knowledge and data driven multi-granularity Chinese text sentiment analysis method, which comprises the steps of performing characteristic extraction through models such as a bidirectional gating circulation unit and an attention mechanism on the basis of vectorization representation of characters, words, radicals and parts of speech to obtain a characteristic vector, representing a sentiment knowledge graph spectrum into a sentiment knowledge vector through a TransE model, performing characteristic fusion on the characteristic vector and the sentiment knowledge vector through a multi-head attention mechanism to obtain a knowledge enhanced characteristic vector, and finally performing sentiment tendency identification on the characteristic vector through a full connection layer and a classification function.

The technical scheme adopted by the invention for solving the technical problem is as follows: a knowledge and data driven multi-granularity Chinese text sentiment analysis method is provided, and comprises the following steps:

s1, preprocessing a Chinese text to form 5 types of input data of a word-level text, a word-level radical text, a word-level radical text and a part-of-speech text;

s2, converting input data into a set consisting of 5 types of vectors, namely a word vector, a word-level radical vector, a word-level radical vector and a part-of-speech vector, by using a word embedding method;

s3, performing feature fusion on the word vector and the word-level radical vector to obtain a word-radical feature vector, and performing feature fusion on the word vector and the part-of-speech vector to obtain a word-part-of-speech feature vector by utilizing a BiGRU model and a dot product attention mechanism;

s4, constructing an emotion knowledge map by using the emotion vocabulary ontology library, and performing distributed vector representation on triples in the emotion knowledge map to obtain emotion knowledge vectors; respectively fusing the character-radical feature vector, the word-radical feature vector and the word-part of speech feature vector with the emotion knowledge vector through a multi-head attention mechanism to obtain a knowledge enhanced character-radical feature output vector, a knowledge enhanced word-radical feature vector and a knowledge enhanced word-part of speech feature vector;

and S5, outputting the enhanced character-radical feature output vector, the enhanced word-radical feature vector and the enhanced word-part-of-speech feature vector to generate an emotion recognition result.

The step S1 specifically includes the following processes:

s1.1, for an input text T consisting of m words, it is a word-level text T ^c ＝{c ₁ ,c ₂ ,...,c _m -wherein each element represents each word in T; segmentation of an input text T into n words, i.e. word text T, using a word segmentation tool ^w ＝{w ₁ ,w ₂ ,...,w _n -wherein each element represents each word in T;

s1.2, according to radical mapping relation of Xinhua dictionary, character level text T is mapped ^c Harmony word text T ^w Processing to obtain word-level radical text T ^rc ＝{rc ₁ ,rc ₂ ,...,rc _m And word-level radical text T ^rw ＝{rw ₁ ,rw ₂ ,...,rw _n Character level radical text T ^rc Each element in the text of the word-level radicals represents a word-level radical; word text T by utilizing jieba part-of-speech tagging tool ^w Conversion to part-of-speech text T ^pos ＝{pos ₁ ,pos ₂ ,...,pos _n Each element represents a part of speech corresponding to a word, so that input data { T } is obtained ^c ,T ^rc ,T ^w ,T ^rw ,T ^pos }。

In step S2, the input data is converted by a word embedding method to obtain a vector set { E } ^c ,E ^rc ,E ^w ,E ^rw ,E ^pos }, wherein:

representing a word vector set, wherein each element represents a word vector;

representing a word-level radical vector set, wherein each element represents a word-level radical vector;

representing a word vector set, wherein each element represents a word vector;

representing a set of word-level radical vectors, wherein each element represents a word-level radical vector;

representing a set of word-property vectors, where each element represents a word-property vector.

In step S3, a BiGRU model and a dot product attention mechanism are used to perform feature fusion on the word vector and the word-level radical vector to obtain a word-radical feature vector, which specifically includes the following steps:

s3.1, biGRU ^c Setting the initial states of the models to be 0, and collecting the word vector set E ^c Input BiGRU ^c Model, obtaining a word feature vector set by the following formula

Wherein each element represents a word feature vector, and the calculation formula is:

s3.2, respectively converting y through a dot product attention mechanism ^c And word level part initial vector set E ^rc Performing feature fusion to obtain a fused vector

The calculation formula is as follows:

wherein alpha is _i Representing word level part first vector set E ^rc The ith element of

And character feature vector set y ^c The ith element y _i ^c A weight matrix after dot product operation represents dot product operation, T is matrix transposition operation, and softmax (·) represents a softmax normalization function;

s3.3, mixing

As a BiGRU ^rc Input vector of model, biGRU ^c The hidden layer state of the last moment of the model is transmitted to the BiGRU ^rc The model is used as an initial state, and then a character-radical feature vector set is obtained

Wherein each element represents a word-radical feature vector, and the calculation formula is as follows:

the word-radical feature output vector after knowledge enhancement in the step S4 is obtained by the following processes:

s4.1, constructing an emotional knowledge map by taking the emotional words in the emotional word ontology library as head entities h, the emotional categories as tail entities t and the emotional intensities of the emotional words as relations r;

s4.2, carrying out distributed vector representation on the triples in the emotion knowledge map through a TransE model to obtain an emotion knowledge vector K ^r_c ；

S4.3, integrating character-radical characteristic vectors V by using a multi-head attention mechanism ^r_c As Query vector, emotion knowledge vector K ^r_c Performing feature fusion as corresponding Key vector and Value vector to obtain feature output vector after knowledge enhancement

The calculation formula is:

K ^r_c ＝TransE(h,r,t)

wherein TransE (. Cndot.) represents the TransE model and Multihead (. Cndot.) is the multi-headed attention mechanism.

Step S5 specifically includes the following processes:

s5.1, outputting vectors to the enhanced character-radical characteristics

Enhanced word-radical feature vector

And enhanced word-part-of-speech feature vectors

Performing maximum pooling operation, and performing feature fusion by vector splicing to obtain a fused feature vector V _y ；

S5.2, fusing the feature vector V _y Inputting a fully-connected neural network, and performing normalization processing by using a softmax function to obtain a probability output P;

and S5.3, selecting the value with the maximum probability as the emotion recognition result y.

The invention has the beneficial effects based on the technical scheme that:

aiming at the practical requirements of the particularity and emotion analysis of the Chinese text, the invention utilizes the part of speech of the Chinese character and the word to assist the semantic understanding of the Chinese text, develops research around the problem of Chinese emotion analysis, introduces a research paradigm of 'knowledge + data' in the field of Chinese emotion analysis, fully excavates the potential semantic and emotion information of the characters, words, parts of speech, part of speech and other features by fusing an emotion knowledge vector and a feature vector of a depth model, improves the capability of Chinese emotion analysis, further enriches the theoretical system and the method system of Chinese emotion analysis, and simultaneously verifies the performance improvement brought by the method on Chinese emotion analysis through experiments.

Drawings

FIG. 1 is a schematic diagram of an example text conversion process.

FIG. 2 is a schematic diagram of the Skip-Gram model structure.

Fig. 3 is a schematic diagram of word-radical feature vector generation.

FIG. 4 is a schematic diagram of the TransE model.

Fig. 5 is a schematic diagram of the generation process of the output vector.

Fig. 6 is a schematic diagram of the output process.

Detailed Description

The invention is further illustrated by the following examples in conjunction with the drawings.

The invention provides a knowledge and data driven multi-granularity Chinese text sentiment analysis method, which comprises the following steps of:

s1, preprocessing the Chinese text to form 5 types of input data of a word-level text, a word-level radical text, a word-level radical text and a part-of-speech text. The method specifically comprises the following steps:

s1.1, for an input text T consisting of m words, it is a word-level text T ^c ＝{c ₁ ,c ₂ ,...,c _m In which c is _i (i =1,2,. Said., m) represents each word in T; segmenting the input text T into n words by utilizing a jieba word segmentation tool, namely, the word text T ^w ＝{w ₁ ,w ₂ ,...,w _n In which w _i (i =1, 2.. N) represents each word in T;

s1.2, according to radical mapping relation of Xinhua dictionary, character level text T is mapped ^c Word-sum text T ^w Processing to obtain word-level radical text T ^rc ＝{rc ₁ ,rc ₂ ,...,rc _m And word-level radical text T ^rw ＝{rw ₁ ,rw ₂ ,...,rw _n } in which rc _i (i =1, 2.. Said., m) represents a word-level radical, rw _i (i =1, 2.. Ang., n) represents a word-level radical; word text T by utilizing jieba part-of-speech tagging tool ^w Conversion to part-of-speech text T ^pos ＝{pos ₁ ,pos ₂ ,...,pos _n At pos, where _i (i =1, 2.. Times.n) represents the part of speech corresponding to the word, so far the input data { T } is obtained ^c ,T ^rc ,T ^w ,T ^rw ,T ^pos }。

From the above analysis, | T ^c |＝|T ^rc |，|T ^w |＝|T ^rw |＝|T ^pos | and | represent the text scale. FIG. 1 shows a schematic representation of a "better predictionThe conversion process of the input text is given by taking a plurality of texts as examples.

And S2, converting the input data into a set consisting of 5 types of vectors such as a word vector, a word-level radical vector, a word-level radical vector and a part-of-speech vector by using a word embedding method. Specifically, the Word2vec Word embedding method is adopted to convert the input data to obtain a vector set { E } ^c ,E ^rc ,E ^w ,E ^rw ,E ^pos }, in which:

representing a word vector set, wherein each element represents a word vector;

representing a word vector set, wherein each element represents a word vector;

The Word2Vec method has a Continuous Bag of Words model (Continuous Bag of Words, CBOW) and a Skip-Gram model (Skip-Gram) when training vectors. Since the quality of the vector obtained by training the Skip-Gram model in large-scale corpus is better, the model is used for vectorizing the input data. Using the word vector as an example, the Skip-Gram model passes through the central word w _c To predict the context background word w _o The probability of (c). The model structure is shown in FIG. 2, wherein w _t Denotes w _c ，w _t-2 、w _t-1 、w _t+1 、w _t+2 Denotes w _o SUM is a summation operation.

The model represents each word as a word vector of a central word and a word vector of a background word, so as to calculate the conditional probability between the central word and the background word to be predicted, as shown in the following formula:

wherein p represents a conditional probability, w _c Denotes a core word, w _o Representing a background word, v _c Word vectors, v, representing the central words _o A word vector representing a background word, N representing the size of the lexicon, c, o, i representing the index of the word in the lexicon, exp (·) representing an exponential function based on a natural constant e.

And S3, performing feature fusion on the word vectors and word-level radical vectors to obtain word-radical feature vectors, performing feature fusion on the word vectors and the word-level radical vectors to obtain word-radical feature vectors, and performing feature fusion on the word vectors and the part-of-speech vectors to obtain word-part-of-speech feature vectors by using a BiGRU model and a dot product attention mechanism.

In view of the fact that Chinese texts have obvious sequence characteristics, a BiGRU model is adopted as a basic model. The model realizes the effective utilization of text context semantic features by splicing feature vectors of GRU models with forward direction and reverse direction. The working principle of the GRU model is shown as follows:

r _t ＝sigmoid(x _t ×W _xr +h _t-1 ×W _hr +b _r )

z _t ＝sigmoid(x _t ×W _xz +h _t-1 ×W _hz +b _z )

wherein x is _t Is the input vector at time t, r _t And z _t Representing the reset gate and the update gate, respectively, at time t, W and b are the corresponding weight matrix and offset vector,

representing candidate memory cells, sigmoid (-) and tanh (-) represent activation functions, h _t An output vector of the current time,. Fwdaruma product,. Times.represents matrix multiplication.

The principle of BiGRU is shown by the following formula:

wherein x is _t For the input vector at time t,

feature vectors, y, derived from forward and reverse GRU models, respectively _t And obtaining the feature vector for the BiGRU model at the current moment.

Inspiration of Attention mechanism (Attention) comes from human Attention, namely, input data can be divided into different degrees of importance by assigning different weights to the input data. From the implementation method, the attention mechanism is mainly implemented by a linear weighting mode and a dot product mode, the effects of the two modes are not essentially different, and the dot product mode is faster in calculation speed, so that the feature fusion is performed by the dot product attention mechanism. The calculation process is shown as the following formula:

Attention(Q,K,V)＝softmax(QK ^T )V

q and K are a Query matrix and a Key matrix, V is a Value matrix, and softmax ((-)) represents a normalization function.

Based on the analysis, the feature extraction layer performs feature fusion on the word vectors and word-level radical vectors by using a BiGRU model and a dot product attention mechanism to obtain word-radical feature vectors, performs feature fusion on the word vectors and word-level radical vectors to obtain word-radical feature vectors, and performs feature fusion on the word vectors and part-of-speech vectors to obtain word-part-of-speech feature vectors.

Referring to fig. 3, taking generation of a word-radical feature vector as an example, the method specifically includes the following processes:

s3.1, biGRU ^c Setting the initial states of the models to be 0, and collecting the word vector set E ^c Input BiGRU ^c Model, obtaining a set of word feature vectors by the following formula

Wherein

Representing the character feature vector, and calculating as follows:

s3.2, respectively converting y through a dot product attention mechanism ^c And word level part head vector set E ^rc Performing feature fusion to obtain a fused vector

The calculation formula is as follows:

wherein alpha is _i Representing a set E of word-level part radical vectors ^rc The ith element in

And character feature vector set y ^c The ith element y _i ^c The weight matrix after the dot product operation represents the dot product operation, T is the matrix transposition operation, and softmax (·) represents a softmax normalization function;

s3.3, mixing

As a BiGRU ^rc Input vector of model, and BiGRU ^c The hidden layer state of the last moment of the model is transmitted to the BiGRU ^rc The model is used as an initial state, and then a character-radical feature vector set is obtained:

wherein

Representing the character-radical feature vector, and the calculation formula is as follows:

similarly, the generation process of the word-radical feature vector and the word-part-of-speech feature vector is similar to that of the word-radical feature vector, and finally, a word-radical feature vector set is obtained:

and word-part-of-speech feature vector sets:

wherein

The word-radical feature vector is represented,

representing word-part-of-speech feature vectors.

S4, constructing an emotion knowledge map by using the emotion vocabulary ontology library, and performing distributed vector representation on triples in the emotion knowledge map to obtain emotion knowledge vectors; and respectively fusing the character-radical feature vector, the word-radical feature vector and the word-part of speech feature vector with the emotion knowledge vector through a multi-head attention mechanism to obtain a knowledge-enhanced character-radical feature output vector, a knowledge-enhanced word-radical feature vector and a knowledge-enhanced word-part of speech feature vector.

The TransE model belongs to a knowledge graph embedding model, and the model carries out distributed vector representation on entities and relations in the knowledge graph so as to obtain entity semantic vectors. The schematic diagram of the TransE model is shown in FIG. 4.

Let the head entity be h, the tail entity be t, and the relationship be r, then for a given triplet (h, r, t), the TransE model represents it as (h, r, t) in FIG. 5, where h is the vector representation of the head entity, r is the vector representation of the relationship, and t is the vector representation of the tail entity. In the training process of the TransE model, a triplet when h + r is approximately equal to t is a positive sample triplet, a triplet when h + r is not equal to t is a negative sample triplet, the distance between the positive sample triplets is reduced by constructing a target loss function L, the distance between the negative sample triplets is increased, and an L calculation formula is shown as the following formula:

d(h,r,t)＝||h+r-t|| _L1/L2

where d is a function measuring the distance between two vectors h + r and t, | | | | |, which represents the Euclidean distance, calculated using L1 or L2 norm, S ⁺ Representing positive samplesTriple set, S ^- Representing a negative sample triplet set, gamma refers to the interval in the loss function, which parameter is greater than 0.

The Multi-Head Attention (Multi-Head) mechanism enhances the Attention capability of the model by transversely splicing a plurality of Attention mechanisms, and can further represent semantic information of different positions and different aspects. The principle is as follows:

head _i ＝Attention(QW _i ^Q ,KW _i ^K ,VW _i ^V )

MultiHead(Q,K,V)＝Concat(head ₁ ,head ₂ ,...,head _h )W ^o

wherein, head represents attention head, h represents number of heads, Q, K and V are Query vector, key vector and Value vector,

weight matrices of Query, key and Value, W, respectively, of ith head ^o Being a weight matrix, concat (-) represents the splicing function,

and Concat (. Cndot.) W ^o The Linear layer operation in fig. 1 is shown.

Through the above analysis, taking the enhanced word-radical feature output vector generation process as an example, referring to fig. 5, the specific process is as follows:

s4.1, using about 2.7 ten thousand emotion words in an emotion vocabulary ontology library of the university of the great chain of studios as head entities h, using 21 emotion categories as tail entities t, and using the emotion intensity of the emotion words as relation r to further construct an emotion knowledge map;

s4.2, carrying out distributed vector representation on the triples in the emotion knowledge map through a TransE model to obtain an emotion knowledge vector K ^r_c 、K ^r_w And K ^pos_w (the three are the same vector);

s4.3, integrating character-radical characteristic vectors V by using a multi-head attention mechanism ^r_c As Query vector, emotion knowledge vector K ^r_c As a correspondingPerforming feature fusion on the Key vector and the Value vector to obtain a feature output vector after knowledge enhancement

The calculation formula is:

K ^r_c ＝TransE(h,r,t)

The enhanced word-radical feature vector can be obtained by the same method

And enhanced word-part-of-speech feature vectors

And S5, outputting the enhanced character-radical feature output vector, the enhanced word-radical feature vector and the enhanced word-part-of-speech feature vector to generate an emotion recognition result. Referring to fig. 6, the specific process is as follows:

s5.1, outputting vectors to the enhanced character-radical characteristics

Enhanced word-radical feature vector

And enhanced word-part-of-speech feature vectors

S5.2, fusing the feature vector V _y Inputting the neural network in full connection, and performing normalization processing by utilizing softmax function to obtainTo a probability output P;

and S5.3, selecting the value with the maximum probability as the emotion recognition result y. The calculation formula is as follows:

P＝softmax(WV _y +b)

y＝argmax(P)

where Concat (. Cndot.) represents the splicing function, max (. Cndot.) is the maximum pooling operation, W represents the weight matrix, b represents the bias, softmax (. Cndot.) represents the normalization function, and argmax (. Cndot.) represents the probability maximization function.

To verify the effectiveness of the method of the invention, experimental analysis was performed.

The experimental environment and configuration are shown in table 1:

TABLE 1 Experimental Environment and configuration

The data set used in the experiment is NLPECC data set and homemade broad bean movie data set. The NLPECC dataset totals 44875 samples with emotion labels of like, sad, disgust, angry, happy, and six others. In the experiment, an NLPECC data set is divided into a training set, a verification set and a test set, and the proportion of the NLPECC data set is 6:2:2. the homemade broad bean movie data set is 5 movie comment data crawled from a broad bean movie, namely Da Sheng Gui, charlotte vexation, american captain 3, hourly space 3 and July and Ansheng, and the data size is about 30 ten thousand. And carrying out pretreatment such as data cleaning, full-angle character conversion into half-angle character conversion, punctuation mark normalization, english letter capitalization to lowercase, traditional and simple conversion, duplicate removal and the like on the data to obtain about 25 ten thousand pieces of comment data. Table 2 gives the statistics of the pre-processed data.

TABLE 2 Pre-processed dataset information

The 1-star comment and the 2-star comment are classified as negative comments, the 3-star comment is classified as neutral comment, the 4-star comment and the 5-star comment are classified as positive comments, and the neutral comment is abandoned because the sentiment label scoring error rate of the neutral comment is too high to meet the experimental requirements. Therefore, about 6.7 thousands of negative reviews, about 14 thousands of positive reviews, and the number of positive reviews and the number of negative reviews are unbalanced, in order to solve the problem of data imbalance, 5 thousands of reviews are randomly extracted from the positive reviews and the negative reviews to serve as an experimental data set, and the experimental data set with the total size of 10 thousands of movie reviews is constructed. In the experiment, the experimental data set is equally divided into 10, and the proportion of the training set, the verification set and the test set is 6:2:2. namely 6 thousands of training sets (3 thousands of positive and negative comments), 2 thousands of verification sets (1 ten thousands of positive and negative comments), and 2 thousands of test sets (1 ten thousands of positive and negative comments).

The hyper-parameter settings for the model proposed by the experiment are shown in table 3:

TABLE 3 Superparameter settings

The test measures emotion recognition performance by using Precision (Precision, P), recall (Recall, R) and harmonic mean (F1-score, F1). The calculation formula is as follows:

wherein, true case (TP) represents correctly classified Positive case sample, false Positive case (FP) represents incorrectly classified Positive case sample, and False Negative case (FN) represents incorrectly classified Negative case sample. And P represents the proportion of positive samples predicted to be positive by the model, and R represents the proportion of positive samples predicted to be positive by the model in the positive samples predicted to be correct.

The knowledge and data driven multi-granularity Chinese text sentiment analysis method (KEAMM) provided by the invention is compared with the following existing Chinese text sentiment analysis model:

(1) BiGRU. The model uses a bidirectional gating circulation unit to extract the characteristics of word vectors, and text emotion analysis is achieved.

(2) DCCNN. The model adopts different channels to carry out convolution operation, wherein one channel is a word vector, the other channel is a word vector, and emotion analysis is carried out by fusing the characteristics of the two channels.

(3) FG _ MCCNN. The model comprehensively utilizes word vectors, word vectors and the part of speech fused by the word vectors and the part of speech to carry out emotion analysis on the vectors through multiple channels of the CNN model.

(4) BERT-BilSTM. The model constructs a word vector by using a BERT model, and then performs feature extraction by using BilSTM to realize emotion recognition.

(5) BERT-MCNN. The model is an emotion analysis model fusing a BERT model and a multilayer collaborative convolutional neural network.

The present invention takes characters, words, radicals, parts of speech and other characteristics as input. In order to verify the effectiveness of each module in the method of the invention, the following ablation experiments were performed after adjusting the model structure:

(1) Two BiGRUs. And respectively modeling the word text and the word text by using two BiGRU models, splicing output vectors of the word text and the word text, and identifying the emotion.

(2) Four BiGRUs. And respectively modeling the word text, the word level radical text, the word text and the word level radical text by using four BiGRUs, splicing output vectors of the four channels through the BiGRUs, and performing emotion recognition.

(3) Five BiGRUs. The model abandons the mode of performing feature fusion on characters, words, radicals and parts of speech in a KEAMM model through an attention mechanism, and respectively performs feature extraction on character texts, word-level radical texts, word-level radical texts and part of speech texts through five BiGRUs, splices output feature vectors passing through the BiGRUs on five channels, and performs emotion recognition.

(4) An Attention-based Multi-granularity Model (AMM). And discarding the part introduced by the knowledge in the KEAMM model, namely removing the feature fusion layer, splicing the feature vectors output by the feature extraction layer, and performing emotion recognition.

(5) KEAMM. The model is presented herein.

Table 4 gives the experimental results of the above respective models on the experimental data set.

TABLE 4 results of the experiment

By comprehensively analyzing the model experiment results in the comparison experiment and the ablation experiment in table 4 for the broad bean movie data set and the NLPECC data set, the following conclusions can be drawn.

The BiGRU model is a model only using word features, the F1 values of the model are 85.63% and 77.70%, the Two-turn BiGRUs model is a dual-channel model simultaneously using the word features and the word features, the F1 values of the model are 86.58% and 79.35%, and the F1 values of the Two-turn BiGRUs model are respectively improved by 0.95% and 1.65% compared with the BiGRU model, so that the characteristics of the words and the words are simultaneously used, the advantages of the words can be combined, and richer semantic feature power-assisted emotion analysis is extracted.

By comparing the DCCNN model with the FG _ MCCNN model, the DCCNN model is a dual-channel CNN model which simultaneously utilizes word characteristics and word characteristics, the F1 values of the model are 86.53% and 79.32% respectively, the FG _ MCCNN model is a three-channel CNN model which comprehensively utilizes the word characteristics, the word characteristics and the part-of-speech characteristics, the F1 values of the model are 86.92% and 80.47% respectively, and the F1 value of the model is improved by 0.39% and 1.15% respectively compared with the former model, which shows that the part-of-speech characteristics can bring increment for the improvement of emotion recognition performance. This conclusion is also confirmed by the improvement in F1 values for the Five BiGRUs model in the ablation experiments by 0.34% and 0.43%, respectively, compared to the Four BiGRUs model.

The F1 values of the DCCNN model are respectively 86.53% and 79.32%, the F1 values of the Two-turn BiGRUs model are respectively 86.58% and 79.35%, the F1 values of the Two-turn BiGRUs model are respectively improved by 0.05% and 0.03% compared with the former model, and analysis shows that the Two models are dual-channel models simultaneously utilizing character features and word features, and the difference is that the former model uses CNN as a basic model, and the latter model uses BiGRU as a basic model, so that the reason that the F1 value of the Two-turn BiGRUs model is slightly higher than that of the DCCNN model is that the convolution characteristics of the CNN model can cause text semantic feature loss, and the BiGRU model can learn the long-term dependence of context, further capture rich text semantic features, and further lead to better results.

From the above analysis, the performance of the CNN model in this experiment is weaker than that of the BiGRU model, and the F1 values of Four BiGRUs in table 4 are 87.07% and 80.80%, respectively, which are improved by 0.49% and 1.45% compared with Two BiGRUs, respectively, and improved by 0.53% and 1.48% compared with the DCCNN model, respectively; the F1 values of the Five BiGRUs model were 87.41% and 81.23%, respectively, which were improved by 0.49% and 0.76%, respectively, compared to the FG _ MCCNN model. The method shows that the characteristics of the radicals play a certain role in Chinese emotion analysis, and on the other hand, the method proves that the performance of emotion recognition can be improved by fusing a plurality of effective characteristics.

The F1 values of the AMM model are respectively 88.24% and 82.20%, compared with Five BiGRU, the F1 values are respectively increased by 0.83% and 0.97%, the comparison of the two models can find that the Five BiGRU model only performs feature fusion on features with different granularities through simple splicing, no information interaction is performed among the features of characters, words, radicals, parts of speech and the like in the feature extraction process, the AMM model extracts the characters and the word features through the BiGRU model, and performs information interaction and fusion on the characters and the word features respectively by using a point-product attention mechanism, so that the fused feature vector can sense emotion tendencies from the deep-level semantic features of the characters, the words, the radicals and the parts of speech, and further improves emotion recognition performance.

The F1 values of the BERT-BilSTM are respectively 88.25% and 82.99%, the F1 values of the BERT-MCNN model are respectively 88.68% and 83.14%, the two models are higher than various reference models and ablation models represented by AMM models, and the recognition performance of the model is only lower than that of the KEAMM model provided by the text, mainly because the BERT model can dynamically represent text vectors and can be finely adjusted for semantic representation capability according to emotion analysis tasks, the model learning field knowledge is assisted, and richer semantic features are generated, so that the performance is improved.

Finally, the F1 values of the present invention reached 89.23% and 84.84%, which exceeded all comparative models, demonstrating the effectiveness and superiority of the present invention. Compared with an AMM model in an ablation experiment, the F1 value of the model is respectively improved by 0.99% and 2.64%, compared with a BERT-BilSTM model in a comparison experiment, the F1 value of the model is respectively improved by 0.98% and 1.85%, and compared with a BERT-MCNN model, the F1 value of the model is respectively improved by 0.55% and 1.7%. The AMM, BERT-BilsTM and BERT-MCNN models are analyzed to be free of introduction of emotion knowledge, and the KEAMM model introduces emotion knowledge vectors into the AMM model through a multi-head attention mechanism, so that the performance of the KEAMM model exceeds that of the BERT-MCNN models and other models, and the fact that in Chinese emotion analysis, explicit emotion knowledge is introduced can guide a depth model to conduct more accurate emotion analysis and further performance is improved is shown.

The knowledge and data driven multi-granularity Chinese text sentiment analysis method provided by the invention introduces a research paradigm of 'knowledge + data' into the field of Chinese sentiment analysis based on the particularity of Chinese text and the actual requirements of sentiment analysis, carries out deep fusion on a sentiment knowledge vector, a BiGRU model and a feature vector obtained by an attention mechanism, and provides a Chinese text sentiment analysis method integrating multi-granularity features such as characters, words, radicals, parts of speech and the like. Results of a comparison experiment and an ablation experiment show that the F1 value of the method is obviously improved compared with other models.

Claims

1. A knowledge and data driven multi-granularity Chinese text sentiment analysis method is characterized by comprising the following steps:

s3, performing feature fusion on the word vectors and word-level radical vectors to obtain word-radical feature vectors, performing feature fusion on the word vectors and the word-level radical vectors to obtain word-radical feature vectors, and performing feature fusion on the word vectors and the part-of-speech vectors to obtain word-part-of-speech feature vectors by using a BiGRU model and a dot product attention mechanism;

2. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 1, wherein: the step S1 specifically includes the following processes:

s1.1, for an input text T consisting of m words, it is a word-level text T ^c ＝{c ₁ ,c ₂ ,...,c _m -wherein each element represents each word in T; segmenting the input text T into n words using a word segmentation tool, i.e.Word text T ^w ＝{w ₁ ,w ₂ ,...,w _n -wherein each element represents each word in T;

s1.2, according to radical mapping relation of Xinhua dictionary, character level text T is mapped ^c Harmony word text T ^w Processing to obtain word-level radical text T ^rc ＝{rc ₁ ,rc ₂ ,...,rc _m And word-level radical text T ^rw ＝{rw ₁ ,rw ₂ ,...,rw _n }, word-level radical text T ^rc Each element in the text of the word-level radicals represents the word-level radicals; word text T by utilizing jieba part-of-speech tagging tool ^w Conversion to part-of-speech text T ^pos ＝{pos ₁ ,pos ₂ ,...,pos _n Each element represents a part of speech corresponding to a word, so that input data { T } is obtained ^c ,T ^rc ,T ^w ,T ^rw ,T ^pos }。

3. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 2, wherein: in step S2, the input data is converted by a word embedding method to obtain a vector set { E } ^c ,E ^rc ,E ^w ,E ^rw ,E ^pos }, in which:

representing a word vector set, wherein each element represents a word vector;

representing a word vector set, wherein each element represents a word vector;

4. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 3, wherein: in step S3, a BiGRU model and a dot product attention mechanism are used to perform feature fusion on the word vector and the word-level radical vector to obtain a word-radical feature vector, which specifically includes the following steps:

s3.2, respectively comparing y with y through a dot product attention mechanism ^c And word level part initial vector set E ^rc Performing feature fusion to obtain fused vector

The calculation formula is:

And character feature vector set y ^c The ith element in

A weight matrix after dot product operation represents dot product operation, T is matrix transposition operation, and softmax (·) represents a softmax normalization function;

s3.3, mixing

Wherein each element represents a word-radical feature vector, and the calculation formula is:

5. the knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 4, wherein: the character-radical feature output vector after knowledge enhancement in the step S4 is specifically obtained through the following processes:

s4.1, constructing an emotion knowledge graph by taking the emotion words in the emotion vocabulary ontology library as head entities h, taking emotion categories as tail entities t and taking emotion intensity of the emotion words as relation r;

s4.2, by TranThe sE model carries out distributed vector representation on the triples in the emotion knowledge map to obtain an emotion knowledge vector K ^r_c ；

The calculation formula is:

K ^r_c ＝TransE(h,r,t)

6. The knowledge and data driven multi-granularity Chinese text sentiment analysis method of claim 5, wherein: step S5 specifically includes the following processes:

s5.1, outputting vectors to the enhanced character-radical characteristics

Enhanced word-radical feature vector

And enhanced word-part-of-speech feature vectors