CN112926336A

CN112926336A - Microblog case aspect-level viewpoint identification method based on text comment interactive attention

Info

Publication number: CN112926336A
Application number: CN202110163045.0A
Authority: CN
Inventors: 余正涛; 段玲; 郭军军; 相艳; 黄于欣; 线岩团
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-08

Abstract

The invention relates to a microblog case aspect-level viewpoint identification method based on text comment interactive attention, and belongs to the technical field of natural language processing. According to the method, texts and related comments of a plurality of hot cases are crawled from a microblog to form a data set, and then word segmentation is carried out on the data set to construct a dictionary. The method comprises the steps of firstly respectively encoding the text and the comment, then realizing the fusion of the text information and the comment information based on an interactive attention mechanism, and realizing the recognition of the comment text case aspect viewpoint based on the fused characteristics. According to the method, the accuracy of case-level viewpoint identification is effectively improved, and the problem of poor microblog case-level viewpoint identification performance is solved.

Description

Microblog case aspect-level viewpoint identification method based on text comment interactive attention

Technical Field

The invention relates to a microblog case aspect-level viewpoint identification method based on text comment interactive attention, and belongs to the technical field of natural language processing.

Background

In recent years, the concerns of netizens on legal cases are increased, the reports of microblog media on such cases are more frequent, and a plurality of netizens give comments on the cases. At present, the texts of the news also contain viewpoints of different objects, and in the face of huge data, it is not practical to grasp public opinion trends by manually reading a large number of comments, and people or judicial authorities pay more attention to netizen viewpoints of some aspects of cases. Therefore, the research of microblog-data-oriented case-related aspect-level viewpoint identification has important research significance for rapidly mastering the internet situation. However, the general form and expression mode of the microblog data are flexible and changeable, and the judgment of the microblog data in terms of the grade viewpoint based on the traditional natural language processing method is difficult. In fact, the microblog text is a statement of case facts, wherein the statement includes descriptions of various aspects of the case, and microblog comments are mostly discussions expanded around the text content, so that the information combined with the text can provide assistance for understanding of case-related texts.

Disclosure of Invention

The invention provides a microblog case aspect viewpoint identification method based on interactive attention of text comments, which is used for realizing fusion of text information and comment information, promoting and enhancing semantic representation of comments, realizing identification of comment text case aspect viewpoints based on fused features and solving the problem of poor aspect viewpoint identification performance caused by insufficient comment information or ambiguous information.

The technical scheme of the invention is as follows: a microblog case aspect-level viewpoint identification method based on interactive attention of text comments comprises the steps of crawling texts and comments of a plurality of hot cases from a microblog by using a crawler, constructing a microblog case corpus, performing word segmentation and other preprocessing on the corpus, and then constructing a dictionary. Aiming at the problem of recognition of aspect-level viewpoints of microblog related documents, a case aspect-level viewpoint recognition method based on text and interactive comment attention is provided, and recognition of aspect-level viewpoints is achieved by fusing context information of social media.

The method comprises the following steps:

step1, acquiring relevant texts and comments of the hot cases from the microblog, constructing a microblog case corpus, and then segmenting the corpus to construct a dictionary;

step2, encoding the microblog texts and comments by using a multi-head attention mechanism;

step3, fusing the information of the texts and the comments through an interactive attention mechanism;

step4, identifying aspect level views of microblog cases using a multi-label classifier.

As a further scheme of the present invention, the Step1 specifically comprises the following steps:

step1.1, firstly, crawling microblog case linguistic data from the Internet by utilizing a web crawler program;

step1.2, filtering and denoising the crawled microblog case linguistic data to construct a microblog case data set;

and Step1.3, extracting the comments related to the microblog cases from the Step1.2 data set, corresponding to the texts to which the comments belong, marking corresponding labels, forming the microblog case linguistic data through manual processing, segmenting the comments by using a Chinese word segmentation tool, and constructing a dictionary.

As a further scheme of the invention, the step Step1.3 comprises the following specific steps:

step1.3.1, extracting the comments related to the microblog cases from the step1.2 data set, and corresponding to the texts;

step1.3.2, marking each comment with a corresponding label;

step1.3.3, segmenting the comments by using a Chinese segmentation tool, and inputting all the comments according to batches until all the comments are input;

step1.3.4, constructing a dictionary by the words obtained by Step1.3.3, specifically operating to construct an empty dictionary first, inputting each word into the empty dictionary, adding the word into the dictionary if the dictionary does not contain the word, and skipping the next word if the word is contained, and completing in turn.

As a further scheme of the present invention, the Step2 specifically comprises the following steps:

step2.1, using the text and comments of the microblog case as input at two ends of the model, adopting the same coding mode for the text and the comments, and expressing each sentence into an embedded matrix about the sentence;

the text and the comment of the microblog case are used as two inputs of a coding end, a sentence is assumed, n words exist in the sentence, and a sentence X represents the following formula:

X＝(x₁,x₂,...,x_n)

after the sentence is embedded, the expression formula of the word embedding sequence is as follows:

E＝(w₁,w₂,...,w_n)

e is a sequence which expresses sentences into a two-dimensional embedding matrix, all the embedding of the sentences are connected together, the dimension size is n multiplied by d, n is the number of words, d is the embedding dimension of the sentences, and each element in the sequence E is independent;

step2.2, coding the microblog text and the comments by adopting a multi-head attention mechanism, so that each word and all the words in the sentence are concerned by calculation, and the representation of the microblog case text and the comments is obtained;

reading each text sequence by using a multi-head attention mechanism, calculating the attention of each word and all words, converting a two-dimensional embedded matrix E into fixed single values Q, K and V, linearly changing the fixed single values and inputting the fixed single values into the multi-head attention mechanism with 8 heads, finally splicing the output values of all the heads, and converting the output values into one output value which is the same as the single head by a linear conversion layer, wherein the specific calculation formula is as follows:

A＝Linear(Multihead(Q,K,V))

the matrix A represents the representation of the text sequence obtained by multi-head attention mechanism coding; q, K, V are fixed single values;

and encoding the text and the comments by adopting a multi-head attention mechanism, so that each word in the sentence and all the words are concerned by calculation, and the representation of the text and the comments of the microblog case is obtained.

The multi-head attention mechanism is different in that multiple calculations are performed, and the model can be allowed to learn related information in different representation subspaces. Each word in the sentence pays attention to all words, and the dependency relationship can be directly calculated regardless of the distance between the words, so that the internal structure of one sentence is learned, the calculation at the previous moment is not depended on, and the parallel operation can be well realized. A multi-headed attention model can be understood macroscopically as a query to a series of key-value pair mappings. And performing word embedding on each word to obtain an embedding matrix about the sentence, and performing linear transformation on the embedding matrix to obtain a corresponding query (Q), a key (K) and a value (V). Q, K and V are fixed single values, and are respectively subjected to linear transformation and then input to the scaling dot product, wherein h times are needed, so-called multiple heads, parameters are not shared among the heads, and the parameters subjected to linear transformation at Q, K, V are different. Then, splicing the results of h times of scaling dot product attention, and taking the value obtained by performing linear transformation once again as the result of the multi-head attention mechanism, as shown in the following formula:

MultiHead(Q,K,V)＝Concat(head₁,head₂,,,head_h)W_i ^o

wherein:

head_i＝Attention(QW_i ^q,KW_i ^k,VW_i ^v)

wherein the parameter matrix

In this work, a scaling dot product of h-8 parallels is used to note, i.e., 8 heads. For each head d_k＝ d_v＝d_model/h。

Wherein the input dimension is d_kQ, K and a dimension d_vV.calculating the dot product of Q and all K, each key divided by

And apply the soft max function to obtain the weights for these values as shown in the equation:

wherein

Is a scaling factor, which plays a role in adjusting so that the inner volume above the cup is not too large. Wherein

Refers to the square root of the key vector dimension.

As a further scheme of the present invention, the Step3 specifically comprises the following steps:

step3.1, firstly, calculating the similarity between the text matrix and the comment matrix;

the representations of the texts and comments obtained through the multi-head attention mechanism coding are respectively represented by a matrix L and a matrix K, and the attention is calculated from two directions: the focus from comment to body and from body to comment, which come from a shared similarity matrix between the context embedded information of the body and comment, the similarity matrix is calculated as follows:

S_tj＝α(K_:t,L_:j)

wherein S_tjRepresenting the similarity between the tth comment word and the jth text word, alpha being a trainable scalar function, the similarity between two input vector representations of the encoded text and comment,K_:tthe t-th column vector of K, L_:jA jth column vector of L;

step3.2, calculating the attention from the text to the comment, namely an attention coding module from the text to the comment, wherein the text to comment coding module shows which text words are most relevant to each comment word;

text-to-comment attention indicates which text words are most relevant to each comment word, let a_tRepresenting the attention weight between the text word and the tth comment word, Σ a for all t_tjAttention weight is given by a 1_t＝soft max(S_t:) And calculating to obtain, wherein each text vector participating in attention is expressed as shown in a formula:

the method comprises the steps of calculating a concerned text vector with all comment words;

step3.3, calculating the attention of the comment to the text, namely an attention coding module for commenting to the text, wherein the attention coding module for commenting to the text indicates which comment words are most similar to a certain text word;

the attention of the comment to the text indicates which comment words are most similar to a certain text word by p ═ soft max (max)_col(s)) to obtain attention weights on the comment words, where the maximum function is max_col(s) performed across columns, then each comment vector participating in the attention is represented as shown in the formula:

the sum of the weights of the comment words most relevant to the text at a certain time is expressedAt any moment, use

To represent this comment vector;

step3.4, fusing text and comment information through an interactive attention mechanism, mutually focusing attention on the text and the comment, and fusing the text and the comment information to obtain a representation of the comment containing the text information;

and finally, splicing the comment word embedding and the text comment interactive attention vector, wherein the obtained matrix is represented by G, as shown in a formula:

each column vector in G can be viewed as a representation of textual information for each comment word, where the β function is an arbitrarily trainable neural network that fuses 3 input vectors.

As a further scheme of the present invention, the Step4 specifically comprises the following steps:

step4.1, performing linear transformation on the information characteristics of the text and the comments fused in the step3 to obtain the probabilities of four classes in each sentence of comment;

step4.2, obtaining two classifications of each of the four classifications according to the probability of each classification through a Sigmoid function;

step4.3, if the classification result is more than 0.4, the classification belongs to the class, otherwise, the classification does not belong to the class;

step4.4, map the comments to the corresponding tags, identifying the aspect level views of the case.

Through the process of mutual attention of the text comments, the G matrix obtained by fusing the interactive attention information of the text comments is a comment representation containing the text information. By subjecting this representation to a 4-level linear transformation and a non-linear activation function, 4 classes of vectors F are obtained. And then, carrying out secondary classification on each class through a sigmod function, and mapping vectors of 4 classes to (0,1) to obtain a predicted value P. The classification process is shown by the following two formulas:

F＝tanh(Linear(G))

P＝Sigmoid(F)

for the predicted value P, if the output value is greater than 0.4, the predicted value P is classified as a positive class, that is, if the output value is greater than 0.4, the comment belongs to the corresponding class, otherwise, the comment does not belong to the corresponding class. And each comment belongs to one or more types, so that identification of a microblog case aspect level viewpoint is realized.

The invention has the beneficial effects that: the method comprises the steps of firstly coding texts and comments respectively based on a Transformer framework, then realizing the fusion of text information and comment information based on an interactive attention mechanism, and realizing the recognition of comment text case aspect-level viewpoints based on the fused characteristics. By adopting the interactive attention and the microblog text information, the accuracy of case-aspect-level viewpoint identification can be remarkably improved.

The method solves the problem of poor microblog case aspect-level viewpoint identification performance, and has a practical effect on microblog case aspect-level viewpoint identification by fusing text and comment information through an interactive attention mechanism.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a general model diagram of the present invention;

FIG. 3 is a schematic diagram of a text comment interactive attention-coding network in accordance with the present invention.

Detailed Description

Example 1: as shown in fig. 1-3, a microblog case aspect level viewpoint identification method based on interactive attention of text comments, the method includes:

step1.3.2, marking each comment with a corresponding label;

X＝(x₁,x₂,...,x_n)

E＝(w₁,w₂,...,w_n)

A＝Linear(Multihead(Q,K,V))

S_tj＝α(K_:t,L_:j)

wherein S_tjRepresenting the similarity between the tth comment word and the jth text word, alpha being the trainable scalar function, the similarity between two input vector representations of the coded text and comment, K_:tThe t-th column vector of K, L_:jA jth column vector of L;

the sum of the weights of the comment words most relevant to the text at a certain time is expressed, and is used for all times

To represent this comment vector;

F＝tanh(Linear(G))

P＝Sigmoid(F)

The present invention uses Hamming Loss (HL) to estimate the accuracy of the model. HL counts the number of misclassifications, i.e. labels not belonging to this sample are predicted or labels belonging to this sample are not predicted. The smaller the value of Ham min gLoss, the better the performance. For one test set, S { (x)₁,Y₁)，(x₂,Y₂)....(x_n,Y_n) The loss is calculated as follows:

n is the number of samples, L is the number of labels, Y_i,jIs the true value of the jth component in the ith prediction, P_i,jIs the predicted value of the jth component in the ith prediction result. XOR indicates that XOR (0,1) ═ XOR (1,0) ═ 1, and XOR (0,0) ═ XOR (1,1) ═ 0.

The accuracy (Precision, P), Recall (Recall, R) and F1 value (F1-score, F1) are adopted as the evaluation indexes of the model.

Weigthed-F1: f1 value Weigthed-F1 was used in the invention. P, R, F1 for each class was calculated and then averaged to give an F1 value over the entire sample. On this basis, weighted averaging is performed in consideration of the number of each category.

To clarify how much the text comment interactive attention-coding network contributes to the model, we eliminated it and compared it with the results obtained by the model, as shown in table 1.

TABLE 1 ablation test results

Numbering	Model structure	P	R	F1
					1	The method of the invention	0.7349	0.7122	0.6603
2	(-) Interactive attention network	0.6708	0.6832	0.6080

The "(-) interactive attention network" means the interactive attention network with text comments removed. Analysis of table 1 shows that compared with the multi-head attention mechanism, the interactive attention network fused text information improves the accuracy of classification by 6.4%, the recall rate by 2.9% and the F1 value by 5.2%.

To evaluate the recognition effect of each class, we also calculated the precision rate, recall rate, and F1 value for each class. The results of the experiment are shown in table 2.

TABLE 2 recognition Effect of each class

Numbering	Categories	P	R	F1
						1	class 1	0.8053	0.9229	0.8601
2	class 2	0.8222	0.1588	0.2622
					3	class 3	0.6450	0.7842	1.0000
4	class 4	0.6489	0.2837	0.3948

As can be seen from Table 2, class1, class2, class3, and class4 represent four different categories, respectively. In our dataset, the number of reviews containing class1 and class3 was greater, the number of reviews containing class2 was less, and the number of reviews containing class4 was about half. Analysis of Table 2 reveals that the accuracy of class1 and class2 is higher, and the recall and F1 values of class1 and class3 are higher.

In order to further evaluate the effect of the text comment interactive attention-based microblog case-level viewpoint identification model, experiments are performed on the data set provided by the invention by using different baseline models, and compared with the method provided by the invention, the experiment results are shown in table 3.

TABLE 3 model comparison experiment

Modeling method	P	R	F1
				CNN	0.6958	0.7014	0.6385
CNN-RNN	0.6831	0.6783	0.6206
				Transformer	0.6669	0.6847	0.6096
The method of the invention	0.7349	0.7122	0.6603

The comparison with the baseline model is not fused with the information of the text, and only the comments are classified. Analysis table 3 shows that:

(1) compared with CNN, the method of the invention improves the accuracy by 3.9%, the recall rate by 1.08% and the F1 value by 2.1%.

(2) Compared with CNN-RNN, the method provided by the invention has the advantages that the accuracy is improved by 5.1%, the recall rate is improved by 3.4%, and the F1 value is improved by 3.9%.

(3) Compared with a Transformer, the method disclosed by the invention has the advantages that the accuracy is improved by 6.8%, the recall rate is improved by 2.7%, and the F1 value is improved by 5%.

In conclusion, the interactive attention mechanism is used for fusing the text and comment information, and the method has a practical effect on microblog case aspect-level viewpoint identification.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A microblog case aspect-level viewpoint identification method based on text comment interactive attention is characterized by comprising the following steps of:

step1, constructing a microblog case corpus, and then segmenting words and constructing a dictionary for the corpus;

2. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 1, wherein: the specific steps of Step1 are as follows:

3. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 2, wherein: the specific steps of the step Step1.3 are as follows:

step1.3.2, marking each comment with a corresponding label;

4. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 1, wherein: the specific steps of Step2 are as follows:

step2.2, coding the microblog texts and the comments by adopting a multi-head attention mechanism, so that each word and all the words in the sentence are concerned by calculation, and the representation of the microblog case texts and the comments is obtained.

5. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 4, wherein: in Step2.1, the text and the comments of the microblog case are used as two inputs of a coding end, a sentence is assumed, n words exist in the sentence, and the sentence X represents the following formula:

X＝(x₁,x₂,...,x_n)

E＝(w₁,w₂,...,w_n)

e is the sequence of sentences represented as a two-dimensional embedding matrix that ties all the embeddings of a sentence together, with dimensions of n x d, n being the number of words, d being the dimension of sentence embedding, each element in the sequence E now being independent of the other.

6. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 4, wherein: in Step2.2, reading each text sequence by using a multi-head attention mechanism, calculating the attention of each word and all words, converting a two-dimensional embedded matrix E into fixed single values Q, K and V, linearly changing the fixed single values and inputting the fixed single values into the multi-head attention mechanism with 8 heads, finally splicing the output values of all the heads, and converting the output values into one output value which is the same as the single head by a linear conversion layer, wherein the specific calculation formula is as follows:

A＝Linear(Multihead(Q,K,V))

7. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 1, wherein: the specific steps of Step3 are as follows:

S_tj＝α(K_:t,L_:j)

wherein S_tjRepresenting the t-th comment word and the j-th text wordSimilarity between two input vector representations of a trainable scalar function, coded text and comments, K_:tThe t-th column vector of K, L_:jA jth column vector of L;

text-to-comment attention indicates which text words are most relevant to each comment word, let a_tRepresenting the attention weight between the text word and the tth comment word, Σ a for all t_tjAttention weight is given by a 1_t＝softmax(S_t:) And calculating to obtain, wherein each text vector participating in attention is expressed as shown in a formula:

To represent this comment vector;

8. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 1, wherein: the specific steps of Step4 are as follows:

9. The microblog case-level viewpoint identifying method based on interactive attention of text comments as recited in claim 1, wherein: the Step4 comprises the following steps:

through the process that the text comments concern each other, a G matrix obtained by fusing the interactive attention information of the text comments is a comment representation containing the text information; obtaining vectors of 4 classes by carrying out linear transformation on the representation through 4 layers and a nonlinear activation function; secondly, performing secondary classification on each class through a sigmod function, and mapping vectors of 4 classes to (0,1) to obtain a predicted value P; the classification process is shown by the following two formulas:

F＝tanh(Linear(G))

P＝Sigmoid(F)

and for the predicted value P, when the output value is greater than 0.4, classifying the predicted value P into a positive class, namely, when the output value is greater than 0.4, the comment belongs to the corresponding class, otherwise, the comment does not belong to the class, and each comment belongs to one class or multiple classes, so that identification of the microblog case aspect grade viewpoint is realized.