CN114547299A - Short text sentiment classification method and device based on composite network model - Google Patents

Short text sentiment classification method and device based on composite network model Download PDF

Info

Publication number
CN114547299A
CN114547299A CN202210150471.5A CN202210150471A CN114547299A CN 114547299 A CN114547299 A CN 114547299A CN 202210150471 A CN202210150471 A CN 202210150471A CN 114547299 A CN114547299 A CN 114547299A
Authority
CN
China
Prior art keywords
convolution
vector
network
feature vector
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210150471.5A
Other languages
Chinese (zh)
Inventor
李校林
赵路伟
伍晓思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202210150471.5A priority Critical patent/CN114547299A/en
Publication of CN114547299A publication Critical patent/CN114547299A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of natural language processing, and particularly relates to a short text sentiment classification method and device based on a composite network model, wherein the method comprises the steps of acquiring a text data set and processing the text data set; inputting the processed text data set into a BERT model to obtain word vector representation; respectively sending the word vector representations into a TextCNN network and a BiGRU network to obtain a first feature vector and a second feature vector; splicing the first feature vector and the second feature vector to obtain a spliced feature vector; the method combines the TextCNN and the BiGRU network in parallel, so that the model can fully combine the capacity of extracting local features of the TextCNN and the capacity of associating context information of the BiGRU, make up the defect of using a single network for training and effectively avoid the negative interaction between the TextCNN and the BiGRU network.

Description

Short text sentiment classification method and device based on composite network model
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a short text emotion classification method and device based on a composite network model.
Background
The emotion analysis task is a popular research task in the field of natural language processing, also called Opinion Mining (Opinion Mining), and integrates other field subjects such as computer science, signal processing, artificial intelligence and the like into related knowledge of network data Mining, and the main task is to mine emotional tendency expressed by commentators by combining given text content. Analyzing text emotion through deep learning is a research hotspot in processing natural languages, and in recent years, many researchers have applied different neural networks to the process of text emotion analysis and achieved good results.
However, the conventional Chinese short text emotion analysis task generally adopts a static pre-training model to generate word vectors, each word can only generate a unique corresponding word vector, and the problems caused by word characteristics such as word ambiguity, grammar and semantics and the like cannot be processed according to different text information. Meanwhile, in the selection of the neural network, the traditional text convolution neural network can extract local features of text data by using position invariance, but the TextCNN network cannot integrally represent the pooled word vectors in combination with other word vectors in the context. In addition, word vectors in the context can be fully combined by using the RNN network for learning, so that the output vector can well reflect the associated information in the sentence, but the traditional RNN network has the problems of gradient disappearance and gradient explosion in the training process, and the unidirectional RNN network cannot comprehensively combine the context information during training of the word vectors, so that the training effect needs to be improved.
Disclosure of Invention
In order to solve the problems, the invention provides a short text sentiment classification method and a device based on a composite network model, which combine TextCNN and BiGRU networks in parallel, so that the model can fully combine the capacity of TextCNN for extracting local features and the capacity of BiGRU associated context information, make up the defect of using a single network for training, and effectively avoid the negative interaction between the TextCNN and the BiGRU networks.
A short text sentiment classification method based on a composite network model constructs a BERT-CNN-BiGRU composite network model, the model comprises a BERT model, a TextCNN network, a BiGRU network and an improved self-attention module, and the short text sentiment classification method based on the composite network model comprises the following steps:
s1, acquiring a Chinese short text data set, and processing the Chinese short text data set, wherein the processing means comprises defining a BERT word separator, removing stop words, separating words, filling length, dividing the data set and the like;
s2, inputting the processed Chinese short text data set into a BERT model to obtain word vector representation;
s3, respectively sending the word vector representations into a TextCNN network and a BiGRU network to obtain a first feature vector and a second feature vector;
s4, splicing the first feature vector and the second feature vector to obtain a spliced feature vector;
and S5, sending the spliced feature vectors into an improved self-attention module to obtain classified feature vectors, and obtaining a classification result through a linear layer with a softmax activation function.
Furthermore, the TextCNN network is a three-channel text convolution network, which includes a first convolution layer with a convolution kernel of 2, a second convolution layer with a convolution kernel of 3, a third convolution layer with a convolution kernel of 4, and a pooling layer, and the process of obtaining the first feature vector through the TextCNN network is as follows:
convolving the word vector representation with a first convolution layer, a second convolution layer and a third convolution layer respectively to obtain a first convolution result, a second convolution result and a third convolution result;
pooling the first convolution result, the second convolution result and the third convolution result through a pooling layer to obtain a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector;
and splicing the first low-dimensional vector, the second low-dimensional vector and the third low-dimensional vector to obtain a first feature vector.
Further, the calculation formula of the three convolutional layers is as follows:
Figure BDA0003510160530000021
wherein C is a convolution result, W is a weight vector of a convolution kernel, X is a word vector matrix formed by word vector representation, b is a bias execution vector, and sigma is a sigmoid activation function.
Further, the pooling layer respectively obtains maximum eigenvalues of the first convolution result, the second convolution result and the third convolution result as a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector, and the formulas are expressed as:
Mi=max(ci1,ci2,...,cij);
wherein M isiRepresenting the maximum characteristic value obtained by pooling the first convolution result, the second convolution result or the third convolution result, cijAnd expressing the ith characteristic vector in the first convolution result, the second convolution result or the third convolution result and a value generated by the ith convolution, wherein one convolution generates a value, the values generated by multiple convolutions are spliced, the spliced result is used as the convolution result output by the convolution layer, and the maximum pooling formula is that the maximum value is taken from the convolution result of multiple convolution operations of one convolution layer, so that the dimension reduction effect is realized.
Further, the calculation formula of the improved self-attention module is as follows:
Figure BDA0003510160530000031
wherein h' is a classification feature vector, X is a splicing feature vector,
Figure BDA0003510160530000032
for the adjustment factor, W is a position weight parameter.
A short text sentiment classification device based on a composite network model comprises:
the data processing module is used for processing the acquired text data;
the BERT module is used for acquiring word vector representation of the text processed in the data processing module;
the text convolution module is used for performing convolution pooling on the word vector representation output by the BERT module to obtain a first feature vector;
the gating cycle module is used for acquiring context information represented by the word vector output by the BERT module and outputting a second feature vector;
the splicing unit is used for splicing the first eigenvector output by the text convolution module and the second eigenvector output by the gating circulation module to obtain a spliced vector;
the improved self-attention module is used for screening key information in the spliced vector to obtain a classified feature vector;
and the classification module is used for acquiring the emotion classification result of the text data according to the classification feature vector.
The invention has the beneficial effects that:
the invention provides a short text sentiment classification method and a short text sentiment classification device based on a composite network model, wherein a BERT-CNN-BiGRU composite network model is constructed, TextCNN and BiGRU networks are combined in parallel in the model, the training efficiency is higher, and the classification time is shortened. Position information is introduced into the self-attention mechanism for improvement, the improved self-attention mechanism is utilized for weighting and summing, so that the characteristics obtained by the two mechanisms are fully combined, the training efficiency is improved, the attention weight parameters can reflect key information, and the text emotion classification accuracy is greatly improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a diagram of a BERT-CNN-BiGRU composite network model structure according to the present invention;
FIG. 3 is a diagram of a GRU network architecture;
fig. 4 is a diagram illustrating the structure of the TextCNN network according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention constructs a BERT-CNN-BiGRU composite network model based on a BERT model and a CNN-BiGRU composite neural network, the structure of the model is shown in figure 2, the BERT-CNN-BiGRU composite network model combines network modules in a series-parallel combination mode, a short text data set is sent into the BERT model to be trained to obtain a word vector representation sequence, the word vector representation sequence is sent into a TextCNN network and a BiGRU network respectively to be trained, the TextCNN network and the BiGRU network are parallel, and the training processes of the TextCNN network and the BiGRU network are independent. And respectively training the TextCNN and the BiGRU network to obtain corresponding feature vectors, splicing the TextCNN and the BiGRU network, fusing the TextCNN and the BiGRU network through an improved self-attention mechanism, so that the finally generated feature vectors for classification can fully combine the ability of the TextCNN network for extracting and capturing local feature information and the ability of the BiGRU network for combining context associated semantic information, and finally classifying through a linear layer with a softmax activation function to obtain a final judgment result.
In the overall training process of the model, the TextCNN and the BiGRU network adopt a parallel combination mode, in a natural language processing task, the TextCNN network can extract and capture local feature information of an input text sequence by utilizing the position invariance of the TextCNN network, but information outside a moving window is inevitably lost when the TextCNN network is operated by convolution pooling and the like, so that the output feature vector cannot be well combined with full-text information for classification operation. When processing an input text sequence, an RNN network such as a BiGRU network may sufficiently combine context-related semantic information according to a time sequence or an input sequence, but may combine a certain amount of interference information to affect the determination, and also has no capability of capturing local feature information. If the TextCNN and the BiGRU network are combined in series, training results of the TextCNN and the BiGRU network are mutually interfered, information outside a sliding window can be lost after the input text vector is subjected to local feature extraction through the TextCNN, and the text vector cannot be effectively combined with context associated semantic information after being sent into the BiGRU network. If the extraction and combination of the context associated semantic information are carried out through the BiGRU network, and then the text vector structure is damaged when the text vector structure is sent into the TextCNN network, so that the local feature information can not be effectively extracted, the invention adopts a parallel combination mode of the TextCNN and the BiGRU network for training, thereby combining the advantages of the TextCNN and the BiGRU network and effectively avoiding the negative interaction between the TextCNN and the BiGRU network. The training efficiency and the training result are obviously improved.
In an embodiment, a short text sentiment classification method based on a composite network model, as shown in fig. 1, includes the following steps:
s1, acquiring a microblog text data set and processing the microblog text data set;
s2, inputting the processed microblog text data set into a BERT model to obtain word vector representation;
s3, respectively sending the word vector representations into a TextCNN network and a BiGRU network to obtain a first feature vector and a second feature vector;
s4, splicing the first feature vector and the second feature vector to obtain a spliced feature vector;
and S5, sending the spliced feature vectors into an improved self-attention module to obtain classified feature vectors, and obtaining a classification result through a linear layer with a softmax activation function.
The data set adopted in the embodiment is an open source data set weibo _ senti _100k, the data set is composed of text comment data of the Xinlang microblogs, and the comment data originate from a large number of different types of microblogs and relate to multiple types of microblog comments. Emoji expressions, which are often used in reviews, have been replaced with corresponding textual information. The emotion tags of microblog comment data in the data set are classified into two categories [0,1], the emotion tag of comment data with negative emotion polarity is 0, and the emotion tag of comment with positive emotion polarity is 1. 119988 microblog comment data are summed, wherein 59993 positive emotion comments and 59995 negative emotion comments.
Specifically, the microblog data set is characterized in that text sentences are short, although a large number of emoji emoticons and special characters are contained in the microblog data set, the microblog data set has single emotion polarity and contains more interference information, and the microblog data set is suitable for directly operating the whole sentence and is not suitable for segmentation operation.
The RNN may create a gradient vanishing or gradient explosion problem as time steps increase during training, which makes it impossible for the RNN to effectively extract and utilize historical information. The LSTM network uses three gates to maintain and control the cell state, specifically, forgets to discard previously useless history information and stores valid information again through an input gate, thereby solving various deficiencies of the RNN network.
The GRU network improves and upgrades the LSTM network, and effectively solves the problem that the LSTM network structure is too complex. The GRU network structure is shown in fig. 3, the GRU network replaces the last forgetting gate and the input gate of the original LSTM network with an update gate, and the update gate can decide which information to keep or discard after the computation is finished. Another gate in the GRU network is a reset gate which manages the mutual combination of new inputs and previous computation results for determining the extent to which hidden layer information was ignored at the previous moment. The GRU network improves a three-layer structure in the LSTM network into two layers, has the same cell state and hidden layer state, and has the advantages of more simplified structure, less parameters, better model convergence and shorter training time on the basis of keeping the effect of the LSTM network.
Preferably, the forward propagation formula of the GRU network is:
rt=σ(Wr·[ht-1,xt]);
zt=σ(Wz·[ht-1,xt]);
Figure BDA0003510160530000061
Figure BDA0003510160530000071
wherein z istFor updating the gate at time t, rtReset gate for time t, xtInput sequences fed into the GRU network for time t, i.e. word vector representations of the output of the BERT model, htIndicating the active state at time t,
Figure BDA0003510160530000072
indicating candidate activation states at time t, Wr、Wz、WhAnd sigma is a sigmoid activation function for the corresponding weight parameter matrix.
When processing the sequence data output by the BERT, the unidirectional GRU network can only combine the semantic information of the above with the input of the next time according to the time sequence relation, and can not capture the related information of the following effectively. For this purpose the bidirectional GRU network is used for training.
The TextCNN network adopted by the invention is a three-channel text convolution network, the structure of which is shown in figure 4, wherein W in the figureiIs the input i-th word vector. The dimensionality of the word vector output by the BERT model of the experiment is 768 dimensions, and the width of a convolution kernel used in NLP task is required to be evenly divided by 768, so that the sizes of convolution kernels with 2, 3 and 4 word numbers are used;
specifically, the first convolution layer with convolution kernel 2, the second convolution layer with convolution kernel 3, and the third convolution layer with convolution kernel 4, and the process of obtaining the first feature vector through the TextCNN network in step S3 is as follows:
convolving the word vector representation with a first convolution layer, a second convolution layer and a third convolution layer respectively to obtain a first convolution result, a second convolution result and a third convolution result;
pooling the first convolution result, the second convolution result and the third convolution result through a pooling layer to obtain a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector;
and splicing the first low-dimensional vector, the second low-dimensional vector and the third low-dimensional vector to obtain a first feature vector.
Specifically, the calculation formula of the convolutional layer is as follows:
Figure BDA0003510160530000073
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003510160530000074
for convolution operation, C is a convolution result, W is a weight vector of a convolution kernel, X is a matrix vector formed by word vector representation, b is a bias execution vector, and sigma is a sigmoid activation function.
The pooling layer can extract local feature information captured by the convolution layer and play an effective dimension reduction role, and the formula is as follows:
Mi=max(ci1,ci2,...,cij);
wherein M isiRepresenting the maximum characteristic value obtained by pooling the first convolution result, the second convolution result or the third convolution result, cijRepresenting the ith feature vector in the first convolution result, the second convolution result or the third convolution result, obtaining feature vectors with the same size after dimension reduction after convolution by convolution kernels with the word vector widths of 2, 3 and 4 and having different sizes after convolution, namely screening the maximum feature value from different feature vectors generated after convolution of each sliding window, and carrying out splicing operation on each screened maximum feature value to obtain a new feature value combined with the feature values of all sliding windows, thereby eliminating the length difference between vector sequences generated by calculation difference.
The purpose of the attention mechanism is to give higher weight to the key information on the basis of keeping the original characteristic information, so that the key information has larger influence on the final classification result. Only the internal elements of the user can be focused through the self-attention mechanism, the training efficiency is higher, and the generated result is more accurate. The calculation formula of the self-attention mechanism is as follows:
Figure BDA0003510160530000081
h is the self-attention mechanism output.
In the calculation process of the self-attention mechanism, vectors at different positions in a spliced vector sequence have different influences on a final result, for example, a vector at a front position often obtains a larger attention weight parameter because an observation window is smaller and is less combined with other information, which affects a final judgment result.
Specifically, the calculation formula for the improved self-attention module is as follows:
Figure BDA0003510160530000082
wherein h' is a classification feature vector, X is a splicing feature vector,
Figure BDA0003510160530000083
for the adjustment factor, W is a position weight parameter, expressed as:
Figure BDA0003510160530000091
ammis a location weight parameter.
In the embodiment, a microblog data set is adopted for training, and the accuracy (accuracy), precision (precision), recall (call) and F1 values are adopted for evaluation after model training is completed.
The accuracy is the proportion of correct judgment in all judgments, which represents the proportion of samples with correct prediction results in the total samples, and the formula is as follows:
Figure BDA0003510160530000092
the precision rate is also called precision rate, and it represents the proportion of the correctly predicted samples in all the samples predicted to be positive according to the predicted result, and its formula is:
Figure BDA0003510160530000093
the recall ratio is called recall ratio, and represents the proportion of the correctly predicted positive samples in all the actually positive samples, and the formula is as follows:
Figure BDA0003510160530000094
the F1 value is a comprehensive evaluation index after combining the accuracy rate and the recall rate, the larger the value is, the better the value is, and the formula is as follows:
Figure BDA0003510160530000095
tp (true poisites) represents a sample which is correctly predicted as a positive class in a positive class sample, fp (false poisites) represents a sample which is incorrectly predicted as a positive class in a negative class sample, fn (false positives) represents a sample which is incorrectly predicted as a negative class in the positive class sample, and tn (true negotives) represents a sample which is correctly predicted as a negative class in the negative class sample.
To verify the performance of the model presented herein, comparative experiments based on BERT and CNN-BiGRU composite neural networks were set up, and the accuracy, precision, recall, and F1 values were taken for each experiment and compared.
The comparative experiment contained the following 5:
(1) W2V-CNN-BiGRU the word vector output using the traditional and training model word2vec was trained in a composite network designed herein.
(2) BERT: and (4) directly training by using a single BERT model, and directly classifying the obtained result.
(3) BERT-TextCNN: and (5) taking the BERT as a pre-training model to obtain word vectors, outputting the word vectors, and then training the word vectors through a TextCNN network.
(4) BERT-BiGRU: and (4) taking the BERT as a pre-training model to obtain word vectors and output the word vectors, and then training the word vectors through a BiGRU network.
(5) BERT-CNN-BiGRU: and obtaining word vectors by taking the BERT as a pre-training model, respectively sending the output into a TextCNN network and a BiGRU network for training, splicing the output obtained by the two networks, and performing weighted calculation by an improved self-attention mechanism.
The results obtained after test comparison of the individual experiments on the test set are shown in table 1:
TABLE 1 comparative results
Figure BDA0003510160530000101
As can be seen from Table 1, the indexes of the BERT-CNN-BiGRU model have obvious advantages over other models. The BERT model is formed by using encoder stacking in a transform, and a self-attention mechanism in the transform model can fully combine the context of an input data set when generating word vectors, which is a great advantage of the BERT model compared with a traditional word2vec model. Therefore, even though the BERT model is used for training alone, the effect of the output feature vector after classification task is directly carried out is still very superior to that of the w2v composite network model in the 1 st experiment, and the accuracy and the F1 value obtained finally in the experiment are greatly improved. However, the model that the output of the BERT is classified after local features or context correlation information is extracted through other networks still has a progress space, and the accuracy and the F1 value of the final output of the BERT model after the BERT model is connected with the TextCNN or the BiGRU network are obviously improved compared with the effect of training the BERT model. Comparing the 3 rd and 4 th experiments in combination with the previous training procedure, it can be seen that the TextCNN model here works slightly better when dealing with the tasks herein than the model using the BiGRU network alone. In this embodiment, the BERT-CNN-BiGRU model fully combines the capability of extracting local features from the CNN network with the capability of extracting context-dependent semantic information from the BiGRU, and the obtained feature vector also contains more key information, so that the accuracy and the F1 value in the 5 th experiment are the highest in all comparison experiments.
A short text sentiment classification device based on a composite network model comprises:
the data processing module is used for processing the acquired text data;
the BERT module is used for acquiring word vector representation of the text processed in the data processing module;
the text convolution module is used for performing convolution pooling on the word vector representation output by the BERT module to obtain a first feature vector;
the gating circulation module is used for acquiring context information represented by the word vector output by the BERT module and outputting a second feature vector;
the splicing unit is used for splicing the first eigenvector output by the text convolution module and the second eigenvector output by the gating circulation module to obtain a spliced vector;
the improved self-attention module is used for screening key information in the spliced vector to obtain a classified feature vector;
and the classification module is used for acquiring the emotion classification result of the text data according to the classification feature vector.
Specifically, the text convolution module is used for extracting local features in a word vector sequence by utilizing the position invariance of a convolution neural network and screening local key contents;
specifically, the gating cycle module adopts a BiGRU network which is a bidirectional improved RNN network, and the gating cycle module has the function of fully combining context associated semantic information to ensure that a vector sequence output by the gating cycle module can fully combine context semantic information in a text;
specifically, the improved self-attention module has the function of combining the local feature vectors extracted and captured by the TextCNN network with the vectors with context-associated semantic information generated by the BiGRU network, so that the vectors finally used for classification can combine the advantages of the local feature vectors and the vectors, thereby improving the classification accuracy. And more important information can be given more weight when the attention mechanism weight is calculated, so that the influence on the final classification result is larger.
In the self-attention mechanism, a weight parameter iterator, namely a position weight parameter matrix is added, and the function of the weight parameter iterator is to improve the defects of the TextCNN and the BiGRU network in the training process, for example, when the self-attention mechanism is calculated, a vector with a front position tends to obtain larger attention weight. Balancing it with this weighting parameter matrix will result in a better classification.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. A short text sentiment classification method based on a composite network model is characterized in that a BERT-CNN-BiGRU composite network model is constructed, the model comprises a BERT model, a TextCNN network, a BiGRU network and an improved self-attention module, and the short text sentiment classification method based on the composite network model comprises the following steps:
s1, acquiring a Chinese short text data set, and processing the Chinese short text data set;
s2, inputting the processed Chinese short text data set into a BERT model to obtain word vector representation;
s3, respectively sending the word vector representations into a TextCNN network and a BiGRU network to obtain a first feature vector and a second feature vector;
s4, splicing the first feature vector and the second feature vector to obtain a spliced feature vector;
and S5, sending the spliced feature vectors into an improved self-attention module to obtain classified feature vectors, and obtaining a classification result through a linear layer with a softmax activation function.
2. The short text sentiment classification method based on the composite network model according to claim 1, wherein the TextCNN network is a three-channel text convolution network, which includes a first convolution layer with a convolution kernel of 2, a second convolution layer with a convolution kernel of 3, a third convolution layer with a convolution kernel of 4 and a pooling layer, and the process of obtaining the first feature vector through the TextCNN network is as follows:
respectively sending the word vector representations into a first convolution layer, a second convolution layer and a third convolution layer for convolution to obtain a first convolution result, a second convolution result and a third convolution result;
pooling the first convolution result, the second convolution result and the third convolution result through a pooling layer to obtain a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector;
and splicing the first low-dimensional vector, the second low-dimensional vector and the third low-dimensional vector to obtain a first feature vector.
3. The method for short text sentiment classification based on a composite network model according to claim 2, wherein the calculation formula of the three convolutional layers is as follows:
Figure FDA0003510160520000011
wherein C is a convolution result, W is a weight vector of a convolution kernel, X is a word vector matrix formed by word vector representation, b is a bias execution vector, and sigma is a sigmoid activation function.
4. The method for short text sentiment classification based on a composite network model according to claim 2, wherein the pooling layer respectively obtains maximum eigenvalues of the first convolution result, the second convolution result and the third convolution result as a first low-dimensional vector, a second low-dimensional vector and a third low-dimensional vector, and the formulas are as follows:
Mi=max(ci1,ci2,...,cij);
wherein M isiRepresenting the first convolution result, the second convolution result, orMaximum eigenvalue, c, obtained after pooling of the third convolution resultijThe representation represents the ith feature vector in the first convolution result, the second convolution result or the third convolution result.
5. The method for short text sentiment classification based on a composite network model according to claim 1, wherein the calculation formula of the improved self-attention module is as follows:
Figure FDA0003510160520000021
wherein h' is a classification feature vector, X is a splicing feature vector,
Figure FDA0003510160520000022
for the adjustment factor, W is a position weight parameter.
6. A short text sentiment classification device based on a composite network model is characterized by comprising the following steps:
the data processing module is used for processing the acquired text data;
the BERT module is used for acquiring word vector representation of the text processed in the data processing module;
the text convolution module is used for performing convolution pooling on the word vector representation output by the BERT module to obtain a first feature vector;
the gating circulation module is used for acquiring context information represented by the word vector output by the BERT module and outputting a second feature vector;
the splicing unit is used for splicing the first eigenvector output by the text convolution module and the second eigenvector output by the gating circulation module to obtain a spliced vector;
the improved self-attention module is used for screening key information in the spliced vector to obtain a classified feature vector;
and the classification module is used for acquiring the emotion classification result of the text data according to the classification feature vector.
CN202210150471.5A 2022-02-18 2022-02-18 Short text sentiment classification method and device based on composite network model Pending CN114547299A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210150471.5A CN114547299A (en) 2022-02-18 2022-02-18 Short text sentiment classification method and device based on composite network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210150471.5A CN114547299A (en) 2022-02-18 2022-02-18 Short text sentiment classification method and device based on composite network model

Publications (1)

Publication Number Publication Date
CN114547299A true CN114547299A (en) 2022-05-27

Family

ID=81675808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210150471.5A Pending CN114547299A (en) 2022-02-18 2022-02-18 Short text sentiment classification method and device based on composite network model

Country Status (1)

Country Link
CN (1) CN114547299A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374285A (en) * 2022-10-26 2022-11-22 思创数码科技股份有限公司 Government affair resource catalog theme classification method and system
CN116205222A (en) * 2023-05-06 2023-06-02 南京邮电大学 Aspect-level emotion analysis system and method based on multichannel attention fusion
CN117521639A (en) * 2024-01-05 2024-02-06 湖南工商大学 Text detection method combined with academic text structure

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115374285A (en) * 2022-10-26 2022-11-22 思创数码科技股份有限公司 Government affair resource catalog theme classification method and system
CN115374285B (en) * 2022-10-26 2023-02-07 思创数码科技股份有限公司 Government affair resource catalog theme classification method and system
CN116205222A (en) * 2023-05-06 2023-06-02 南京邮电大学 Aspect-level emotion analysis system and method based on multichannel attention fusion
CN117521639A (en) * 2024-01-05 2024-02-06 湖南工商大学 Text detection method combined with academic text structure
CN117521639B (en) * 2024-01-05 2024-04-02 湖南工商大学 Text detection method combined with academic text structure

Similar Documents

Publication Publication Date Title
Luan et al. Research on text classification based on CNN and LSTM
CN112667818B (en) GCN and multi-granularity attention fused user comment sentiment analysis method and system
CN114547299A (en) Short text sentiment classification method and device based on composite network model
CN110232395B (en) Power system fault diagnosis method based on fault Chinese text
CN110598005A (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN113255320A (en) Entity relation extraction method and device based on syntax tree and graph attention machine mechanism
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN113705238B (en) Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model
CN113255366B (en) Aspect-level text emotion analysis method based on heterogeneous graph neural network
CN112528668A (en) Deep emotion semantic recognition method, system, medium, computer equipment and terminal
CN113449204B (en) Social event classification method and device based on local aggregation graph attention network
Zhang Research on text classification method based on LSTM neural network model
CN112199503B (en) Feature-enhanced unbalanced Bi-LSTM-based Chinese text classification method
CN112784532A (en) Multi-head attention memory network for short text sentiment classification
CN114722835A (en) Text emotion recognition method based on LDA and BERT fusion improved model
CN114547230A (en) Intelligent administrative law enforcement case information extraction and case law identification method
CN114694255A (en) Sentence-level lip language identification method based on channel attention and time convolution network
Gao et al. REPRESENTATION LEARNING OF KNOWLEDGE GRAPHS USING CONVOLUTIONAL NEURAL NETWORKS.
CN111723572B (en) Chinese short text correlation measurement method based on CNN convolutional layer and BilSTM
CN113779966A (en) Mongolian emotion analysis method of bidirectional CNN-RNN depth model based on attention
CN113255360A (en) Document rating method and device based on hierarchical self-attention network
CN112560440A (en) Deep learning-based syntax dependence method for aspect-level emotion analysis
CN113283605B (en) Cross focusing loss tracing reasoning method based on pre-training model
CN114722798A (en) Ironic recognition model based on convolutional neural network and attention system
CN114693949A (en) Multi-modal evaluation object extraction method based on regional perception alignment network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination