Disclosure of Invention
In view of the above, the present invention provides an e-commerce evaluation multi-classification emotion analysis method based on an AB-CNN model in order to solve the above technical problems.
The invention adopts the following technical scheme:
an e-commerce assessment multi-classification emotion analysis method based on an AB-CNN model comprises the following steps:
acquiring an initial text sequence;
converting the initial text sequence into a corresponding initial word vector;
performing convolution operation on the initial word vector to obtain new features of each word in the initial text sequence to form a text feature matrix;
Based on an attention mechanism, processing the text feature matrix to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion;
based on a Bi-LSTM model, extracting forward output features and backward output features of the target word vector to obtain a target text feature vector of each word containing the forward output features and the backward output features;
and classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result.
Further, the converting the initial text sequence into a corresponding initial word vector includes:
and converting the initial text sequence into a corresponding initial word vector by adopting a word2vec word vector model.
Further, the convolving the initial word vector to obtain new features of each word in the initial text sequence, and forming a text feature matrix, including:
the initial word vector forms a t-l dimension word vector matrix, and the initial word vector passes through t convolution filters with the length of l
Performing convolution operation on the input t x l dimension word vector matrix, wherein the new feature of the ith word in the initial text sequence is as follows:
z i =f(D T ·x i:i+t-1 +b)
wherein ,
is a bias term, D
T As the weight, f is a nonlinear function ReLu;
the text feature expression is obtained as follows:
Z=[z 1 ,z 2 ,…,z n-t+1 ]
wherein ,
using maximum pooling operation and adding maximum +.>
As a feature of the convolution filter; the text feature matrix is formed by:
Y=[Z 1 ,Z 2 ,…,Z n ]
further, the processing the text feature matrix based on the attention mechanism to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion, including:
according to the text feature matrix, a attention mechanism is introduced, and each text information Z input is set i Is q, and each text feature Z is obtained by adopting the following attention distribution coefficient calculation formula i Attention distribution coefficient alpha of (a) i :
Wherein i.epsilon.1, 2, …, n]J is a parameter in a linear function of softmax, s (Z i Q) is an attention calculating function, and an additive model is adopted for calculation, wherein the calculation is as follows:
s(Z i ,q)=V T tanh(WZ i +U q )
a weighted average attention signal is obtained:
averaging attention signals with weights
Mapping to the initial word vector, and performing attention fusion with the initial word vector by adopting the following attention fusion mode to obtain the fused attentionTarget word vector:
wherein ,ωi Mu for the i-th target word vector 1 Is the weight of the original word vector, mu 2 Weight, x, of the attention signal i And (5) the initial word vector is the ith initial word vector.
Further, based on the Bi-LSTM model, extracting the forward output feature and the backward output feature of the target word vector to obtain a target text feature vector of each word including the forward output feature and the backward output feature, including:
based on the Bi-LSTM model, the i-th target text feature vector of the target word vector at the time t is:
wherein ,
for the i-th target text feature vector of the target word vector at the time t,
further, the classifying the obtained target text feature vector of each word as the input of the linear function softmax to obtain a final emotion classification result, which comprises the following steps:
classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result, wherein the softmax function is as follows:
y=softmax(W c M+b c )
wherein ,W
c The weight matrix is represented by a matrix of weights,
b
c representing the bias term.
The beneficial effects of the invention include: the invention provides a method for classifying emotion characteristics of a text, which comprises the steps of combining an Attention mechanism, a bidirectional long-short-time memory network and a convolutional neural network, namely, attention+BiLSTM+CNN, namely, an AB-CNN model, firstly carrying out word vector acquisition, carrying out characteristic extraction, vectorizing sentences, realizing the representation of words by using high-dimensional vectors, loading word vectors as word embedding layers to convolutional layers, carrying out convolutional operation on initial word vectors, obtaining new characteristics of each word in an initial text sequence, obtaining important text characteristics, forming a text characteristic matrix, processing the text characteristic matrix based on the Attention mechanism, calculating average Attention weight of emotion words in each text, obtaining Attention signals, carrying out Attention fusion with the initial word vectors, obtaining target word vectors after Attention fusion, then carrying out forward output characteristic and backward output characteristic extraction on the target word vectors based on Bi-LSTM model, obtaining target text characteristic vectors of each word containing forward emotion output characteristics and backward output characteristics, simultaneously reading the text from two directions, fully utilizing all context information of each word in the initial text sequence, further enhancing the average Attention weight of each emotion word in each text, and finally carrying out linear analysis on the emotion characteristics as an emotion characteristic classification platform, thus obtaining a classification result of the emotion characteristics of each emotion characteristics.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
In order to explain the technical solutions described in the present application, the following description will be given by way of specific embodiments.
As shown in FIG. 1, the embodiment provides an E-commerce evaluation multi-classification emotion analysis method based on an AB-CNN model. First, attention mechanisms (Attention), bi-directional long-short-term memory networks (Bi-LSTM), and Convolutional Neural Networks (CNN) are described.
The Attention mechanism in deep learning is just a Attention mode by referring to human vision, and is originally proposed by Treisman and the like in 1980, the physiological principle of the Attention mechanism is that a human can rapidly scan the panorama when observing the external environment, then a target area focused on is rapidly locked according to the processing of brain signals, and finally an Attention focus is formed, so that the aim of acquiring more detail information and suppressing other useless information is achieved.
When the NLP processes text tasks, attention mechanisms can be used for focusing more attention on text contents needing attention, so that the running speed of the model can be further improved, the complexity of the model is reduced, the training time of the model is saved, and meanwhile, the prediction accuracy of the model can be further improved. In the emotion analysis task, an Attention layer is introduced into the CNN, so that the Attention of a model can be focused on words or sentences related to emotion, and text information of other irrelevant emotion colors is abandoned.
The essence of the attention mechanism is a constant addressing process. Assuming that given an input text sequence X, there is a query vector q, the role of the query vector is to find important information in X. The query process needs to be performed in the whole text sequence X, each word can contribute to its own attention when extracting text content, and the attention is more so when words containing emotion colors are encountered.
Thus, the specific location of each word in the text needs to be known during the query, so that an attention variable U.epsilon.1, N is defined to represent the index of the selected query information. When u=i, this indicates that the i-th word in the query text sequence X is selected, and the calculation process is shown in fig. 2. This process is in fact an embodiment in which the attention mechanism reduces the complexity of the model: instead of inputting all text information content into the model for training, only words or sentences related to emotion are selected from X for input.
The attention mechanism can be divided into three steps: firstly, inputting text information; secondly, calculating an attention distribution weight alpha; thirdly, a weighted average of the input information is calculated. The method comprises the following specific steps:
(1) Text information input: with X= [ X ] 1 ,X 2 ,…,X N ]Representing N input text information contents;
(2) Attention weight coefficient calculation: the attention weighting coefficient between the i-th word and q is as follows:
α i =P(U=i|X,q)=softmax(s(X i ,q))
wherein ,αi Called attention systemNumber, s (X) i Q) is an attention calculating function, and mainly comprises the following calculating methods:
additive model: s (X) i ,q)=V T tanh(WX i +U q )
scaling the dot product model:
where W, U, V are parameters in the network model and d is the latitude of the input word vector.
(3) Attention weighted average: attention coefficient alpha i It can be understood that the input text information X is encoded by the degree to which the i-th information is of interest when the context query vector is q:
long and short term memory networks (long short term memory, LSTM) are one implementation of Recurrent Neural Networks (RNNs). However, in the practical application process, a series of problems such as gradient disappearance, gradient explosion, limited information reading range and the like of the RNN are found, and in order to solve the problems, LSTM is introduced, and has the characteristic of 'memory time sequence', so that the relationship between input text data and context can be quickly learned.
On the basis of a simple RNN, LSTM is improved in two ways:
(1) New internal state. LSTM introduces a new internal state c t ∈R D The linear circulation information transmission is specially carried out, and the information is output to the external state h of the hidden layer t ∈R D The internal state may be calculated by the following formula:
h t =o t ⊙tanh(c t )
wherein ,f
t ∈[0,1]
D ,i
t ∈[0,1]
D ,o
t ∈[0,1]
D The path for information transfer is controlled for three gates, +.
t-1 Is the memory cell at the previous time,
is a candidate state obtained by a nonlinear function:
at each instant t, the internal state c of the LSTM network t History information up to the present time is recorded.
(2) Gating mechanism. LSTM networks introduce gating mechanisms to control the path of information delivery. The three gates are respectively a forgetting gate f t Input gate i t Output gate o t . The value of the gate in the LSTM network is between (0, 1), indicating that the information is allowed to pass through in a certain proportion. The three gates are calculated in the following ways:
f t =σ(W f x t +U f h t-1 +b f )
i t =σ(W i x t +U i h t-1 +b i )
o t =σ(W o x t +U o x t-1 +b o )
wherein, sigma (·) is a Logistic function, x t For the current time input, h t-1 Is the external state at the last moment.
Bidirectional long and short time memory network (bidirect)Long ShortTerm Memory, biLSTM) is divided into 2 independent LSTM, the input sequence is respectively input into the 2 LSTM in the forward direction and the reverse direction for feature extraction, and the word vector formed by splicing the 2 output vectors is used as the final feature expression of the word. The structural features of the BiLSTM model are shown in figure 3, the design concept is that feature data obtained at the time t simultaneously has past and future information, and the output of the time t of the forward LSTM layer is recorded as
The output result at time t of backward LSTM layer is denoted +.>
Experiments prove that the text feature extraction efficiency and performance of the BiLSTM model are superior to those of a single LSTM structure model, and 2 LSTM parameters in the BiLSTM are mutually independent, and only share a word-embedding word vector list.
CNNs (Convolutional Neural Networks ) are traditionally used in the field of computer vision, and networks have become finer from the initial convolutional layer to the later addition of layers such as pooling, dropout, padding, etc. Subsequently GoogleNet, VGGNet and the most well known ResNet in the field of image recognition were presented in succession, the advent of this network has made the classification accuracy of neural networks on images beyond the human level. It can be seen that convolutional neural networks have efficient feature extraction and classification capabilities, and when text information is considered as a one-dimensional image, the CNN can be used to classify the text, and the model structure is shown in fig. 4.
First, the text sequence to be input is expressed by word vector in the word embedding layer, and as CNN input, the text matrix in the word embedding layer is expressed by X, and the text matrix is expressed as x= [ X ] 1 ,X 2 ,…,X N ]。
Features of the text are then extracted by convolving by sliding over the original input text sequence. If the convolution kernel is denoted by k, i.e
Convolution kernelAnd k, performing N-gram convolution operation in a sliding window scanning mode, and enabling the sliding step length of the convolution kernels to be s, so that (N-n+1) pieces of characteristic information of each convolution kernel can be obtained. And selecting text features with the largest weight value through a pooling layer, and ignoring text features which are not important, so as to obtain a word vector represented by the final text features.
And finally, classifying at a full-connection layer, fully connecting text features obtained by filtering at a pooling layer with predicted class labels, comprehensively obtaining all the text features, calculating the probability of each class label, and taking the maximum label probability value as a classification result.
The E-commerce assessment multi-classification emotion analysis method based on the AB-CNN model combines an Attention mechanism (Attention), a bidirectional long short-time memory network (BiLSTM) and a Convolutional Neural Network (CNN) to obtain an AB-CNN model, and performs classification prediction on an E-commerce platform data set.
The AB-CNN model structure is shown in FIG. 5, and mainly comprises: input layer, word embedding layer, convolution layer, dropout layer, attention layer, biLSTM layer, full connection layer, and output layer.
The E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model provided by the embodiment comprises the following steps:
Step S1: an initial text sequence is obtained.
Step S1 corresponds to an input layer of the AB-CNN model structure. The initial text sequence is a text sequence to be processed, and the initial text sequence x is composed of n words and is expressed as x= [ x ] 1 ,x 2 ,…,x n ]。
Step S2: converting the initial text sequence into a corresponding initial word vector:
step S2 corresponds to a word embedding layer of the AB-CNN model structure. The initial text sequence is converted into a corresponding initial word vector. As a specific implementation, a word2vec word vector model is used to convert an initial text sequence into a corresponding initial word vector. Correspondingly, the word embedding layer is specifically a word2vec word vector embedding layer. The word vector encodes a latitude of 128 and initializes it, and each text is vectorized in the form of:
where n represents the length of each comment text sequence, each word is represented by an h-dimensional vector, x
i For the vector representation of the i-th word in the sentence,
is a join operator.
The text after word segmentation is used as input, and the text sequence x of the input word embedding layer is n-dimension h-dimension vector matrix and is converted into low-dimension word vector (t-dimension word vector matrix). Through the embedding layer, the words complete the conversion from text to digital vector.
Step S3: performing convolution operation on the initial word vector to obtain new features of each word in the initial text sequence, and forming a text feature matrix:
Step S3 corresponds to the convolution layer of the AB-CNN model structure.
The initial word vector forms a t-l dimension word vector matrix, and the matrix is formed by t convolution filters with the length of l
Performing convolution operation on the input t/l-dimensional word vector matrix, wherein the new feature of the ith word in the initial text sequence is as follows:
z i =f(D T ·x i:i+t-1 +b)
wherein ,
is a bias term, D
T As the weight, f is a nonlinear function ReLu. The filter is applied to each possible word x in the sentence
1:t ,x
2:t+1 ,…,x
n-t+1:n ]The text feature expression is obtained as follows:
Z=[z 1 ,z 2 ,…,z n-t+1 ]
wherein ,
using maximum pooling operation and adding maximum +.>
As a feature of the convolution filter, the purpose is to acquire the most important feature with the highest value for each feature. The text feature matrix is formed by:
Y=[Z 1 ,Z 2 ,…,Z n ]
the convolutional layer outputs Y.
As a specific embodiment, dropout layers are added after the convolution layers to prevent overfitting.
Step S4: based on an attention mechanism, processing the text feature matrix to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion:
step S4 corresponds to the attention layer of the AB-CNN model structure.
The important characteristics of the text are extracted through the convolution layer, and related emotion polarity words in the important characteristics of each text can be extracted through the attention layer, so that the running time is saved, and the complexity of the model is reduced. By directing attention to the output text feature matrix Y of the convolution layer, each text information Z input is set i Is q, and each text feature Z is obtained by adopting the following attention distribution coefficient calculation formula i Attention distribution coefficient alpha of (a) i :
Wherein i.epsilon.1, 2, …, n]The method comprises the steps of carrying out a first treatment on the surface of the j is a parameter in a linear function of softmax and represents the value of the obtained text characteristics Z j Summing and calculating probability distribution of ith text, i.e. weighting coefficient alpha i 。s(Z i Q) is an attention calculating function, the aforementioned can be selected: additive model, dot product model, scaled dot product model, bilinearThe sex model is calculated. The present embodiment uses an additive model for calculation as follows:
s(Z i ,q)=V T tanh(WZ i +U q )
attention coefficient alpha i When the context query vector q is expressed, the degree to which the ith emotion information is focused can be obtained by encoding the input text information P as follows, thereby obtaining a weighted average attention signal:
signal attention
Mapping to a corresponding input word vector matrix x
i In the above, a text matrix with attention mechanisms is obtained, which is denoted +.>
And, adopt the following attention fusion mode to fuse with the initial word vector, get the goal word vector after fusing the attention:
wherein ,ωi Mu, i.e. the i-th target word vector 1 Is the weight of the original word vector, mu 2 Weight, x, of the attention signal i Is the i-th initial word vector. Omega= [ omega ] 1 ,ω 2 ,…,ω n ]。
Step S5: based on a Bi-LSTM model, extracting forward output features and backward output features of the target word vector to obtain a target text feature vector of each word containing the forward output features and the backward output features:
step S5: biLSTM layer corresponding to AB-CNN model structure.
The text word vector ω of the emotion polarity is output by the attention layer as input to the Bi-LSTM layer. The information of the input sequence in the forward direction and the backward direction is combined through the two LSTMs, so that the emotion colors of the input text content are further enriched, and the classification effect of the model is improved.
For the output of time t, the forward LSTM layer has information of time t and previous times in the input sequence, and the backward LSTM layer has information of time t and subsequent times in the input sequence.
The information of the input sequence in the forward direction and the backward direction is combined through the two LSTMs, and for the output of the t moment, the forward LSTM layer has the information of the t moment and the previous moment in the input sequence, and the backward LSTM layer has the information of the t moment and the subsequent moment in the input sequence. The Bi-LSTM model can effectively improve the accuracy and the forward output of the Bi-LSTM at the time t
And backward output->
The following are provided:
the Bi-LSTM outputs the target text feature vector containing the emotion color at the time t and the ith target word vector at the time t
The method comprises the following steps:
extracting the text sequence of Bi-LSTM layerMeaning information, which can be output as
Step S6: classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result:
step S6 corresponds to the full connection layer of the AB-CNN model structure.
The input text starts from the embedded layer and carries out word2vec to vector the text, then carries out convolutional layer classification on the text, extracts important characteristics of the text, then carries out attention layer extraction on semantic characteristics of emotion, and then carries out Bi-LSTM extraction on text context information. Further enhancing the emotion colors of the extracted semantic features to obtain deeper semantic feature representations. And finally, classifying the result M obtained by the Bi-LSTM layer as the input of a linear function softmax to obtain a final emotion classification result, wherein the softmax function is as follows:
y=softmax(W c M+b c )
wherein ,Wc Representing a weight matrix, b c Representing the bias term.
The output layer is used for outputting the final emotion classification result.
The implementation process of the E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model is provided, and the effect of the E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model is provided as follows, wherein the effect comprises data set division, evaluation index and model parameter selection. The performance of the model is then demonstrated and compared to other existing deep learning models.
(1) Introduction to data set
The adopted data set is a public electronic commerce platform comment data set, the data set comprises 21091 pieces of data in total, 8033 pieces of good (positive) data, 4355 pieces of general comment (neutral) data and 8703 pieces of bad (negative) data, and the specific division is shown in the following table 1. Table 1 is a dataset profile.
TABLE 1
Emotion category |
Data set content examples |
Training set |
Test set |
Active role of |
"good baby, good seller" |
6443 |
1590 |
Neutral |
"sound function well-! But also has the disadvantages-! " |
3479 |
876 |
Negative electrode |
"no delivery at all-! The wasted money is-! " |
6951 |
1752 |
Together, a total of |
—— |
16873 |
4218 |
(2) Data partitioning and training process
The training process of the model of the embodiment is completed on a Windows10 operating system by dividing a data set into a training set and a testing set according to the proportion of 4:1, the training is completed by using a CPU, the CPU is an Intel (R) Core (TM) i7-5500U 2.40GHz processor, the RAM is 16GB, the programming language is Python, the version is Python3.7, the development tool is Pycharm, the version of the adopted Chinese word segmentation tool is jieba0.38, and the architecture based on deep learning adopts Tensorflow1.15.0 and Keras2.3.1.
(3) Evaluation index
Because the present embodiment solves the three-classification problem, the accuracy and the multi-classification evaluation index Kappa coefficient and hamming distance are selected.
Accuracy rate: the decision ability of the model to the entire dataset is reflected. For the test set, the ability to determine positive, determine neutral, and determine negative can be determined positive. I.e. the proportion of correctly classified samples to the whole sample, the formula is as follows:
in this embodiment, n=3, which indicates the accuracy of three classifications.
Kappa coefficient: kappa coefficients are one method used in statistics to evaluate consistency, and range from [ -1,1], to practical use, typically [0,1]. The higher the value of this coefficient, the higher the classification accuracy that the representation model achieves. The calculation method of kappa coefficient can be expressed as follows:
wherein ,Po Representing the overall classification accuracy.
P e Expressed as:
wherein ,ai Representing the i-th class of real sample number, b i Indicating the number of samples predicted by class i.
Table 2 shows the kappa coefficient table.
TABLE 2
Sea distance: the Hamming distance is also suitable for the problem of multiple classifications, namely simply measuring the distance between the predicted label and the real label, and the value is between 0 and 1. A distance of 0 indicates that the predicted outcome is exactly the same as the real outcome, and a distance of 1 indicates that the model is exactly opposite to the real outcome.
(4) Parameter selection
Selecting a proper length of the input text is a primary solution, if the input text is short, the text is intercepted, the true emotion polarity of the text cannot be grasped, and the final performance of the model is affected. However, the length of the input text cannot be too long, so that a large amount of 0 is added behind the word vector, the training accuracy of the model is reduced, and the final evaluation index is affected. As shown in fig. 6 and 7, it can be seen that the text length of the data set is mostly below 200, and the text length is only a very small part above 200, and the frequency of occurrence of the text length below 201 is 94% of the whole data set, and the frequency of occurrence of the text length above 201 is only 6%. Therefore, the present embodiment considers the text length and the frequency of occurrence in combination, and selects 200 as the length of the input text.
The most important index for measuring the model is the selection of iteration times, the model is over-fitted due to the excessive iteration times, the model is not trained enough due to the insufficient iteration times, and the optimal state cannot be achieved. Therefore, the selection of the iteration number is the first problem to be solved. As shown in fig. 8 and table 3, when the number of iterations is greater than 16, the performance of the model starts to decrease, and when the number of iterations is less than 16, the performance of the model is always improved, but none of the iterations is the optimal number of iterations, and as shown by experimental analysis, the optimal number of iterations selected in this embodiment is 16. Table 3 is a selection table of iteration numbers.
TABLE 3 Table 3
Number of iterations |
Accuracy rate of |
Kappa coefficient | Sea distance | |
4 |
0.8367 |
0.7431 |
0.1633 |
8 |
0.8978 |
0.8397 |
0.1022 |
12 |
0.9033 |
0.8483 |
0.0967 |
16 |
0.9061 |
0.8528 |
0.0939 |
20 |
0.8917 |
0.8304 |
0.1083 |
24 |
0.8774 |
0.8082 |
0.1226 |
The model training process is easy to generate the over fitting phenomenon, and the over fitting specific body is shown: the model has smaller loss function on training data and higher prediction accuracy; however, the loss function is larger on the test data, and the prediction accuracy is lower. In order to avoid the occurrence of the overfitting phenomenon, dropouts are introduced, so that the model generalization capability can be stronger, because the model does not depend on some local characteristics too, and complex co-adaptation relations among neurons are reduced. Experimental analysis shows that when the dropout value is 0.45, the performance of the model is optimal, and the occurrence of the overfitting phenomenon is prevented. The experimental results are shown in table 4 and fig. 9. Table 4 is a selection table of random inactivation values.
TABLE 4 Table 4
Random inactivation value |
Accuracy rate of |
Kappa coefficient |
Sea distance |
0.15 |
0.9045 |
0.8507 |
0.0955 |
0.25 |
0.8985 |
0.8411 |
0.1015 |
0.35 |
0.8988 |
0.8414 |
0.1012 |
0.45 |
0.9078 |
0.8555 |
0.0922 |
0.55 |
0.8940 |
0.8342 |
0.1060 |
0.65 |
0.8895 |
0.8264 |
0.1105 |
The visual understanding of the Batch Size is that the Size of the Batch Size influences the optimization degree and speed of the model according to the number of samples selected by one training. The batch data can be selected for processing each time in the training process by setting the batch size, the result of the overlarge batch size is that the network is easy to converge to some bad local optimal points, in order to ensure the training effect of the model, the proper batch size needs to be selected, and experimental analysis shows that when the batch size is 16, the optimal convergence precision is achieved, and the training effect of the model is also best. The experimental results are shown in table 5 and fig. 10. Table 5 is a selection table of batch data sizes.
TABLE 5
The learning rate determines whether and when the objective function can converge to a local minimum, and the appropriate learning rate enables the objective function to converge to the local minimum at an appropriate time. Too much learning rate may result in loss explosion or nan, and too little learning rate may result in half a day loss not being reflected. In the embodiment, different fixed learning rates are tried, the change relation of iteration times and loss is observed, and the learning rate corresponding to the relation with the fastest loss of loss is found. Experimental analysis shows that when the learning rate is 0.0001, the performance of the model is optimal, and loss is most rapid. The experimental results are shown in table 6, fig. 11 and fig. 12. Table 6 is a selection table of learning rates.
TABLE 6
Learning rate |
Accuracy rate of |
Loss value |
Kappa coefficient |
Sea distance |
0.01 |
0.3770 |
1.0600 |
0.00 |
0.6230 |
0.001 |
0.8696 |
0.6178 |
0.7951 |
0.1304 |
0.0001 |
0.9002 |
0.3036 |
0.8438 |
0.0998 |
0.00001 |
0.8867 |
0.3721 |
0.8222 |
0.1133 |
0.000001 |
0.5142 |
0.9329 |
0.2133 |
0.4858 |
The final choice of the hyper-parameters of the model of this example is shown in Table 7 below.
TABLE 7
Super parameter |
Value of super parameter |
Word vector latitude |
128 |
Convolution kernel size |
3 |
Convolution kernel number |
250 |
BiLSTM hidden layer size |
64 |
Maximum input text length |
200 |
Number of iterations |
16 |
Random inactivation value |
0.45 |
Batch data size |
16 |
Learning rate |
0.0001 |
(5) Model comparison
For comparison with the model proposed in this embodiment, 8 proposed deep learning models were selected for comparison experiments to evaluate the performance of the model of this embodiment, as shown in table 8. Table 8 is a comparison table of the performance of different deep learning models.
TABLE 8
Method |
Accuracy rate of |
Kappa coefficient |
Sea distance |
BiGRU |
0.9004 |
0.8441 |
0.0996 |
ATT+CNN |
0.9125 |
0.8629 |
0.0875 |
ATT+Bi-LSTM |
0.8976 |
0.8397 |
0.1024 |
CNN |
0.8966 |
0.8384 |
0.1034 |
LSTM+CNN |
0.8791 |
0.8103 |
0.1209 |
ATT+LSTM+CNN |
0.9016 |
0.8503 |
0.0938 |
CNN+Bi-LSTM |
0.9073 |
0.8555 |
0.0927 |
CNN+BiGRU |
0.8976 |
0.8402 |
0.1024 |
The embodiment proposes |
0.9151 |
0.8673 |
0.0848 |
From the analysis of table 8, the model proposed in this embodiment is absolutely advantageous in terms of accuracy, kappa coefficient and hamming distance, because this embodiment introduces a attention mechanism and a bi-directional long-short-term memory network based on convolutional neural network. When text information is extracted, the bidirectional long-short-time memory network can combine the information in the forward direction and the backward direction of the input text sequence, so that the extracted characteristic data simultaneously has the information between the past and the future, the BiLSTM+CNN model is superior to a single CNN model, and the accuracy is improved by 1.07%. The attention mechanism can comprehensively consider the whole text content, so that the attention of the model is focused on words or sentences related to emotion, the text content of other irrelevant emotion colors is abandoned, and the performance of the model is further improved by introducing the attention mechanism, so that the model of the embodiment is higher than other deep learning models.
As can be seen from the confusion matrix of fig. 13, the predictive accuracy of the active, neutral, and passive labels on the test set respectively reached: 87.42%,90.30% and 95.83%. Especially, the prediction accuracy of neutral emotion and negative emotion is over 90%, which shows that the model of the embodiment has good effect in multi-classification emotion analysis tasks.
As can be seen from the performance analysis of the different combined models of Table 9 and FIG. 14, when only attention mechanisms are introduced into the emotion analysis model, the model has poor performance, the accuracy is only 60.36%, and the Kappa coefficient is 0.3731, which is common in performance; when the BiLSTM is only adopted, the model can process text context information, and the accuracy, kappa coefficient and Hamming distance are improved, but the model processes the whole text information, so that the time cost is overlarge; when the attention mechanism is combined with the BiLSTM, the model can pay attention to text information in two directions and pay attention to sentences related to emotion, so that the performance of the model is further improved; when the two are combined with CNN, the model of the embodiment is obtained, the accuracy is improved by 1.85% compared with that of independent CNN, the accuracy is improved by 31.15% compared with that of independent ATT, and the accuracy is improved by 0.78% compared with that of ATT+CNN, so that the capability of the model for extracting and classifying features is further improved, and meanwhile, the performance of the model achieves the best effect. Table 9 is a comparison table of ablation experimental models.
TABLE 9
Method |
Accuracy rate of |
Kappa coefficient |
Sea distance |
CNN |
0.8966 |
0.8384 |
0.1034 |
ATT |
0.6036 |
0.3731 |
0.3964 |
BiLSTM |
0.8770 |
0.8068 |
0.1230 |
ATT+CNN |
0.9125 |
0.8629 |
0.0875 |
ATT+BiLSTM |
0.8976 |
0.8397 |
0.1024 |
CNN+BiLSTM |
0.9073 |
0.8555 |
0.0927 |
The embodiment proposes |
0.9151 |
0.8673 |
0.0848 |
(6) Conclusion(s)
The emotion analysis is an important branch of natural language processing in emotion analysis, and the emotion analysis on an e-commerce platform is favored by many consumers and e-commerce websites, so that the method has high research value in practical application. In this embodiment, an AB-CNN model architecture is proposed, and the model combines the attention mechanism and BiLSTM to improve the prediction accuracy of the multi-classification model. Words or sentences related to emotion are extracted through an attention mechanism, and context text information is acquired in the BiLSTM at the same time, so that the emotion degree is further enhanced, and the model classification prediction effect is more accurate. Finally, by comparing with the proposed model, the model of the embodiment obtains the best experimental effect, and comparing with the ablation experiments of different combination models of the model, the attention mechanism and the BiLSTM can be introduced to improve the performance of the model to different degrees.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the scope of the embodiments of the present application, and are intended to be included within the scope of the present application.