CN115878804B

CN115878804B - E-commerce evaluation multi-classification emotion analysis method based on AB-CNN model

Info

Publication number: CN115878804B
Application number: CN202211697001.7A
Authority: CN
Inventors: 李红婵; 路延通; 钱宇超; 李春磊; 杨文贺; 朱颢东
Original assignee: Zhengzhou University of Light Industry
Current assignee: Zhengzhou Zhiduoxin Technology Co ltd
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2023-06-20
Anticipated expiration: 2042-12-28
Also published as: CN115878804A

Abstract

The invention relates to an E-business assessment multi-classification emotion analysis method based on an AB-CNN model, which combines an attention mechanism, a bidirectional long-short memory network and a convolutional neural network, is abbreviated as the AB-CNN model, firstly carries out word vector acquisition, loads a word vector as a word embedding layer to a convolutional layer, carries out convolutional operation on an initial word vector, then obtains an attention signal based on the attention mechanism, carries out attention fusion with the initial word vector to obtain a target word vector after attention fusion, then reads texts from two directions simultaneously based on the Bi-LSTM model, fully utilizes all context information of current time data, further strengthens emotion degree, finally classifies the obtained target text feature vector of each word as input of a linear function softmax to obtain an emotion classification result, and can improve the accuracy of emotion analysis modes of an E-business platform.

Description

E-commerce evaluation multi-classification emotion analysis method based on AB-CNN model

Technical Field

The invention relates to an e-commerce evaluation multi-classification emotion analysis method based on an AB-CNN model.

Background

Along with the explosion of big data of the Internet, various electronic commerce platforms are gradually raised, grasp the emotion polarity of consumers to become a current hot topic, and not only can guide enterprises to make marketing strategies, but also can help the enterprises to accurately grasp the trend of future hot selling commodities.

Emotion analysis, also called emotion tendentiousness analysis or opinion mining, is a process of extracting information from user opinions, and by analyzing text, audio, images, etc., people's opinion, attitude, emotion, etc. are obtained. Emotion analysis of text refers to the process of analyzing, processing, generalizing, and reasoning for text with emotion colors. With the development of the Internet, a large number of texts with emotion polarity colors appear on the Internet, and researchers slowly analyze emotion words from the beginning to analyze more complex emotion sentences and emotion articles. For this, emotion analysis can be classified into: three levels of research including word level, sentence level, chapter level, etc. According to different types of processing texts, the method can be divided into emotion analysis based on a social platform and emotion analysis based on an electronic commerce platform, wherein the former mainly processes comment contents of the social platform, and the latter mainly processes product comment contents of the electronic commerce platform. For example, the mobile phone has high cost performance and smooth operation, which shows that consumers are satisfied with the product; "the mobile phone overall feel general bar-! "indicates that the consumer has calculated approval for the product; "the cell phone is too junk, true card-! "indicates consumer dissatisfaction with the product. Emotion analysis based on e-commerce comments can help consumers quickly know public praise of a certain product in public mind, so that the method is favored by many consumers and e-commerce websites, and emotion analysis based on social platform comments is widely used in public opinion monitoring and information prediction.

Emotion analysis of text goes from beginning an emotion dictionary to a later machine learning method to a deep learning method nowadays, and is a hot topic of research. In terms of the development of emotion analysis, the following three research aspects are mainly focused on: a method based on emotion dictionary and rules, a method based on traditional machine learning and a method based on deep learning.

And the emotion analysis method based on the emotion dictionary utilizes the emotion dictionary to acquire emotion values of emotion words in the document, and then determines the overall emotion tendency of the document through weighted calculation. For example, zargari H et al propose a global enhancement word-based N-Gram emotion dictionary method that increases the emotion phrases covered in the dictionary by considering the relationships of multiple enhancement words to emotion words; the Yan et al construct a comprehensive and efficient polarity dictionary through a manual means, wherein the polarity dictionary comprises a basic dictionary, a negative dictionary, a degree adverb dictionary and a word turning dictionary, and good effects are obtained in experiments; liang et al construct a microblog emotion dictionary with wide coverage aiming at the characteristics of complex and various microblog texts and spoken language, and finally determine the emotion tendency of the whole microblog statement by calculating a final result; zhang et al train with microblog text based on the established relevant dictionary such as the degree adverb dictionary, the network word dictionary, the negative word dictionary, etc., to obtain updated emotion value. The Xiao et al propose a Chinese microblog emotion analysis strategy which can effectively analyze emotion tendencies in microblogs, and network vocabulary is searched and marked by constructing a network vocabulary classifier, so that classification accuracy is further improved; xu et al establishes an expanded emotion dictionary containing basic emotion words, scene emotion words and multi-sense emotion words, and further improves the emotion classification effect of the text; thien et al propose an emotion dictionary for Vietnam that contains over 10 tens of thousands of emotion words for Vietnam; zhao et al propose a new emotion word recognition method that combines multi-feature linear fusion with multi-cycle strategies, utilizes existing dictionary recognition and builds a universal emotion word dictionary suitable for cross-domain.

The emotion analysis method based on machine learning refers to extracting features by using a large number of marked or unmarked corpora and using a traditional machine learning algorithm, and finally outputting results through emotion analysis. For example, guangahan M proposes a microblog emotion mining method based on word2vec and a support vector machine. Weighting the word2vec word vector after training, counting different vocabulary frequencies expected by the microblog, and finally carrying out emotion analysis in the SVM; xue et al construct a data emotion classifier based on naive Bayes principle, and analyze emotion tendencies of the test text by using the constructed emotion classifier; by comparing two machine learning methods of SVM and Naive Bayes (NB), wawre et al have derived that for large-scale data sets, the Naive Bayes method has higher classification accuracy than SVM; kamal et al propose an emotion analysis system based on a combination of rules and machine learning methods to identify their emotion polarities; rathor et al have analyzed three machine learning algorithms of SVM, NB and ME by weighting letters and then comparing, and the results show that the three algorithms all achieve good classification effects. Zhang et al propose an AE-SVM algorithm based and achieved a good test on a high-dimensional dataset of employee-to-business emotion analysis.

Due to the great success of deep learning in the image field, emotion analysis based on deep learning has also begun to be widely used, and currently, deep learning models include Convolutional Neural Networks (CNN), long-short-time memory networks (LSTM), bi-directional long-short-time memory networks (Bi-LSTM), recurrent Neural Networks (RNN), attention mechanisms, and the like. For example, teng et al propose a multi-dimensional topic classification model based on long-short-term memory networks, which is composed of long-short-term memory (LSTM) networks, which can implement the processing of vectors, arrays, and high-dimensional data; the Yin et al inputs the obtained data into a BiGRU neural network layer for feature enhancement, the feature enhancement is realized through superposition and repeated use, and the convergence speed is higher through continuous enhancement and is superior to other classification models; he et al extract word embedding features and sequence features of word vectors based on words, fuse the two features into SVM input, and finally judge emotion polarity of a text; zeng et al propose a PosATT-LSTM model while considering the importance between context words and context positional relationships; the method has the advantages that the method combines word2vec and a stacked two-way long-short-term memory (stacked Bi-LSTM) model, and the result shows that the method has better performance than other machine learning models; su et al propose a model of convolutional neural network (AEB-CNN) based on emoticon attention, which combines the mechanism of emoticon and attention with CNN, thus improving the accuracy of emotion analysis.

Therefore, the emotion polarity of the development e-commerce platform is a hot topic with great research value and challenges, and has important application value for the establishment of enterprise marketing strategies and the research and judgment of public opinion information. At present, emotion analysis mostly stays in research on two categories, and the research on two categories and above is not particularly more, but emotion analysis of two categories can obtain larger prediction accuracy, but emotion polarity of a user in real life is not purely positive and negative, and may have richer emotion colors, such as: happiness, anger, vigour, heart injury, aversion, etc. Therefore, the accuracy of the emotion analysis mode of the current e-commerce platform still has a larger improvement space.

Disclosure of Invention

In view of the above, the present invention provides an e-commerce evaluation multi-classification emotion analysis method based on an AB-CNN model in order to solve the above technical problems.

The invention adopts the following technical scheme:

an e-commerce assessment multi-classification emotion analysis method based on an AB-CNN model comprises the following steps:

acquiring an initial text sequence;

converting the initial text sequence into a corresponding initial word vector;

performing convolution operation on the initial word vector to obtain new features of each word in the initial text sequence to form a text feature matrix;

Based on an attention mechanism, processing the text feature matrix to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion;

based on a Bi-LSTM model, extracting forward output features and backward output features of the target word vector to obtain a target text feature vector of each word containing the forward output features and the backward output features;

and classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result.

Further, the converting the initial text sequence into a corresponding initial word vector includes:

and converting the initial text sequence into a corresponding initial word vector by adopting a word2vec word vector model.

Further, the convolving the initial word vector to obtain new features of each word in the initial text sequence, and forming a text feature matrix, including:

the initial word vector forms a t-l dimension word vector matrix, and the initial word vector passes through t convolution filters with the length of l

Performing convolution operation on the input t x l dimension word vector matrix, wherein the new feature of the ith word in the initial text sequence is as follows:

z _i ＝f(D ^T ·x _i:i+t-1 +b)

wherein ,

is a bias term, D ^T As the weight, f is a nonlinear function ReLu;

the text feature expression is obtained as follows:

Z＝[z ₁ ,z ₂ ,…,z _n-t+1 ]

wherein ,

using maximum pooling operation and adding maximum +.>

As a feature of the convolution filter; the text feature matrix is formed by:

Y＝[Z ₁ ,Z ₂ ,…,Z _n ]

further, the processing the text feature matrix based on the attention mechanism to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion, including:

according to the text feature matrix, a attention mechanism is introduced, and each text information Z input is set _i Is q, and each text feature Z is obtained by adopting the following attention distribution coefficient calculation formula _i Attention distribution coefficient alpha of (a) _i ：

Wherein i.epsilon.1, 2, …, n]J is a parameter in a linear function of softmax, s (Z _i Q) is an attention calculating function, and an additive model is adopted for calculation, wherein the calculation is as follows:

s(Z _i ,q)＝V ^T tanh(WZ _i +U _q )

a weighted average attention signal is obtained:

averaging attention signals with weights

Mapping to the initial word vector, and performing attention fusion with the initial word vector by adopting the following attention fusion mode to obtain the fused attentionTarget word vector:

wherein ,ω_i Mu for the i-th target word vector ₁ Is the weight of the original word vector, mu ₂ Weight, x, of the attention signal _i And (5) the initial word vector is the ith initial word vector.

Further, based on the Bi-LSTM model, extracting the forward output feature and the backward output feature of the target word vector to obtain a target text feature vector of each word including the forward output feature and the backward output feature, including:

based on the Bi-LSTM model, the i-th target text feature vector of the target word vector at the time t is:

wherein ,

for the i-th target text feature vector of the target word vector at the time t,

further, the classifying the obtained target text feature vector of each word as the input of the linear function softmax to obtain a final emotion classification result, which comprises the following steps:

classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result, wherein the softmax function is as follows:

y＝softmax(W _c M+b _c )

wherein ,W_c The weight matrix is represented by a matrix of weights,

b _c representing the bias term.

The beneficial effects of the invention include: the invention provides a method for classifying emotion characteristics of a text, which comprises the steps of combining an Attention mechanism, a bidirectional long-short-time memory network and a convolutional neural network, namely, attention+BiLSTM+CNN, namely, an AB-CNN model, firstly carrying out word vector acquisition, carrying out characteristic extraction, vectorizing sentences, realizing the representation of words by using high-dimensional vectors, loading word vectors as word embedding layers to convolutional layers, carrying out convolutional operation on initial word vectors, obtaining new characteristics of each word in an initial text sequence, obtaining important text characteristics, forming a text characteristic matrix, processing the text characteristic matrix based on the Attention mechanism, calculating average Attention weight of emotion words in each text, obtaining Attention signals, carrying out Attention fusion with the initial word vectors, obtaining target word vectors after Attention fusion, then carrying out forward output characteristic and backward output characteristic extraction on the target word vectors based on Bi-LSTM model, obtaining target text characteristic vectors of each word containing forward emotion output characteristics and backward output characteristics, simultaneously reading the text from two directions, fully utilizing all context information of each word in the initial text sequence, further enhancing the average Attention weight of each emotion word in each text, and finally carrying out linear analysis on the emotion characteristics as an emotion characteristic classification platform, thus obtaining a classification result of the emotion characteristics of each emotion characteristics.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the following briefly describes the drawings that are required to be used in the embodiments:

FIG. 1 is a schematic overall flow chart of an e-commerce evaluation multi-classification emotion analysis method based on an AB-CNN model provided by an embodiment of the application;

FIG. 2 is a schematic diagram of the attention mechanism;

FIG. 3 is a block diagram of a bidirectional long and short duration memory network;

FIG. 4 is a block diagram of a convolutional neural network;

FIG. 5 is a diagram of the AB-CNN model structure;

FIG. 6 is a sentence length and frequency statistics plot;

FIG. 7 is a graph of cumulative distribution function of sentence length;

FIG. 8 is a selection chart of iteration numbers;

FIG. 9 is a selection graph of random inactivation values;

FIG. 10 is a size selection diagram of batch data;

FIG. 11 is a selection chart of learning rates;

FIG. 12 is a graph of the relationship between loss value and learning rate;

FIG. 13 is a diagram of a test set confusion matrix;

fig. 14 is a comparison of ablation experimental models.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In order to explain the technical solutions described in the present application, the following description will be given by way of specific embodiments.

As shown in FIG. 1, the embodiment provides an E-commerce evaluation multi-classification emotion analysis method based on an AB-CNN model. First, attention mechanisms (Attention), bi-directional long-short-term memory networks (Bi-LSTM), and Convolutional Neural Networks (CNN) are described.

The Attention mechanism in deep learning is just a Attention mode by referring to human vision, and is originally proposed by Treisman and the like in 1980, the physiological principle of the Attention mechanism is that a human can rapidly scan the panorama when observing the external environment, then a target area focused on is rapidly locked according to the processing of brain signals, and finally an Attention focus is formed, so that the aim of acquiring more detail information and suppressing other useless information is achieved.

When the NLP processes text tasks, attention mechanisms can be used for focusing more attention on text contents needing attention, so that the running speed of the model can be further improved, the complexity of the model is reduced, the training time of the model is saved, and meanwhile, the prediction accuracy of the model can be further improved. In the emotion analysis task, an Attention layer is introduced into the CNN, so that the Attention of a model can be focused on words or sentences related to emotion, and text information of other irrelevant emotion colors is abandoned.

The essence of the attention mechanism is a constant addressing process. Assuming that given an input text sequence X, there is a query vector q, the role of the query vector is to find important information in X. The query process needs to be performed in the whole text sequence X, each word can contribute to its own attention when extracting text content, and the attention is more so when words containing emotion colors are encountered.

Thus, the specific location of each word in the text needs to be known during the query, so that an attention variable U.epsilon.1, N is defined to represent the index of the selected query information. When u=i, this indicates that the i-th word in the query text sequence X is selected, and the calculation process is shown in fig. 2. This process is in fact an embodiment in which the attention mechanism reduces the complexity of the model: instead of inputting all text information content into the model for training, only words or sentences related to emotion are selected from X for input.

The attention mechanism can be divided into three steps: firstly, inputting text information; secondly, calculating an attention distribution weight alpha; thirdly, a weighted average of the input information is calculated. The method comprises the following specific steps:

(1) Text information input: with X= [ X ] ₁ ,X ₂ ,…,X _N ]Representing N input text information contents;

(2) Attention weight coefficient calculation: the attention weighting coefficient between the i-th word and q is as follows:

α _i ＝P(U＝i|X,q)＝softmax(s(X _i ,q))

wherein ,α_i Called attention systemNumber, s (X) _i Q) is an attention calculating function, and mainly comprises the following calculating methods:

additive model: s (X) _i ,q)＝V ^T tanh(WX _i +U _q )

Dot product model:

scaling the dot product model:

bilinear model:

where W, U, V are parameters in the network model and d is the latitude of the input word vector.

(3) Attention weighted average: attention coefficient alpha _i It can be understood that the input text information X is encoded by the degree to which the i-th information is of interest when the context query vector is q:

long and short term memory networks (long short term memory, LSTM) are one implementation of Recurrent Neural Networks (RNNs). However, in the practical application process, a series of problems such as gradient disappearance, gradient explosion, limited information reading range and the like of the RNN are found, and in order to solve the problems, LSTM is introduced, and has the characteristic of 'memory time sequence', so that the relationship between input text data and context can be quickly learned.

On the basis of a simple RNN, LSTM is improved in two ways:

(1) New internal state. LSTM introduces a new internal state c _t ∈R ^D The linear circulation information transmission is specially carried out, and the information is output to the external state h of the hidden layer _t ∈R ^D The internal state may be calculated by the following formula:

h _t ＝o _t ⊙tanh(c _t )

wherein ,f_t ∈[0,1] ^D ，i _t ∈[0,1] ^D ，o _t ∈[0,1] ^D The path for information transfer is controlled for three gates, +. _t-1 Is the memory cell at the previous time,

is a candidate state obtained by a nonlinear function:

at each instant t, the internal state c of the LSTM network _t History information up to the present time is recorded.

(2) Gating mechanism. LSTM networks introduce gating mechanisms to control the path of information delivery. The three gates are respectively a forgetting gate f _t Input gate i _t Output gate o _t . The value of the gate in the LSTM network is between (0, 1), indicating that the information is allowed to pass through in a certain proportion. The three gates are calculated in the following ways:

f _t ＝σ(W _f x _t +U _f h _t-1 +b _f )

i _t ＝σ(W _i x _t +U _i h _t-1 +b _i )

o _t ＝σ(W _o x _t +U _o x _t-1 +b _o )

wherein, sigma (·) is a Logistic function, x _t For the current time input, h _t-1 Is the external state at the last moment.

Bidirectional long and short time memory network (bidirect)Long ShortTerm Memory, biLSTM) is divided into 2 independent LSTM, the input sequence is respectively input into the 2 LSTM in the forward direction and the reverse direction for feature extraction, and the word vector formed by splicing the 2 output vectors is used as the final feature expression of the word. The structural features of the BiLSTM model are shown in figure 3, the design concept is that feature data obtained at the time t simultaneously has past and future information, and the output of the time t of the forward LSTM layer is recorded as

The output result at time t of backward LSTM layer is denoted +.>

Experiments prove that the text feature extraction efficiency and performance of the BiLSTM model are superior to those of a single LSTM structure model, and 2 LSTM parameters in the BiLSTM are mutually independent, and only share a word-embedding word vector list.

CNNs (Convolutional Neural Networks ) are traditionally used in the field of computer vision, and networks have become finer from the initial convolutional layer to the later addition of layers such as pooling, dropout, padding, etc. Subsequently GoogleNet, VGGNet and the most well known ResNet in the field of image recognition were presented in succession, the advent of this network has made the classification accuracy of neural networks on images beyond the human level. It can be seen that convolutional neural networks have efficient feature extraction and classification capabilities, and when text information is considered as a one-dimensional image, the CNN can be used to classify the text, and the model structure is shown in fig. 4.

First, the text sequence to be input is expressed by word vector in the word embedding layer, and as CNN input, the text matrix in the word embedding layer is expressed by X, and the text matrix is expressed as x= [ X ] ₁ ,X ₂ ,…,X _N ]。

Features of the text are then extracted by convolving by sliding over the original input text sequence. If the convolution kernel is denoted by k, i.e

Convolution kernelAnd k, performing N-gram convolution operation in a sliding window scanning mode, and enabling the sliding step length of the convolution kernels to be s, so that (N-n+1) pieces of characteristic information of each convolution kernel can be obtained. And selecting text features with the largest weight value through a pooling layer, and ignoring text features which are not important, so as to obtain a word vector represented by the final text features.

And finally, classifying at a full-connection layer, fully connecting text features obtained by filtering at a pooling layer with predicted class labels, comprehensively obtaining all the text features, calculating the probability of each class label, and taking the maximum label probability value as a classification result.

The E-commerce assessment multi-classification emotion analysis method based on the AB-CNN model combines an Attention mechanism (Attention), a bidirectional long short-time memory network (BiLSTM) and a Convolutional Neural Network (CNN) to obtain an AB-CNN model, and performs classification prediction on an E-commerce platform data set.

The AB-CNN model structure is shown in FIG. 5, and mainly comprises: input layer, word embedding layer, convolution layer, dropout layer, attention layer, biLSTM layer, full connection layer, and output layer.

The E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model provided by the embodiment comprises the following steps:

Step S1: an initial text sequence is obtained.

Step S1 corresponds to an input layer of the AB-CNN model structure. The initial text sequence is a text sequence to be processed, and the initial text sequence x is composed of n words and is expressed as x= [ x ] ₁ ,x ₂ ,…,x _n ]。

Step S2: converting the initial text sequence into a corresponding initial word vector:

step S2 corresponds to a word embedding layer of the AB-CNN model structure. The initial text sequence is converted into a corresponding initial word vector. As a specific implementation, a word2vec word vector model is used to convert an initial text sequence into a corresponding initial word vector. Correspondingly, the word embedding layer is specifically a word2vec word vector embedding layer. The word vector encodes a latitude of 128 and initializes it, and each text is vectorized in the form of:

where n represents the length of each comment text sequence, each word is represented by an h-dimensional vector, x _i For the vector representation of the i-th word in the sentence,

is a join operator.

The text after word segmentation is used as input, and the text sequence x of the input word embedding layer is n-dimension h-dimension vector matrix and is converted into low-dimension word vector (t-dimension word vector matrix). Through the embedding layer, the words complete the conversion from text to digital vector.

Step S3: performing convolution operation on the initial word vector to obtain new features of each word in the initial text sequence, and forming a text feature matrix:

Step S3 corresponds to the convolution layer of the AB-CNN model structure.

The initial word vector forms a t-l dimension word vector matrix, and the matrix is formed by t convolution filters with the length of l

Performing convolution operation on the input t/l-dimensional word vector matrix, wherein the new feature of the ith word in the initial text sequence is as follows:

z _i ＝f(D ^T ·x _i:i+t-1 +b)

wherein ,

is a bias term, D ^T As the weight, f is a nonlinear function ReLu. The filter is applied to each possible word x in the sentence _1:t ,x _2:t+1 ,…,x _n-t+1:n ]The text feature expression is obtained as follows:

Z＝[z ₁ ,z ₂ ,…,z _n-t+1 ]

wherein ,

using maximum pooling operation and adding maximum +.>

As a feature of the convolution filter, the purpose is to acquire the most important feature with the highest value for each feature. The text feature matrix is formed by:

Y＝[Z ₁ ,Z ₂ ,…,Z _n ]

the convolutional layer outputs Y.

As a specific embodiment, dropout layers are added after the convolution layers to prevent overfitting.

Step S4: based on an attention mechanism, processing the text feature matrix to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion:

step S4 corresponds to the attention layer of the AB-CNN model structure.

The important characteristics of the text are extracted through the convolution layer, and related emotion polarity words in the important characteristics of each text can be extracted through the attention layer, so that the running time is saved, and the complexity of the model is reduced. By directing attention to the output text feature matrix Y of the convolution layer, each text information Z input is set _i Is q, and each text feature Z is obtained by adopting the following attention distribution coefficient calculation formula _i Attention distribution coefficient alpha of (a) _i ：

Wherein i.epsilon.1, 2, …, n]The method comprises the steps of carrying out a first treatment on the surface of the j is a parameter in a linear function of softmax and represents the value of the obtained text characteristics Z _j Summing and calculating probability distribution of ith text, i.e. weighting coefficient alpha _i 。s(Z _i Q) is an attention calculating function, the aforementioned can be selected: additive model, dot product model, scaled dot product model, bilinearThe sex model is calculated. The present embodiment uses an additive model for calculation as follows:

s(Z _i ,q)＝V ^T tanh(WZ _i +U _q )

attention coefficient alpha _i When the context query vector q is expressed, the degree to which the ith emotion information is focused can be obtained by encoding the input text information P as follows, thereby obtaining a weighted average attention signal:

signal attention

Mapping to a corresponding input word vector matrix x _i In the above, a text matrix with attention mechanisms is obtained, which is denoted +.>

And, adopt the following attention fusion mode to fuse with the initial word vector, get the goal word vector after fusing the attention:

wherein ,ω_i Mu, i.e. the i-th target word vector ₁ Is the weight of the original word vector, mu ₂ Weight, x, of the attention signal _i Is the i-th initial word vector. Omega= [ omega ] ₁ ,ω ₂ ,…,ω _n ]。

Step S5: based on a Bi-LSTM model, extracting forward output features and backward output features of the target word vector to obtain a target text feature vector of each word containing the forward output features and the backward output features:

step S5: biLSTM layer corresponding to AB-CNN model structure.

The text word vector ω of the emotion polarity is output by the attention layer as input to the Bi-LSTM layer. The information of the input sequence in the forward direction and the backward direction is combined through the two LSTMs, so that the emotion colors of the input text content are further enriched, and the classification effect of the model is improved.

For the output of time t, the forward LSTM layer has information of time t and previous times in the input sequence, and the backward LSTM layer has information of time t and subsequent times in the input sequence.

The information of the input sequence in the forward direction and the backward direction is combined through the two LSTMs, and for the output of the t moment, the forward LSTM layer has the information of the t moment and the previous moment in the input sequence, and the backward LSTM layer has the information of the t moment and the subsequent moment in the input sequence. The Bi-LSTM model can effectively improve the accuracy and the forward output of the Bi-LSTM at the time t

And backward output->

The following are provided:

the Bi-LSTM outputs the target text feature vector containing the emotion color at the time t and the ith target word vector at the time t

The method comprises the following steps:

extracting the text sequence of Bi-LSTM layerMeaning information, which can be output as

Step S6: classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result:

step S6 corresponds to the full connection layer of the AB-CNN model structure.

The input text starts from the embedded layer and carries out word2vec to vector the text, then carries out convolutional layer classification on the text, extracts important characteristics of the text, then carries out attention layer extraction on semantic characteristics of emotion, and then carries out Bi-LSTM extraction on text context information. Further enhancing the emotion colors of the extracted semantic features to obtain deeper semantic feature representations. And finally, classifying the result M obtained by the Bi-LSTM layer as the input of a linear function softmax to obtain a final emotion classification result, wherein the softmax function is as follows:

y＝softmax(W _c M+b _c )

wherein ,W_c Representing a weight matrix, b _c Representing the bias term.

The output layer is used for outputting the final emotion classification result.

The implementation process of the E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model is provided, and the effect of the E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model is provided as follows, wherein the effect comprises data set division, evaluation index and model parameter selection. The performance of the model is then demonstrated and compared to other existing deep learning models.

(1) Introduction to data set

The adopted data set is a public electronic commerce platform comment data set, the data set comprises 21091 pieces of data in total, 8033 pieces of good (positive) data, 4355 pieces of general comment (neutral) data and 8703 pieces of bad (negative) data, and the specific division is shown in the following table 1. Table 1 is a dataset profile.

TABLE 1

Emotion category	Data set content examples	Training set	Test set
				Active role of	"good baby, good seller"	6443	1590
Neutral	"sound function well-! But also has the disadvantages-! "	3479	876
				Negative electrode	"no delivery at all-! The wasted money is-! "	6951	1752
Together, a total of	——	16873	4218

(2) Data partitioning and training process

The training process of the model of the embodiment is completed on a Windows10 operating system by dividing a data set into a training set and a testing set according to the proportion of 4:1, the training is completed by using a CPU, the CPU is an Intel (R) Core (TM) i7-5500U 2.40GHz processor, the RAM is 16GB, the programming language is Python, the version is Python3.7, the development tool is Pycharm, the version of the adopted Chinese word segmentation tool is jieba0.38, and the architecture based on deep learning adopts Tensorflow1.15.0 and Keras2.3.1.

(3) Evaluation index

Because the present embodiment solves the three-classification problem, the accuracy and the multi-classification evaluation index Kappa coefficient and hamming distance are selected.

Accuracy rate: the decision ability of the model to the entire dataset is reflected. For the test set, the ability to determine positive, determine neutral, and determine negative can be determined positive. I.e. the proportion of correctly classified samples to the whole sample, the formula is as follows:

in this embodiment, n=3, which indicates the accuracy of three classifications.

Kappa coefficient: kappa coefficients are one method used in statistics to evaluate consistency, and range from [ -1,1], to practical use, typically [0,1]. The higher the value of this coefficient, the higher the classification accuracy that the representation model achieves. The calculation method of kappa coefficient can be expressed as follows:

wherein ,P_o Representing the overall classification accuracy.

P _e Expressed as:

wherein ,a_i Representing the i-th class of real sample number, b _i Indicating the number of samples predicted by class i.

Table 2 shows the kappa coefficient table.

TABLE 2

Sea distance: the Hamming distance is also suitable for the problem of multiple classifications, namely simply measuring the distance between the predicted label and the real label, and the value is between 0 and 1. A distance of 0 indicates that the predicted outcome is exactly the same as the real outcome, and a distance of 1 indicates that the model is exactly opposite to the real outcome.

(4) Parameter selection

Selecting a proper length of the input text is a primary solution, if the input text is short, the text is intercepted, the true emotion polarity of the text cannot be grasped, and the final performance of the model is affected. However, the length of the input text cannot be too long, so that a large amount of 0 is added behind the word vector, the training accuracy of the model is reduced, and the final evaluation index is affected. As shown in fig. 6 and 7, it can be seen that the text length of the data set is mostly below 200, and the text length is only a very small part above 200, and the frequency of occurrence of the text length below 201 is 94% of the whole data set, and the frequency of occurrence of the text length above 201 is only 6%. Therefore, the present embodiment considers the text length and the frequency of occurrence in combination, and selects 200 as the length of the input text.

The most important index for measuring the model is the selection of iteration times, the model is over-fitted due to the excessive iteration times, the model is not trained enough due to the insufficient iteration times, and the optimal state cannot be achieved. Therefore, the selection of the iteration number is the first problem to be solved. As shown in fig. 8 and table 3, when the number of iterations is greater than 16, the performance of the model starts to decrease, and when the number of iterations is less than 16, the performance of the model is always improved, but none of the iterations is the optimal number of iterations, and as shown by experimental analysis, the optimal number of iterations selected in this embodiment is 16. Table 3 is a selection table of iteration numbers.

TABLE 3 Table 3

Number of iterations	Accuracy rate of	Kappa coefficient	Sea distance
					4	0.8367	0.7431	0.1633
8	0.8978	0.8397	0.1022
				12	0.9033	0.8483	0.0967
16	0.9061	0.8528	0.0939
				20	0.8917	0.8304	0.1083
24	0.8774	0.8082	0.1226

The model training process is easy to generate the over fitting phenomenon, and the over fitting specific body is shown: the model has smaller loss function on training data and higher prediction accuracy; however, the loss function is larger on the test data, and the prediction accuracy is lower. In order to avoid the occurrence of the overfitting phenomenon, dropouts are introduced, so that the model generalization capability can be stronger, because the model does not depend on some local characteristics too, and complex co-adaptation relations among neurons are reduced. Experimental analysis shows that when the dropout value is 0.45, the performance of the model is optimal, and the occurrence of the overfitting phenomenon is prevented. The experimental results are shown in table 4 and fig. 9. Table 4 is a selection table of random inactivation values.

TABLE 4 Table 4

Random inactivation value	Accuracy rate of	Kappa coefficient	Sea distance
				0.15	0.9045	0.8507	0.0955
0.25	0.8985	0.8411	0.1015
				0.35	0.8988	0.8414	0.1012
0.45	0.9078	0.8555	0.0922
				0.55	0.8940	0.8342	0.1060
0.65	0.8895	0.8264	0.1105

The visual understanding of the Batch Size is that the Size of the Batch Size influences the optimization degree and speed of the model according to the number of samples selected by one training. The batch data can be selected for processing each time in the training process by setting the batch size, the result of the overlarge batch size is that the network is easy to converge to some bad local optimal points, in order to ensure the training effect of the model, the proper batch size needs to be selected, and experimental analysis shows that when the batch size is 16, the optimal convergence precision is achieved, and the training effect of the model is also best. The experimental results are shown in table 5 and fig. 10. Table 5 is a selection table of batch data sizes.

TABLE 5

The learning rate determines whether and when the objective function can converge to a local minimum, and the appropriate learning rate enables the objective function to converge to the local minimum at an appropriate time. Too much learning rate may result in loss explosion or nan, and too little learning rate may result in half a day loss not being reflected. In the embodiment, different fixed learning rates are tried, the change relation of iteration times and loss is observed, and the learning rate corresponding to the relation with the fastest loss of loss is found. Experimental analysis shows that when the learning rate is 0.0001, the performance of the model is optimal, and loss is most rapid. The experimental results are shown in table 6, fig. 11 and fig. 12. Table 6 is a selection table of learning rates.

TABLE 6

Learning rate	Accuracy rate of	Loss value	Kappa coefficient	Sea distance
					0.01	0.3770	1.0600	0.00	0.6230
0.001	0.8696	0.6178	0.7951	0.1304
					0.0001	0.9002	0.3036	0.8438	0.0998
0.00001	0.8867	0.3721	0.8222	0.1133
					0.000001	0.5142	0.9329	0.2133	0.4858

The final choice of the hyper-parameters of the model of this example is shown in Table 7 below.

TABLE 7

Super parameter	Value of super parameter
		Word vector latitude	128
Convolution kernel size	3
		Convolution kernel number	250
BiLSTM hidden layer size	64
		Maximum input text length	200
Number of iterations	16
		Random inactivation value	0.45
Batch data size	16
		Learning rate	0.0001

(5) Model comparison

For comparison with the model proposed in this embodiment, 8 proposed deep learning models were selected for comparison experiments to evaluate the performance of the model of this embodiment, as shown in table 8. Table 8 is a comparison table of the performance of different deep learning models.

TABLE 8

Method	Accuracy rate of	Kappa coefficient	Sea distance
				BiGRU	0.9004	0.8441	0.0996
ATT+CNN	0.9125	0.8629	0.0875
				ATT+Bi-LSTM	0.8976	0.8397	0.1024
CNN	0.8966	0.8384	0.1034
				LSTM+CNN	0.8791	0.8103	0.1209
ATT+LSTM+CNN	0.9016	0.8503	0.0938
				CNN+Bi-LSTM	0.9073	0.8555	0.0927
CNN+BiGRU	0.8976	0.8402	0.1024
				The embodiment proposes	0.9151	0.8673	0.0848

From the analysis of table 8, the model proposed in this embodiment is absolutely advantageous in terms of accuracy, kappa coefficient and hamming distance, because this embodiment introduces a attention mechanism and a bi-directional long-short-term memory network based on convolutional neural network. When text information is extracted, the bidirectional long-short-time memory network can combine the information in the forward direction and the backward direction of the input text sequence, so that the extracted characteristic data simultaneously has the information between the past and the future, the BiLSTM+CNN model is superior to a single CNN model, and the accuracy is improved by 1.07%. The attention mechanism can comprehensively consider the whole text content, so that the attention of the model is focused on words or sentences related to emotion, the text content of other irrelevant emotion colors is abandoned, and the performance of the model is further improved by introducing the attention mechanism, so that the model of the embodiment is higher than other deep learning models.

As can be seen from the confusion matrix of fig. 13, the predictive accuracy of the active, neutral, and passive labels on the test set respectively reached: 87.42%,90.30% and 95.83%. Especially, the prediction accuracy of neutral emotion and negative emotion is over 90%, which shows that the model of the embodiment has good effect in multi-classification emotion analysis tasks.

As can be seen from the performance analysis of the different combined models of Table 9 and FIG. 14, when only attention mechanisms are introduced into the emotion analysis model, the model has poor performance, the accuracy is only 60.36%, and the Kappa coefficient is 0.3731, which is common in performance; when the BiLSTM is only adopted, the model can process text context information, and the accuracy, kappa coefficient and Hamming distance are improved, but the model processes the whole text information, so that the time cost is overlarge; when the attention mechanism is combined with the BiLSTM, the model can pay attention to text information in two directions and pay attention to sentences related to emotion, so that the performance of the model is further improved; when the two are combined with CNN, the model of the embodiment is obtained, the accuracy is improved by 1.85% compared with that of independent CNN, the accuracy is improved by 31.15% compared with that of independent ATT, and the accuracy is improved by 0.78% compared with that of ATT+CNN, so that the capability of the model for extracting and classifying features is further improved, and meanwhile, the performance of the model achieves the best effect. Table 9 is a comparison table of ablation experimental models.

TABLE 9

Method	Accuracy rate of	Kappa coefficient	Sea distance
				CNN	0.8966	0.8384	0.1034
ATT	0.6036	0.3731	0.3964
				BiLSTM	0.8770	0.8068	0.1230
ATT+CNN	0.9125	0.8629	0.0875
				ATT+BiLSTM	0.8976	0.8397	0.1024
CNN+BiLSTM	0.9073	0.8555	0.0927
				The embodiment proposes	0.9151	0.8673	0.0848

(6) Conclusion(s)

The emotion analysis is an important branch of natural language processing in emotion analysis, and the emotion analysis on an e-commerce platform is favored by many consumers and e-commerce websites, so that the method has high research value in practical application. In this embodiment, an AB-CNN model architecture is proposed, and the model combines the attention mechanism and BiLSTM to improve the prediction accuracy of the multi-classification model. Words or sentences related to emotion are extracted through an attention mechanism, and context text information is acquired in the BiLSTM at the same time, so that the emotion degree is further enhanced, and the model classification prediction effect is more accurate. Finally, by comparing with the proposed model, the model of the embodiment obtains the best experimental effect, and comparing with the ablation experiments of different combination models of the model, the attention mechanism and the BiLSTM can be introduced to improve the performance of the model to different degrees.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims

1. The E-commerce evaluation multi-classification emotion analysis method based on the AB-CNN model is characterized by comprising the following steps of:

acquiring an initial text sequence, wherein the initial text sequence is a text sequence to be processed, and the initial text sequence x consists of n words and is expressed as x= [ x ] ₁ ,x ₂ ,…,x _n ]；

Converting the initial text sequence into a corresponding initial word vector;

classifying the obtained target text feature vector of each word as the input of a linear function softmax to obtain a final emotion classification result;

the convolution operation is carried out on the initial word vector to obtain new features of each word in the initial text sequence, and a text feature matrix is formed, and the method comprises the following steps:

z _i ＝f(D ^T ·x _i:i+t-1 +b)

wherein ,

is a bias term, D ^T As the weight, f is a nonlinear function ReLu;

the text feature expression is obtained as follows:

Z＝[z ₁ ,z ₂ ,…,z _n-t+1 ]

wherein ,

using maximum pooling operation and adding maximum +.>

Characterising as a convolution filterThe method comprises the steps of carrying out a first treatment on the surface of the The text feature matrix is formed by:

Y＝[Z ₁ ,Z ₂ ,…,Z _n ]

the processing the text feature matrix based on the attention mechanism to obtain an attention signal, and performing attention fusion with the initial word vector to obtain a target word vector after attention fusion, including:

Wherein i.epsilon.1, 2, …, n]J is a parameter in a linear function of softmax and represents the value of the obtained text characteristics Z _j Summing and calculating probability distribution of ith text, s (Z _i Q) is an attention calculating function, and an additive model is adopted for calculation, wherein the calculation is as follows:

s(Z _i ,q)＝V ^T tanh(WZ _i +U _q )

wherein ,W,U_q V is a parameter in the network model;

a weighted average attention signal is obtained:

averaging attention signals with weights

Mapping to the initial word vector, and performing attention fusion with the initial word vector by adopting the following attention fusion mode to obtain a target word vector after attention fusion:

2. The method for analyzing multi-classification emotion of e-commerce assessment based on an AB-CNN model of claim 1, wherein said converting said initial text sequence into a corresponding initial word vector comprises:

3. The method for analyzing emotion by multiple categories of e-commerce assessment based on an AB-CNN model according to claim 1, wherein the extracting the forward output feature and the backward output feature of the target word vector based on the Bi-LSTM model to obtain the target text feature vector of each word including the forward output feature and the backward output feature includes:

wherein ,

c _t-1 is the memory cell of the previous moment.

4. The method for multi-classification emotion analysis of e-commerce assessment based on an AB-CNN model of claim 3, wherein classifying the obtained target text feature vector of each word as an input of a linear function softmax to obtain a final emotion classification result comprises:

y＝softmax(W _c M+b _c )

wherein ,W_c The weight matrix is represented by a matrix of weights,

b _c representing the bias term.