Disclosure of Invention
In order to overcome the problems in the related art, the embodiment of the invention provides a specific target emotion classification method and device based on attention coding and graph convolution network.
According to a first aspect of the embodiments of the present invention, there is provided a method for classifying a specific target emotion based on attention coding and graph convolution network, including the following steps:
obtaining a word vector corresponding to the context and a word vector corresponding to the specific target;
inputting the word vector corresponding to the context and the word vector corresponding to the specific target into a preset bidirectional recurrent neural network model to obtain a hidden state vector corresponding to the context and a hidden state vector corresponding to the specific target;
extracting a syntactic vector in a syntactic dependency tree corresponding to the context based on a preset graph convolutional neural network combined with point-to-point convolution;
performing multi-head self-attention coding on the hidden state vector corresponding to the context, the hidden state vector corresponding to the specific target and the syntax vector to respectively obtain context semantic information coding, specific target semantic information coding and syntax information coding;
respectively carrying out multi-head interactive attention coding on the context semantic information code and the syntax information code and the specific target semantic information code and the syntax information code to obtain a context-syntax information code and a specific target-syntax information code;
averaging and pooling the context semantic information codes, the context-syntax information codes and the specific target-syntax information codes, and then splicing to obtain a feature representation corresponding to the specific target;
and inputting the feature representation into a preset normalization index function to obtain an emotion classification result of the specific target.
Optionally, the word vector corresponding to the context is used
Word vectors corresponding to specific targets
Inputting the hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target through a preset bidirectional cyclic neural network model LSTM, wherein the process comprises the following steps:
wherein n represents the dimension of the word vector corresponding to the context, m represents the dimension of the word vector corresponding to the specific target,
representing the forward operation process in the preset bidirectional recurrent neural network model LSTM,
representing the inverse operation process in a preset bidirectional recurrent neural network model LSTM, H
cRepresenting a hidden state vector corresponding to the context, H
tRepresenting the hidden state vector corresponding to the context.
Optionally, a function is assigned according to the position and the position weight of the specific target in the context to obtain the position weight corresponding to each word in the context;
obtaining a syntactic dependency tree corresponding to a context;
obtaining an adjacency matrix corresponding to the words in the context according to the syntactic dependency tree, wherein the adjacency matrix reflects the adjacency relation of the words in the context;
inputting the adjacency matrix and the position weight corresponding to each word into a preset graph convolution neural network to obtain an output result of an output layer;
and performing point-by-point convolution on the output result to obtain the syntactic vector.
Optionally, a function is assigned according to the position and the position weight of the specific target in the context to obtain the position weight corresponding to each word in the context; wherein the location weight assignment function F (-) is as follows:
τ +1 represents a specific purposeStarting position of target, m represents number of words in specific target, n represents number of words in context, q represents number of words in contextiIndicating the position weight of the ith word in the context.
Optionally, inputting the adjacency matrix, the position weight of each word and the output result of the previous layer into a preset graph convolution operation formula to obtain the output result of the current layer, and repeatedly executing the input operation until the output result of the output layer is obtained; the preset graph convolution operation formula is as follows:
A
ijthe value representing the ith row and jth column of the adjacency matrix, A ∈ R
n×nRepresenting the contiguous matrix A as a matrix of n rows and n columns, A
ijRepresenting the output result of the previous layer of the convolutional neural network of the preset map, q
lRepresenting the location weight of the jth word in the context,
the l-1 layer output result of the jth word of the convolutional neural network representing the preset graph,
l-level output result of j-th word representing preset graph convolutional neural network, d
iIndicates the depth, W, of the ith word in the syntactic dependency tree
lRepresenting the weight, bias b
lRepresenting a bias, RELU () representing an activation function.
Optionally, inputting an output result of an output layer of a preset graph convolution network into a preset point-by-point convolution formula to obtain the syntax vector; wherein the preset point-by-point convolution formula is as follows:
PWC(h)=σ(h*Wpwc+bpwc)
h
lrepresents the output result of the output layer of the preset graph convolution network,
represents a syntactic vector, where σ represents the activation function ReLu, a convolution operation,
is the learnable weight of the convolution kernel,
is the offset of the convolution kernel.
Optionally, the hidden state vector H corresponding to the context is used
cThe hidden state vector H corresponding to the specific target
tAnd the syntactic vector
Respectively inputting a preset multi-head attention coding formula to obtain a context semantic information code H
csTarget-specific semantic information encoding H
tsAnd syntax information coding H
gs(ii) a Wherein, the preset multi-head attention coding formula is as follows:
Hcs=MHA(Hc,Hc)
Hts=MHA(Ht,Ht)
oh=Attentionh(k,q)
Attention(k,q)=soft max(fs(k,q))k
fs(ki,qj)=tanh([ki;qj]·Watt)
f
s(k
i,q
j) First input vector k ═ k, { representing multi-head attention
1,k
2,...,k
nAnd a second input vector q ═ q of multi-head attention
1,q
2,...,q
mSemantic relevance of }, when multi-head self-attention coding is performed, k ≠ q, and when multi-head interactive attention coding is performed, k ≠ q, "; "refers to the concatenation of the vectors,
is a weight that can be learned that is,
represents 1 line 2d
hidMatrix of columns, d
hidRepresenting the dimension of the hidden state vector softmax () representing the normalized exponential function, Attention
h(k, q) and o
hRepresenting the h-th output result in multi-head attention, h e [1, n
head],
Represents a pair o
hD undergoing a linear change
hidLine d
hidA matrix of the columns is formed,
d
hrepresenting the vector dimensions of the multi-headed attention-coding output.
Optionally, the context semantic information is encoded HcsAnd syntax information coding HgsThe specific target semantic information code HtsAnd syntax information coding HgsRespectively inputting a preset multi-head attention coding formula to obtain a context-syntax information code HgtAnd specific target-syntax information encoding Hcg(ii) a Wherein the content of the first and second substances,
Hgt=MHA(Hgs,Hts)
Hcg=MHA(Hgs,Hcs)
optionally, encoding H context-syntax informationcgSpecific object-syntax information coding HgtAnd context semantic coding HcsInputting a preset average pooling calculation formula, and splicing output results to obtain a feature expression u corresponding to the specific target; wherein, the preset average pooling calculation formula is as follows:
represents an average pooling result of context-syntax information encoding,
represents an average pooling result of a specific target-syntax information encoding,
representing average pooled results of context semantic coding, feature representation
"; "refers to the concatenation of vectors.
Optionally, the feature expression is input into a preset conversion formula, and then a conversion result of the feature expression is input into a preset normalization index function, so as to obtain an emotion classification result of the specific target, where the preset conversion formula and the preset normalization index function are as follows:
in order to be a representation of the feature,
and b
u∈R
cAre learnable weight and bias terms, respectively, y ∈ R
cIs the emotion classification result of the specific object, and c represents the classified category.
Compared with the prior art, the embodiment of the invention obtains the hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target by presetting the bidirectional cyclic neural network model, and performs multi-head self-attention coding on the hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target by combining the advantages of multi-head self-attention parallel calculation and long-distance dependence, thereby extracting abundant and sufficient context semantic information and specific target semantic information. And extracting a syntax vector in a syntax dependency tree corresponding to the context through a point-by-point convolution graph convolutional neural network, performing multi-head self-attention coding on the syntax vector to obtain syntax information codes, performing interactive fusion on the syntax information codes and the context semantic information codes, the syntax information codes and the specific target semantic information codes respectively by using multi-head interactive attention, splicing the fused result with the context semantic information codes to obtain final feature representation, fully considering the relation between the context, the specific target and the syntax information by the feature representation, preventing context words irrelevant in the syntax from being identified as clues for judging target emotion classification, and improving the accuracy of emotion classification.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if/if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
Referring to fig. 1, fig. 1 is a flowchart illustrating a specific target emotion classification method based on attention coding and graph convolution network according to an exemplary embodiment of the present invention, where the method is executed by an emotion classification apparatus and includes the following steps:
s101: and acquiring a word vector corresponding to the context and a word vector corresponding to the specific target.
Word embedding is a digital representation of words by mapping a word into a high-dimensional vector, called a word vector, to achieve a representation of the word.
In the embodiment of the present application, the emotion classification apparatus first determines a context and a specific target within the text, where the context may be a sentence in the text, and the specific target is at least one word in the context, for example: the context is "the price is reusable the service is port", and the specific targets are "price" and "service". And then, the emotion classification equipment converts the context and the specific target into corresponding word vectors through a word embedding tool, if the context comprises n words, the word vectors corresponding to the context are n high-dimensional vectors, and if the specific target comprises m words, the word vectors corresponding to the specific target are m high-dimensional vectors.
The word embedding tool can be GloVe or word2vec and the like, and in the embodiment of the application, word vector conversion is performed on the context and the specific target by adopting the GloVe based on the parallelization processing of the GloVe and the advantage of being beneficial to processing a large data set, so that the word vector corresponding to the context and the word vector corresponding to the specific target are obtained.
S102: and inputting the word vector corresponding to the context and the word vector corresponding to the specific target into a preset bidirectional recurrent neural network model to obtain the hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target.
A Recurrent Neural Network (RNN) is a Recurrent Neural Network in which sequence data is input, recursion is performed in an evolution direction of a sequence, and all nodes (cyclic units) are connected in a chain manner, and common Recurrent Neural networks include a Bidirectional Recurrent Neural Network (Bi-RNN), a Long-Short Term Memory Network (LSTM), a Bidirectional Long-Short Term Memory Network (Bi-LSTM), and the like.
In the embodiment of the application, a Bidirectional recurrent neural network (Bi-RNN) or a Bidirectional Long Short-Term Memory network (Bi-LSTM) may be used as a preset Bidirectional recurrent neural network model to obtain a hidden state vector corresponding to a context and a hidden state vector corresponding to a specific target, and initially obtain semantic information included in the context and the specific target. The hidden layer of the bidirectional recurrent neural network model needs to store two values, one value participates in forward calculation, the other value participates in reverse calculation, namely one recurrent network comprises a forward recurrent neural network and a backward neural network, and the bidirectional recurrent neural network is more suitable for modeling time sequence data and is more beneficial to capturing bidirectional semantic dependence.
In an alternative embodiment, since the front-back order of words in context needs to be considered in the process of classifying specific emotion objects, if Bi-LSTM is adopted, information far away can be transmitted, and the problem of long-term dependence is avoided. For example, the context "i don't feel this painting is well-developed", the forward neural network in the Bi-LSTM model can know that the word "not" is negative to the following "good", so as to find that the emotional polarity of the sentence is depreciation. For another example: "I feel that the hotel room is dirty and not good before", can know that 'not go' is a modification to the dirty degree through the backward neural network in Bi-LSTM. Therefore, by utilizing the Bi-LSTM to preset the bidirectional recurrent neural network model, the obtained hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target can describe the context and the semantic information contained in the specific target more accurately.
Specifically, the word vector corresponding to the context is used
Word vectors corresponding to specific targets
Inputting the hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target through a preset bidirectional cyclic neural network model LSTM, wherein the process comprises the following steps:
wherein n represents a word corresponding to the contextThe dimension of the vector, m represents the dimension of the word vector corresponding to the particular target,
representing the forward operation process in the preset bidirectional recurrent neural network model LSTM,
representing the inverse operation process in a preset bidirectional recurrent neural network model LSTM, H
cRepresenting a hidden state vector corresponding to the context, H
tRepresenting the hidden state vector corresponding to the context.
S103: and extracting the syntactic vectors in the syntactic dependency tree corresponding to the context based on a preset graph convolutional neural network combined with point-by-point convolution.
Semantic Dependency analysis (SDP), also known as Dependency tree, is used to analyze Semantic associations between words in a context and present the Semantic associations in a Dependency structure. The specific process is as follows: (1) the context is segmented, for example: "monkeys like to eat bananas. "the monkey likes to eat banana after word segmentation. "; (2) part-of-speech tagging is performed on each word, for example: monkey/NN liked/VV eaten/VV banana/NN. a/PU; (3) generating a phrase syntax tree by part-of-speech tagging; (4) the phrase syntax tree is converted into a syntactic dependency tree.
Graph convolutional neural networks (GCNs) are used to process data of a graph structure type, i.e. a topological structure, which may also be referred to as a non-euclidean structure, and common graph structures include, for example, social networks, information networks, and the like. The point-by-point convolution is to perform convolution operation again on the output result of the graph convolution neural network so as to better integrate the syntax in the sentence.
In the embodiment of the application, the emotion classification device acquires a syntactic dependency tree corresponding to a context, and acquires an initial syntactic vector in the syntactic dependency tree by using a preset graph convolution neural network, and then performs point-by-point convolution on the initial syntactic vector to obtain a syntactic vector corresponding to the context. The trained graph convolution neural network is used as a preset graph convolution neural network, and the specific training mode is the same as that of the existing neural network.
The syntactic information displayed in the syntactic dependency tree can be extracted by combining the preset graph convolutional neural network of point-by-point convolution to obtain a syntactic vector, so that words irrelevant in syntax are avoided being taken as clues for judging emotion polarity in the follow-up process, and the accuracy of emotion classification of a specific target is improved.
In an optional embodiment, to accurately obtain the syntax vector, step S103 specifically includes steps S1031 to S1035, referring to fig. 2, fig. 2 is a schematic flow diagram of step S103 in the specific target emotion classification method based on attention coding and graph convolution network according to an exemplary embodiment of the present invention, and steps S1031 to S1035 are as follows:
s1031: and obtaining the position weight corresponding to each word in the context according to the position of the specific target in the context and the position weight distribution function.
The importance of each word in the context to the sentiment classification of a particular object varies depending on the location of the particular object in the context. Specifically, the position weight corresponding to each word in the context is obtained according to the position of the specific target in the context and the position weight distribution function. The position weight distribution function can be preset according to different requirements of different specific target emotion classifications. For example: the position weight distribution function can be set as F (a), wherein a is the number of words separated from the nearest specific target between each word in the context, so that different position weights can be obtained according to the number of the separated words.
In an optional embodiment, a position weight word vector corresponding to each word in the context is obtained according to the position of the specific target in the context and a position weight distribution function; wherein the location weight assignment function F (-) is as follows:
τ +1 represents the starting position of the specific object, m represents the number of words in the specific object, n represents the number of words in the context,qiindicating the position weight of the ith word in the context.
The importance degree of the words at different positions can be analyzed more accurately through the weight distribution function F (-) and the position weight corresponding to each word in the context can be distributed more reasonably.
S1032: and acquiring a syntactic dependency tree corresponding to the context.
And the emotion classification equipment acquires a syntactic dependency tree corresponding to the context.
In the embodiment of the application, the syntactic dependency tree can be obtained through spaCy. The syntactic dependency tree can vividly embody the dependency relationship of words in the context. Referring to fig. 3, fig. 3 is a diagram illustrating a syntactic dependency tree according to an exemplary embodiment of the present invention. As shown, the context is "I am happy today", and in the constructed syntactic dependency tree, the structural syntactic dependency tree is happy with the root of the syntactic dependency tree, which comprises three branches of "I" (subject), "today" (subject) and "not" (subject), and the branch of "not" is "very" (subject) for further modifying "not".
S1033: and obtaining an adjacency matrix corresponding to the words in the context according to the syntactic dependency tree, wherein the adjacency matrix reflects the adjacency relation of the words in the context.
In the embodiment of the application, the emotion classification device obtains an adjacency matrix corresponding to the words in the context according to the syntactic dependency tree. Where the adjacency matrix reflects the adjacency of words in the context. It should be noted that the word has an adjacency with itself by default.
Referring to fig. 4, fig. 4 is a schematic diagram of an adjacency matrix according to an exemplary embodiment of the present invention, and the adjacency matrix shown in fig. 4 corresponds to the syntactic dependency tree shown in fig. 3. As shown, the values on the diagonal lines are all 1, which indicates that each word has an adjacent relation with itself. The root of the syntactic dependency tree in FIG. 3 is "happy," which includes three branches, "I", "today", and "not", respectively, and thus the value is 1 at the intersection of the row where "happy" is located and the column value where "I", "today", and "not" are located in the corresponding adjacency matrix. The adjacency relation between the words can be accurately and quickly acquired through the adjacency matrix corresponding to the words in the context.
S1034: and inputting the adjacency matrix and the position weight corresponding to each word into a preset graph convolution neural network to obtain an output result of an output layer.
In the embodiment of the application, the emotion classification device inputs the adjacency matrix and the position weight corresponding to each word into a preset graph convolution neural network to obtain an output result of an output layer. The hidden layer of the preset graph convolutional neural network is set to be 1 layer, and the activation function can be set according to actual conditions.
Specifically, the emotion classification device classifies the adjacency matrix, the position weight of each word and the hidden state vector H corresponding to the contextcInputting an input layer of a preset graph convolutional neural network, transmitting an output result of the input layer, an adjacent matrix and the position weight of each word into a hidden layer, transmitting the output result, the adjacent matrix and the position weight of each word to the output layer by the hidden layer, and finally obtaining an output result of the output layer, namely an initial syntax vector in a syntax dependency tree.
In an alternative embodiment, the hidden layer of the pre-defined graph convolution neural network may be a plurality of layers, and the activation function is RELU ().
Specifically, the emotion classification device inputs the adjacency matrix, the position weight of each word and the output result of the previous layer into a preset graph convolution operation formula to obtain the output result of the current layer, and repeatedly executes input operation until the output result of the output layer is obtained; the preset graph convolution operation formula is as follows:
A
ijthe value representing the ith row and jth column of the adjacency matrix, A ∈ R
n×nRepresenting the contiguous matrix A as a matrix of n rows and n columns, A
ijRepresenting the output result of the previous layer of the convolutional neural network of the preset map, q
lRepresenting the location weight of the jth word in the context,
the l-1 layer output result of the jth word of the convolutional neural network representing the preset graph,
l-level output result of j-th word representing preset graph convolutional neural network, d
iIndicates the depth, W, of the ith word in the syntactic dependency tree
lRepresenting the weight, bias b
lRepresenting a bias, RELU () representing an activation function.
S1035: and performing point-by-point convolution on the output result to obtain the syntactic vector.
And the emotion classification equipment performs point-by-point convolution on the output result (namely the initial syntactic vector) to obtain the syntactic vector. Wherein, the point-by-point convolution operation refers to the step-by-step pair of initial syntax vectors
Vector of each word in the word
A convolution operation is performed to better integrate the syntactic information within each word.
In an optional embodiment, the emotion classification device inputs an output result of an output layer of a preset graph convolution network into a preset point-by-point convolution formula to obtain the syntax vector; wherein the preset point-by-point convolution formula is as follows:
PWC(h)=σ(h*Wpwc+bpwc)
h
lrepresents the output result of the output layer of the preset graph convolution network,
represents a syntactic vector, where σ represents the activation function ReLu, a convolution operation,
is the learnable weight of the convolution kernel,
is the offset of the convolution kernel.
S104: and carrying out multi-head self-attention coding on the hidden state vector corresponding to the context, the hidden state vector corresponding to the specific target and the syntactic vector to respectively obtain context semantic information coding, specific target semantic information coding and syntactic information coding.
The essence of the attention mechanism comes from the human visual attention mechanism, which is applied to emotion classification in order to enable more attention to be assigned to key words in the classification process. Specifically, a sentence of text can be imagined to be composed of a series of < Key, Value > data pairs, at this time, a certain element Query is given, a weight coefficient of Value corresponding to each Key is obtained by calculating similarity or correlation between the Query and each Key, and after normalization by a softmax function, the weight coefficient and the corresponding Value are subjected to weighted summation to obtain an attention result. In current research, Key and Value are often equal, i.e., Key is Value.
The Multi-head Attention coding (Multi-head Attention) represents that multiple times of Attention coding operation are carried out, each operation represents one head, parameters among the heads are not shared, and finally, the results are spliced and linear transformation is carried out once to obtain a Multi-head coding result.
The multi-head attention coding is further divided into multi-head self-attention coding and multi-head interactive attention coding. The Query and the Key of multi-head self attention are the same, and the Query and the Key of multi-head interactive attention coding are different. For multi-head self-attention coding, it needs to implement the calculation of attention values between each word in a certain sentence text and all words of the sentence text; for multi-headed interactive attention coding, it requires the implementation of the calculation of attention values between each word in a certain sentence of text and all the words of the other text.
In this embodiment of the application, the emotion classification device performs multi-head self-attention coding on the hidden state vector corresponding to the context, the hidden state vector corresponding to the specific target, and the syntax vector to obtain a context semantic information code, a specific target semantic information code, and a syntax information code, respectively.
Specifically, (1) the emotion classification device performs multi-head self-attention coding by taking hidden state vectors corresponding to the context as Query and Key to obtain context semantic information codes; (2) the emotion classification equipment performs multi-head self-attention coding by taking hidden state vectors corresponding to the specific target as Query and Key to obtain semantic information coding of the specific target; (3) and the emotion classification equipment performs multi-head self-attention coding by taking the syntax vector as Query and Key to obtain syntax information coding.
By carrying out multi-head self-attention coding on the hidden state vector corresponding to the context, the hidden state vector corresponding to the specific target and the syntactic vector, richer semantic information and emotional information can be extracted.
In an optional embodiment, the emotion classification device classifies the hidden state vector H corresponding to the context
cThe hidden state vector H corresponding to the specific target
tAnd the syntactic vector
Respectively inputting a preset multi-head attention coding formula to obtain a context semantic information code H
csTarget-specific semantic information encoding H
tsAnd syntax information coding H
gs(ii) a Wherein, the preset multi-head attention coding formula is as follows:
Hcs=MHA(Hc,Hc)
Hts=MHA(Ht,Ht)
oh=Attentionh(k,q)
Attention(k,q)=soft max(fs(k,q))k
fs(ki,qj)=tanh([ki;qj]·Watt)
f
s(k
i,q
j) First input vector k ═ k, { representing multi-head attention
1,k
2,...,k
nAnd a second input vector q ═ q of multi-head attention
1,q
2,...,q
mSemantic relevance of }, when multi-head self-attention coding is performed, k ≠ q, and when multi-head interactive attention coding is performed, k ≠ q, "; "refers to the concatenation of the vectors,
is a weight that can be learned that is,
represents 1 line 2d
hidMatrix of columns, d
hidRepresenting the dimension of the hidden state vector softmax () representing the normalized exponential function, Attention
h(k, q) and o
hRepresenting the h-th output result in multi-head attention, h e [1, n
head],
Represents a pair o
hD undergoing a linear change
hidLine d
hidA matrix of the columns is formed,
d
hrepresenting the vector dimensions of the multi-headed attention-coding output.
S105: and respectively carrying out multi-head interactive attention coding on the context semantic information code and the syntax information code and the specific target semantic information code and the syntax information code to obtain a context-syntax information code and a specific target-syntax information code.
And the emotion classification equipment respectively carries out multi-head interactive attention coding on the context semantic information coding and the syntax information coding and the specific target semantic information coding and the syntax information coding to obtain a context-syntax information coding and a specific target-syntax information coding.
Specifically, (1) the emotion classification device performs multi-head interactive attention coding by using syntax information coding as Key and context semantic information coding as Query to obtain context-syntax information coding, so that the syntax information coding and the context semantic coding are interactively fused, and close connection between the syntax information and the context is fully considered. (2) The emotion classification equipment carries out multi-head interactive attention coding by taking the syntax information code as Key and the specific target semantic information code as Query to obtain a specific target-syntax information code, so that the syntax information code and the specific target semantic information code are interactively fused, and the close connection between the syntax information and the specific target is fully considered.
In an alternative embodiment, the emotion classification device encodes the context semantic information HcsAnd syntax information coding HgsThe specific target semantic information code HtsAnd syntax information coding HgsRespectively inputting a preset multi-head attention coding formula to obtain a context-syntax information code HgtAnd specific target-syntax information encoding Hcg(ii) a Wherein the content of the first and second substances,
Hgt=MHA(Hgs,Hts)
Hcg=MHA(Hgs,Hcs)
s106: and averaging and pooling the context semantic information codes, the context-syntax information codes and the specific target-syntax information codes, and then splicing to obtain the feature representation corresponding to the specific target.
And the emotion classification equipment averages and pools the context semantic information codes, the context-syntax information codes and the specific target-syntax information codes and then splices the context semantic information codes, the context-syntax information codes and the specific target-syntax information codes to obtain the characteristic representation corresponding to the specific target. Wherein, the average pooling operation is to average the values of the same dimension, and the splicing operation is to splice the vectors end to end, such as vectors [1,1], [2,2], [3,3], to obtain [1,1,2,2,3,3 ].
In an alternative embodiment, the emotion classification device encodes the context-syntax information HcgSpecific object-syntax information coding HgtAnd context semantic coding HcsInputting a preset average pooling calculation formula, and splicing output results to obtain a feature expression u corresponding to the specific target; wherein, the preset average pooling calculation formula is as follows:
represents an average pooling result of context-syntax information encoding,
represents an average pooling result of a specific target-syntax information encoding,
representing average pooled results of context semantic coding, feature representation
"; "refers to the concatenation of vectors.
S107: and inputting the feature representation into a preset normalization index function to obtain an emotion classification result of the specific target.
And the emotion classification equipment inputs the feature representation into a preset normalized index function to obtain an emotion classification result of the specific target. The preset normalization index function is a softmax () function, and probability distribution of emotion polarities under different specific targets is obtained through the preset normalization index function, so that emotion classification results of the specific targets are obtained.
In an optional embodiment, the emotion classification device inputs the feature representation into a preset conversion formula, and then inputs a conversion result of the feature representation into a preset normalization index function to obtain an emotion classification result of the specific target, where the preset conversion formula and the preset normalization index function are as follows:
in order to be a representation of the feature,
and b
u∈R
cAre the learnable weight and bias term, respectively, y ∈R
cIs the emotion classification result of the specific object, and c represents the classified category.
Referring to fig. 5 and fig. 6, fig. 5 is a schematic diagram illustrating an overall structure of a specific target emotion classification model based on attention coding and a graph convolution network according to an exemplary embodiment of the present invention, and fig. 6 is a schematic diagram illustrating a graph convolution neural network according to an exemplary embodiment of the present invention. The specific target emotion classification model (hereinafter referred to as an AEGCN model) based on attention coding and graph volume network corresponds to the specific target emotion classification method based on attention coding and graph volume network proposed in the embodiment of the present application, for example: steps S101 to S107. Specifically, the model firstly obtains a hidden state vector corresponding to a context and a hidden state vector corresponding to a specific target through a preset bidirectional cyclic neural network model, then performs multi-head self-attention coding on the hidden state vector corresponding to the context and the hidden state vector corresponding to the specific target by combining the advantages of multi-head self-attention parallel computation and long-distance dependence, and extracts abundant and sufficient context semantic information and specific target semantic information. And extracting a syntax vector in a syntax dependency tree corresponding to the context by combining a point-by-point convolution graph convolutional neural network, performing multi-head self-attention coding on the syntax vector to obtain syntax information codes, performing interactive fusion on the syntax information codes, the context semantic information codes, the syntax information codes and the specific target semantic information codes respectively by using multi-head interactive attention, splicing the fused result with the context semantic information codes to obtain final feature representation, fully considering the relation between the context, the specific target and the syntax information by the feature representation, preventing context words irrelevant in syntax from being identified as clues for judging target emotion classification, and improving the accuracy of emotion classification.
The following experimental demonstration is performed on the specific target emotion classification method based on attention coding and graph convolution network, which is provided by the embodiment of the application, and the demonstration process is as follows:
(1) five data sets TWITTER, REST14 and LAP14(SemEval 2014task4), REST15(SemEval 2015 task 12), REST16(SemEval 2016 task 5) were chosen.
Wherein the TWITTER data set was originally created by Tang et al, and contains tweets from social software TWITTER, which collectively include 6940 comments, each with a particular target marked therein and the emotional polarity of the particular target.
The SemEval-2014 Task4 data set is mainly used for fine-grained sentiment analysis and comprises LAP14 and REST14, the data set of each field is divided into training data, verification data (separated from the training data) and test data, and comprises 7794 comments, and each comment has a specific target marked therein and the sentiment polarity of the specific target.
The SemEval-2015 task 12 data set is mainly used for fine-grained emotion analysis and comprises REST15, the data set of each field is divided into training data, verification data (separated from the training data) and test data, and a total of 1746 comments, each comment has a specific target marked therein and the emotional polarity of the specific target.
SemEval-2016 task 5, a data set mainly used for fine-grained sentiment analysis, comprising REST16, wherein the data set of each field is divided into training data, verification data (separated from the training data) and test data, and comprises 2454 comments in total, and each comment has a specific target marked therein and the sentiment polarity of the specific target.
(2) Preprocessing the data in the dataset, specifically, initializing in a GloVe tool, converts each word in the data into a high-dimensional vector with dimension 300. And initializing and setting the weight of the model used in the experiment by using the uniform distribution.
(3) And (4) constructing a graph convolution neural network structure, wherein the dimension of all hidden layers is 300. Adam is an optimizer with a learning rate of 0.001. L is2The weight of the regularization term is set to 0.00001. Meanwhile, to prevent overfitting, the parameter dropout rate is 0.5. The coefficient of the parameter batch size is 32. The number of heads in the multi-head attention is 3, and the number of GCN layers is set to 2.
(4) And comparing the experimental results.
The invention selects Accuracy and Macro-Averaged F1 as evaluation indexes. Wherein, Accuracy is a two-classification evaluation index, the calculation mode is the ratio of the number of correctly classified sentences to the total number of sentences, Macro-Averaged F1 is a multi-classification evaluation index, and the calculation mode is as follows:
wherein, TPiRefers to the True Positive of the classification i, which refers to the number of sentences predicted classified as i and truly classified as i, FPiFalse Positive for classification i, number of sentences predicted to be classified as i but not really classified as i, TNiTrue Negative referring to classification i, where True Negative refers to the number of sentences predicted to be classified as not i and truly classified as not i, FNiFalse Negative referring to the classification i, False Negative referring to the number of sentences predicted to be classified as not i but actually i, and n is the total number of sentences.
Referring to table 1 below, it can be seen from table 1 that the performance of the specific target emotion classification method based on attention coding and graph convolution network proposed in the present application is superior to that of the conventional machine learning method. The SVM models in the table are classified by using a support vector machine, and rely on a large amount of artificial feature extraction. The specific target emotion classification method based on attention coding and graph convolution network provided by the application has no artificial feature extraction, and the accuracy rates of Twitter, lap14 and restaurant data sets are respectively 9.76%, 5.42% and 0.88% higher than that of SVM. The specific target emotion classification method based on attention coding and graph convolution network is more suitable for research of specific target emotion analysis.
The attention coding and graph convolution network-based specific target emotion classification method uses the bidirectional LSTM and combines the multi-head attention mechanism to encode the semantic, and has better effect compared with a standard multi-attention mechanism method and a method for performing semantic encoding only by using the multi-head attention mechanism. Taking MemNet as an example, the accuracy and F1 values were lower on the five data sets than in the method herein. Taking the AEN as an example, only one index (REST14, F1) was low (0.82%) in the present method among the three data sets (TWITTER, LAP14, and REST 14).
Due to the combination of the syntactic information, compared with a method without considering the syntactic information, the specific target emotion classification method based on attention coding and graph convolution network provided by the application has better effect. Taking AOA as an example, the accuracy of REST16 alone among the accuracy and F1 values on five data sets was slightly higher (0.11% higher) than the method proposed in the present application. In addition, the IAN accuracy rate and the F1 value on five data sets are lower than those of the method, and the method provided by the application is better than TNet-LF in the performance of four data sets.
The GCN combined with point-by-point convolution provided by the invention is superior to the GCN combined with aspect-specific masking layer. The ASGCN model enforces an apect-specific masking layer on top of the GCN to obtain a representation of a particular object that incorporates syntactic information. But this loses the contextual representation with the syntactic information. Of the accuracy and the F1 value of the five data sets, the method only has the three indexes of the accuracy of F1 of REST14, the accuracy of F1 of REST15 and the accuracy of REST16 which are lower than (0.87%, 1.02% and 1.60% of respecitvely) ASGCN, and has the other seven indexes which are better than the ASGCN, so that the effectiveness of the method provided by the application is proved.
TABLE 1
As above, the ablation study was performed for the specific target emotion classification method based on attention coding and graph convolution network proposed in the present application, and the results are shown in table 2.
First, there were three datasets (LAP14, REST14, and REST15) that exhibited a downslide after the GCN was removed, but TWITTER and REST16 performed better. Since the sentence comparison spoken in the TWITTER dataset and the TNet-LF method observed in combination with the above experimental results performed best on both indices of the dataset REST16, it can be concluded that the TWITTER dataset and the REST16 dataset are not very sensitive to syntactic information.
Secondly, without multi-head self-attention, the TWITTER dataset does not perform well, but the REST15 dataset results are rising, which reflects that the TWITTER dataset is very dependent on semantic information and that the application of the invention to multi-head self-attention mechanism can extract rich semantic information well, and it can be inferred that the REST15 dataset is more sensitive to syntactic information than other datasets.
Finally, it can be seen that if multi-head interaction attention is removed, the present invention does not perform well on five data sets, which also indicates that multi-head interaction attention is very important to the present invention, and that interaction of syntactic and semantic information is very important for data sets like LAP14, REST14, and REST 15.
Experimental results of ablation studies show that each step in the specific target emotion classification method based on attention coding and graph convolution network proposed in the embodiments of the present application is indispensable and effective.
TABLE 2
Referring to fig. 7, fig. 7 is a schematic structural diagram of a specific target emotion classification apparatus based on attention coding and graph convolution network according to an exemplary embodiment of the present invention. The units are included for executing the steps in the embodiments corresponding to fig. 1 and fig. 2, and refer to the related description in the embodiments corresponding to fig. 1 and fig. 2. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 7, the specific target emotion classification apparatus 7 based on attention coding and graph convolution network includes:
an obtaining unit 71, configured to obtain a word vector corresponding to a context and a word vector corresponding to a specific target;
the first neural network unit 72 is configured to input the word vector corresponding to the context and the word vector corresponding to the specific target into a preset bidirectional recurrent neural network model, so as to obtain a hidden state vector corresponding to the context and a hidden state vector corresponding to the specific target;
a second neural network unit 73, configured to extract a syntactic vector in the syntactic dependency tree corresponding to the context based on a preset graph convolutional neural network combining point-by-point convolution;
a first encoding unit 74, configured to perform multi-head self-attention encoding on the hidden state vector corresponding to the context, the hidden state vector corresponding to the specific target, and the syntax vector, so as to obtain a context semantic information code, a specific target semantic information code, and a syntax information code, respectively;
a second encoding unit 75, configured to perform multi-head interactive attention encoding on the context semantic information encoding and syntax information encoding, and the specific target semantic information encoding and syntax information encoding, respectively, to obtain a context-syntax information encoding and a specific target-syntax information encoding;
a splicing unit 76, configured to splice the context semantic information codes, the context-syntax information codes, and the specific target-syntax information codes after averaging and pooling, so as to obtain a feature representation corresponding to the specific target;
and a classification unit 77, configured to input the feature representation into a preset normalized exponential function, so as to obtain an emotion classification result of the specific target.
Optionally, the second neural network unit 73 includes:
a position weight assigning unit 731, configured to assign a function according to a position and a position weight of a specific target in a context, and obtain a position weight corresponding to each word in the context;
a first obtaining unit 732, configured to obtain a syntactic dependency tree corresponding to a context;
a second obtaining unit 733, configured to obtain an adjacency matrix corresponding to a word in the context according to the syntactic dependency tree, where the adjacency matrix reflects an adjacency relationship of the word in the context;
a third neural network unit 734, configured to input the adjacency matrix and the position weight corresponding to each word into a preset graph convolution neural network, so as to obtain an output result of an output layer;
and a point-by-point convolution unit 735, configured to perform point-by-point convolution on the output result to obtain the syntax vector.
Referring to fig. 8, fig. 8 is a schematic diagram of a specific target emotion classification apparatus based on attention coding and graph convolution network according to an exemplary embodiment of the present invention. As shown in fig. 8, the specific target emotion classification device 8 based on attention coding and graph convolution network of this embodiment includes: a processor 80, a memory 81, and a computer program 82 stored in said memory 81 and operable on said processor 80, such as a specific target emotion classification program based on attention coding and graph volume networks. The processor 80, when executing the computer program 82, implements the steps in each of the above embodiments of the method for classifying specific target emotion based on attention coding and graph convolution network, such as the steps S101 to S107 shown in fig. 1. Alternatively, the processor 80, when executing the computer program 82, implements the functions of the modules/units in the above-described device embodiments, such as the functions of the units 71 to 77 shown in fig. 7.
Illustratively, the computer program 82 may be partitioned into one or more modules/units that are stored in the memory 81 and executed by the processor 80 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 82 in the specific target emotion classification device 8 based on attention coding and graph convolution network. For example, the computer program 82 may be partitioned into an acquisition unit, a first neural network unit, a second neural network unit, a first coding unit, a second coding unit, a splicing unit, and a classification unit, each unit functioning as follows:
the acquiring unit is used for acquiring word vectors corresponding to the context and word vectors corresponding to the specific target;
the first neural network unit is used for inputting the word vector corresponding to the context and the word vector corresponding to the specific target into a preset bidirectional cyclic neural network model to obtain a hidden state vector corresponding to the context and a hidden state vector corresponding to the specific target;
the second neural network unit is used for extracting a syntactic vector in the syntactic dependency tree corresponding to the context based on a preset graph convolutional neural network combined with point-by-point convolution;
the first coding unit is used for carrying out multi-head self-attention coding on the hidden state vector corresponding to the context, the hidden state vector corresponding to the specific target and the syntax vector to respectively obtain context semantic information coding, specific target semantic information coding and syntax information coding;
the second coding unit is used for respectively carrying out multi-head interactive attention coding on the context semantic information coding and the syntax information coding and the specific target semantic information coding and the syntax information coding to obtain a context-syntax information coding and a specific target-syntax information coding;
the splicing unit is used for splicing the context semantic information codes, the context-syntax information codes and the specific target-syntax information codes after averaging and pooling to obtain the feature representation corresponding to the specific target;
and the classification unit is used for inputting the feature representation into a preset normalized index function to obtain an emotion classification result of the specific target.
The specific target emotion classification device 8 based on attention coding and graph convolution network can include, but is not limited to, a processor 80 and a memory 81. Those skilled in the art will appreciate that FIG. 8 is merely an example of a specific target emotion classification device 8 based on attention coding and graph convolution network, and does not constitute a limitation on a specific target emotion classification device 8 based on attention coding and graph convolution network, and may include more or less components than those shown, or combine some components, or different components, for example, the specific target emotion classification device 8 based on attention coding and graph convolution network may further include an input-output device, a network access device, a bus, etc.
The Processor 80 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 81 may be an internal storage unit of the attention coding and graph volume network based specific target emotion classification device 8, for example, a hard disk or a memory of the attention coding and graph volume network based specific target emotion classification device 8. The memory 81 may also be an external storage device of the specific target emotion classification device 8 based on the attention code and the graph volume network, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) and the like equipped on the specific target emotion classification device 8 based on the attention code and the graph volume network. Further, the memory 81 may also include both an internal storage unit and an external storage device of the specific target emotion classification device 8 based on the attention coding and the graph volume network. The memory 81 is used to store the computer program and other programs and data required by the specific target emotion classification apparatus based on attention coding and graph volume network. The memory 81 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice. The present invention is not limited to the above-described embodiments, and various modifications and variations of the present invention are intended to be included within the scope of the claims and the equivalent technology of the present invention if they do not depart from the spirit and scope of the present invention.