CN114896969A - Method for extracting aspect words based on deep learning - Google Patents
Method for extracting aspect words based on deep learning Download PDFInfo
- Publication number
- CN114896969A CN114896969A CN202210514804.8A CN202210514804A CN114896969A CN 114896969 A CN114896969 A CN 114896969A CN 202210514804 A CN202210514804 A CN 202210514804A CN 114896969 A CN114896969 A CN 114896969A
- Authority
- CN
- China
- Prior art keywords
- layer
- representing
- expressed
- sentence
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method for extracting facet words based on deep learning, which comprises the following steps of constructing a facet word extraction data set; embedding sentence features in the data set into a semantic space; using a multi-feature encoder to encode sentence features; encoding a sentence context using a bidirectional LSTM-based context encoding layer; extracting global semantic information of the sentence by using a global semantic information extraction layer based on a multi-head self-attention mechanism, and capturing semantic relation between the aspect words and the context; a sequence decoding layer based on the conditional random field decodes the vector learned by the model, extraction of the aspect words in the sentence is completed through sequence labeling, and the method can be used for extraction of the aspect words of the social media text. The method is based on fully learning the multi-feature of the sentence, completes the capture of the context semantic information of the word in the aspect by combining a multi-head self-attention mechanism, can still show good effect in a complex scene, and has the characteristics of high precision and strong robustness.
Description
Technical Field
The invention relates to a method for extracting facet words based on deep learning, which can be used for extracting facet words of social media texts and belongs to the technical field of internet and natural language processing.
Background
With the continuous development of the internet, more and more netizens are used to express views and attitudes towards news events using social media (e.g., microblog, Twitter, etc.). The social media platforms gradually become sensors for the development of real world events, and network public sentiment plays more and more important roles in reflecting the civil ideas, refracting the reality and the like. Meanwhile, various bad opinions are also filled in the network, but the network space is not the other than the law, and the network public opinion is supervised by the network public opinion analysis technology, so that the network environment with harmonious health is created, and the government departments can know the civilian, properly process social public opinion events. The emotion analysis technology is an important component of public opinion analysis technology, and the quality of emotion analysis directly determines the quality analysis of public opinion analysis. The existing emotion analysis technology is document level emotion analysis and sentence level emotion analysis, and cannot meet the requirements of a public sentiment analysis system on social media content emotion details, so that the introduction of an aspect level emotion analysis technology is needed. However, the aspect word extraction is a prerequisite for the aspect level emotion analysis, and the high-quality aspect words have significance for the aspect level emotion analysis.
In recent years, many scholars have conducted intensive studies on the facet extraction technology. The methods of investigation include two types: supervised learning based methods and unsupervised learning based methods. Researchers of the supervised learning-based method for extracting the aspect words regard the aspect word extraction as a sequence labeling task, and the common research methods include a graph-based method, a semantic analysis-based method and a statistical-based method. Although the method improves the precision of the aspect word extraction to a certain extent, the method depends heavily on high-quality data labeled manually, the cost problem is caused by the manually labeled data, and the model is difficult to migrate to a new field. The method based on unsupervised learning can solve the problems of the method to a certain extent. But the unsupervised learning-based method does not fully consider the capture of word sequence information and neglects the extraction of character-level features, which leads to incomplete model-extracted aspect words.
The invention provides a method for extracting aspect words based on deep learning, which aims at solving the problem that the semantic features of sentences are not fully learned in the existing aspect word extraction research. Firstly, initially coding a sentence by using a multi-feature coding layer; then, the initial code is sent into a context coding layer of a bidirectional LSTM-based model to learn the context information of the sentence; secondly, sending the learning result of the previous layer into a global context information extraction layer based on a multi-head self-attention mechanism to learn semantic association between the aspect words; and finally, finishing sequence decoding through a sequence decoding layer based on the conditional random field to obtain an aspect word extraction result. The method improves the robustness of the aspect word extraction model and improves the extraction accuracy.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides the method for extracting the aspect words based on the deep learning.
In order to achieve the purpose, the technical scheme of the invention is as follows: a method for extracting aspect words based on deep learning covers the whole process of extracting aspect words, mainly comprises the processes of multi-feature coding, context coding, global context information extraction, word sequence decoding and the like, and can effectively extract aspect words from comment texts, so that the precision of the task is improved. The method mainly comprises three steps as follows:
And 2, training an aspect word extraction model. Firstly, initially coding a sentence by using a multi-feature coding layer; then, the initial code is sent into a context coding layer of a bidirectional LSTM-based model to learn the context information of the sentence; secondly, sending the learning result of the previous layer into a global context information extraction layer based on a multi-head self-attention mechanism to learn semantic association between the aspect words; and finally, finishing sequence decoding through a sequence decoding layer based on the conditional random field to obtain an aspect word extraction result. In the training stage, the loss function of the model compares the predicted value with the true value of the model and calculates the loss value, and the model parameters are updated through back propagation, so that the model parameters become better. In addition, each time one round of training is carried out, the data of the verification set is sent to the model for verification;
step 2, training an aspect word extraction model, wherein the implementation process of the step is divided into 4 sub-steps:
and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:
definition E w Representing the result of the overall embedding of three types of information, E T 、E s And E P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:
E w =E T +E S +E P (1)
then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H 0 Then there is H 0 =E w Then the process of encoding is represented as:
H i =Transformer(H i-1 ),i∈[1,L] (2)
wherein H i Represents the result of i-th layer transform encoding, and L represents the total number of layers of the transform of the Roberta-base encoder.
Next, the characters corresponding to each word are encoded, and it is assumed that the character sequence after padding is C ═ C 1 ,c 2 ,...,c n N represents the number of characters. Suppose Emb c For an embedded matrix of characters, the embedding process of the characters can be expressed as:
E c =Emb c ·C (3)
in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:
wherein the content of the first and second substances,representing the forward hidden state output of the bi-directional LSTM,representing a backward hidden state output, H, of a bi-directional LSTM C Representing the final output of the bi-directional LSTM,representing the join operation of the vectors.
And finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:
wherein H CW Vector representation representing four features of fused words, positions, segments and characters, H L Representing the output of the last layer of the transform of the Roberta-base, H C Representing the final output of the bi-directional LSTM,representing the join operation of the vectors.
And a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-short term memory network, wherein the process is as follows:
the context coding process based on the long-short term memory network can be expressed as follows:
wherein the content of the first and second substances,represents the output of the bi-directional LSTM forward concealment layer,representing the output of a bi-directional LSTM backward hidden layer, H ctx Representing the final output of the bi-directional LSTM,representing the join operation of the vectors.
Input gate of LSTM cell i t And an output gate o t And forget door f t The calculation process of (a) can be expressed as:
wherein W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication.
And a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:
first, by converting the input vector in a linear layer, the calculation process can be expressed as:
wherein the content of the first and second substances,to input the feature vectors obtained by the linear layer calculation,andrespectively weight matrix and bias value.
Then, the feature vectors are combined with three weight matrices W Q 、W K And W V Are multiplied respectively to obtain q i 、k j And v j The calculation process can be expressed as:
then, q is i Is transferred toAnd k is j Multiplying to obtain an attention score, and dividing the attention score by the attention scoreFinally, the weight matrix w can be obtained through the standardization of the softmax function ij The calculation process can be expressed as:
after that, v is i And a weight w ij Multiplying and then adding up the output vector from the attention layerThe calculation process can be expressed as:
wherein an indicates a matrix multiplication. The k output from the attention head isThe vector join process of the multi-headed self-attention mechanism can be expressed as:
wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' att 。
Finally, H' att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layerA matrix of weights is represented by a matrix of weights,representing the bias value, the calculation process is represented as:
substeps 2-4 of using a conditional random field based sequence decoding layer as a sequenceAnd the column decoder finishes the extraction of the aspect words in the sentence through sequence marking. Let X be input to the sequence decoding layer as X ═ X 1 ,x 2 ,...,x m Y ═ Y for the tag sequence 1 ,y 2 ,...,y m Then the predicted calculation process can be expressed as:
P(Y|X)=softmax(s(X,Y)) (27)
where s (X, Y) represents the score of the tag prediction, A represents a randomly initialized matrix,for representing adjacent labels y i And y i+1 The correlation of (c). H represents the output of the upper layer and,denotes the y th i+1 The score of each tag. P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax being the activation function.
Finally, using Viterbi algorithm to calculate the label sequence with highest score, and using it as final prediction resultThe calculation process can be expressed as:
the loss function of the model can be expressed as:
where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.
And 3, testing the model by using the test set. And (3) for the text to be processed, firstly, sending the text to be processed into the model obtained through the training process in the step (2), carrying out multi-feature coding, context coding, global context information extraction and the like on the sentence by using the model, and finally, completing the extraction of the aspect words by using a sequence decoder.
Compared with the prior art, the invention has the following beneficial effects:
the method fully learns the characteristics of the data set, encodes the initial characteristics of the sentence through the multi-characteristic encoding layer, excavates the deep information of the sentence through the context encoding layer, and finally learns the association between the face words through the global context information extraction layer, so that the accuracy of extracting the face words by the model is further improved, and the model has strong robustness. The method can ensure the integrity of the extracted aspect words and lay a good foundation for the aspect level emotion classification.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a general framework diagram of a method of an embodiment of the invention;
FIG. 3 is a diagram of the internal structure of a context coding layer based on a long-term and short-term memory network;
FIG. 4 is a detailed diagram of the global context information extraction layer based on a multi-headed self-attention mechanism.
Detailed Description
The following examples are included to further illustrate the invention and to provide a better understanding and appreciation for the invention.
Example 1: referring to fig. 1 to 4, a method for extracting an aspect word based on deep learning includes the following steps:
Step 2, training an aspect word extraction model, wherein the implementation process of the step is divided into 4 sub-steps:
and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:
definition E w Representing the result of the overall embedding of the three types of information, E T 、E s And E P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:
E w =E T +E S +E P (1)
then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H 0 Then there is H 0 =E w Then the process of encoding is represented as:
H i =Transformer(H i-1 ),i∈[1,L] (2)
wherein H i Represents the result of i-th layer transform encoding, and L represents the total number of layers of the transform of the Roberta-base encoder.
Next, the characters corresponding to each word are encoded, and it is assumed that the character sequence after padding is C ═ C 1 ,c 2 ,...,c n N represents the number of characters. Suppose Emb c For an embedded matrix of characters, the embedding process of the characters can be expressed as:
E c =Emb c ·C (3)
in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:
wherein the content of the first and second substances,representing the forward hidden state output of the bi-directional LSTM,representing a backward hidden state output, H, of a bi-directional LSTM C Representing the final output of the bi-directional LSTM,representing the join operation of the vectors.
And finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:
wherein H CW Vector representation representing four features of fused words, positions, segments and characters, H L Representing the output of the last layer of the transform of the Roberta-base, H C Representing the final output of the bi-directional LSTM,representing the join operation of the vectors.
And a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-term and short-term memory network, wherein the process is as follows:
the context coding process based on the long-short term memory network can be expressed as follows:
wherein the content of the first and second substances,represents the output of the bi-directional LSTM forward hidden layer,representing the output of a bi-directional LSTM backward hidden layer, H ctx Representing the final output of the bi-directional LSTM,representing the join operation of the vectors.
Input gate of LSTM cell i t And an output gate o t And forget door f t The calculation process of (a) can be expressed as:
wherein W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication.
And a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:
first, by converting the input vector in a linear layer, the calculation process can be expressed as:
wherein, the first and the second end of the pipe are connected with each other,to input the feature vectors obtained by the linear layer calculation,andrespectively weight matrix and bias value.
Then, the feature vectors are combined with three weight matrices W Q 、W K And W V Are multiplied respectively to obtain q i 、k j And v j The calculation process can representComprises the following steps:
then, q is added i Is transferred toAnd k is j Multiplying to obtain an attention score, and dividing the attention score by the attention scoreFinally, the weight matrix w can be obtained through the standardization of the softmax function ij The calculation process can be expressed as:
after that, v is adjusted i And a weight w ij Multiplying and then adding up the output vector from the attention layerThe calculation process can be expressed as:
wherein an indicates a matrix multiplication. The k output from the attention head isVector connection of multi-head self-attention mechanismThe process can be represented as:
wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' att 。
Finally, H' att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layerA matrix of weights is represented by a matrix of weights,representing the bias value, the calculation process is represented as:
and a substep 2-4, using a sequence decoding layer based on the conditional random field as a sequence decoder, and completing the extraction of the aspect words in the sentence through sequence marking. Let X be input to the sequence decoding layer as X ═ X 1 ,x 2 ,...,x m Y ═ Y for the tag sequence 1 ,y 2 ,...,y m }, the calculation process of prediction can be expressed as:
P(Y|X)=softmax(s(X,Y)) (27)
where s (X, Y) represents the score of the label prediction, A represents a randomly initialized matrix,for representing adjacent labels y i And y i+1 The correlation of (c). H represents the output of the upper layer and,denotes the y th i+1 The score of each tag. P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax being the activation function.
Finally, using Viterbi algorithm to calculate the label sequence with highest score, and using it as final prediction resultThe calculation process can be expressed as:
the loss function of the model can be expressed as:
where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.
And 3, for the text to be processed, firstly sending the text to be processed into the model obtained through the training process in the step 2, carrying out multi-feature coding, context coding, global context information extraction and the like on the sentence by using the model, and finally obtaining the result of extracting the aspect words by using sequence decoding.
In summary, the invention firstly uses the multi-feature coding layer to initially code the sentence, then the context coding layer learns the context information of the sentence, then the global context information extraction layer learns the associated information between the face words, and finally the sequence decoding layer is used to complete the extraction of the face words.
It should be noted that the above-mentioned embodiments illustrate rather than limit the scope of the invention, and that those skilled in the art will be able to modify the invention in its various equivalent forms after reading the present invention, all within the scope of the appended claims.
Claims (5)
1. A method for extracting aspect words based on deep learning is characterized by comprising the following steps:
step 1, constructing a facet word extraction data set,
step 2, training an aspect word extraction model,
and 3, testing the data set.
2. The method for extracting the aspect words based on the deep learning of claim 1, wherein in the step 1, an aspect word extraction data set is constructed, specifically, a SemEval2014 retaurant data set and a Laptop data set are collected firstly, then an ACL14 Twitter public data set is collected, and finally the data sets are divided into a training set and a verification set according to a ratio of 8:2 and are respectively used for training and verifying an aspect word extraction model.
3. The method for extracting the aspect words based on the deep learning of claim 1, wherein the step 2 comprises, firstly, sending the sentences of the aspect words to be extracted into a multi-feature coding layer of the model to obtain an initial coding result of the sentences; then, the initial coding result of the sentence is sent into a context coding layer based on the bidirectional LSTM to carry out context coding of the sentence; then, extracting global contextual features of the sentence by a global contextual information extraction layer based on a multi-head self-attention mechanism, and capturing semantic association among the aspect words; and finally, decoding is completed through a sequence decoding layer based on the conditional random field to obtain the result of extracting the aspect words, in the training stage, the loss function of the model compares the predicted value with the true value of the model, the loss value is calculated, and the model parameters are updated through back propagation to enable the model parameters to become better.
4. The method for extracting the aspect words based on the deep learning of claim 3, wherein in the step 2, the aspect word extraction model is trained, and the implementation process of the step is divided into 4 sub-steps:
and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:
definition E w Representing the result of the overall embedding of three types of information, E T 、E s And E P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:
E w =E T +E S +E P (1)
then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H 0 Then there is H 0 =E w Then the process of encoding is represented as:
H i =Transformer(H i-1 ),i∈[1,L] (2)
wherein H i Represents the result of the i-th layer transform coding, and L represents the total number of layers of the transform of the Roberta-base encoder;
then, the characters corresponding to each word are encoded, and the filled character sequence is set as C ═ C 1 ,c 2 ,...,c n N denotes the number of characters, Emb c For an embedding matrix of characters, the embedding process of the characters is expressed as:
E c =Emb c ·C (3)
in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:
wherein the content of the first and second substances,representing the forward hidden state output of the bi-directional LSTM,representing a backward hidden state output, H, of a bi-directional LSTM C Representing the final output of the bi-directional LSTM,a join operation representing a vector;
and finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:
wherein H CW Vector representation representing four features of fused words, positions, segments and characters, H L Representing the output of the last layer of the transform of the Roberta-base, H C Representing the final output of the bi-directional LSTM,a join operation representing a vector;
and a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-term and short-term memory network, wherein the process is as follows:
the context coding process based on the long-short term memory network can be expressed as follows:
wherein the content of the first and second substances,represents the output of the bi-directional LSTM forward hidden layer,representing the output of a bi-directional LSTM backward hidden layer, H ctx Representing the final output of the bi-directional LSTM,a join operation that represents a vector is performed,
input gate of LSTM cell i t And an output gate o t And forget door f t The calculation processes of (a) are respectively expressed as:
wherein, W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication;
and a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:
first, by converting the input vector in a linear layer, the calculation process can be expressed as:
wherein the content of the first and second substances,to input the feature vectors obtained by the linear layer calculation,andrespectively a weight matrix and an offset value;
then, the feature vectors are combined with three weight matrices W Q 、W K And W V Are multiplied respectively to obtain q i 、k j And v j The calculation process is expressed as:
then, q is added i Is transferred toAnd k is j Multiplying to obtain an attention score, and dividing the attention score by the attention scoreFinally, the weight matrix w can be obtained through the standardization of the softmax function ij The calculation process can be expressed as:
after that, v is adjusted i And a weight w ij Multiplying and then adding up the output vector from the attention layerThe calculation process can be expressed as:
wherein the output of the k-th self attention head isThe vector join process of the multi-headed self-attention mechanism can be expressed as:
wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' att ;
Finally, H' att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layer A matrix of weights is represented by a matrix of weights,representing the bias value, the calculation process is represented as:
and a substep 2-4 of using a sequence decoding layer based on the conditional random field as a sequence decoder, completing the extraction of aspect words in the sentence through sequence marking, and assuming that the input of the sequence decoding layer is X ═ { X ═ X 1 ,x 2 ,...,x m Y ═ Y for the tag sequence 1 ,y 2 ,...,y m And then the predicted calculation process is represented as:
P(Y|X)=softmax(s(X,Y)) (27)
where s (X, Y) represents the score of the tag prediction, A represents a randomly initialized matrix,for representing adjacent labels y i And y i+1 H represents the output of the upper layer,denotes the y th i+1 The score of each label, P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax is an activation function, and finally, a Viterbi algorithm is used for calculating the label sequence with the highest score to serve as a final prediction resultThe calculation process is expressed as:
the loss function of the model can be expressed as:
where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.
5. The method for extracting the aspect words based on the deep learning of claim 1, wherein in step 3, the model is tested by using a test set, specifically, for the text to be processed, the text to be processed is firstly sent to the model obtained through the training process in step 2, the model performs the steps of multi-feature coding, context coding, global context information extraction and the like on the sentence, and finally, the extraction of the aspect words is completed by using a sequence decoder.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210514804.8A CN114896969A (en) | 2022-05-12 | 2022-05-12 | Method for extracting aspect words based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210514804.8A CN114896969A (en) | 2022-05-12 | 2022-05-12 | Method for extracting aspect words based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114896969A true CN114896969A (en) | 2022-08-12 |
Family
ID=82722227
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210514804.8A Pending CN114896969A (en) | 2022-05-12 | 2022-05-12 | Method for extracting aspect words based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114896969A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737922A (en) * | 2023-03-10 | 2023-09-12 | 云南大学 | Tourist online comment fine granularity emotion analysis method and system |
-
2022
- 2022-05-12 CN CN202210514804.8A patent/CN114896969A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737922A (en) * | 2023-03-10 | 2023-09-12 | 云南大学 | Tourist online comment fine granularity emotion analysis method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113158665B (en) | Method for improving dialog text generation based on text abstract generation and bidirectional corpus generation | |
Xie et al. | Attention-based dense LSTM for speech emotion recognition | |
CN113255755A (en) | Multi-modal emotion classification method based on heterogeneous fusion network | |
CN111143563A (en) | Text classification method based on integration of BERT, LSTM and CNN | |
CN110866542A (en) | Depth representation learning method based on feature controllable fusion | |
CN113657115B (en) | Multi-mode Mongolian emotion analysis method based on ironic recognition and fine granularity feature fusion | |
CN112800768A (en) | Training method and device for nested named entity recognition model | |
CN113392717A (en) | Video dense description generation method based on time sequence characteristic pyramid | |
Zhu et al. | Multi-scale temporal network for continuous sign language recognition | |
CN111145914B (en) | Method and device for determining text entity of lung cancer clinical disease seed bank | |
CN114168754A (en) | Relation extraction method based on syntactic dependency and fusion information | |
CN114648031A (en) | Text aspect level emotion recognition method based on bidirectional LSTM and multi-head attention mechanism | |
CN110929476B (en) | Task type multi-round dialogue model construction method based on mixed granularity attention mechanism | |
CN116245110A (en) | Multi-dimensional information fusion user standing detection method based on graph attention network | |
CN113051904B (en) | Link prediction method for small-scale knowledge graph | |
CN114896969A (en) | Method for extracting aspect words based on deep learning | |
CN116258147A (en) | Multimode comment emotion analysis method and system based on heterogram convolution | |
CN114692604A (en) | Deep learning-based aspect-level emotion classification method | |
CN115906816A (en) | Text emotion analysis method of two-channel Attention model based on Bert | |
CN115455144A (en) | Data enhancement method of completion type space filling type for small sample intention recognition | |
CN114662456A (en) | Image ancient poem generation method based on Faster R-convolutional neural network detection model | |
CN114238649A (en) | Common sense concept enhanced language model pre-training method | |
CN117668213B (en) | Chaotic engineering abstract generation method based on cascade extraction and graph comparison model | |
CN116882398B (en) | Implicit chapter relation recognition method and system based on phrase interaction | |
CN114996424B (en) | Weak supervision cross-domain question-answer pair generation method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |