CN114896969A - Method for extracting aspect words based on deep learning - Google Patents

Method for extracting aspect words based on deep learning Download PDF

Info

Publication number
CN114896969A
CN114896969A CN202210514804.8A CN202210514804A CN114896969A CN 114896969 A CN114896969 A CN 114896969A CN 202210514804 A CN202210514804 A CN 202210514804A CN 114896969 A CN114896969 A CN 114896969A
Authority
CN
China
Prior art keywords
layer
representing
expressed
sentence
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210514804.8A
Other languages
Chinese (zh)
Inventor
杨鹏
张朋辉
戈妍妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Youhui Xin'an Technology Co ltd
Original Assignee
Nanjing Youhui Xin'an Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Youhui Xin'an Technology Co ltd filed Critical Nanjing Youhui Xin'an Technology Co ltd
Priority to CN202210514804.8A priority Critical patent/CN114896969A/en
Publication of CN114896969A publication Critical patent/CN114896969A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method for extracting facet words based on deep learning, which comprises the following steps of constructing a facet word extraction data set; embedding sentence features in the data set into a semantic space; using a multi-feature encoder to encode sentence features; encoding a sentence context using a bidirectional LSTM-based context encoding layer; extracting global semantic information of the sentence by using a global semantic information extraction layer based on a multi-head self-attention mechanism, and capturing semantic relation between the aspect words and the context; a sequence decoding layer based on the conditional random field decodes the vector learned by the model, extraction of the aspect words in the sentence is completed through sequence labeling, and the method can be used for extraction of the aspect words of the social media text. The method is based on fully learning the multi-feature of the sentence, completes the capture of the context semantic information of the word in the aspect by combining a multi-head self-attention mechanism, can still show good effect in a complex scene, and has the characteristics of high precision and strong robustness.

Description

Method for extracting aspect words based on deep learning
Technical Field
The invention relates to a method for extracting facet words based on deep learning, which can be used for extracting facet words of social media texts and belongs to the technical field of internet and natural language processing.
Background
With the continuous development of the internet, more and more netizens are used to express views and attitudes towards news events using social media (e.g., microblog, Twitter, etc.). The social media platforms gradually become sensors for the development of real world events, and network public sentiment plays more and more important roles in reflecting the civil ideas, refracting the reality and the like. Meanwhile, various bad opinions are also filled in the network, but the network space is not the other than the law, and the network public opinion is supervised by the network public opinion analysis technology, so that the network environment with harmonious health is created, and the government departments can know the civilian, properly process social public opinion events. The emotion analysis technology is an important component of public opinion analysis technology, and the quality of emotion analysis directly determines the quality analysis of public opinion analysis. The existing emotion analysis technology is document level emotion analysis and sentence level emotion analysis, and cannot meet the requirements of a public sentiment analysis system on social media content emotion details, so that the introduction of an aspect level emotion analysis technology is needed. However, the aspect word extraction is a prerequisite for the aspect level emotion analysis, and the high-quality aspect words have significance for the aspect level emotion analysis.
In recent years, many scholars have conducted intensive studies on the facet extraction technology. The methods of investigation include two types: supervised learning based methods and unsupervised learning based methods. Researchers of the supervised learning-based method for extracting the aspect words regard the aspect word extraction as a sequence labeling task, and the common research methods include a graph-based method, a semantic analysis-based method and a statistical-based method. Although the method improves the precision of the aspect word extraction to a certain extent, the method depends heavily on high-quality data labeled manually, the cost problem is caused by the manually labeled data, and the model is difficult to migrate to a new field. The method based on unsupervised learning can solve the problems of the method to a certain extent. But the unsupervised learning-based method does not fully consider the capture of word sequence information and neglects the extraction of character-level features, which leads to incomplete model-extracted aspect words.
The invention provides a method for extracting aspect words based on deep learning, which aims at solving the problem that the semantic features of sentences are not fully learned in the existing aspect word extraction research. Firstly, initially coding a sentence by using a multi-feature coding layer; then, the initial code is sent into a context coding layer of a bidirectional LSTM-based model to learn the context information of the sentence; secondly, sending the learning result of the previous layer into a global context information extraction layer based on a multi-head self-attention mechanism to learn semantic association between the aspect words; and finally, finishing sequence decoding through a sequence decoding layer based on the conditional random field to obtain an aspect word extraction result. The method improves the robustness of the aspect word extraction model and improves the extraction accuracy.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides the method for extracting the aspect words based on the deep learning.
In order to achieve the purpose, the technical scheme of the invention is as follows: a method for extracting aspect words based on deep learning covers the whole process of extracting aspect words, mainly comprises the processes of multi-feature coding, context coding, global context information extraction, word sequence decoding and the like, and can effectively extract aspect words from comment texts, so that the precision of the task is improved. The method mainly comprises three steps as follows:
step 1, constructing a facet word extraction data set. SemEval2014 resultatant dataset and Laptop dataset are collected firstly, then ACL14 Twitter public dataset is collected, and finally, the datasets are divided into training sets and verification sets according to the ratio of 8:2 and are respectively used for training and verifying the aspect extraction model.
And 2, training an aspect word extraction model. Firstly, initially coding a sentence by using a multi-feature coding layer; then, the initial code is sent into a context coding layer of a bidirectional LSTM-based model to learn the context information of the sentence; secondly, sending the learning result of the previous layer into a global context information extraction layer based on a multi-head self-attention mechanism to learn semantic association between the aspect words; and finally, finishing sequence decoding through a sequence decoding layer based on the conditional random field to obtain an aspect word extraction result. In the training stage, the loss function of the model compares the predicted value with the true value of the model and calculates the loss value, and the model parameters are updated through back propagation, so that the model parameters become better. In addition, each time one round of training is carried out, the data of the verification set is sent to the model for verification;
step 2, training an aspect word extraction model, wherein the implementation process of the step is divided into 4 sub-steps:
and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:
definition E w Representing the result of the overall embedding of three types of information, E T 、E s And E P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:
E w =E T +E S +E P (1)
then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H 0 Then there is H 0 =E w Then the process of encoding is represented as:
H i =Transformer(H i-1 ),i∈[1,L] (2)
wherein H i Represents the result of i-th layer transform encoding, and L represents the total number of layers of the transform of the Roberta-base encoder.
Next, the characters corresponding to each word are encoded, and it is assumed that the character sequence after padding is C ═ C 1 ,c 2 ,...,c n N represents the number of characters. Suppose Emb c For an embedded matrix of characters, the embedding process of the characters can be expressed as:
E c =Emb c ·C (3)
in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:
Figure BDA0003641055980000031
Figure BDA0003641055980000032
Figure BDA0003641055980000033
wherein the content of the first and second substances,
Figure BDA0003641055980000034
representing the forward hidden state output of the bi-directional LSTM,
Figure BDA0003641055980000035
representing a backward hidden state output, H, of a bi-directional LSTM C Representing the final output of the bi-directional LSTM,
Figure BDA0003641055980000036
representing the join operation of the vectors.
And finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:
Figure BDA0003641055980000037
Figure BDA0003641055980000038
wherein H CW Vector representation representing four features of fused words, positions, segments and characters, H L Representing the output of the last layer of the transform of the Roberta-base, H C Representing the final output of the bi-directional LSTM,
Figure BDA0003641055980000039
representing the join operation of the vectors.
And a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-short term memory network, wherein the process is as follows:
the context coding process based on the long-short term memory network can be expressed as follows:
Figure BDA0003641055980000041
Figure BDA0003641055980000042
Figure BDA0003641055980000043
Figure BDA0003641055980000044
wherein the content of the first and second substances,
Figure BDA0003641055980000045
represents the output of the bi-directional LSTM forward concealment layer,
Figure BDA0003641055980000046
representing the output of a bi-directional LSTM backward hidden layer, H ctx Representing the final output of the bi-directional LSTM,
Figure BDA0003641055980000047
representing the join operation of the vectors.
Input gate of LSTM cell i t And an output gate o t And forget door f t The calculation process of (a) can be expressed as:
Figure BDA0003641055980000048
Figure BDA0003641055980000049
Figure BDA00036410559800000410
output of LSTM cell
Figure BDA00036410559800000411
And
Figure BDA00036410559800000412
the calculation method of (a) can be expressed as:
Figure BDA00036410559800000413
Figure BDA00036410559800000414
wherein W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication.
And a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:
first, by converting the input vector in a linear layer, the calculation process can be expressed as:
Figure BDA00036410559800000415
wherein the content of the first and second substances,
Figure BDA00036410559800000416
to input the feature vectors obtained by the linear layer calculation,
Figure BDA00036410559800000417
and
Figure BDA00036410559800000418
respectively weight matrix and bias value.
Then, the feature vectors are combined with three weight matrices W Q 、W K And W V Are multiplied respectively to obtain q i 、k j And v j The calculation process can be expressed as:
Figure BDA0003641055980000051
Figure BDA0003641055980000052
Figure BDA0003641055980000053
then, q is i Is transferred to
Figure BDA0003641055980000054
And k is j Multiplying to obtain an attention score, and dividing the attention score by the attention score
Figure BDA0003641055980000055
Finally, the weight matrix w can be obtained through the standardization of the softmax function ij The calculation process can be expressed as:
Figure BDA0003641055980000056
after that, v is i And a weight w ij Multiplying and then adding up the output vector from the attention layer
Figure BDA0003641055980000057
The calculation process can be expressed as:
Figure BDA0003641055980000058
wherein an indicates a matrix multiplication. The k output from the attention head is
Figure BDA0003641055980000059
The vector join process of the multi-headed self-attention mechanism can be expressed as:
Figure BDA00036410559800000510
wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' att
Finally, H' att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layer
Figure BDA00036410559800000511
A matrix of weights is represented by a matrix of weights,
Figure BDA00036410559800000512
representing the bias value, the calculation process is represented as:
Figure BDA00036410559800000513
substeps 2-4 of using a conditional random field based sequence decoding layer as a sequenceAnd the column decoder finishes the extraction of the aspect words in the sentence through sequence marking. Let X be input to the sequence decoding layer as X ═ X 1 ,x 2 ,...,x m Y ═ Y for the tag sequence 1 ,y 2 ,...,y m Then the predicted calculation process can be expressed as:
Figure BDA0003641055980000061
P(Y|X)=softmax(s(X,Y)) (27)
where s (X, Y) represents the score of the tag prediction, A represents a randomly initialized matrix,
Figure BDA0003641055980000062
for representing adjacent labels y i And y i+1 The correlation of (c). H represents the output of the upper layer and,
Figure BDA0003641055980000063
denotes the y th i+1 The score of each tag. P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax being the activation function.
Finally, using Viterbi algorithm to calculate the label sequence with highest score, and using it as final prediction result
Figure BDA0003641055980000064
The calculation process can be expressed as:
Figure BDA0003641055980000065
the loss function of the model can be expressed as:
Figure BDA0003641055980000066
where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.
And 3, testing the model by using the test set. And (3) for the text to be processed, firstly, sending the text to be processed into the model obtained through the training process in the step (2), carrying out multi-feature coding, context coding, global context information extraction and the like on the sentence by using the model, and finally, completing the extraction of the aspect words by using a sequence decoder.
Compared with the prior art, the invention has the following beneficial effects:
the method fully learns the characteristics of the data set, encodes the initial characteristics of the sentence through the multi-characteristic encoding layer, excavates the deep information of the sentence through the context encoding layer, and finally learns the association between the face words through the global context information extraction layer, so that the accuracy of extracting the face words by the model is further improved, and the model has strong robustness. The method can ensure the integrity of the extracted aspect words and lay a good foundation for the aspect level emotion classification.
Drawings
FIG. 1 is a flow chart of a method of an embodiment of the present invention;
FIG. 2 is a general framework diagram of a method of an embodiment of the invention;
FIG. 3 is a diagram of the internal structure of a context coding layer based on a long-term and short-term memory network;
FIG. 4 is a detailed diagram of the global context information extraction layer based on a multi-headed self-attention mechanism.
Detailed Description
The following examples are included to further illustrate the invention and to provide a better understanding and appreciation for the invention.
Example 1: referring to fig. 1 to 4, a method for extracting an aspect word based on deep learning includes the following steps:
step 1, constructing a facet word extraction data set. SemEval2014 resultatant dataset and Laptop dataset are collected firstly, then ACL14 Twitter public dataset is collected, and finally, the datasets are divided into training sets and verification sets according to the ratio of 8:2 and are respectively used for training and verifying the aspect extraction model.
Step 2, training an aspect word extraction model, wherein the implementation process of the step is divided into 4 sub-steps:
and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:
definition E w Representing the result of the overall embedding of the three types of information, E T 、E s And E P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:
E w =E T +E S +E P (1)
then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H 0 Then there is H 0 =E w Then the process of encoding is represented as:
H i =Transformer(H i-1 ),i∈[1,L] (2)
wherein H i Represents the result of i-th layer transform encoding, and L represents the total number of layers of the transform of the Roberta-base encoder.
Next, the characters corresponding to each word are encoded, and it is assumed that the character sequence after padding is C ═ C 1 ,c 2 ,...,c n N represents the number of characters. Suppose Emb c For an embedded matrix of characters, the embedding process of the characters can be expressed as:
E c =Emb c ·C (3)
in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:
Figure BDA0003641055980000071
Figure BDA0003641055980000072
Figure BDA0003641055980000073
wherein the content of the first and second substances,
Figure BDA0003641055980000074
representing the forward hidden state output of the bi-directional LSTM,
Figure BDA0003641055980000075
representing a backward hidden state output, H, of a bi-directional LSTM C Representing the final output of the bi-directional LSTM,
Figure BDA0003641055980000081
representing the join operation of the vectors.
And finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:
Figure BDA0003641055980000082
Figure BDA0003641055980000083
wherein H CW Vector representation representing four features of fused words, positions, segments and characters, H L Representing the output of the last layer of the transform of the Roberta-base, H C Representing the final output of the bi-directional LSTM,
Figure BDA0003641055980000084
representing the join operation of the vectors.
And a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-term and short-term memory network, wherein the process is as follows:
the context coding process based on the long-short term memory network can be expressed as follows:
Figure BDA0003641055980000085
Figure BDA0003641055980000086
Figure BDA0003641055980000087
Figure BDA0003641055980000088
wherein the content of the first and second substances,
Figure BDA0003641055980000089
represents the output of the bi-directional LSTM forward hidden layer,
Figure BDA00036410559800000810
representing the output of a bi-directional LSTM backward hidden layer, H ctx Representing the final output of the bi-directional LSTM,
Figure BDA00036410559800000811
representing the join operation of the vectors.
Input gate of LSTM cell i t And an output gate o t And forget door f t The calculation process of (a) can be expressed as:
Figure BDA00036410559800000812
Figure BDA00036410559800000813
Figure BDA00036410559800000814
output of LSTM cell
Figure BDA00036410559800000815
And
Figure BDA00036410559800000816
the calculation method of (a) can be expressed as:
Figure BDA00036410559800000817
Figure BDA0003641055980000091
wherein W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication.
And a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:
first, by converting the input vector in a linear layer, the calculation process can be expressed as:
Figure BDA0003641055980000092
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003641055980000093
to input the feature vectors obtained by the linear layer calculation,
Figure BDA0003641055980000094
and
Figure BDA0003641055980000095
respectively weight matrix and bias value.
Then, the feature vectors are combined with three weight matrices W Q 、W K And W V Are multiplied respectively to obtain q i 、k j And v j The calculation process can representComprises the following steps:
Figure BDA0003641055980000096
Figure BDA0003641055980000097
Figure BDA0003641055980000098
then, q is added i Is transferred to
Figure BDA0003641055980000099
And k is j Multiplying to obtain an attention score, and dividing the attention score by the attention score
Figure BDA00036410559800000910
Finally, the weight matrix w can be obtained through the standardization of the softmax function ij The calculation process can be expressed as:
Figure BDA00036410559800000911
after that, v is adjusted i And a weight w ij Multiplying and then adding up the output vector from the attention layer
Figure BDA00036410559800000912
The calculation process can be expressed as:
Figure BDA00036410559800000913
wherein an indicates a matrix multiplication. The k output from the attention head is
Figure BDA00036410559800000914
Vector connection of multi-head self-attention mechanismThe process can be represented as:
Figure BDA00036410559800000915
wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' att
Finally, H' att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layer
Figure BDA0003641055980000101
A matrix of weights is represented by a matrix of weights,
Figure BDA0003641055980000102
representing the bias value, the calculation process is represented as:
Figure BDA0003641055980000103
and a substep 2-4, using a sequence decoding layer based on the conditional random field as a sequence decoder, and completing the extraction of the aspect words in the sentence through sequence marking. Let X be input to the sequence decoding layer as X ═ X 1 ,x 2 ,...,x m Y ═ Y for the tag sequence 1 ,y 2 ,...,y m }, the calculation process of prediction can be expressed as:
Figure BDA0003641055980000104
P(Y|X)=softmax(s(X,Y)) (27)
where s (X, Y) represents the score of the label prediction, A represents a randomly initialized matrix,
Figure BDA0003641055980000105
for representing adjacent labels y i And y i+1 The correlation of (c). H represents the output of the upper layer and,
Figure BDA0003641055980000106
denotes the y th i+1 The score of each tag. P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax being the activation function.
Finally, using Viterbi algorithm to calculate the label sequence with highest score, and using it as final prediction result
Figure BDA0003641055980000107
The calculation process can be expressed as:
Figure BDA0003641055980000108
the loss function of the model can be expressed as:
Figure BDA0003641055980000109
where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.
And 3, for the text to be processed, firstly sending the text to be processed into the model obtained through the training process in the step 2, carrying out multi-feature coding, context coding, global context information extraction and the like on the sentence by using the model, and finally obtaining the result of extracting the aspect words by using sequence decoding.
In summary, the invention firstly uses the multi-feature coding layer to initially code the sentence, then the context coding layer learns the context information of the sentence, then the global context information extraction layer learns the associated information between the face words, and finally the sequence decoding layer is used to complete the extraction of the face words.
It should be noted that the above-mentioned embodiments illustrate rather than limit the scope of the invention, and that those skilled in the art will be able to modify the invention in its various equivalent forms after reading the present invention, all within the scope of the appended claims.

Claims (5)

1. A method for extracting aspect words based on deep learning is characterized by comprising the following steps:
step 1, constructing a facet word extraction data set,
step 2, training an aspect word extraction model,
and 3, testing the data set.
2. The method for extracting the aspect words based on the deep learning of claim 1, wherein in the step 1, an aspect word extraction data set is constructed, specifically, a SemEval2014 retaurant data set and a Laptop data set are collected firstly, then an ACL14 Twitter public data set is collected, and finally the data sets are divided into a training set and a verification set according to a ratio of 8:2 and are respectively used for training and verifying an aspect word extraction model.
3. The method for extracting the aspect words based on the deep learning of claim 1, wherein the step 2 comprises, firstly, sending the sentences of the aspect words to be extracted into a multi-feature coding layer of the model to obtain an initial coding result of the sentences; then, the initial coding result of the sentence is sent into a context coding layer based on the bidirectional LSTM to carry out context coding of the sentence; then, extracting global contextual features of the sentence by a global contextual information extraction layer based on a multi-head self-attention mechanism, and capturing semantic association among the aspect words; and finally, decoding is completed through a sequence decoding layer based on the conditional random field to obtain the result of extracting the aspect words, in the training stage, the loss function of the model compares the predicted value with the true value of the model, the loss value is calculated, and the model parameters are updated through back propagation to enable the model parameters to become better.
4. The method for extracting the aspect words based on the deep learning of claim 3, wherein in the step 2, the aspect word extraction model is trained, and the implementation process of the step is divided into 4 sub-steps:
and a substep 2-1, performing initial coding on the sentence by using a multi-feature coding layer, wherein the specific process is as follows:
definition E w Representing the result of the overall embedding of three types of information, E T 、E s And E P Representing word embedding, fragment embedding and position embedding respectively, information embedding in Roberta is expressed as:
E w =E T +E S +E P (1)
then, the multi-layer Transformer encoder encodes the embedded result, and the input of the first-layer encoder is defined as H 0 Then there is H 0 =E w Then the process of encoding is represented as:
H i =Transformer(H i-1 ),i∈[1,L] (2)
wherein H i Represents the result of the i-th layer transform coding, and L represents the total number of layers of the transform of the Roberta-base encoder;
then, the characters corresponding to each word are encoded, and the filled character sequence is set as C ═ C 1 ,c 2 ,...,c n N denotes the number of characters, Emb c For an embedding matrix of characters, the embedding process of the characters is expressed as:
E c =Emb c ·C (3)
in the character encoding stage, a bidirectional long-short term memory network is used as a character encoder, and the encoding process can be expressed as follows:
Figure FDA0003641055970000021
Figure FDA0003641055970000022
Figure FDA0003641055970000023
wherein the content of the first and second substances,
Figure FDA0003641055970000024
representing the forward hidden state output of the bi-directional LSTM,
Figure FDA0003641055970000025
representing a backward hidden state output, H, of a bi-directional LSTM C Representing the final output of the bi-directional LSTM,
Figure FDA0003641055970000026
a join operation representing a vector;
and finally, fusing four different granularity characteristics of words, positions, segments and characters, wherein the process is represented as follows:
Figure FDA0003641055970000027
Figure FDA0003641055970000028
wherein H CW Vector representation representing four features of fused words, positions, segments and characters, H L Representing the output of the last layer of the transform of the Roberta-base, H C Representing the final output of the bi-directional LSTM,
Figure FDA0003641055970000029
a join operation representing a vector;
and a substep 2-2, obtaining vector representation fusing four different granularity characteristics from a multi-characteristic coding layer, and carrying out context coding on sentences in a context coding layer based on a bidirectional long-term and short-term memory network, wherein the process is as follows:
the context coding process based on the long-short term memory network can be expressed as follows:
Figure FDA00036410559700000210
Figure FDA00036410559700000211
Figure FDA0003641055970000031
Figure FDA0003641055970000032
wherein the content of the first and second substances,
Figure FDA0003641055970000033
represents the output of the bi-directional LSTM forward hidden layer,
Figure FDA0003641055970000034
representing the output of a bi-directional LSTM backward hidden layer, H ctx Representing the final output of the bi-directional LSTM,
Figure FDA0003641055970000035
a join operation that represents a vector is performed,
input gate of LSTM cell i t And an output gate o t And forget door f t The calculation processes of (a) are respectively expressed as:
Figure FDA0003641055970000036
Figure FDA0003641055970000037
Figure FDA0003641055970000038
output of LSTM cell
Figure FDA0003641055970000039
And
Figure FDA00036410559700000310
the calculation method of (a) is expressed as:
Figure FDA00036410559700000311
Figure FDA00036410559700000312
wherein, W represents a weight matrix, b represents a bias value, sigmoid and tanh represent activation functions, and x represents matrix multiplication;
and a substep 2-3, extracting the global context information of the sentence by using a global context information extraction layer based on a multi-head self-attention mechanism, wherein the specific process is as follows:
first, by converting the input vector in a linear layer, the calculation process can be expressed as:
Figure FDA00036410559700000313
wherein the content of the first and second substances,
Figure FDA00036410559700000314
to input the feature vectors obtained by the linear layer calculation,
Figure FDA00036410559700000315
and
Figure FDA00036410559700000316
respectively a weight matrix and an offset value;
then, the feature vectors are combined with three weight matrices W Q 、W K And W V Are multiplied respectively to obtain q i 、k j And v j The calculation process is expressed as:
Figure FDA00036410559700000317
Figure FDA00036410559700000318
Figure FDA00036410559700000319
then, q is added i Is transferred to
Figure FDA0003641055970000041
And k is j Multiplying to obtain an attention score, and dividing the attention score by the attention score
Figure FDA0003641055970000042
Finally, the weight matrix w can be obtained through the standardization of the softmax function ij The calculation process can be expressed as:
Figure FDA0003641055970000043
after that, v is adjusted i And a weight w ij Multiplying and then adding up the output vector from the attention layer
Figure FDA0003641055970000044
The calculation process can be expressed as:
Figure FDA0003641055970000045
wherein the output of the k-th self attention head is
Figure FDA0003641055970000046
The vector join process of the multi-headed self-attention mechanism can be expressed as:
Figure FDA0003641055970000047
wherein concat represents a vector join operation, and K outputs from the attention head are joined to obtain a hidden state vector H' att
Finally, H' att Obtaining the final output of the multi-head attention mechanism through the operation of the linear layer
Figure FDA0003641055970000048
Figure FDA0003641055970000049
A matrix of weights is represented by a matrix of weights,
Figure FDA00036410559700000410
representing the bias value, the calculation process is represented as:
Figure FDA00036410559700000411
and a substep 2-4 of using a sequence decoding layer based on the conditional random field as a sequence decoder, completing the extraction of aspect words in the sentence through sequence marking, and assuming that the input of the sequence decoding layer is X ═ { X ═ X 1 ,x 2 ,...,x m Y ═ Y for the tag sequence 1 ,y 2 ,...,y m And then the predicted calculation process is represented as:
Figure FDA00036410559700000412
P(Y|X)=softmax(s(X,Y)) (27)
where s (X, Y) represents the score of the tag prediction, A represents a randomly initialized matrix,
Figure FDA00036410559700000413
for representing adjacent labels y i And y i+1 H represents the output of the upper layer,
Figure FDA00036410559700000414
denotes the y th i+1 The score of each label, P (Y | X) represents the conditional probability of Y occurring under the condition of X, softmax is an activation function, and finally, a Viterbi algorithm is used for calculating the label sequence with the highest score to serve as a final prediction result
Figure FDA0003641055970000051
The calculation process is expressed as:
Figure FDA0003641055970000052
the loss function of the model can be expressed as:
Figure FDA0003641055970000053
where ln represents a natural logarithm, and P (Y | X) represents a conditional probability of Y occurring under the condition of X.
5. The method for extracting the aspect words based on the deep learning of claim 1, wherein in step 3, the model is tested by using a test set, specifically, for the text to be processed, the text to be processed is firstly sent to the model obtained through the training process in step 2, the model performs the steps of multi-feature coding, context coding, global context information extraction and the like on the sentence, and finally, the extraction of the aspect words is completed by using a sequence decoder.
CN202210514804.8A 2022-05-12 2022-05-12 Method for extracting aspect words based on deep learning Pending CN114896969A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210514804.8A CN114896969A (en) 2022-05-12 2022-05-12 Method for extracting aspect words based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210514804.8A CN114896969A (en) 2022-05-12 2022-05-12 Method for extracting aspect words based on deep learning

Publications (1)

Publication Number Publication Date
CN114896969A true CN114896969A (en) 2022-08-12

Family

ID=82722227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210514804.8A Pending CN114896969A (en) 2022-05-12 2022-05-12 Method for extracting aspect words based on deep learning

Country Status (1)

Country Link
CN (1) CN114896969A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737922A (en) * 2023-03-10 2023-09-12 云南大学 Tourist online comment fine granularity emotion analysis method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737922A (en) * 2023-03-10 2023-09-12 云南大学 Tourist online comment fine granularity emotion analysis method and system

Similar Documents

Publication Publication Date Title
CN113158665B (en) Method for improving dialog text generation based on text abstract generation and bidirectional corpus generation
Xie et al. Attention-based dense LSTM for speech emotion recognition
CN113255755A (en) Multi-modal emotion classification method based on heterogeneous fusion network
CN111143563A (en) Text classification method based on integration of BERT, LSTM and CNN
CN110866542A (en) Depth representation learning method based on feature controllable fusion
CN113657115B (en) Multi-mode Mongolian emotion analysis method based on ironic recognition and fine granularity feature fusion
CN112800768A (en) Training method and device for nested named entity recognition model
CN113392717A (en) Video dense description generation method based on time sequence characteristic pyramid
Zhu et al. Multi-scale temporal network for continuous sign language recognition
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN114648031A (en) Text aspect level emotion recognition method based on bidirectional LSTM and multi-head attention mechanism
CN110929476B (en) Task type multi-round dialogue model construction method based on mixed granularity attention mechanism
CN116245110A (en) Multi-dimensional information fusion user standing detection method based on graph attention network
CN113051904B (en) Link prediction method for small-scale knowledge graph
CN114896969A (en) Method for extracting aspect words based on deep learning
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN114692604A (en) Deep learning-based aspect-level emotion classification method
CN115906816A (en) Text emotion analysis method of two-channel Attention model based on Bert
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN114662456A (en) Image ancient poem generation method based on Faster R-convolutional neural network detection model
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN117668213B (en) Chaotic engineering abstract generation method based on cascade extraction and graph comparison model
CN116882398B (en) Implicit chapter relation recognition method and system based on phrase interaction
CN114996424B (en) Weak supervision cross-domain question-answer pair generation method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination