CN117454873A - Ironic detection method and system based on knowledge-enhanced neural network model - Google Patents
Ironic detection method and system based on knowledge-enhanced neural network model Download PDFInfo
- Publication number
- CN117454873A CN117454873A CN202311374400.4A CN202311374400A CN117454873A CN 117454873 A CN117454873 A CN 117454873A CN 202311374400 A CN202311374400 A CN 202311374400A CN 117454873 A CN117454873 A CN 117454873A
- Authority
- CN
- China
- Prior art keywords
- model
- text
- knowledge
- ironic
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 60
- 238000003062 neural network model Methods 0.000 title claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 64
- 230000015654 memory Effects 0.000 claims abstract description 34
- 230000007246 mechanism Effects 0.000 claims abstract description 29
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 15
- 238000005457 optimization Methods 0.000 claims abstract description 15
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 25
- 230000004913 activation Effects 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- YTAHJIFKAKIKAV-XNMGPUDCSA-N [(1R)-3-morpholin-4-yl-1-phenylpropyl] N-[(3S)-2-oxo-5-phenyl-1,3-dihydro-1,4-benzodiazepin-3-yl]carbamate Chemical compound O=C1[C@H](N=C(C2=C(N1)C=CC=C2)C1=CC=CC=C1)NC(O[C@H](CCN1CCOCC1)C1=CC=CC=C1)=O YTAHJIFKAKIKAV-XNMGPUDCSA-N 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000007787 long-term memory Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000006403 short-term memory Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 101150083764 KCNK9 gene Proteins 0.000 description 1
- 241000562358 Potamia Species 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a irony detection method and a irony detection system based on a knowledge-enhanced neural network model, comprising the following steps: s1, screening context information highly related to a text to be detected from an external knowledge source and integrating the context information with an original text; s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner; s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; s4, classifying and training, and improving the model through an optimization algorithm to obtain a final knowledge-reinforced ironic detection model; s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results; the invention enhances the ironic understanding of the model, enables the model to effectively capture more complex language modes, and remarkably improves the ironic detection accuracy and robustness.
Description
Technical Field
The invention relates to the technical field of natural language processing and text mining, in particular to a irony detection method and system based on a knowledge-enhanced neural network model.
Background
With the popularity of social media and online platforms, the amount of text data (especially user-generated content) has grown exponentially, and the frequent inclusion of sarcasm and darkness mock in such text data has made challenges for such tasks as emotion analysis, public opinion monitoring, and natural language understanding, and thus sarcasm detection has become an important research direction in the field of natural language processing.
Early research methods relied primarily on manual extraction of features such as word frequency, emotion vocabulary, and traditional machine learning algorithms such as Support Vector Machines (SVMs), decision trees, and random forests.
With the rapid development of deep learning technology, models based on Convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), and long-short-term memory networks (LSTM) exhibit superior performance in text representation learning, feature automatic extraction, and classification accuracy.
However, most irony detection models still focus today mainly on analyzing semantic and structural features inside text, relatively ignoring possible close associations between text and external knowledge sources; further complicating, irony and dark mock are often highly context dependent, often making it difficult to obtain satisfactory detection performance by means of a single text analysis method alone.
Therefore, how to integrate the external knowledge sources effectively and achieve high accuracy and robustness in irony detection tasks is a problem that those skilled in the art are urgent to solve.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for irony detection based on knowledge-enhanced neural network model to solve the problems mentioned in the background art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for ironic detection based on a knowledge-enhanced neural network model, comprising the steps of:
s1, screening context information highly related to a text to be detected from an external knowledge source, and integrating the screened context information with an original text;
s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner;
s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text;
s4, embedding and inputting words of a pre-training language model RoBERTa into the coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, and improving the model through an optimization algorithm to obtain a final knowledge-enhanced irony detection model;
s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results.
Preferably, the specific content of step S1 is:
s11, respectively acquiring sentences most related to the original text from different external knowledge sources to serve as candidate contexts;
s12, sequencing all candidate contexts by using a BERTSCore text similarity algorithm, and selecting the candidate context which is most matched with the original text as the context of the original text according to the text similarity score;
s13, connecting the extracted context with the original text by using an EOS label representing the end of the sequence.
Preferably, the BERTSCore text similarity algorithm is specifically:
wherein A, B is the text to be detected, the text in the external knowledge source, respectively.
Preferably, the specific steps for obtaining the pre-trained language model RoBERTa are:
collecting a large amount of unlabeled text data d= (D) 1 ,d 2 ,…d n );
For each document d i Word segmentation is carried out to obtain a word sequence T= { T 1 ,t 2 ,…,t ni };
Initializing a word embedding matrix e= { E 1 ,e 2 ,…,e V E, where e i The embedded vector of the i-th word in the vocabulary, and V is the size of the vocabulary;
training the pre-training language model based on a transducer architecture and a self-attention mechanism thereof, optimizing model parameters by using maximum likelihood estimation, and obtaining the pre-training language model RoBERTa after training is completed.
Preferably, the specific content of the training of the step S4 model is as follows:
s41, dividing the integrated text data into a training set and a testing set, and converting words of the training set into word embedding matrixes X= { X by using a pre-training language model RoBERTa 1 ,x 2 ,…,x N Inputting into the coding model;
s42, obtaining a hidden state sequence through an 8-layer two-way long short-time memory BiLSTM network of the coding model;
s43, introducing a multi-head attention mechanism to obtain the attention weight and the average weight of each layer;
s44, inputting the full-connection layer according to the average weight, constructing a classifier by using an activation function softmax, and classifying the output of the full-connection layer;
s45, performing model optimization by using a cross entropy loss function and an Adam optimizer according to the classification result;
s46, updating the weight of the model according to the gradient of the loss function so as to improve the performance of the model.
Preferably, the self-attention mechanism is specifically:
wherein Q, K, V is respectively a query, a key and a value matrix,d k is the dimension of the key.
Preferably, step S4 further includes model test and evaluation: and evaluating, analyzing and summarizing the test result of the model through accuracy, recall rate and F value, and optimizing and improving the performance of the model.
Preferably, the 8-layer bidirectional long-short-time memory BiLSTM network is constructed specifically as follows:
wherein H is t Hidden state, X, for the t-th time step t An input for a t-th time step;
attention weight alpha of each layer l The method comprises the following steps:
wherein T is the length of the sequence,similarity of the target time step t and the source time step j in the first layer BiLSTM;
wherein,the hidden state of the target sequence at the first layer at time t-1; />A hidden state of the source sequence at the first layer at time j; a is a learnable function;
average weightThe method comprises the following steps:
the inputs to the fully connected layer and softmax classifier are expressed as:
the full connection layer F is:
F(D)=W f ·D+b f
wherein W is f 、b f The weight and the bias of the full connection layer are respectively;
classifier C constructed using the activation function softmax was:
C(F)=soft max(W c ·F+b c )
wherein W is c 、b c Respectively the weight and the bias of the classifier;
the cross entropy loss function L is:
wherein N is the number of marked samples, y i For the actual tag of the ith sample, C (x i ) Model predictive value representing the ith sample, ranging from [0,1]。
Preferably, the rule for optimization using Adam optimizer is specifically:
m t =β 1 m t-1 +(1-β 1 )g t
wherein m is t 、v t Estimated values of first moment and second moment respectively, beta 1 And beta 2 G is the attenuation factor t For the gradient of the loss function L with respect to the model parameter σ, α represents the learning rate, ε is a small constant that prevents division by zero, σ t Is the model parameter at time t.
A knowledge-based neural network model-based ironic detection system, based on the ironic detection method based on the knowledge-based neural network model, comprising: the system comprises a text acquisition module, a text integration module, a knowledge-enhanced ironic detection model and a model construction training module;
knowledge-enhanced ironic detection models include a pre-trained language model RoBERTa and a bi-directional long-short-term memory BiLSTM model;
the text acquisition module is used for acquiring the text to be detected and an external knowledge source;
the text integrating module is used for screening context information highly related to the text to be detected from an external knowledge source and integrating the screened context information with the original text;
the pre-training language model RoBERTa is used for word embedding of the integrated text data and initializing the weight of the bi-directional long-short-term memory BiLSTM network;
the model construction training module is used for constructing a coding model consisting of an 8-layer two-way long-short-time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short-time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; embedding and inputting words of a pre-training language model RoBERTa into a coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, improving the model through an optimization algorithm to obtain a two-way long-short-time memory BiLSTM model and a final knowledge-enhanced irony detection model;
the knowledge-enhanced ironic detection model is used for inputting the acquired text to be detected and outputting ironic detection results.
Compared with the prior art, the ironic detection method and system based on the knowledge enhancement neural network model are provided, and the context information is provided by combining an external knowledge source, so that the ironic understanding of the model is enhanced, and the ironic detection method and system have multifunction and robustness compared with the model without using the external information;
different from the prior work mainly using a simpler neural network architecture, the invention adopts a multi-layer method, comprising a pre-training language model RoBERTa, an 8-layer two-way long short-time memory BiLSTM network and a multi-head attention mechanism, so that the model can effectively capture more complex language modes, and the accuracy and the robustness of irony detection are obviously improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of the overall framework topology of a method for ironic detection based on a knowledge-enhanced neural network model provided by the present invention;
FIG. 2 is a schematic diagram of the topology of the pre-trained language model RoBERTa provided by the present invention;
fig. 3 is a schematic diagram of a bidirectional long-short-term memory BiLSTM network model and a multi-head attention mechanism according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a ironic detection method based on a knowledge-enhanced neural network model, which comprises the following steps of:
s1, screening context information highly related to a text to be detected from an external knowledge source, and integrating the screened context information with an original text;
s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner;
s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text;
s4, embedding and inputting words of a pre-training language model RoBERTa into the coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, and improving the model through an optimization algorithm to obtain a final knowledge-enhanced irony detection model;
s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results.
In order to further implement the above technical solution, the specific content of step S1 is:
s11, respectively acquiring sentences most related to the original text from different external knowledge sources to serve as candidate contexts;
s12, sequencing all candidate contexts by using a BERTSCore text similarity algorithm, and selecting the candidate context which is most matched with the original text as the context of the original text according to the text similarity score;
s13, connecting the extracted context with the original text by using an EOS label representing the end of the sequence.
In this embodiment, the external knowledge sources include Wikipedia, new york Times and uk broadcaster data BBC;
for Wikipedia, finding the sentence most relevant to the original text as a candidate context;
for New YorkTimes, firstly, using a natural language processing tool space to identify named entities in original texts, and using an interface NYTAPI to search the most relevant sentences as candidate contexts;
for the uk broadcaster data BBC, the interface gdeltrocapi is used to search the top ten headlines as candidate contexts that are most relevant to the entity in the original text.
In order to further implement the above technical solution, the BERTScore text similarity algorithm specifically includes:
wherein A, B is the text to be detected, the text in the external knowledge source, respectively.
In order to further implement the technical scheme, the specific steps for obtaining the pre-training language model RoBERTa are as follows:
collecting a large amount of unlabeled text data d= (D) 1 ,d 2 ,…d n );
For each document d i Word segmentation is carried out to obtain a word sequence T= { T 1 ,t 2 ,…,t ni };
Initializing a word embedding matrix e= { E 1 ,e 2 ,…,e V E, where e i The embedded vector of the i-th word in the vocabulary, and V is the size of the vocabulary;
training the pre-training language model based on a transducer architecture and a self-attention mechanism thereof, optimizing model parameters by using maximum likelihood estimation, and obtaining the pre-training language model RoBERTa after training is completed.
In this embodiment, the method for maximum likelihood estimation MLE optimization model parameters is as follows:
wherein θ is a model parameter of the pre-training language model RoBERTa, and N is a size of the training set.
In order to further implement the above technical solution, the specific content of the model training in step S4 is:
s41, dividing the integrated text data into a training set and a testing set, and converting words of the training set into word embedding matrixes X= { X by using a pre-training language model RoBERTa 1 ,x 2 ,…,x N Inputting into the coding model;
s42, constructing an 8-layer BiLSTM model, which can be expressed as:
wherein H is t Hidden state, X, for the t-th time step t An input for a t-th time step;
s43, introducing a multi-head attention mechanism in a training model so as to more effectively capture key information in a text:
for each layer l of the 8-layer BiLSTM, there is a sequence of hidden statesAttention weight alpha of each layer l The method comprises the following steps:
wherein T is the length of the sequence,similarity of the target time step t and the source time step j in the first layer BiLSTM;
wherein,the hidden state of the target sequence at the first layer at time t-1; />A hidden state of the source sequence at the first layer at time j; a is a learnable function;
in obtaining the attention weight alpha of each layer l After that, the average weight can be calculated
S44. These are then combinedAs input to the full connectivity layer and softmax classifier, one mayExpressed as:
to further integrate the output of the bi-directional long and short-term memory network BiLSTM and prepare the data for classifier use, a fully connected layer F:
F(D)=W f ·D+b f
wherein W is f 、b f The weight and the bias of the full connection layer are respectively;
next, classifier C constructed using the activation function softmax was used to classify the output of the fully connected layer:
C(F)=softmax(W c ·F+b c )
wherein W is c 、b c Respectively the weight and the bias of the classifier;
s45, in order to enable the model to quantify the performance of the model, and improve the model by an optimization algorithm, a cross entropy loss function L is used:
wherein N is the number of marked samples, y i For the actual tag of the ith sample, C (x i ) Model predictive value representing the ith sample, ranging from [0,1];
Optimization was performed using Adam optimizer, with the following rules:
m t =β 1 m t-1 +(1-β 1 )g t
wherein m is t 、v t Estimated values of first moment and second moment respectively, beta 1 And beta 2 Is usually set to 0.9 and 0.999 g t For the gradient of the loss function L with respect to the two-way long and short term memory network model parameters sigma, alpha represents the learning rate, epsilon is a small constant which is prevented from being divided by zero, and is usually set to be 1×10 -8 ,σ t Is a model parameter at time t;
s46, finally, after each training period is finished, updating the weight of the model according to the gradient of the loss function so as to improve the performance of the model.
In order to further implement the above technical solution, the self-attention mechanism is specifically:
wherein Q, K, V is respectively a query, a key, a value matrix, d k Is the dimension of the key.
In order to further implement the above technical solution, step S4 further includes model testing and evaluation: and evaluating, analyzing and summarizing the test result of the model through accuracy, recall rate and F value, and optimizing and improving the performance of the model.
In this embodiment, the performance index usage accuracy is:
the recall rate is:
the F1 fraction is as follows:
in this embodiment, context information is generated by external knowledge sources (such as Wikipedia, new york times New YorkTimes and uk broadcasters data BBC), and then fused with the original text data; the text is then converted to a numeric vector at the word embedding layer, which is then processed by the coding layer and the multi-headed attention mechanism layer.
At the coding layer, a knowledge-enhanced neural network model is used, which can more effectively capture complex patterns in text; the multi-headed attention mechanism layer further extracts key information in the text and uses it in the classification layer to predict whether the text is ironic.
In another embodiment, a plurality of sets of comparison experiments were performed to verify model validity:
first, the performance of different network models (including BERT-GRU-Softmax based on a pre-trained language model and gated recurrent neural network GRU, BERT-LSTM-Softmax based on a pre-trained language model and underlying long and short term memory LSTM network, three benchmark models, and the model of the present invention) on ironic detection tasks was experimentally studied on the semval-2018 Task3 dataset.
The invention adopts a bi-directional long-short-term memory BiLSTM model as shown in figure 3, a pre-trained RoBERTa neural network model as shown in figure 2, and Twitter comment text from a Semeval-2018Task3 data set which is randomly divided into a training set, a test set and a verification set according to a ratio of 8:1:1 is mainly used for training and evaluating ironic detection models.
In the aspect of model construction, the invention uses a knowledge-enhanced neural network model which acquires context information from external knowledge sources (such as Wikipedia, new York Times and British broadcasters data BBC), the model mainly comprises an embedded layer, a coding layer, a multi-head attention mechanism layer and a classification layer, the embedded layer uses a pre-trained RoBERTa model, and the coding layer uses an 8-layer bi-directional long-short time memory BiLSTM network;
for optimization, an Adam optimizer is used, and weight attenuation is set to be 1.0e-2; using oneyclelr as a learning rate scheduler, the maximum learning rate is set to 1.0e-5; the training period is set to 20, and each period comprises 256 steps; the size of the hidden layer is set to 512 and the drop out rate is set to 0.5; in the training process, the micro-batch size is set to 8, and the maximum training period is set to 20; the word embedded storage mode is set as GPU, and 500 preheating steps are added;
finally, the model is classified using a softmax layer, the probability that the sample belongs to both ironic and non-ironic categories is output, and the performance of the model is evaluated by comparison with the real label.
Constructing other comparison models, performing comparison test, and performing experimental results on Semeval2018 data set such as
Table 1 shows:
different network model methods | Marco-F1(%) |
Pre-trained language model and gated recurrent neural network | 79.50 |
Pre-training language model and basic long-short time memory network | 80.65 |
Singh et al (2019) attention mechanism and emoticon textualization | 80.31 |
Potamias et al (2020) converter model and cyclic convolution network | 80.00 |
Turbo et al (2022) multimode integrated training network | 79.80 |
The invention (Wikipedia) | 82.97 |
Table 1 shows the performance of each model on the Semeval-2018Task3 data set, and from the experimental result data, the invention obviously surpasses other baseline models with the F1 fraction of 82.97% after the Wikipedia context information is introduced, so that the superiority of the model in irony recognition is proved.
Experimental results for different knowledge sources as context are shown in table 2:
external knowledge source | Macro-F1(%) |
Without introducing external knowledge | 80.65 |
New York Times | 80.77 |
British broadcaster data | 81.39 |
Wikipedia (Wikipedia) | 82.97 |
Table 2 shows experimental results of different knowledge sources; the model provided by the invention achieves the F1 fraction of 80.65% even if no external knowledge is introduced, which fully proves that the neural network model has strong capability by integrating the pre-training language model RoBERTa, the long-short-term memory LSTM and the attention mechanism; after the context information of the original text is introduced, all knowledge sources including Wikipedia, new york times and british broadcaster data BBC can improve the performance of the model; specifically, the model using Wikipedia as a knowledge source achieves an F1 score of 82.97%, which is significantly improved compared to 80.65% without using an external knowledge source; new York Times and British broadcaster data BBC are used as knowledge sources to respectively enable the F1 score of the model to reach 80.77% and 81.39%; these data clearly indicate that contextual information in external knowledge sources is critical in predicting irony.
The experimental results of the model provided by the invention in different knowledge enhancement methods are shown in table 3:
data enhancement policy | Macro-F1(%) |
Reverse translation | 77.58 |
Synonym replacement | 80.69 |
Word order replacement | 79.21 |
The invention is that | 82.97 |
Table 3 shows experimental results of different data enhancement strategies; from there, the three strategies have different effects on ironic detection tasks. The synonym replacement strategy is slightly improved to reach an F1 fraction of 80.69% and slightly exceeds 80.65% without data enhancement, which indicates that the synonym replacement is used as a data enhancement strategy, so that the generalization capability of the model can be improved to a certain extent; for reverse translation and word order replacement, the model only reaches an F1 score of 77.58% and 79.21%, which is slightly lower than the model without data enhancement, probably because both strategies can introduce some noise while increasing data diversity, thereby affecting the performance of the model; when Wikipedia is introduced into the model as context information, the performance is obviously improved, the F1 fraction reaches 82.97%, and further, the context information in an external knowledge source can provide richer semantic information, so that the model is facilitated to better understand the semantics in the text.
An ironic detection system based on a knowledge-based enhanced neural network model, comprising: the system comprises a text acquisition module, a text integration module, a knowledge-enhanced ironic detection model and a model construction training module;
knowledge-enhanced ironic detection models include a pre-trained language model RoBERTa and a bi-directional long-short-term memory BiLSTM model;
the text acquisition module is used for acquiring the text to be detected and an external knowledge source;
the text integrating module is used for screening context information highly related to the text to be detected from an external knowledge source and integrating the screened context information with the original text;
the pre-training language model RoBERTa is used for word embedding of the integrated text data and initializing the weight of the bi-directional long-short-term memory BiLSTM network;
the model construction training module is used for constructing a coding model consisting of an 8-layer two-way long-short-time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short-time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; embedding and inputting words of a pre-training language model RoBERTa into a coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, improving the model through an optimization algorithm to obtain a two-way long-short-time memory BiLSTM model and a final knowledge-enhanced irony detection model;
the knowledge-enhanced ironic detection model is used for inputting the acquired text to be detected and outputting ironic detection results.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method for irony detection based on a knowledge-enhanced neural network model, comprising the steps of:
s1, screening context information highly related to a text to be detected from an external knowledge source, and integrating the screened context information with an original text;
s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner;
s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text;
s4, embedding and inputting words of a pre-training language model RoBERTa into the coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, and improving the model through an optimization algorithm to obtain a final knowledge-enhanced irony detection model;
s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results.
2. The method for irony detection based on a knowledge-enhanced neural network model according to claim 1, wherein the specific content of step S1 is:
s11, respectively acquiring sentences most related to the original text from different external knowledge sources to serve as candidate contexts;
s12, sequencing all candidate contexts by using a BERTSCore text similarity algorithm, and selecting the candidate context which is most matched with the original text as the context of the original text according to the text similarity score;
s13, connecting the extracted context with the original text by using an EOS label representing the end of the sequence.
3. The method for irony detection based on knowledge-enhanced neural network model according to claim 2, characterized in that the berscore text similarity algorithm is specifically:
wherein A, B is the text to be detected, the text in the external knowledge source, respectively.
4. The method for ironic detection based on a knowledge-enhanced neural network model according to claim 1, wherein the specific step of obtaining the pre-trained language model RoBERTa is:
collecting a large amount of unlabeled text data d= (D) 1 ,d 2 ,…d n );
For each document d i Word segmentation is carried out to obtain a word sequence T= { T 1 ,t 2 ,…,t ni };
Initializing a word embedding matrix e= { E 1 ,e 2 ,…,e V E, where e i The embedded vector of the i-th word in the vocabulary, and V is the size of the vocabulary;
training the pre-training language model based on a transducer architecture and a self-attention mechanism thereof, optimizing model parameters by using maximum likelihood estimation, and obtaining the pre-training language model RoBERTa after training is completed.
5. The method for irony detection based on a knowledge-enhanced neural network model according to claim 1, wherein the specific contents of the multi-class training of the model in step S4 are:
s41, dividing the integrated text data into a training set and a testing set, and converting words of the training set into word embedding matrixes X= { X by using a pre-training language model RoBERTa 1 ,x 2 ,…,x N Inputting into the coding model;
s42, obtaining a hidden state sequence through an 8-layer two-way long short-time memory BiLSTM network of the coding model;
s43, introducing a multi-head attention mechanism to obtain the attention weight and the average weight of each layer;
s44, inputting the full-connection layer according to the average weight, constructing a classifier by using an activation function softmax, and classifying the output of the full-connection layer;
s45, performing model optimization by using a cross entropy loss function and an Adam optimizer according to the classification result;
s46, updating the weight of the model according to the gradient of the loss function so as to improve the performance of the model.
6. The method for irony detection based on a knowledge-enhanced neural network model according to claim 4 or 5, wherein the self-attention mechanism is specifically:
wherein Q, K, V is respectively a query, a key, a value matrix, d k Is the dimension of the key.
7. The method for irony detection based on a knowledge-enhanced neural network model according to claim 1, further comprising the step of model testing and evaluation after step S4: and evaluating, analyzing and summarizing the test result of the model through accuracy, recall rate and F value, and optimizing and improving the performance of the model.
8. The method for detecting the irony based on the knowledge-enhanced neural network model according to claim 5, wherein the 8-layer bidirectional long-short-time memory BiLSTM network is constructed specifically as follows:
wherein the method comprises the steps of,H t Hidden state, X, for the t-th time step t Word embedding for the t-th time step;
attention weight alpha of each layer l The method comprises the following steps:
wherein T is the length of word sequence,similarity of the target time step t and the source time step j in the first layer BiLSTM;
wherein,the hidden state of the target sequence at the first layer at time t-1; />A hidden state of the source sequence at the first layer at time j; a is a learnable function;
average weightThe method comprises the following steps:
the inputs to the fully connected layer and softmax classifier are expressed as:
the full connection layer F is:
F(D)=W f ·D+b f
wherein W is f 、b f The weight and the bias of the full connection layer are respectively;
classifier C constructed using the activation function softmax was:
C(F)=softmax(W c ·F+b c )
wherein W is c 、b c Respectively the weight and the bias of the classifier;
the cross entropy loss function L is:
wherein N is the number of marked samples, y i For the actual tag of the ith sample, C (x i ) Model predictive value representing the ith sample, ranging from [0,1]。
9. The method for ironic detection based on a knowledge-enhanced neural network model according to claim 8, characterized in that the rules for optimization using Adam optimizer are specifically:
m t =β 1 m t-1 +(1-β 1 )g t
wherein m is t 、v t Estimated values of first moment and second moment respectively, beta 1 And beta 2 G is the attenuation factor t For the gradient of the loss function L with respect to the model parameter σ, α represents the learning rate, ε is a small constant that prevents division by zero, σ t Is the model parameter at time t.
10. A knowledge-based neural network model-based ironic detection system, characterized in that it is based on a knowledge-based neural network model-based ironic detection method according to any one of claims 1-9, comprising: the system comprises a text acquisition module, a text integration module, a knowledge-enhanced ironic detection model and a model construction training module;
knowledge-enhanced ironic detection models include a pre-trained language model RoBERTa and a bi-directional long-short-term memory BiLSTM model;
the text acquisition module is used for acquiring the text to be detected and an external knowledge source;
the text integrating module is used for screening context information highly related to the text to be detected from an external knowledge source and integrating the screened context information with the original text;
the pre-training language model RoBERTa is used for word embedding of the integrated text data and initializing the weight of the bi-directional long-short-term memory BiLSTM network;
the model construction training module is used for constructing a coding model consisting of an 8-layer two-way long-short-time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short-time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; embedding and inputting words of a pre-training language model RoBERTa into a coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, improving the model through an optimization algorithm to obtain a two-way long-short-time memory BiLSTM model and a final knowledge-enhanced irony detection model;
the knowledge-enhanced ironic detection model is used for inputting the acquired text to be detected and outputting ironic detection results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311374400.4A CN117454873B (en) | 2023-10-23 | 2023-10-23 | Ironic detection method and system based on knowledge-enhanced neural network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311374400.4A CN117454873B (en) | 2023-10-23 | 2023-10-23 | Ironic detection method and system based on knowledge-enhanced neural network model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117454873A true CN117454873A (en) | 2024-01-26 |
CN117454873B CN117454873B (en) | 2024-04-23 |
Family
ID=89584733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311374400.4A Active CN117454873B (en) | 2023-10-23 | 2023-10-23 | Ironic detection method and system based on knowledge-enhanced neural network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117454873B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118155077A (en) * | 2024-04-17 | 2024-06-07 | 中国科学院地理科学与资源研究所 | Automatic open-pit mining area identification method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210012199A1 (en) * | 2019-07-04 | 2021-01-14 | Zhejiang University | Address information feature extraction method based on deep neural network model |
CN112307745A (en) * | 2020-11-05 | 2021-02-02 | 浙江大学 | Relationship enhanced sentence ordering method based on Bert model |
CN115510841A (en) * | 2022-09-16 | 2022-12-23 | 武汉大学 | Text matching method based on data enhancement and graph matching network |
-
2023
- 2023-10-23 CN CN202311374400.4A patent/CN117454873B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210012199A1 (en) * | 2019-07-04 | 2021-01-14 | Zhejiang University | Address information feature extraction method based on deep neural network model |
CN112307745A (en) * | 2020-11-05 | 2021-02-02 | 浙江大学 | Relationship enhanced sentence ordering method based on Bert model |
CN115510841A (en) * | 2022-09-16 | 2022-12-23 | 武汉大学 | Text matching method based on data enhancement and graph matching network |
Non-Patent Citations (3)
Title |
---|
YAN MENGXIANG ET.AL: "Deceptive review detection via hierarchical neural network model with attention mechanism", JOURNAL OF COMPUTER APPLICATIONS, vol. 39, no. 7, 8 May 2020 (2020-05-08), pages 1925 - 1930 * |
李彤: "基于情感分析与注意力机制的股票预测研究", 万方学位论文全文数据库, 8 June 2022 (2022-06-08), pages 1 - 20 * |
颜梦香;姬东鸿;任亚峰: "基于层次注意力机制神经网络模型的虚假评论识别", 计算机应用, vol. 39, no. 7, 28 February 2019 (2019-02-28), pages 1925 - 1930 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118155077A (en) * | 2024-04-17 | 2024-06-07 | 中国科学院地理科学与资源研究所 | Automatic open-pit mining area identification method and system |
Also Published As
Publication number | Publication date |
---|---|
CN117454873B (en) | 2024-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111897908B (en) | Event extraction method and system integrating dependency information and pre-training language model | |
CN109471938B (en) | Text classification method and terminal | |
WO2021135193A1 (en) | Visual object guidance-based social media short text named entity identification method | |
CN108319666B (en) | Power supply service assessment method based on multi-modal public opinion analysis | |
CN111126386B (en) | Sequence domain adaptation method based on countermeasure learning in scene text recognition | |
CN111738004A (en) | Training method of named entity recognition model and named entity recognition method | |
CN111310476B (en) | Public opinion monitoring method and system using aspect-based emotion analysis method | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
CN111506732B (en) | Text multi-level label classification method | |
CN112199501B (en) | Scientific and technological information text classification method | |
CN111782807B (en) | Self-bearing technology debt detection classification method based on multiparty integrated learning | |
CN113657115B (en) | Multi-mode Mongolian emotion analysis method based on ironic recognition and fine granularity feature fusion | |
CN117454873B (en) | Ironic detection method and system based on knowledge-enhanced neural network model | |
CN113220890A (en) | Deep learning method combining news headlines and news long text contents based on pre-training | |
CN111966827A (en) | Conversation emotion analysis method based on heterogeneous bipartite graph | |
CN112732910B (en) | Cross-task text emotion state evaluation method, system, device and medium | |
CN113806547A (en) | Deep learning multi-label text classification method based on graph model | |
CN115952292B (en) | Multi-label classification method, apparatus and computer readable medium | |
Zhi et al. | Financial fake news detection with multi fact CNN-LSTM model | |
CN111429184A (en) | User portrait extraction method based on text information | |
CN112287197A (en) | Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases | |
CN111428513A (en) | False comment analysis method based on convolutional neural network | |
CN112417132A (en) | New intention recognition method for screening negative samples by utilizing predicate guest information | |
CN111460100A (en) | Criminal legal document and criminal name recommendation method and system | |
CN113051886B (en) | Test question duplicate checking method, device, storage medium and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |