CN117454873A - Ironic detection method and system based on knowledge-enhanced neural network model - Google Patents

Ironic detection method and system based on knowledge-enhanced neural network model Download PDF

Info

Publication number
CN117454873A
CN117454873A CN202311374400.4A CN202311374400A CN117454873A CN 117454873 A CN117454873 A CN 117454873A CN 202311374400 A CN202311374400 A CN 202311374400A CN 117454873 A CN117454873 A CN 117454873A
Authority
CN
China
Prior art keywords
model
text
knowledge
ironic
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311374400.4A
Other languages
Chinese (zh)
Other versions
CN117454873B (en
Inventor
任亚峰
王子霖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Foreign Studies
Original Assignee
Guangdong University of Foreign Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Foreign Studies filed Critical Guangdong University of Foreign Studies
Priority to CN202311374400.4A priority Critical patent/CN117454873B/en
Publication of CN117454873A publication Critical patent/CN117454873A/en
Application granted granted Critical
Publication of CN117454873B publication Critical patent/CN117454873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a irony detection method and a irony detection system based on a knowledge-enhanced neural network model, comprising the following steps: s1, screening context information highly related to a text to be detected from an external knowledge source and integrating the context information with an original text; s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner; s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; s4, classifying and training, and improving the model through an optimization algorithm to obtain a final knowledge-reinforced ironic detection model; s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results; the invention enhances the ironic understanding of the model, enables the model to effectively capture more complex language modes, and remarkably improves the ironic detection accuracy and robustness.

Description

Ironic detection method and system based on knowledge-enhanced neural network model
Technical Field
The invention relates to the technical field of natural language processing and text mining, in particular to a irony detection method and system based on a knowledge-enhanced neural network model.
Background
With the popularity of social media and online platforms, the amount of text data (especially user-generated content) has grown exponentially, and the frequent inclusion of sarcasm and darkness mock in such text data has made challenges for such tasks as emotion analysis, public opinion monitoring, and natural language understanding, and thus sarcasm detection has become an important research direction in the field of natural language processing.
Early research methods relied primarily on manual extraction of features such as word frequency, emotion vocabulary, and traditional machine learning algorithms such as Support Vector Machines (SVMs), decision trees, and random forests.
With the rapid development of deep learning technology, models based on Convolutional Neural Networks (CNNs), recurrent Neural Networks (RNNs), and long-short-term memory networks (LSTM) exhibit superior performance in text representation learning, feature automatic extraction, and classification accuracy.
However, most irony detection models still focus today mainly on analyzing semantic and structural features inside text, relatively ignoring possible close associations between text and external knowledge sources; further complicating, irony and dark mock are often highly context dependent, often making it difficult to obtain satisfactory detection performance by means of a single text analysis method alone.
Therefore, how to integrate the external knowledge sources effectively and achieve high accuracy and robustness in irony detection tasks is a problem that those skilled in the art are urgent to solve.
Disclosure of Invention
In view of the above, the present invention provides a method and a system for irony detection based on knowledge-enhanced neural network model to solve the problems mentioned in the background art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a method for ironic detection based on a knowledge-enhanced neural network model, comprising the steps of:
s1, screening context information highly related to a text to be detected from an external knowledge source, and integrating the screened context information with an original text;
s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner;
s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text;
s4, embedding and inputting words of a pre-training language model RoBERTa into the coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, and improving the model through an optimization algorithm to obtain a final knowledge-enhanced irony detection model;
s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results.
Preferably, the specific content of step S1 is:
s11, respectively acquiring sentences most related to the original text from different external knowledge sources to serve as candidate contexts;
s12, sequencing all candidate contexts by using a BERTSCore text similarity algorithm, and selecting the candidate context which is most matched with the original text as the context of the original text according to the text similarity score;
s13, connecting the extracted context with the original text by using an EOS label representing the end of the sequence.
Preferably, the BERTSCore text similarity algorithm is specifically:
wherein A, B is the text to be detected, the text in the external knowledge source, respectively.
Preferably, the specific steps for obtaining the pre-trained language model RoBERTa are:
collecting a large amount of unlabeled text data d= (D) 1 ,d 2 ,…d n );
For each document d i Word segmentation is carried out to obtain a word sequence T= { T 1 ,t 2 ,…,t ni };
Initializing a word embedding matrix e= { E 1 ,e 2 ,…,e V E, where e i The embedded vector of the i-th word in the vocabulary, and V is the size of the vocabulary;
training the pre-training language model based on a transducer architecture and a self-attention mechanism thereof, optimizing model parameters by using maximum likelihood estimation, and obtaining the pre-training language model RoBERTa after training is completed.
Preferably, the specific content of the training of the step S4 model is as follows:
s41, dividing the integrated text data into a training set and a testing set, and converting words of the training set into word embedding matrixes X= { X by using a pre-training language model RoBERTa 1 ,x 2 ,…,x N Inputting into the coding model;
s42, obtaining a hidden state sequence through an 8-layer two-way long short-time memory BiLSTM network of the coding model;
s43, introducing a multi-head attention mechanism to obtain the attention weight and the average weight of each layer;
s44, inputting the full-connection layer according to the average weight, constructing a classifier by using an activation function softmax, and classifying the output of the full-connection layer;
s45, performing model optimization by using a cross entropy loss function and an Adam optimizer according to the classification result;
s46, updating the weight of the model according to the gradient of the loss function so as to improve the performance of the model.
Preferably, the self-attention mechanism is specifically:
wherein Q, K, V is respectively a query, a key and a value matrix,d k is the dimension of the key.
Preferably, step S4 further includes model test and evaluation: and evaluating, analyzing and summarizing the test result of the model through accuracy, recall rate and F value, and optimizing and improving the performance of the model.
Preferably, the 8-layer bidirectional long-short-time memory BiLSTM network is constructed specifically as follows:
wherein H is t Hidden state, X, for the t-th time step t An input for a t-th time step;
attention weight alpha of each layer l The method comprises the following steps:
wherein T is the length of the sequence,similarity of the target time step t and the source time step j in the first layer BiLSTM;
wherein,the hidden state of the target sequence at the first layer at time t-1; />A hidden state of the source sequence at the first layer at time j; a is a learnable function;
average weightThe method comprises the following steps:
the inputs to the fully connected layer and softmax classifier are expressed as:
the full connection layer F is:
F(D)=W f ·D+b f
wherein W is f 、b f The weight and the bias of the full connection layer are respectively;
classifier C constructed using the activation function softmax was:
C(F)=soft max(W c ·F+b c )
wherein W is c 、b c Respectively the weight and the bias of the classifier;
the cross entropy loss function L is:
wherein N is the number of marked samples, y i For the actual tag of the ith sample, C (x i ) Model predictive value representing the ith sample, ranging from [0,1]。
Preferably, the rule for optimization using Adam optimizer is specifically:
m t =β 1 m t-1 +(1-β 1 )g t
wherein m is t 、v t Estimated values of first moment and second moment respectively, beta 1 And beta 2 G is the attenuation factor t For the gradient of the loss function L with respect to the model parameter σ, α represents the learning rate, ε is a small constant that prevents division by zero, σ t Is the model parameter at time t.
A knowledge-based neural network model-based ironic detection system, based on the ironic detection method based on the knowledge-based neural network model, comprising: the system comprises a text acquisition module, a text integration module, a knowledge-enhanced ironic detection model and a model construction training module;
knowledge-enhanced ironic detection models include a pre-trained language model RoBERTa and a bi-directional long-short-term memory BiLSTM model;
the text acquisition module is used for acquiring the text to be detected and an external knowledge source;
the text integrating module is used for screening context information highly related to the text to be detected from an external knowledge source and integrating the screened context information with the original text;
the pre-training language model RoBERTa is used for word embedding of the integrated text data and initializing the weight of the bi-directional long-short-term memory BiLSTM network;
the model construction training module is used for constructing a coding model consisting of an 8-layer two-way long-short-time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short-time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; embedding and inputting words of a pre-training language model RoBERTa into a coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, improving the model through an optimization algorithm to obtain a two-way long-short-time memory BiLSTM model and a final knowledge-enhanced irony detection model;
the knowledge-enhanced ironic detection model is used for inputting the acquired text to be detected and outputting ironic detection results.
Compared with the prior art, the ironic detection method and system based on the knowledge enhancement neural network model are provided, and the context information is provided by combining an external knowledge source, so that the ironic understanding of the model is enhanced, and the ironic detection method and system have multifunction and robustness compared with the model without using the external information;
different from the prior work mainly using a simpler neural network architecture, the invention adopts a multi-layer method, comprising a pre-training language model RoBERTa, an 8-layer two-way long short-time memory BiLSTM network and a multi-head attention mechanism, so that the model can effectively capture more complex language modes, and the accuracy and the robustness of irony detection are obviously improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of the overall framework topology of a method for ironic detection based on a knowledge-enhanced neural network model provided by the present invention;
FIG. 2 is a schematic diagram of the topology of the pre-trained language model RoBERTa provided by the present invention;
fig. 3 is a schematic diagram of a bidirectional long-short-term memory BiLSTM network model and a multi-head attention mechanism according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a ironic detection method based on a knowledge-enhanced neural network model, which comprises the following steps of:
s1, screening context information highly related to a text to be detected from an external knowledge source, and integrating the screened context information with an original text;
s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner;
s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text;
s4, embedding and inputting words of a pre-training language model RoBERTa into the coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, and improving the model through an optimization algorithm to obtain a final knowledge-enhanced irony detection model;
s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results.
In order to further implement the above technical solution, the specific content of step S1 is:
s11, respectively acquiring sentences most related to the original text from different external knowledge sources to serve as candidate contexts;
s12, sequencing all candidate contexts by using a BERTSCore text similarity algorithm, and selecting the candidate context which is most matched with the original text as the context of the original text according to the text similarity score;
s13, connecting the extracted context with the original text by using an EOS label representing the end of the sequence.
In this embodiment, the external knowledge sources include Wikipedia, new york Times and uk broadcaster data BBC;
for Wikipedia, finding the sentence most relevant to the original text as a candidate context;
for New YorkTimes, firstly, using a natural language processing tool space to identify named entities in original texts, and using an interface NYTAPI to search the most relevant sentences as candidate contexts;
for the uk broadcaster data BBC, the interface gdeltrocapi is used to search the top ten headlines as candidate contexts that are most relevant to the entity in the original text.
In order to further implement the above technical solution, the BERTScore text similarity algorithm specifically includes:
wherein A, B is the text to be detected, the text in the external knowledge source, respectively.
In order to further implement the technical scheme, the specific steps for obtaining the pre-training language model RoBERTa are as follows:
collecting a large amount of unlabeled text data d= (D) 1 ,d 2 ,…d n );
For each document d i Word segmentation is carried out to obtain a word sequence T= { T 1 ,t 2 ,…,t ni };
Initializing a word embedding matrix e= { E 1 ,e 2 ,…,e V E, where e i The embedded vector of the i-th word in the vocabulary, and V is the size of the vocabulary;
training the pre-training language model based on a transducer architecture and a self-attention mechanism thereof, optimizing model parameters by using maximum likelihood estimation, and obtaining the pre-training language model RoBERTa after training is completed.
In this embodiment, the method for maximum likelihood estimation MLE optimization model parameters is as follows:
wherein θ is a model parameter of the pre-training language model RoBERTa, and N is a size of the training set.
In order to further implement the above technical solution, the specific content of the model training in step S4 is:
s41, dividing the integrated text data into a training set and a testing set, and converting words of the training set into word embedding matrixes X= { X by using a pre-training language model RoBERTa 1 ,x 2 ,…,x N Inputting into the coding model;
s42, constructing an 8-layer BiLSTM model, which can be expressed as:
wherein H is t Hidden state, X, for the t-th time step t An input for a t-th time step;
s43, introducing a multi-head attention mechanism in a training model so as to more effectively capture key information in a text:
for each layer l of the 8-layer BiLSTM, there is a sequence of hidden statesAttention weight alpha of each layer l The method comprises the following steps:
wherein T is the length of the sequence,similarity of the target time step t and the source time step j in the first layer BiLSTM;
wherein,the hidden state of the target sequence at the first layer at time t-1; />A hidden state of the source sequence at the first layer at time j; a is a learnable function;
in obtaining the attention weight alpha of each layer l After that, the average weight can be calculated
S44. These are then combinedAs input to the full connectivity layer and softmax classifier, one mayExpressed as:
to further integrate the output of the bi-directional long and short-term memory network BiLSTM and prepare the data for classifier use, a fully connected layer F:
F(D)=W f ·D+b f
wherein W is f 、b f The weight and the bias of the full connection layer are respectively;
next, classifier C constructed using the activation function softmax was used to classify the output of the fully connected layer:
C(F)=softmax(W c ·F+b c )
wherein W is c 、b c Respectively the weight and the bias of the classifier;
s45, in order to enable the model to quantify the performance of the model, and improve the model by an optimization algorithm, a cross entropy loss function L is used:
wherein N is the number of marked samples, y i For the actual tag of the ith sample, C (x i ) Model predictive value representing the ith sample, ranging from [0,1];
Optimization was performed using Adam optimizer, with the following rules:
m t =β 1 m t-1 +(1-β 1 )g t
wherein m is t 、v t Estimated values of first moment and second moment respectively, beta 1 And beta 2 Is usually set to 0.9 and 0.999 g t For the gradient of the loss function L with respect to the two-way long and short term memory network model parameters sigma, alpha represents the learning rate, epsilon is a small constant which is prevented from being divided by zero, and is usually set to be 1×10 -8 ,σ t Is a model parameter at time t;
s46, finally, after each training period is finished, updating the weight of the model according to the gradient of the loss function so as to improve the performance of the model.
In order to further implement the above technical solution, the self-attention mechanism is specifically:
wherein Q, K, V is respectively a query, a key, a value matrix, d k Is the dimension of the key.
In order to further implement the above technical solution, step S4 further includes model testing and evaluation: and evaluating, analyzing and summarizing the test result of the model through accuracy, recall rate and F value, and optimizing and improving the performance of the model.
In this embodiment, the performance index usage accuracy is:
the recall rate is:
the F1 fraction is as follows:
in this embodiment, context information is generated by external knowledge sources (such as Wikipedia, new york times New YorkTimes and uk broadcasters data BBC), and then fused with the original text data; the text is then converted to a numeric vector at the word embedding layer, which is then processed by the coding layer and the multi-headed attention mechanism layer.
At the coding layer, a knowledge-enhanced neural network model is used, which can more effectively capture complex patterns in text; the multi-headed attention mechanism layer further extracts key information in the text and uses it in the classification layer to predict whether the text is ironic.
In another embodiment, a plurality of sets of comparison experiments were performed to verify model validity:
first, the performance of different network models (including BERT-GRU-Softmax based on a pre-trained language model and gated recurrent neural network GRU, BERT-LSTM-Softmax based on a pre-trained language model and underlying long and short term memory LSTM network, three benchmark models, and the model of the present invention) on ironic detection tasks was experimentally studied on the semval-2018 Task3 dataset.
The invention adopts a bi-directional long-short-term memory BiLSTM model as shown in figure 3, a pre-trained RoBERTa neural network model as shown in figure 2, and Twitter comment text from a Semeval-2018Task3 data set which is randomly divided into a training set, a test set and a verification set according to a ratio of 8:1:1 is mainly used for training and evaluating ironic detection models.
In the aspect of model construction, the invention uses a knowledge-enhanced neural network model which acquires context information from external knowledge sources (such as Wikipedia, new York Times and British broadcasters data BBC), the model mainly comprises an embedded layer, a coding layer, a multi-head attention mechanism layer and a classification layer, the embedded layer uses a pre-trained RoBERTa model, and the coding layer uses an 8-layer bi-directional long-short time memory BiLSTM network;
for optimization, an Adam optimizer is used, and weight attenuation is set to be 1.0e-2; using oneyclelr as a learning rate scheduler, the maximum learning rate is set to 1.0e-5; the training period is set to 20, and each period comprises 256 steps; the size of the hidden layer is set to 512 and the drop out rate is set to 0.5; in the training process, the micro-batch size is set to 8, and the maximum training period is set to 20; the word embedded storage mode is set as GPU, and 500 preheating steps are added;
finally, the model is classified using a softmax layer, the probability that the sample belongs to both ironic and non-ironic categories is output, and the performance of the model is evaluated by comparison with the real label.
Constructing other comparison models, performing comparison test, and performing experimental results on Semeval2018 data set such as
Table 1 shows:
different network model methods Marco-F1(%)
Pre-trained language model and gated recurrent neural network 79.50
Pre-training language model and basic long-short time memory network 80.65
Singh et al (2019) attention mechanism and emoticon textualization 80.31
Potamias et al (2020) converter model and cyclic convolution network 80.00
Turbo et al (2022) multimode integrated training network 79.80
The invention (Wikipedia) 82.97
Table 1 shows the performance of each model on the Semeval-2018Task3 data set, and from the experimental result data, the invention obviously surpasses other baseline models with the F1 fraction of 82.97% after the Wikipedia context information is introduced, so that the superiority of the model in irony recognition is proved.
Experimental results for different knowledge sources as context are shown in table 2:
external knowledge source Macro-F1(%)
Without introducing external knowledge 80.65
New York Times 80.77
British broadcaster data 81.39
Wikipedia (Wikipedia) 82.97
Table 2 shows experimental results of different knowledge sources; the model provided by the invention achieves the F1 fraction of 80.65% even if no external knowledge is introduced, which fully proves that the neural network model has strong capability by integrating the pre-training language model RoBERTa, the long-short-term memory LSTM and the attention mechanism; after the context information of the original text is introduced, all knowledge sources including Wikipedia, new york times and british broadcaster data BBC can improve the performance of the model; specifically, the model using Wikipedia as a knowledge source achieves an F1 score of 82.97%, which is significantly improved compared to 80.65% without using an external knowledge source; new York Times and British broadcaster data BBC are used as knowledge sources to respectively enable the F1 score of the model to reach 80.77% and 81.39%; these data clearly indicate that contextual information in external knowledge sources is critical in predicting irony.
The experimental results of the model provided by the invention in different knowledge enhancement methods are shown in table 3:
data enhancement policy Macro-F1(%)
Reverse translation 77.58
Synonym replacement 80.69
Word order replacement 79.21
The invention is that 82.97
Table 3 shows experimental results of different data enhancement strategies; from there, the three strategies have different effects on ironic detection tasks. The synonym replacement strategy is slightly improved to reach an F1 fraction of 80.69% and slightly exceeds 80.65% without data enhancement, which indicates that the synonym replacement is used as a data enhancement strategy, so that the generalization capability of the model can be improved to a certain extent; for reverse translation and word order replacement, the model only reaches an F1 score of 77.58% and 79.21%, which is slightly lower than the model without data enhancement, probably because both strategies can introduce some noise while increasing data diversity, thereby affecting the performance of the model; when Wikipedia is introduced into the model as context information, the performance is obviously improved, the F1 fraction reaches 82.97%, and further, the context information in an external knowledge source can provide richer semantic information, so that the model is facilitated to better understand the semantics in the text.
An ironic detection system based on a knowledge-based enhanced neural network model, comprising: the system comprises a text acquisition module, a text integration module, a knowledge-enhanced ironic detection model and a model construction training module;
knowledge-enhanced ironic detection models include a pre-trained language model RoBERTa and a bi-directional long-short-term memory BiLSTM model;
the text acquisition module is used for acquiring the text to be detected and an external knowledge source;
the text integrating module is used for screening context information highly related to the text to be detected from an external knowledge source and integrating the screened context information with the original text;
the pre-training language model RoBERTa is used for word embedding of the integrated text data and initializing the weight of the bi-directional long-short-term memory BiLSTM network;
the model construction training module is used for constructing a coding model consisting of an 8-layer two-way long-short-time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short-time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; embedding and inputting words of a pre-training language model RoBERTa into a coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, improving the model through an optimization algorithm to obtain a two-way long-short-time memory BiLSTM model and a final knowledge-enhanced irony detection model;
the knowledge-enhanced ironic detection model is used for inputting the acquired text to be detected and outputting ironic detection results.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for irony detection based on a knowledge-enhanced neural network model, comprising the steps of:
s1, screening context information highly related to a text to be detected from an external knowledge source, and integrating the screened context information with an original text;
s2, word embedding is carried out on the integrated text data by using a pre-training language model RoBERTa, and the weight of the BiLSTM network is memorized in a bidirectional long-short time manner;
s3, constructing a coding model consisting of an 8-layer two-way long-short time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text;
s4, embedding and inputting words of a pre-training language model RoBERTa into the coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, and improving the model through an optimization algorithm to obtain a final knowledge-enhanced irony detection model;
s5, acquiring a text to be detected, inputting a knowledge-enhanced ironic detection model, and outputting ironic detection results.
2. The method for irony detection based on a knowledge-enhanced neural network model according to claim 1, wherein the specific content of step S1 is:
s11, respectively acquiring sentences most related to the original text from different external knowledge sources to serve as candidate contexts;
s12, sequencing all candidate contexts by using a BERTSCore text similarity algorithm, and selecting the candidate context which is most matched with the original text as the context of the original text according to the text similarity score;
s13, connecting the extracted context with the original text by using an EOS label representing the end of the sequence.
3. The method for irony detection based on knowledge-enhanced neural network model according to claim 2, characterized in that the berscore text similarity algorithm is specifically:
wherein A, B is the text to be detected, the text in the external knowledge source, respectively.
4. The method for ironic detection based on a knowledge-enhanced neural network model according to claim 1, wherein the specific step of obtaining the pre-trained language model RoBERTa is:
collecting a large amount of unlabeled text data d= (D) 1 ,d 2 ,…d n );
For each document d i Word segmentation is carried out to obtain a word sequence T= { T 1 ,t 2 ,…,t ni };
Initializing a word embedding matrix e= { E 1 ,e 2 ,…,e V E, where e i The embedded vector of the i-th word in the vocabulary, and V is the size of the vocabulary;
training the pre-training language model based on a transducer architecture and a self-attention mechanism thereof, optimizing model parameters by using maximum likelihood estimation, and obtaining the pre-training language model RoBERTa after training is completed.
5. The method for irony detection based on a knowledge-enhanced neural network model according to claim 1, wherein the specific contents of the multi-class training of the model in step S4 are:
s41, dividing the integrated text data into a training set and a testing set, and converting words of the training set into word embedding matrixes X= { X by using a pre-training language model RoBERTa 1 ,x 2 ,…,x N Inputting into the coding model;
s42, obtaining a hidden state sequence through an 8-layer two-way long short-time memory BiLSTM network of the coding model;
s43, introducing a multi-head attention mechanism to obtain the attention weight and the average weight of each layer;
s44, inputting the full-connection layer according to the average weight, constructing a classifier by using an activation function softmax, and classifying the output of the full-connection layer;
s45, performing model optimization by using a cross entropy loss function and an Adam optimizer according to the classification result;
s46, updating the weight of the model according to the gradient of the loss function so as to improve the performance of the model.
6. The method for irony detection based on a knowledge-enhanced neural network model according to claim 4 or 5, wherein the self-attention mechanism is specifically:
wherein Q, K, V is respectively a query, a key, a value matrix, d k Is the dimension of the key.
7. The method for irony detection based on a knowledge-enhanced neural network model according to claim 1, further comprising the step of model testing and evaluation after step S4: and evaluating, analyzing and summarizing the test result of the model through accuracy, recall rate and F value, and optimizing and improving the performance of the model.
8. The method for detecting the irony based on the knowledge-enhanced neural network model according to claim 5, wherein the 8-layer bidirectional long-short-time memory BiLSTM network is constructed specifically as follows:
wherein the method comprises the steps of,H t Hidden state, X, for the t-th time step t Word embedding for the t-th time step;
attention weight alpha of each layer l The method comprises the following steps:
wherein T is the length of word sequence,similarity of the target time step t and the source time step j in the first layer BiLSTM;
wherein,the hidden state of the target sequence at the first layer at time t-1; />A hidden state of the source sequence at the first layer at time j; a is a learnable function;
average weightThe method comprises the following steps:
the inputs to the fully connected layer and softmax classifier are expressed as:
the full connection layer F is:
F(D)=W f ·D+b f
wherein W is f 、b f The weight and the bias of the full connection layer are respectively;
classifier C constructed using the activation function softmax was:
C(F)=softmax(W c ·F+b c )
wherein W is c 、b c Respectively the weight and the bias of the classifier;
the cross entropy loss function L is:
wherein N is the number of marked samples, y i For the actual tag of the ith sample, C (x i ) Model predictive value representing the ith sample, ranging from [0,1]。
9. The method for ironic detection based on a knowledge-enhanced neural network model according to claim 8, characterized in that the rules for optimization using Adam optimizer are specifically:
m t =β 1 m t-1 +(1-β 1 )g t
wherein m is t 、v t Estimated values of first moment and second moment respectively, beta 1 And beta 2 G is the attenuation factor t For the gradient of the loss function L with respect to the model parameter σ, α represents the learning rate, ε is a small constant that prevents division by zero, σ t Is the model parameter at time t.
10. A knowledge-based neural network model-based ironic detection system, characterized in that it is based on a knowledge-based neural network model-based ironic detection method according to any one of claims 1-9, comprising: the system comprises a text acquisition module, a text integration module, a knowledge-enhanced ironic detection model and a model construction training module;
knowledge-enhanced ironic detection models include a pre-trained language model RoBERTa and a bi-directional long-short-term memory BiLSTM model;
the text acquisition module is used for acquiring the text to be detected and an external knowledge source;
the text integrating module is used for screening context information highly related to the text to be detected from an external knowledge source and integrating the screened context information with the original text;
the pre-training language model RoBERTa is used for word embedding of the integrated text data and initializing the weight of the bi-directional long-short-term memory BiLSTM network;
the model construction training module is used for constructing a coding model consisting of an 8-layer two-way long-short-time memory BiLSTM network, embedding a multi-head self-attention mechanism in the two-way long-short-time memory BiLSTM network, and capturing long-distance dependency and local semantic features in a text; embedding and inputting words of a pre-training language model RoBERTa into a coding model, performing multi-classification training through a full-connection layer, a multi-head self-attention mechanism and a classifier formed by a softmax activation function, improving the model through an optimization algorithm to obtain a two-way long-short-time memory BiLSTM model and a final knowledge-enhanced irony detection model;
the knowledge-enhanced ironic detection model is used for inputting the acquired text to be detected and outputting ironic detection results.
CN202311374400.4A 2023-10-23 2023-10-23 Ironic detection method and system based on knowledge-enhanced neural network model Active CN117454873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311374400.4A CN117454873B (en) 2023-10-23 2023-10-23 Ironic detection method and system based on knowledge-enhanced neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311374400.4A CN117454873B (en) 2023-10-23 2023-10-23 Ironic detection method and system based on knowledge-enhanced neural network model

Publications (2)

Publication Number Publication Date
CN117454873A true CN117454873A (en) 2024-01-26
CN117454873B CN117454873B (en) 2024-04-23

Family

ID=89584733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311374400.4A Active CN117454873B (en) 2023-10-23 2023-10-23 Ironic detection method and system based on knowledge-enhanced neural network model

Country Status (1)

Country Link
CN (1) CN117454873B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118155077A (en) * 2024-04-17 2024-06-07 中国科学院地理科学与资源研究所 Automatic open-pit mining area identification method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012199A1 (en) * 2019-07-04 2021-01-14 Zhejiang University Address information feature extraction method based on deep neural network model
CN112307745A (en) * 2020-11-05 2021-02-02 浙江大学 Relationship enhanced sentence ordering method based on Bert model
CN115510841A (en) * 2022-09-16 2022-12-23 武汉大学 Text matching method based on data enhancement and graph matching network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012199A1 (en) * 2019-07-04 2021-01-14 Zhejiang University Address information feature extraction method based on deep neural network model
CN112307745A (en) * 2020-11-05 2021-02-02 浙江大学 Relationship enhanced sentence ordering method based on Bert model
CN115510841A (en) * 2022-09-16 2022-12-23 武汉大学 Text matching method based on data enhancement and graph matching network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YAN MENGXIANG ET.AL: "Deceptive review detection via hierarchical neural network model with attention mechanism", JOURNAL OF COMPUTER APPLICATIONS, vol. 39, no. 7, 8 May 2020 (2020-05-08), pages 1925 - 1930 *
李彤: "基于情感分析与注意力机制的股票预测研究", 万方学位论文全文数据库, 8 June 2022 (2022-06-08), pages 1 - 20 *
颜梦香;姬东鸿;任亚峰: "基于层次注意力机制神经网络模型的虚假评论识别", 计算机应用, vol. 39, no. 7, 28 February 2019 (2019-02-28), pages 1925 - 1930 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118155077A (en) * 2024-04-17 2024-06-07 中国科学院地理科学与资源研究所 Automatic open-pit mining area identification method and system

Also Published As

Publication number Publication date
CN117454873B (en) 2024-04-23

Similar Documents

Publication Publication Date Title
CN111897908B (en) Event extraction method and system integrating dependency information and pre-training language model
CN109471938B (en) Text classification method and terminal
WO2021135193A1 (en) Visual object guidance-based social media short text named entity identification method
CN108319666B (en) Power supply service assessment method based on multi-modal public opinion analysis
CN111126386B (en) Sequence domain adaptation method based on countermeasure learning in scene text recognition
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN111310476B (en) Public opinion monitoring method and system using aspect-based emotion analysis method
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN111506732B (en) Text multi-level label classification method
CN112199501B (en) Scientific and technological information text classification method
CN111782807B (en) Self-bearing technology debt detection classification method based on multiparty integrated learning
CN113657115B (en) Multi-mode Mongolian emotion analysis method based on ironic recognition and fine granularity feature fusion
CN117454873B (en) Ironic detection method and system based on knowledge-enhanced neural network model
CN113220890A (en) Deep learning method combining news headlines and news long text contents based on pre-training
CN111966827A (en) Conversation emotion analysis method based on heterogeneous bipartite graph
CN112732910B (en) Cross-task text emotion state evaluation method, system, device and medium
CN113806547A (en) Deep learning multi-label text classification method based on graph model
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
Zhi et al. Financial fake news detection with multi fact CNN-LSTM model
CN111429184A (en) User portrait extraction method based on text information
CN112287197A (en) Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases
CN111428513A (en) False comment analysis method based on convolutional neural network
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN111460100A (en) Criminal legal document and criminal name recommendation method and system
CN113051886B (en) Test question duplicate checking method, device, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant