CN113627146B - Knowledge constraint-based two-step refute a rumour text generation method - Google Patents

Knowledge constraint-based two-step refute a rumour text generation method Download PDF

Info

Publication number
CN113627146B
CN113627146B CN202110918103.6A CN202110918103A CN113627146B CN 113627146 B CN113627146 B CN 113627146B CN 202110918103 A CN202110918103 A CN 202110918103A CN 113627146 B CN113627146 B CN 113627146B
Authority
CN
China
Prior art keywords
rumour
refute
text
knowledge
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110918103.6A
Other languages
Chinese (zh)
Other versions
CN113627146A (en
Inventor
曹冬林
朱多朵
李臣
林达真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202110918103.6A priority Critical patent/CN113627146B/en
Publication of CN113627146A publication Critical patent/CN113627146A/en
Application granted granted Critical
Publication of CN113627146B publication Critical patent/CN113627146B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

A two-step refute a rumour text generation method based on knowledge constraint relates to the field of natural language processing. Aiming at the problems that refute a rumour texts are highly dependent on external knowledge and refute a rumour long texts are difficult to generate, a rumor is taken as a research object, a knowledge text generation model is built based on a multi-layer decoder architecture of a transducer, a knowledge text sequence is generated by utilizing a knowledge triplet, a refute a rumour conclusion generation model is built by adopting a GPT2-ML model of pytorch version, rumor constraint and knowledge constraint are introduced to generate refute a rumour conclusions, and the generated knowledge text sequence and refute a rumour conclusions jointly form refute a rumour texts. The two-step refute a rumour text method has obviously better effect than other generation methods, not only can alleviate the problem of difficult generation of refute a rumour long text, but also can enable the generated refute a rumour text to have more logic.

Description

Knowledge constraint-based two-step refute a rumour text generation method
Technical Field
The invention relates to the field of natural language processing, in particular to a two-step refute a rumour text generation method based on knowledge constraint.
Background
In recent years, with the rapid development of the internet and information technology, networks have become the most important information exchange platform in the society of today. According to the report issued by the China Internet information center, the Internet surfing ratio of the mobile phones used by the network citizens in China reaches 99.3%, and the timeliness of the network information and the portability of the mobile phones greatly promote the information propagation efficiency. However, with the rapid development of social media, the internet also provides rich and rich land for the growth and spread of rumors.
The loss brought to individuals and enterprises after the rumors are widely spread destroys the social trust system. Therefore, rumor related research work has a very strong practical significance. At present, most scholars' researches on rumors mainly focus on the characteristics of rumors, the transmission modes of rumors, the detection of rumors and the like, and the research on automatically generating refute a rumour texts is very limited.
Most of the existing refute a rumour works are based on manpower, the manpower refute a rumour works are complex, a large amount of manpower and material resources are required to be consumed, and meanwhile, the problem of timeliness lag exists in the manpower refute a rumour. Refute a rumour texting technology can greatly reduce labor cost and time cost of refute a rumour work, and is a key technology for timely inhibiting rumor transmission and effectively reducing social hazard of rumors.
Refute a rumour text generation can be regarded as a subtask of natural language generation, but unlike other text generation tasks, refute a rumour text generation task cannot be regarded simply as an end-to-end generation task, because the information carried by the rumor itself is often insufficient to support the generation of refute a rumour text, and even with the manual refute a rumour method, external information such as common sense or expertise is needed to complete refute a rumour work.
In the method for introducing the external knowledge, the external knowledge is most commonly introduced by a knowledge graph, and the basic composition unit of the knowledge graph is a triplet formed by an entity-relation-entity. In the refute a rumour text generation task, if knowledge maps are available, refute a rumour related knowledge triplets chains can be obtained, which can be used to support refute a rumour text generation.
Refute a rumour text generation mainly suffers from several difficulties:
(1) Refute a rumour text generation is highly dependent on external knowledge, rumor information cannot support refute a rumour text generation, and even manual refute a rumour, refute a rumour can be achieved by external information such as common sense or professional knowledge.
(2) For long text generation, the existing text generation method only works well in the field of open generation, wherein the method also benefits from randomness of a decoding search algorithm, and refute a rumour text is refute a rumour with logic, so that the problem of difficult long text generation exists.
Disclosure of Invention
The invention aims to provide a two-step refute a rumour text generation method based on knowledge constraint, which aims at solving the problems that refute a rumour texts are highly dependent on external knowledge and refute a rumour long texts are difficult to generate, and the like, can increase persuasion of refute a rumour texts and can support complete refute a rumour text generation, so that the generated refute a rumour texts are more logical.
The invention comprises the following steps:
1) Processing data to obtain a knowledge triplet;
2) Transmitting the knowledge triplet sequence into a knowledge text generation model to obtain a knowledge text sequence;
3) Transmitting the knowledge text sequence and the rumor text sequence into refute a rumour conclusion generating models to obtain refute a rumour conclusion text sequences;
4) Transmitting the rumor text sequence and the knowledge triplet into refute a rumour conclusion generating model to obtain refute a rumour text generated in one step;
5) The knowledge text sequence and refute a rumour conclusion text sequence are used together as refute a rumour text generated in two steps.
In the step 1), the data is processed to obtain a knowledge triplet, and a LTP model of Ha Gong open source can be utilized to obtain a syntactic analysis tree by utilizing the LTP model, and then the triplet is extracted by taking predicates as centers.
In step 2), the knowledge text generation model adopts the architecture of a transform multi-layer Decoder, removes the attention layer between Encoder-decoders on the basis of tranformer, and simultaneously changes the attention mechanism in the Decoder.
In the step 3), the refute a rumour conclusion generation model adopts a pytorch edition GPT2-ML architecture trained by 15G Chinese corpus.
In step 4), the one-step generation is based on an end-to-end model, a refute a rumour conclusion model of a pytorch edition GPT2-ML architecture trained by 15G Chinese corpus is adopted, a rumor text sequence and a knowledge triplet are input, and a knowledge text sequence and a refute a rumour conclusion text sequence are output and are jointly used as one-step generation refute a rumour text.
In step 5), the knowledge text sequence is obtained by a knowledge text generation model based on a transform multi-layer decoder architecture, the refute a rumour conclusion text sequence is obtained by a refute a rumour conclusion model of a pytorch version of GPT2-ML architecture trained by 15G Chinese corpus, and the two are used together as refute a rumour text generated in two steps.
The method takes rumors as research objects, establishes a knowledge text generation model based on a multi-layer decoder architecture of a transducer, generates a knowledge text sequence by using a knowledge triplet, simultaneously establishes a refute a rumour conclusion generation model by adopting a GPT2-ML model of pytorch edition, introduces rumor constraint and knowledge constraint to generate refute a rumour conclusion, and jointly forms refute a rumour text by the generated knowledge text sequence and refute a rumour conclusion. According to the invention, two models are established, a knowledge text generation model and a refute a rumour conclusion generation model are established, and the generated texts of the two parts are taken as refute a rumour text. The first step of the invention is to generate a knowledge text sequence based on the knowledge triples, which can not only increase the persuasion of refute a rumour texts, but also support the complete refute a rumour text generation; introducing rumor constraint and knowledge constraint simultaneously to generate refute a rumour conclusion parts; the two-step generated text is collectively referred to as refute a rumour text. The two-step refute a rumour text method has the effect obviously superior to other generation methods, and the method relieves the problem of difficult generation of refute a rumour long text by introducing external knowledge, and ensures that the generated refute a rumour text has higher logic.
Drawings
Fig. 1 is a general structural view of the present invention.
FIG. 2 is a block diagram of a knowledge text generation model in the present invention.
Fig. 3 is a masking mechanism diagram of a knowledge text generation model.
Fig. 4 is a two-step refute a rumour text generation block diagram.
FIG. 5 is a diagram of the structure of the GPT2-ML model. The box section is a transducer module structure, 48x representing a total of 48 layers.
FIG. 6 is a diagram of the structure of a pre-trained GPT2-ML model.
Fig. 7 is a graph of experimental results of knowledge text generation model fine tuning. Curve a represents the result of knowledge text generation model KTG, curve b represents KTG-Fine, i.e. the experimental result of the model after Fine tuning on refute a rumour knowledge text training set, and the ordinate is the value of the BLEU-1 index.
FIG. 8 is an effect of different sized training sets on refute a rumour conclusions to generate experimental results.
Detailed Description
The invention will be further illustrated by the following examples in conjunction with the accompanying drawings.
The embodiment of the invention establishes two models, namely a knowledge text generation model and a refute a rumour conclusion generation model, and the two generated texts are taken as refute a rumour text together. Fig. 1 is a general structural view of the present invention. Firstly, as shown in the left half part of the graph, generating a knowledge text sequence by using a knowledge text generation model; second, as shown in the right half of the figure, generating refute a rumour conclusion text sequences by utilizing refute a rumour conclusion generating models; the two-step generated text is collectively referred to as refute a rumour text.
1. Knowledge text generation model
The invention adopts a transform multi-layer Decoder structure to generate knowledge text, and is slightly different from the original transform Decoder in that the attention layer between Encoder-decoders is removed, the attention mechanism in the Decoder is changed, the position embedding method is also modified to a certain extent, and the overall structure of the model is shown in figure 2. The model is a knowledge text generation model based on a transducer single decoder structure. "[ BOS ]" is an indicator of the beginning of a text sequence; "[ SEP ]" means a separator with which the triplet sequences and the knowledge text sequences are separated at the time of model training; "[ EOS ]" means an ending symbol of a sentence; the middle part is a transducer module used in the invention, 10x represents 10 layers of transducer modules, and the transducer modules are overlapped; linear & Softmax represents the output layer.
The original transducer model is composed of an encoder and a Decoder, and the overall structure of the model is similar to that of the Decoder module of the transducer, except that the attention mechanism between Encoder-decoders in the original paper is omitted.
In the knowledge text generation model, one or more groups of knowledge triples G=((s1,r1,o1),(s2,r2,o2(,…,(sm,rm,om)), are given, wherein m is the number of the triples, s is the head entity of the triples, r is the relation among the entities, and o is the tail entity of the triples. From these knowledge triples, a corresponding knowledge text sequence y= (Y 1,y2,y3,…,ym) is generated, m representing the length of the output sequence, where Y belongs to the vocabulary C. During the training phase of the model, GREEDYSEARCH algorithm is utilized to select the candidate sequence with the highest probabilityThe difference loss is calculated with the real text sequence Y= (Y 1,y2,y3,...,ym), the loss function is usually cross entropy, and the objective function of the model can be expressed as: argmaxP (Y|X, θ), where θ is a parameter of the model, where P (Y|X) can be expanded to:/>
P (y|x) is actually the joint probability distribution of the sequence Y 1,y2,...,ym given X, usually the decoding operation is performed step by step, i.e. each token is generated step by step from left to right, so that at decoding instant t the token to be generated is related not only to the model parameters θ but also to all tokens generated before instant t, thus generating a continuously readable sequence.
Because a single decoder model is used, a certain improvement on the attention mechanism is needed, the attention mechanism in the original decoder is unidirectional, and the attention mechanism is usually unidirectional for the information of the input end, so as to ensure the integrity of sentence semantics, the attention mechanism needs to be bidirectional when needed, and the attention mechanism needs to be unidirectional for the output of the target end. In the design of the model, these operations are usually implemented by means of masking, so the model of this section makes some small changes to the masking mode, so that the attention mechanism of the input-side data of the model is bidirectional, and the attention mechanism of the output-side data is unidirectional; in the prediction stage of the model, the influence of the mask is not considered, and the masking mode is specifically shown in fig. 3.
Fig. 3 is a masking mechanism diagram of a knowledge text generation model. In the left half of fig. 3 Src represents the input of the source, here specifically the sequence of knowledge triples, trg represents the output of the target, here the sequence of knowledge text to be generated, dark parts representing the addition of masks invisible and bright parts representing visible, so that the information of the input is visible bi-directionally and the information of the output is visible uni-directionally by means of the masks. The right half of fig. 3 can intuitively see the correspondence between the input end and the output end, where "[ BOS ] represents a special symbol for the start of the sequence," [ SEP ] represents a separator for the sequence, "[ EOS ] represents an end symbol for the sequence, and the dashed line represents the sequence content visible when the target end is generated, i.e., only all sequences in front of the token can be seen, so that the model starts from" [ SEP ] to generate the target end sequence until the end symbol "[ EOS" appears, and by using this mask, the model can be calculated in parallel at the time of training.
Given an input sequence x= { X 1,x2,…,xn), Q, K, V is obtained after matrix transformation, and represents query, key and value, respectively. In the self-attention mechanism, the query is self-inverting, while in other attention mechanisms, the query comes from outside. The formula for the self-attention mechanism is as follows:
Wherein, Is an artificially specified hyper-parameter because the longer the vector dimension, the larger the calculated value, and hence the sum/>For balancing.
From the formulas of the self-attention mechanism, it can be seen that the self-attention calculation formulas are parallel, and in the natural language processing task, the text is usually presented in a sequence structure, which results in that if the self-attention mechanism method is directly used, the model can see the information of the whole sequence at the same time. In the field of text generation, etc., models typically require unidirectional progressive decoding to generate a sequence of text, which requires attention from a self-attention mechanism to the generated sequence. In practical applications, the self-attention mechanism is usually used with masks that are not used, by which the self-attention mechanism only focuses on part of the information, and by which the task that is not used is adapted by adjusting the given mask.
The multi-head self-Attention mechanism is based on self-Attention, and the specific method is to repeatedly execute the Attention operation for a plurality of times, and then connect all values to obtain a sequence after the Attention operation, namely the multi-head self-Attention mechanism. The multi-headed attention mechanism is mainly employed because different heads may notice information in different aspects of the sequence.
For each of the input triples G=((s1,r1,o1),(s2,r2,o2),…,(sm,rm,om)),, the entities and relationships of the triples are separated by spaces, the triples are separated by special symbols "[ SEP ]", and then are segmented. Here, the sequence obtained by the above processing of the triples is represented by x= (X 1,x2,…,xn), the input of the model is represented by h 0, and the calculation formula of h 0 is as follows:
h0=xWe+pWp
Where W e is an embedding matrix of word vectors, W p is a position embedding matrix, x represents input data, and p represents input position codes, and since the self-attention mechanism is operated in parallel and does not consider the front-to-back relationship of sentences, it is used to represent timing information of the input data. After h 0 is obtained, the obtained product is then transferred into a 10-layer transducer module, and the specific formula is as follows:
The transducer module is here designated as transducer block, resulting in h 10, where L represents the number of layers of the transducer module. The structure and processing of the data are the same for each transducer block, and each transducer module mainly includes a multi-head self-attention mechanism layer, a batch normalization layer, and a forward propagation layer.
These transducer modules can be understood as stacked together, and the data is passed through the transducer modules of each layer in turn, and finally through a full-connection layer and Softmax normalization operation to obtain the probability distribution of the corresponding word on the vocabulary, denoted here as P (y), where the formula is as follows, and L is the number of layers of the transducer module:
In the model training stage, a method based on maximum likelihood estimation is adopted, a cross entropy loss function is used, and a corresponding optimization algorithm is combined to train the model. The loss function is as follows:
2. refute a rumour conclusion generation model
The task of the present invention refute a rumour to conclude that the model is generated is to generate refute a rumour text sequence given the rumor text and the corresponding external knowledge triples. The refute a rumour text here can be subdivided into two parts, one part being knowledge text, which can increase the convincing power of the refute a rumour text, and the other part being the refute a rumour conclusion, which is mainly illustrative of the content of the rumors themselves.
The rumor text sequence is defined as x= (X 1,x2,x3,…,xn), where n is the length of the rumor sentence and X belongs to the vocabulary C. The knowledge triples are defined in a similar manner to the knowledge text generation part, and are denoted as G=((s1,r1,o1),(s2,r2,o2),…,(sm,rm,om)), herein, where m is the number of triples, s is the subjects of the triples, r is the relationship between the entities, and o is the subjects of the triples, where each rumor text data corresponds to at least one set of knowledge triples.
Refute a rumour the text sequence is defined as y= (Y 1,y2,y3,…,yn), where n is refute a rumour text length and Y belongs to vocabulary C. The refute a rumour text sequence consists of two parts, one part is the external knowledge text, defined herein as y1= (Y 1,y2,y3,…,ym), m is the length of the knowledge text, and the other part is the conclusive part of refute a rumour, defined herein as y2= (Y m+1,ym+2,ym+3,…,yn), all sequences sharing a vocabulary.
The task of the present refute a rumour conclusion generation section may be expressed as generating refute a rumour text Y given rumor X and knowledge triplet G. The overall framework of the model is shown in fig. 4, where the input data is a rumor and knowledge triplet, and the text sequence y= (Y 1,y2,y3,…,yn) is output refute a rumour.
The first step of the model is to generate an external knowledge text, that is, the model in the dashed box in fig. 4, where the model is a model trained in knowledge text generation, and in the subsequent training process, parameters of the part of the model are not updated, and only parameters of the model are updated refute a rumour to conclude generation. The generated external knowledge text can serve as a refute a rumour basis to enhance the convincing power of the refute a rumour text, and also support the refute a rumour conclusion generation in the second step.
The second step of the model is to generate refute a rumour conclusions, the input of the refute a rumour conclusion model is the knowledge text and rumor text sequence generated in the first step, the output is refute a rumour conclusion text sequence, the model only calculates loss for refute a rumour conclusion sequence { y m,ym+1,…,yn } during the second step of training, and only updates parameters of the refute a rumour conclusion model.
Here, similar to the Teacher Forcing idea, the refute a rumour conclusion generation model inputs knowledge text as true knowledge text during the training phase, and knowledge text generated from knowledge triples during the prediction phase of the model is input, and the most main reason for this is to prevent errors of the generated knowledge text from further accumulating in the refute a rumour conclusion generation model.
Whereas the data size of refute a rumour text is too small, and the text generation task is different from other natural language processing tasks, it often requires a large amount of data to train the model to generate the text with high readability, bert is a classical pre-training language model, and performs well in tasks such as text understanding, but the Bert model is bidirectional when pre-trained and does not conform to the text generation task, while the model based on the encoder-decoder framework needs to relearn the attention of the decoder end to the encoder end, and in case of too little training data, it is difficult to obtain good generation effect, and experiments prove that the text generation task can be performed by using the Bert model and the like by using the general encoder-decoder framework, but the required text data size is still much larger than refute a rumour text data size in the invention.
Since refute a rumour text data is very limited, text generation models often require large amounts of data to train the model to be able to generate text with high readability. Meanwhile, other text data are not suitable for assisting training due to the specificity of refute a rumour text data, so that the refute a rumour conclusion generation part of the invention uses a method based on a pre-training language model to do the task of refute a rumour text generation on the basis of pre-trained GPT-2, and the refute a rumour conclusion is generated by training and updating part of parameters of the task.
The GPT-2 language model is a unidirectional autoregressive model, which is matched with a text generation task, and experiments prove that a better result can be obtained on the article abstract generation task by using only 3000 data fine tuning models. However, the author of the GPT-2 paper does not give a pre-trained Chinese language model, GPT2-ML is adopted here, a model structure diagram is shown in figure 5, a model obtained by 15G Chinese corpus training is selected, the model is generated by carrying out refute a rumour conclusion on the basis of the model in the experiment of the invention, in addition, only a pre-trained tensorflow version model is provided by the model, and the model is converted into a pytorch version model for use.
During model training, most of parameters of the model are frozen, namely, the green part is updated, namely, only the parameters of a text and position coding layer and a final full connection layer are updated, and other parameters are kept unchanged and are used for generating refute a rumour conclusion parts. In addition, since the GPT-2 paper does not have a segmenter (Tokenizer) that is open-sourced with respect to Chinese, here Tokenizer is used that is open-sourced by Bert, and other vocabulary is used that is given by the pre-trained model, consistent with the GPT-2 original paper.
Here, the vector h 0 is obtained after the data enter the text and position coding layer, and the specific calculation method is as follows:
h0=xWc+pWp+qWq
where W c、Wp and W q represent a word vector embedding matrix, a position vector embedding matrix, and a segment matrix, respectively, the segment embedding matrix can be understood as identifying the input and output terminals. x is input data; p is a position code; q is used to mark the type of input token, here specifically 0 to mark the rumor text sequence and the knowledge text sequence, and 1 to represent refute a rumour conclusion text sequence, since the GPT2-ML language model is computed in parallel for the information of the source and target, q is needed to distinguish the input of the source and target. It should be noted that W c and W q share a matrix, because the open source code of the pre-trained model is given so as not to disrupt the overall effect of the model, and thus remain consistent with the pre-trained model.
After the input hidden vector h 0 is obtained, the hidden vectors are sequentially transmitted into 48 layers of transformation modules, the vector obtained after passing through all transformation modules is set as h 48, and probability distribution on a vocabulary is obtained through a full connection layer and softmax and is recorded as P (Y), and the calculation formula is as follows:
P(Y)=softmax(h48Wo)
Wherein the dimension of W o and the text embedding layer To obtain a probability distribution of the output over the vocabulary. In the training stage of the model, the model is trained by using a maximum likelihood method, and the loss function is a cross entropy loss function, and the specific formula is as follows:
The model is parallel during the operation of the training stage, the input length and the output length of the model are consistent, when the loss is calculated, the segment identifier q is used for distinguishing the source end from the target end, so that only the loss of the target sentence is calculated, and the loss of the sequence { y m+1,ym+2,…,yn } is calculated.
To verify the effectiveness of the two-step refute a rumour text generation, the present invention also attempts to generate refute a rumour text using a one-step generation method, in which a triplet of rumors and knowledge is input into a model, and then refute a rumour text is generated, where the one-step generation is based on an end-to-end model, and the input data is decoded into output data by an encoder, but in view of the fact that the amount of training set data is too small, the training is performed on a pre-trained GPT2-ML model, and the model structure is shown in fig. 6.
The "[ SEP ]" is a text separator, the input end of the model is a rumor text sequence and a knowledge triplet sequence, and the input end of the model is a complete refute a rumour text sequence. The fine tuning mode of the model and the fine tuning mode of the two steps are kept all the time, namely only the text and position coding layer and the last full connection layer of the model are updated, wherein a data separation method, a data processing method and the like are consistent with the two-step generation model, the model is trained by adopting a method based on maximum likelihood estimation and combining a corresponding optimization algorithm during training, and the loss function calculation sequence { y 1,y2,…,yn } of the model is lost, and the specific calculation formula is as follows:
whereas the decoding strategy employed by the GPT-2 pre-training model is top-k sampling, all experiments of the present invention remain using this decoding strategy. The decoding process may be expressed here as a known model and a sequence y 1,y2,…,y, of length m tokens, the task of decoding being to generate the next n consecutive sequences y 1,y2,…,ym+n, where m and n are different from the meaning represented by the model part and do not represent the refute a rumour text length, but are formulated for convenience. The probability distribution of the final output of the model is denoted here as P (y 1∶m+n), and since the decoder is decoding step by step, the probability distribution decomposition can be expressed as:
During model decoding, each time step outputs a probability distribution, wherein the probability distribution output by the time step i at the moment is denoted as P (y i|y1∶i-1), the probability distribution obtained by Softmax normalization operation during the probability distribution is 1, a super parameter k is artificially given, k token with the largest probability is selected from the probability distribution output at the moment to form a word list V (k), and the P' value in the formula num is the largest:
Then, the probability of all token not in the vocabulary V (k) is set to 0, and then the new probability distribution P' (y|y 1∶i-1) is obtained by re-normalization, and the specific formula is as follows:
After obtaining the probability distribution of the model output at the i moment, randomly selecting a token from the new probability distribution as the output at the moment, wherein the random selection is not to randomly select a token from the word list V (k), but is selected according to the probability distribution, the probability that the token with the larger probability is selected is larger, and the probability is 0 is not selected.
When the model is decoded, the output of each step is selected according to the method until the separator [ SEP ] is selected, the stop is performed, the stop is not used by the [ EOS ] because the pre-training model is used by the [ SEP ] and the separator is used for representing the sentence stop in the experiment to maintain the model effect, and if the stop is not learned all the time, the stop is generated to the artificial specified maximum length.
The following specific examples contain two parts: a knowledge text generation part and refute a rumour a conclusion generation part. The knowledge text generation section contains 3 embodiments: performing performance evaluation on a knowledge text generation model, performing fine tuning experiments on the knowledge text model, and performing instance analysis on the knowledge text generation; the refute a rumour conclusion generation section contains 5 embodiments: refute a rumour conclusion generation assessment, fine tuning method comparison, influence of training data size on experimental results, two-step refute a rumour text generation assessment and instance analysis.
1. Knowledge text generation part experiment
The invention constructs corpus based on knowledge triples to knowledge texts, the construction process of the corpus needs to extract triples from sentences, the extraction of triples is mainly based on LTP models with Hadamard open sources, the specific method is that firstly, a syntactic analysis tree is obtained by using the LTP models, then triples are extracted by taking predicates as centers, after the triples are obtained, named entities in the sentences are obtained by using the LTP models, and then, the extraction of the triples related to the named entities is tried according to the named entities.
The knowledge text generation part of the invention uses a total of two data sets (1) wikipedia Chinese data sets. (2) refute a rumour knowledge text data set. The purpose is to generate knowledge text and to pad the complete refute a rumour text generation.
Whereas the chinese wikipedia dataset contains a lot of noise and mostly appears in terms of terms, the data is processed here by segmenting the text according to the usual sentence ending symbols in chinese, then removing sentences with special symbols and finally preserving sentences with lengths between 20 and 60 words. By using the triplet extraction tool, only sentences from which triples can be extracted are reserved, sentences from which triples cannot be extracted are discarded, and finally the wikipedia data set used for experiments is formed, and the wikipedia data set sample is shown in table 1 for example.
Table 1 wikipedia dataset sample
As shown in Table 1, for unified representation, the triples in the data set are regarded as knowledge triples, and the corresponding texts are regarded as knowledge texts, so that it can be seen that some texts have better effect of extracting triples, and some texts are less capable of reflecting the overall situation of sentences. The wikipedia dataset details are shown in table 2:
table 2 wikipedia dataset detailed information
Quantity of
Knowledge text average length 36.5
Maximum length of knowledge text 60
Average triplet number 1.48
Total amount of samples 474,789
Training set, validation set, test set were divided in 80%, 10% and 10% ratios, and specific information of data set is shown in table 3:
Table 3 wikipedia dataset partitioning
Quantity of
Training set 379,831
Verification set 47,479
Test set 47,479
The refute a rumour knowledge text data set is obtained by manual labeling, the data source is a Chinese Internet combined refute a rumour platform, after the refute a rumour related knowledge text is obtained, the LTP tool is also adopted to extract the triples, the text which cannot extract the knowledge triples by the tool is obtained by manual labeling, and 369 pieces of knowledge text data are finally obtained, wherein the data information is shown in Table 4:
TABLE 4 refute a rumour knowledge text data set
Quantity of
Knowledge text average length 29.8
Maximum length of knowledge text 54
Average triplet number 1.41
Total amount of samples 369
In view of the fact that refute a rumour knowledge text data is too small, the embodiment of the invention regards all refute a rumour knowledge text data as a test set.
The comparison examples are all based on Encoder-Decoder architecture, the maximum length of the encoder input is not fixed, the length of the sequence in each Batch is automatically adjusted according to the data, and the length of the sequence in each Batch is adjusted to be the maximum value of the length of the data token in the Batch. The decoder is also automatically adjusted in length for each Batch while training. At the time of prediction, the encoder was kept the same as at the time of training, and the maximum length of the decoder was set to 60, which is the decoding Search algorithm using the Beam Search, and the Beam Size in the Beam Search method was set to 10.
In all comparative examples, the text output by the decoder starts with "[ BOS ]" and ends with "[ EOS ]" and all models are trained using maximum likelihood estimation, the loss function uses a cross entropy loss function, and for all the filler words "[ PAD ]", the loss is not calculated, and only the knowledge text is calculated.
The operating environment of the examples is shown in table 5:
TABLE 5 running Environment
Model training environment
Operating system: ubuntu 16.04
CUDA:NVIDIA CUDA Toolkit 10.1
Programming language: python 3.6
Deep learning framework: pytorch 0.4.4
GPU:NVIDIA GTX 1080x
RAM:32G
The single decoder model parameter settings used are shown in table 6:
TABLE 6 model training related parameter settings
Training hyper-parameters
Vocabulary size: 25002
Maximum length of position embedding: 1024
Maximum length of Mask: 1024
Latitude size: 768
Transformer layer number: 10
Number of attention heads per layer: 12
Activation function: gelu A
Dropout:0.1
Batch Size:32
Epoch:30
Optimization function: adam (Adam)
Initial learning rate: 0.00015
Generating the maximum length of sentences: 100
Beam Size:10
Example 1 knowledge text generation model Performance assessment
The knowledge text generation section compares the different text generation models using a plurality of evaluation indexes, and the results are shown in table 7, and the bolded and marked results are the best results of each evaluation index. These indices are calculated based mainly on n-gram, so that different word segmentation modes can make the index values of the evaluation slightly different, all the examples and comparative examples use the same word segmentation tool as that used in model training, and then a unified evaluation tool is used for calculating the score.
Table 7 test results of different models on wiki dataset
BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L
Seq2seq 19.25 8.02 4.42 2.58 19.56
Seq2seq-Att 37.18 27.82 20.96 16.05 41.47
Transformer 41.67 33.04 26.25 20.12 48.64
KTG 42.16 33.04 27.01 22.14 49.48
As can be seen from table 7, after the attention mechanism is added to the Seq2Seq, the generated effect is obviously improved, which is that the average length of the knowledge text is longer, the circulating neural network is easy to lose much information when generating long sentences, and the problem of the information loss of the circulating neural network can be relieved to a certain extent after the attention mechanism is added. The Transformer is structurally more suitable for text generation tasks than the recurrent neural network. KTG is a knowledge text generation model used by the invention, namely a single decoder model for improving masks, and is superior to a classical encoder-decoder framework-based method in all indexes, and experiments prove that the knowledge text generation model is effective.
Because the model is mainly used for refute a rumour text generation task, experiments are also carried out on knowledge text generation in refute a rumour data sets, all knowledge texts in refute a rumour texts are used as test sets, the model is not trained on refute a rumour knowledge text data, knowledge triples in refute a rumour knowledge text data sets are directly input into the model to generate knowledge texts, reference texts are real knowledge texts, and results on refute a rumour text data sets are shown in table 8.
TABLE 8 text generation results for different models on refute a rumour knowledge text datasets
BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L
Seq2seq 18.88 9.15 4.77 2.33 18.04
Seq2seq-Att 40.35 32.22 25.95 20.84 45.84
Transformer 47.37 38.27 30.94 24.75 48.03
KTG 47.67 39.75 33.10 27.24 50.25
Comparing the performance of the methods of tables 7 and 8 on different data sets, it can be seen that the models perform similarly on both test sets, indicating that wikipedia data is suitable for training refute a rumour knowledge texts, probably because wikipedia data books are knowledge text data sets built based on knowledge, similar to refute a rumour knowledge texts in both content and form. Meanwhile, the generalization capability of the model is also high.
The KTG model performs on refute a rumour text datasets as well as other classical models based on encoder-decoder frameworks, compared to other models, proving the effectiveness of KTG models in generating refute a rumour knowledge text.
Example 2 knowledge text model trimming experiments
The purpose of the fine tuning experiment was to generate refute a rumour knowledge text, where refute a rumour knowledge text data was subdivided into data sets at a ratio of 90% to 10% without dividing the validation set, as shown in table 9, in order to explore how better to generate refute a rumour knowledge text:
Table 9 refute a rumour knowledge text dataset partitioning
Quantity of
Training set 332
Test set 37
Training is carried out by using refute a rumour knowledge text data on the basis of the model trained by the wikipedia data, namely the model trained by the wikipedia is regarded as a pre-training model, then fine tuning is carried out on a refute a rumour knowledge text data set, in view of the fact that refute a rumour knowledge text data are too little, the mode of fine tuning is to update only the last full connection layer of the model, and the fine tuning result is shown in a table 10:
table 10 results of model fine-tuning on refute a rumour knowledge text
BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L
KTG 49.50 40.70 33.59 28.00 52.36
KTG-Fine 50.35 41.65 34.72 29.12 53.10
According to Table 10, KTG and KTG-Fine are both single decoder models representing the use of the knowledge text generation section of the present invention, where KTG specifically refers to experimental results on refute a rumour knowledge text test sets after training the models on the Wikipedia dataset; the KTG-Fine represents experimental results of the model after Fine tuning on refute a rumour knowledge text training sets, and the best comprehensive index results are taken.
The index values of KTG in table 8 and table 10 are slightly different, because table 8 is an experimental result on all refute a rumour knowledge text data, and the more data tested, the more the model effect can be represented. Table 10 is a table for exploring the effect of model tuning, so refute a rumour knowledge texts are divided into training and testing sets to illustrate the impact of tuning on experimental results.
As can be seen from table 10, the experimental results of the model after fine tuning are better than those before fine tuning, which means that fine tuning is performed on the model trained on wikipedia data by using refute a rumour knowledge text data, so that the model can learn information specific to refute a rumour text knowledge, and the method effect is improved.
To intuitively sense the influence of the fine tuning model on the experimental result, BLEU-1 is selected as an index for evaluating the effect of the model, and the result is shown in FIG. 7. The BLEU-1 can embody the accuracy of model generation, and can be used for better performance in refute a rumour knowledge text generation by fine tuning the model. It can also be found that training to the back, the model's effectiveness is significantly reduced, possibly because refute a rumour knowledge text data volumes are too small, training on small data sets can destroy the text generation capabilities of the model as a whole.
Example 3: knowledge text generation instance analysis
Example analyses were performed here on wikipedia knowledge text and refute a rumour knowledge text, respectively, and on wikipedia data as shown in table 11:
table 11 wiki knowledge text generation sample
As can be seen from table 11, the Seq2Seq model tends to ignore the information input by the source, which may be that the loss of the information by the recurrent neural network is relatively large; the effect of the Seq2Seq-Att is much better than that of a non-attentive mechanism, and the information input by a source terminal can be obviously captured; the effects of the Transformer and the KTG are obviously superior to other models, the generated sentences are strong in readability, the generated sentences do not deviate from the knowledge triples, and the generated sentences are basically generated around the knowledge triples.
Two samples were taken for analysis as well from the result of refute a rumour knowledge text generation, sample information being shown in table 12:
table 12 refute a rumour knowledge text generation sample
For sample 1 in table 12, although the text generated by KTG has a certain discrepancy from the real reference text, the generated sentence is also reasonable; for sample 2 in table 12, no "eyes" appear in the knowledge text, but both the transducer and KTG generate the same sentence as the real text, indicating that the model has some memory for conventional sentences. These examples all illustrate that KTG-generated sentences are highly readable, do not deviate substantially from knowledge triples, and have some logic.
2. Refute a rumour conclusion generation section experiments
Based on the data labeling specification in the previous part of experiments, 369 pieces of refute a rumour text data are obtained by labeling from a 'Chinese Internet combined refute a rumour platform', each group of data comprises a 'rumen text', 'knowledge text' and a 'refute a rumour conclusion', and besides, a 'knowledge triplet' is extracted from the knowledge text, wherein the knowledge triplet is obtained by adopting an automatic extraction method, the other parts are all obtained by manual labeling, for the knowledge text of which the knowledge triplet cannot be automatically extracted, the corresponding knowledge triplet is obtained by adopting a manual labeling mode, so as to obtain refute a rumour text data sets, and detailed information of the refute a rumour data sets is shown in a table 13:
Table 13 refute a rumour text dataset detailed information
Refute a rumour text data set
Average rumor length: 12.6
Maximum rumor length: 43
Knowledge text average length: 29.8
Maximum length of knowledge text: 54
Refute a rumour conclusion average length: 15.7
Refute a rumour conclusion maximum: 36
Knowledge triplet average number: 1.4
Total amount of data: 369
Because of the limited number of data sets, this portion of experimental data divided training and test sets by a ratio of 90% to 10% without dividing the validation set, the specific information is shown in table 14.
TABLE 14 refute a rumour dataset sample number
Quantity of
Training set 332
Test set 37
Data samples are shown in table 15, for example:
Table 15 refute a rumour text data sample
Because the parameters of the GPT2-ML model are more, a larger video memory is needed to perform the experiment, the operation environment of the experiment is different from the previous experiment environment, and the specific operation environment of the experiment is shown in the following table 16:
Table 16 experiment operating Environment
Model training environment
Operating system: ubuntu 16.04
CUDA:NVIDIA CUDA Toolkit 10.1
Programming language: python 3.6
Deep learning framework: pytorch 1.2
GPU:NVIDIA GTX Titan x 1
RAM:128G
In view of the too small amount of experimental data, the comparison of experimental parameter settings is here small. In the comparison experiment, the hidden parameters related to the long and short memory neural network units are all set to 128, the word vector dimension is 128, the encoder data length is not limited, the maximum length of all decoders is set to 40 when the decoder is used for model prediction, and the maximum length of all decoder models is set to 130.
The number of encoder and decoder layers of the transducer model is 6, wherein all the self-attention mechanisms are 8 heads, the word vector dimension is 128, and the position coding mode is consistent with that in the original paper. For fairness of experiments, all experiments in the section share a word list, the size of the word list is 8021, word segmentation modes are consistent, data processing modes are consistent, space separation is used among elements in triples, and special symbol separation is used among triples.
The two-step generation model is based on a pre-training model, and relates to GPT2-ML parameters and a parameter setting table 17 during training:
table 17 GPT2-ML model training related parameter settings
Training hyper-parameters
Vocabulary size: 8021
Maximum length of position embedding: 1024
Maximum length of Mask: 1024
Dimension size: 1536
Transformer layer number: 48
Number of attention heads per layer: 24
Activation function: gelu A
Dropout:0.1
Batch_size:8
Epoch:20
Optimization function: adam (Adam)
Initial learning rate: 0.00005
Maximum length of sentence at the time of generation: 150
Top-k:8
It should also be noted that the term "maximum length of sentence at generation" 150 is used herein mainly at the time of prediction, and refers not to the maximum length of the generated sentence, but to the sum of the lengths of the "rumor text sequence" and the "refute a rumour text sequence" of the model at the time of prediction being not more than 150.Top-k represents the size of k in Top-k random decoding.
To ensure fairness of experiments, all experiments in this section share a set of vocabulary. Because different word segmentation may cause a difference in the result of the evaluation index, the GPT-2 model does not have a word segmentation device (Tokenizer) of open source chinese, where the word segmentation method of all models adopts Tokenizer based on Bert open source, and then the result is calculated by using an evaluation tool.
Example 1 conclusion refute a rumour evaluation of Generation
Firstly, evaluating the result generated by refute a rumour conclusion, wherein a reference text is a real refute a rumour conclusion text sequence, the input of a model is regarded as a rumor sequence and a knowledge text, the generation effect of refute a rumour conclusion is evaluated, the experimental part generates model comparison on different texts by using various evaluation indexes, and the experimental result is shown in table 18:
table 18 generation of text experiment results for different models on test set
BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L
Seq2seq 2.85 2.04e-8 3.97e-11 1.76e-11 3.97
Seq2seq-Att 2.85 2.04e-8 3.97e-11 1.76e-11 3.97
Transformer 2.85 2.04e-8 3.97e-11 1.76e-11 3.97
GPT2-ML 30.58 22.17 15.26 11.10 33.99
The comparison experiment shows that some index data are the same because the model finally converges to all the generation periods and no ending symbol is generated, and the maximum generation length of the text data in refute a rumour conclusion in the comparison experiment is set to be the same value, so that the calculated partial index data are the same.
The reason for all the generated periods in the comparison experiment is that the training data is too little, and the text generation model usually needs a large amount of data to train unlike the models such as text classification, so that the models can learn the implicit relation among sentences.
The experimental result shows that the model can be adjusted to a better effect under the condition of a very small amount of data by a method of fine tuning a pre-trained language model, and the effect generated by the model is far higher than that generated by directly training a conventional text generation model in the aspects of accuracy, recall rate and the like.
The generated refute a rumour conclusion that the text quality is higher is mainly beneficial to the memory of the pre-training model, the relation among sentences is learned in a large amount of data, the text with higher readability can be generated under the condition that the whole structure and parameters of the model are not damaged, and the capability of the model for generating the text is reserved.
Example 2 comparison of trimming methods
The GPT-2 model is simply considered to consist of three parts, namely a text and position coding layer, a transducer module layer and a full connection layer, and in order to explore the influence of parameters of different parts of the fine tuning model on the final generation effect, experiments are carried out by adopting different fine tuning modes, and the experimental results are shown in a table 19. It should be noted that this is to evaluate the effect of the refute a rumour conclusion generation model in the second step, that is, evaluate the generation effect of the refute a rumour conclusion, and the reference text is the true refute a rumour conclusion text.
TABLE 19 influence of different trimming modes on experimental results
BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L
Head-Embed-Tran 0.35 0.00 0.00 0.00 5.95
Head 32.77 24.60 18.03 13.58 31.16
Embed 14.15 8.98 5.99 4.15 19.07
Head-Embed 34.56 27.55 22.09 18.02 34.17
The Head-embedded-Tran is used for updating parameters of the model text and position coding layer, the transducer module layer and the full connection layer; the Head only carries out parameter updating on the last full connection layer of the model; the embedded is to update the parameters of the text and position coding layer of the model; head-embedded updates the text and position coding layer and the last full-join layer of the model.
The Head-embedded-Tran experiment results show that the parameter effect on the whole model is poor, the accuracy is almost close to 0, and the recall index is slightly better, because the model tends to generate single words or words, similar to generating text such as 'will', 'will' and 'will not', because the model is possibly pre-trained by a large amount of question-answer corpus, and meanwhile, the mode of given data is similar to the question-answer form during training, so that the model is easier to generate similar corpus. The main reason that the model has low accuracy and high recall rate is that the generated words are more common, and the probability of the generated words appearing in the reference text is higher, so that the recall rate score is higher than other indexes.
Comparing the experimental results of Ebed, head-Ebed and Head, it can be seen that the effect of the model is much worse than that of the best updated full-link layer when only the embedded layer parameters of text and location are updated, probably because the last full-link layer can directly influence the output of the model. The text and data embedding layer of the fine-tuning model is much better than the fine-tuning of the whole model, because the parameters of the whole model are destroyed when the whole model is adjusted with a small data amount.
Comparing the results of Head-embedded and Head shows that fine tuning only the last fully connected layer of the model can achieve good results, because the transducer module layer can retain the core parameters of the model, and here for a specific task, adjusting the last fully connected layer can allow the model to adapt well to new data and tasks. Meanwhile, the text and position embedded layer can help the model to adapt to new data better, and the effect is slightly better when the text and position embedded layer and the last full-connection layer are updated compared with the effect of updating only the last full-connection layer as can be seen from the evaluation index.
Based on the experimental results, in the later one-step generation experiment, the text and position embedding layer and the last full connection layer are selected to be updated by using a model parameter updating mode based on GPT-2.
Example 3 influence of training data size on experimental results
Since refute a rumour text data is very small, different sizes of training data are used to conduct experiments to further explore the impact of data volume on experimental results. Randomly extracting the training set of refute a rumour texts according to different proportions, taking the extracted data as a new training set, and then performing experiment to finely tune the model; the test set remained identical to the test set in the experiments of example 1 and example 2, with no changes.
Experiments were performed here on refute a rumour conclusion generation models, so that the effect of generating refute a rumour conclusion text was evaluated, and the reference text was true refute a rumour conclusion text. Here too, the same index as in the previous experiment was used for experimental evaluation, and the experimental result is shown in fig. 8.
As can be seen from fig. 8, as the amount of training data increases, the effect of the model increases, and although fluctuations can be seen in the middle, the training set size can be basically considered to be somewhat positively correlated with the model effect, and there is a trend to continue to rise, according to which it is not difficult to see that increasing the amount of refute a rumour text increases the overall effect of the refute a rumour conclusion generation model.
The most probable cause of the intermediate fluctuation is that the result of training on small data is unstable because the data volume is too small, and the small change of the test set under the small data can be reflected on the evaluation index.
Example 4 two-step refute a rumour text generation evaluation
To verify the effectiveness of the two-step refute a rumour text generation method, the experimental results of comparing the two-step refute a rumour text generation with the direct generation of refute a rumour text are shown in table 20 below. The comparison is made here with respect to the effect of the generation of refute a rumour text, i.e. the effect of the generation of the knowledge text together with the refute a rumour conclusion, the reference text being the actual refute a rumour text.
Table 20 Experimental results generated in two-step and one-step
BLEU-1 BLEU-2 BLEU-3 BLEU-4 ROUGE-L
Transformer 1.76 1.12 0.00 0.00 7.63
GPT2-ML(One-Step) 25.46 16.36 10.70 7.62 22.57
GPT2-ML(Two-Step) 47.17 37.61 30.42 24.84 43.74
The transducer is based on a conventional encoder-decoder model, trained using refute a rumour text data, input with a rumor text sequence and knowledge triplets, and output with a refute a rumour text sequence. The encoder-decoder based framework here uses only a transducer because refute a rumour has too little text data, and a typical text modeling requires training of a large amount of data. From the experimental results, it can be seen that the effect of the transducer is poor, and the best result is obtained here, because the model sometimes cannot converge and is randomly generated.
GPT2-ML (One-Step) represents a One-Step refute a rumour text generation method shown in FIG. 7, which is to generate refute a rumour text based on a pre-training model, input a rumor text sequence and a knowledge triplet sequence, output refute a rumour text, and the GPT2-ML fine tuning method uses a fine tuning method which has the best effect in a fine tuning method comparison experiment, and experimental results show that the pre-training model is significantly better than a transducer model without pre-training.
GPT2-ML (Two-Step) represents the Two-Step refute a rumour text generation method, and the experimental results show that the effect of the generated results on various indexes is far better than that of the generated results of one Step.
Comparing the experimental results of GPT2-ML (One-Step) and GPT2-ML (Two-Step), the Two-Step refute a rumour text generation effect is obviously better than the direct generation effect, so that the result can be caused by the following aspects that (1) wikipedia data is text for describing terms and is similar to knowledge text in content and form, GPT2-ML is a model pre-trained by a large amount of data, but the trained data is various, such as daily dialogue data and the like. (2) For long text generation, the current generation method only has better performance in the field of randomness generation, and the two-step refute a rumour text generation method can alleviate the problem of difficult long text generation to a certain extent. (3) The existing decoding methods are all decoding step by step, and for long texts, the generated errors are accumulated step by step, and finally the constraint of rumors may deviate.
Finally, as can be seen from the experimental results, the effect of refute a rumour text generation can be obviously improved on a small dataset by using the pre-training model, and various indexes of the text generation method are much higher than those of the text generation method without using the pre-training language model.
Example 5 example analysis
Two-step refute a rumour text generation samples are shown in, for example, tables 21, 22, 23 below:
Table 21 two-step generation refute a rumour sample one
/>
Table 22 two-step generation refute a rumour sample two
Table 23 two-step generation refute a rumour sample three
As can be seen from the embodiment, the effect of generating the transducer is poor due to too little training data, the information of the input end can be directly ignored, meanwhile, many times the model can not be converged, the text can be randomly generated, comma can be generated or the ending symbol can be directly generated, and the better convergence result is obtained.
GPT2-ML (One-Step) is a One-Step generation method using a GPT2-ML pre-training model, namely, a rumor text sequence and a knowledge triplet sequence are input to generate refute a rumour text, and as can be seen from the example, the generated text is higher in readability, but is not refute a rumour text conforming to logic, but is text related to rumors.
GPT2-ML (Two-Step) is a Two-Step generation method proposed by the invention, and as can be seen from the example, refute a rumour texts are generated, and as can be seen from the generated results, refute a rumour text generation in Two steps can be more descriptive of rumors, and does not simply change rumors into negative sentences.
It can be seen from the examples in table 21 that no "artificial culture" occurs at the input end or even in real text, but that the refute a rumour conclusions generated generate this field and are reasonable here, which mainly benefits from knowledge of the pre-trained model itself.
The experiment and the result show that compared with the prior art, the invention has the following advantages and effects:
(1) The knowledge text can effectively improve the convincing ability of refute a rumour texts, meanwhile, the generating effect of refute a rumour conclusions can be enhanced, and the refute a rumour conclusions generating model can effectively capture the information of the rumors and the information of the knowledge texts.
(2) Compared with the one-step generation result, the two-step refute a rumour text generation proposed in the invention is less prone to deviate from rumor text information, and as can be seen from example analysis, the one-step generation result is only related to some words of rumor or knowledge, although the readability is higher, but not the refute a rumour text conforming to logic, because the decoding method is performed step by step, decoding error accumulation easily causes refute a rumour text to deviate from constraint of rumor,
(3) The problem of difficult generation of long texts is alleviated to a certain extent by dividing refute a rumour text generation tasks into two steps, and experimental results show that the generation effect of refute a rumour texts of the two steps is obviously higher than that of other models on each index.
(4) The experiment in the invention learns on the basis of the pre-training language model, so that refute a rumour texts with high readability can be generated under the condition of small samples, and the language model has certain memory capacity along with the rapid development of the pre-training language model, thereby providing a new direction for the learning of the small samples.

Claims (4)

1. The two-step refute a rumour text generation method based on knowledge constraint is characterized by comprising the following steps:
1) Processing data to obtain a knowledge triplet;
2) Transmitting the knowledge triplet sequence into a knowledge text generation model to obtain a knowledge text sequence; the knowledge text generation model adopts a architecture of a transform multi-layer Decoder, an attention layer between Encoder-decoders is removed on the basis of the original tranformer, and a mask enables an attention mechanism of input end data of the Decoder model to be bidirectional and an attention mechanism of output end data to be unidirectional; each transducer module comprises a multi-head self-attention mechanism layer, a batch normalization layer and a forward propagation layer, data sequentially pass through the transducer module of each layer, and finally pass through a full connection layer and Softmax normalization operation to obtain probability distribution of corresponding words on a vocabulary;
3) Transmitting the knowledge text sequence and the rumor text sequence into refute a rumour conclusion generating models to obtain refute a rumour conclusion text sequences; the refute a rumour conclusion generation model adopts a pytorch edition GPT2-ML architecture trained by 15G Chinese corpus;
4) Transmitting the rumor text sequence and the knowledge triplet into refute a rumour conclusion generating model to obtain refute a rumour text generated in one step;
5) The knowledge text sequence and refute a rumour conclusion text sequence are used together as refute a rumour text generated in two steps.
2. The two-step refute a rumour text generation method based on knowledge constraint of claim 1, wherein in step 1), the data is processed to obtain a knowledge triplet, and the knowledge triplet is extracted by using a LTP model of Hadamard open source, firstly using the LTP model to obtain a syntax analysis tree, and then using predicates as centers.
3. The two-step refute a rumour text generation method based on knowledge constraint as claimed in claim 1, wherein in step 4), the one-step generation is based on an end-to-end model, a refute a rumour conclusion model of a pytorch version of GPT2-ML architecture trained by 15G chinese corpus is adopted, a rumor text sequence and a knowledge triplet are input, and a knowledge text sequence and a refute a rumour conclusion text sequence are output to be used together as a one-step generation refute a rumour text.
4. The two-step refute a rumour text generation method based on knowledge constraint of claim 1, wherein in step 5), the knowledge text sequence is obtained from a knowledge text generation model based on a fransformer multi-layer decoder architecture, the refute a rumour conclusion text sequence is obtained from a refute a rumour conclusion model of a GPT2-ML architecture of version pytorch trained using 15G chinese corpus, and both are used together as refute a rumour text generated in two steps.
CN202110918103.6A 2021-08-11 2021-08-11 Knowledge constraint-based two-step refute a rumour text generation method Active CN113627146B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110918103.6A CN113627146B (en) 2021-08-11 2021-08-11 Knowledge constraint-based two-step refute a rumour text generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110918103.6A CN113627146B (en) 2021-08-11 2021-08-11 Knowledge constraint-based two-step refute a rumour text generation method

Publications (2)

Publication Number Publication Date
CN113627146A CN113627146A (en) 2021-11-09
CN113627146B true CN113627146B (en) 2024-05-28

Family

ID=78384349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110918103.6A Active CN113627146B (en) 2021-08-11 2021-08-11 Knowledge constraint-based two-step refute a rumour text generation method

Country Status (1)

Country Link
CN (1) CN113627146B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376110A (en) * 2014-11-27 2015-02-25 武汉理工数字传播工程有限公司 Chinese knowledge inference method based on ontology inference
CN112084314A (en) * 2020-08-20 2020-12-15 电子科技大学 Knowledge-introducing generating type session system
CN112256861A (en) * 2020-09-07 2021-01-22 中国科学院信息工程研究所 Rumor detection method based on search engine return result and electronic device
CN113010693A (en) * 2021-04-09 2021-06-22 大连民族大学 Intelligent knowledge graph question-answering method fusing pointer to generate network
CN113111188A (en) * 2021-04-14 2021-07-13 清华大学 Text generation method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376110A (en) * 2014-11-27 2015-02-25 武汉理工数字传播工程有限公司 Chinese knowledge inference method based on ontology inference
CN112084314A (en) * 2020-08-20 2020-12-15 电子科技大学 Knowledge-introducing generating type session system
CN112256861A (en) * 2020-09-07 2021-01-22 中国科学院信息工程研究所 Rumor detection method based on search engine return result and electronic device
CN113010693A (en) * 2021-04-09 2021-06-22 大连民族大学 Intelligent knowledge graph question-answering method fusing pointer to generate network
CN113111188A (en) * 2021-04-14 2021-07-13 清华大学 Text generation method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于RNN的网络舆情谣言判断研究;王淼;郭阳明;陈泽林;钟林龙;;电脑知识与技术(24);全文 *
基于深层双向Transformer编码器的早期谣言检测;琚心怡;;信息通信(05);全文 *

Also Published As

Publication number Publication date
CN113627146A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
Bakhtin et al. Real or fake? learning to discriminate machine from human generated text
CN107844469B (en) Text simplification method based on word vector query model
CN112733533B (en) Multi-modal named entity recognition method based on BERT model and text-image relation propagation
CN112000772B (en) Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer
CN113254604B (en) Reference specification-based professional text generation method and device
CN112926337B (en) End-to-end aspect level emotion analysis method combined with reconstructed syntax information
CN114627162A (en) Multimodal dense video description method based on video context information fusion
CN115017299A (en) Unsupervised social media summarization method based on de-noised image self-encoder
CN110851584A (en) Accurate recommendation system and method for legal provision
CN113111152A (en) Depression detection method based on knowledge distillation and emotion integration model
CN115392252A (en) Entity identification method integrating self-attention and hierarchical residual error memory network
CN116821291A (en) Question-answering method and system based on knowledge graph embedding and language model alternate learning
CN115905487A (en) Document question and answer method, system, electronic equipment and storage medium
CN111597316A (en) Multi-stage attention answer selection method fusing semantics and question key information
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN113627146B (en) Knowledge constraint-based two-step refute a rumour text generation method
KR102452814B1 (en) Methods for analyzing and extracting issues in documents
CN115840884A (en) Sample selection method, device, equipment and medium
CN115588486A (en) Traditional Chinese medicine diagnosis generating device based on Transformer and application thereof
CN115238705A (en) Semantic analysis result reordering method and system
CN115455144A (en) Data enhancement method of completion type space filling type for small sample intention recognition
CN114943216A (en) Case microblog attribute-level viewpoint mining method based on graph attention network
CN114036275B (en) Knowledge graph embedding multi-hop question-answering method
CN114970563B (en) Chinese question generation method and system fusing content and form diversity
CN118070775B (en) Performance evaluation method and device of abstract generation model and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant