CN112579583B

CN112579583B - Evidence and statement combined extraction method for fact detection

Info

Publication number: CN112579583B
Application number: CN202011467223.0A
Authority: CN
Inventors: 万海; 陈海城; 黄佳莉; 曾娟; 赵杭
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2022-07-29
Anticipated expiration: 2040-12-14
Also published as: CN112579583A

Abstract

The invention relates to a fact detection-oriented evidence and statement combined extraction method, which comprises the following steps: s1: appointing a prediction base for retrieval and a section of statement to be verified, cleaning the corpus, and performing entity extraction on the statement to obtain an entity set; s2: document retrieval: for a given statement, retrieving and constructing a corresponding candidate document set from the cleaned corpus by using an entity linking method according to the entity set, and taking all sentences in the set as a candidate sentence subset; s3: and constructing an evidence by an evidence search method based on a greedy strategy, and training and testing the evaluation model by using a pre-trained language model BERT as an evaluation model of the evidence to obtain the final target evidence and type. The invention can effectively improve the accuracy of evidence search.

Description

Evidence and statement combined extraction method for fact detection

Technical Field

The invention relates to the field of automatic fact detection, in particular to a fact detection-oriented evidence and statement combined extraction method.

Background

The purpose of automatic fact detection work is to enable a computer to automatically identify and filter false information in the Internet and guarantee the truth and reliability of the information. With the successful application of deep learning in natural language processing in recent years, more and more research efforts have attempted to incorporate deep learning techniques into automated fact checking efforts with good results. The fact detection task is one of the automatic fact detection tasks that are used to determine the authenticity of a given claim, involving two objects: (1) evidence mining, namely for a given statement, retrieving a sentence subset with the highest relevance to the statement from Wikipedia as evidence; (2) the assertion is checked, i.e. the assertion is classified according to evidence. This task comprises the traditional three-stage pipelined subtasks: document retrieval, evidence construction and statement checking. The input of the task is all documents on the declaration and the Wikipedia, and the output is labels of the evidence and the declaration, wherein the labels have three types, namely 'support/rejection/insufficient information', which sequentially shows that the declaration can be known to be true/false/the truth-false can not be judged through the evidence.

Since this task requires the retrieval of target evidence over about five million unstructured wikipedia documents, to narrow the search space, the fact detection task divides "evidence mining" into two phases, "document retrieval" and "evidence construction": the 'document retrieval' stage is used for retrieving a plurality of candidate documents possibly containing target evidence from five million documents; the "evidence construction" stage is used to screen out the set of sentences that constitute the target evidence among the several candidate documents. The problem to be solved in the "statement check" phase is to use the retrieved evidence to classify the statement.

For this task, there have been many works to achieve good results. For example, one work published at the AAAI-19 conference states that the traditional approach to semantic matching assertions and evidence is to project them into an artificially pre-designed feature word vector space where semantic matching is performed. The method considers that the feature vector space designed artificially has a large limit and cannot capture semantic information well, so that the method proposes to use a depth model to automatically learn the feature space for deep semantic matching. Therefore, a homogeneous neural-semantic matching network is respectively introduced into the document retrieval, the evidence construction and the statement verification, so that the semantic matching precision of each of the three stages is improved, and a good effect is achieved on the task; another work published on ACL-19 meetings primarily improves the "statement check" phase. It states that the traditional work is in the statement verification stage, only simply concatenates all sentences in the evidence or generates "statement-sentence" pairs as input, predicts the category of the statement, ignores the semantic links between different sentences, so it uses the pre-trained language model BERT to encode the semantic information of different sentences, then constructs a fully connected evidence graph network to perform message passing between sentences, and captures the potential semantic links.

This task comprises the traditional three-stage pipelined subtasks: document retrieval, evidence construction and statement checking. Most existing approaches follow this three-stage framework. However, the current methods have disadvantages, specifically:

in the stage of evidence construction, a score ranking method is adopted, namely each sentence is evaluated, and the 5 sentences with the highest evaluation scores are taken as the evidence, so that the sentences cannot find out accurate evidence, namely, many irrelevant sentences are introduced into the evidence, the quality of the evidence is reduced, and the manual verification is difficult.

Disclosure of Invention

The invention provides an evidence and statement combined extraction method for fact detection, aiming at overcoming the defect that evidence cannot be accurately searched in the process of the fact detection in the prior art.

The method comprises the following steps:

s1: appointing a prediction base for searching and a section of statement to be checked, cleaning the corpus and performing entity extraction on the statement;

s2: searching documents, namely searching and constructing a corresponding candidate document set from a corpus by using an entity linking method for a statement to be verified, and taking all sentences in the set as a candidate sentence subset;

S3: and an evidence mining and statement checking phase. In the stage, an evidence is constructed based on an evidence search method of a greedy strategy, and a pre-training language model BERT is used as an evaluation model of the evidence.

Wherein the evidence is a subset of the candidate sentence set, i.e. the sentences of the evidence are derived from the candidate sentence set.

The training and testing processes of the evaluation model at this stage are respectively as follows:

s3.1: and (5) training the process. The searching scheme based on the greedy strategy is converted into six equivalent constraints, and in order to enable the evaluation model to learn the six constraints, the method further converts the six equivalent constraints into six corresponding loss objective functions.

Constructing training samples and testing samples corresponding to six kinds of constraints according to existing marking evidences and candidate sentence sets in the data set;

for each instance in the training data, it must satisfy one or more of the constraints. Substituting the training samples into the objective functions corresponding to the constraints which are met by the training samples to calculate corresponding loss values, and then updating parameters of the evaluation model by using a random gradient descent method based on the loss values;

s3.2: and (5) predicting the flow. For a given test case, an evidence search method based on a greedy strategy is adopted to iteratively construct the evidence. During each iterative search, obtaining the evidence and the category where the highest score is located as the prediction evidence and the category of the current iteration; the candidate evidence of the next iteration is composed of the prediction evidence obtained in the last iteration and a candidate sentence. The condition for stopping the iteration is that the number of sentences contained in the predicted evidence reaches a given threshold. Thus, in each iteration, a prediction evidence, prediction category, and the highest score corresponding to that stage are obtained. The method selects the highest scoring one of the predicted evidence and categories as the final target evidence and category.

The training examples corresponding to the six constraints are constructed in the following modes:

giving a statement c to be checked in the training set, marking the corresponding labeled type y of the statement, and marking evidence

And candidate sentence set S ═ S { [ S ] ₁ ,s ₂ ,…,s _N And constructing a training sample by the following method:

for constraint one, if y is equal to N, the label category of the declaration is "it cannot be established that the declaration is truePseudo ", the training examples of the constraint are all the singleton subsets in S, i.e. the training example set is T ₁ ＝{{s _i }:s _i E.g., S), where S _i Is a training example of the constraint;

for constraint two, if y is T or y is F, i.e. the labeled class of the statement is "declare true" or "declare false", the training examples of the constraint are all the single element subsets of e, i.e. the training example set is

Wherein

Is a training example of the constraint;

for constraint three, if y ═ T or y ═ F, i.e. the label category of the statement is "claim true" or "claim false", the training example of the constraint is e itself, i.e. the set of training examples is T ₃ E, where e is a training example of the constraint;

for constraint four, if y is T or y is F, i.e. the labeled category of the statement is "declare true" or "declare false", the training sample set of the constraint is

Wherein S _sub Is any subset of e, S _vsub Is any subset of S, and S _sub And S _vsub The number of the contained sentences is the same, and only one sentence is different. { S _sub ,S _vsub Is a training example of the constraint;

for constraint five, if y ═ T or y ═ F, that is, the labeled category of the statement is "claim true" or "claim false", the training sample set of the constraint is

Wherein S _sub Is any one of eA proper subset; { e, S _sub Is a training example of the constraint;

for constraint six, if y is T or y is F, that is, the labeled category of the statement is "declare true" or "declare false", the training sample set of the constraint is

Wherein S _sup Is any subset of S, and e is S _sup Is a proper subset of (1) and S _sup Only one sentence more than e. { e, S _sup Is a training example of the constraint.

Preferably, the step of cleaning the corpus in S1 is to perform text cleaning on all documents in the corpus, including removing stop words, low-frequency words and special symbols;

preferably, the entity extraction of the declaration in S2 is to extract all entities in the declaration using a hidden markov model-based method, including information such as organization name, person name, place name, etc.

Preferably, the process of linking entities in S2 is as follows: for a given declaration, a corresponding entity set may be obtained according to S1; and traversing all the documents in the corpus, and adding the document into the candidate document set if the title of the document contains any entity in the statement.

Preferably, in order to avoid the problem that the number of the sentences in the candidate document set is large due to the excessive number of the sentences, and further the searching efficiency is reduced, the invention designs an evidence searching method based on a greedy strategy, and the searching space is greatly reduced. The concrete flow of the evidence searching method based on the greedy strategy in the step is as follows:

step 1: setting the evidence currently looked up as

The current predicted category is

Target evidence

Object classes

All sentence subsets contained in the candidate document set are S ═ { S ═ S ₁ ,s ₂ ,…,s _N In which s is _i The ith sentence is represented, and the statement is c;

step 2: constructing a set of candidate evidences

Wherein

Representing the ith candidate evidence;

and step 3: evaluating each evidence in the candidate evidence set using the pre-trained language model BERT, i.e.

Wherein V ∈ R ^C Is a C-dimensional vector, C representing the number of categories;

and 4, step 4: the candidate evidence and category corresponding to the highest score is taken as the current evidence and prediction category, i.e.

And 5: if the current highest score is higher than the historical highest score, then the target evidence and target category are updated, i.e.

Step 6: removing sentences that have been selected as evidence from the set of candidate sentences, i.e.

And 7: if the number of sentences contained in the currently searched evidence reaches a given threshold value K, that is

The search is stopped and output

Otherwise, repeating the step 2 to the step 6;

preferably, in S3.1, in order for the evaluation model to correctly identify the target evidence and the category, the present invention converts the proposed search scheme into the following six constraints, and converts these constraints into equivalent loss functions for updating the parameters of the evaluation model. For a given data set D ═ tone<c _i ,S _i ,E _i ,y _i >I is more than or equal to 1 and less than or equal to N, wherein c _i ，S _i ，E _i And y _ i sequentially represents the ith statement, the candidate sentence set corresponding to the statement, the labeling evidence of the statement and the labeling category of the statement. For any sample in the dataset, it must satisfy one or more of the following constraints:

constraint one, if the labeling category y of a declaration is N, that is, "it is impossible to establish the authenticity of the declaration", then all candidate evidences corresponding to the declaration score higher on the N category than on other categories. The penalty function for this constraint is as follows:

wherein

Representing categories

Score of above, α ₁ Distance super ginseng is more than or equal to 0;

constraint two, if the declared annotation category y is T or y is F, that is, "declare true" or "declare false", then the score of the singleton subset of the annotation evidence corresponding to the declaration on the N category is lower than the scores on the T and F categories. The penalty function for this constraint is as follows:

Wherein alpha is ₂ Distance super parameter is more than or equal to 0;

constraint three, the score of the annotation evidence E on the annotation class y is higher than the score on the error class. The penalty function for this constraint is as follows:

wherein alpha is ₃ Distance super ginseng is more than or equal to 0;

constraint four, for any subset of the annotated evidence E, it scores higher than the scores of the other sets, which are consistent with the size of the subset, and have one and only one element as the element of the subset. The penalty function for this constraint is as follows:

wherein alpha is ₄ Distance super ginseng is more than or equal to 0;

constraint five, the score of the annotation evidence E on the annotation category y is higher than the scores of all proper subsets thereof. The penalty function for this constraint is as follows:

wherein alpha is ₅ Distance super ginseng is more than or equal to 0;

and constraint six, the score of the marking evidence E on the marking category y is higher than that of the true superset of the marking evidence E. The penalty function for this constraint is as follows:

wherein alpha is ₆ Distance greater than or equal to 0.

Preferably, the evaluation model optimization is to minimize the following loss function as an optimization target, and a stochastic gradient descent algorithm is used for optimization to complete model back propagation:

L＝L ₁ +L ₂ +L ₃ +L ₄ +L ₅ +L ₆

compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the traditional fact detection task is a segment pipeline task consisting of three subtasks of document retrieval, evidence construction and declaration checking. The invention simplifies the three-stage pipeline type framework, combines the evidence construction and the statement check into one stage, combines a large amount of prior language knowledge contained in the pre-training language model, and obtains better effect in the aspect of accurate evidence search.

In the traditional fact verification method, in the stage of evidence construction, a score ranking method is adopted, namely each sentence is evaluated, and 5 sentences with the highest evaluation score are taken as the evidence, so that the problems that the sentences cannot find accurate evidence exist in the sentences, namely a plurality of irrelevant sentences are introduced into the evidence, the quality of the evidence is reduced, and the manual verification is difficult. The method adopts an evidence search method based on a greedy strategy, and converts the method into an equivalent loss function for optimizing the evaluation model. The method can effectively search the accurate evidence and obtain better effect on the accurate evidence search.

Pre-trained language models have been widely applied to solve natural language inference problems. The invention fully utilizes a large amount of language prior knowledge contained in the pre-training language model, can more effectively encode the semantic information of the sentence, and is beneficial to improving the understanding of the model on the semantic relation between the evidence and the statement.

Drawings

FIG. 1 is a flowchart of a method for extracting evidence and declaration jointly for fact-oriented detection in embodiment 1.

FIG. 2 is a flow chart of the training phase.

FIG. 3 is a flowchart of an evidence search method based on a greedy strategy.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the present embodiments, certain elements of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

The embodiment provides a method for extracting evidence and statement jointly for fact detection, which comprises the following steps:

the method comprises the following steps:

s1: appointing a prediction base for retrieval and a section of statement to be verified, cleaning the corpus, and performing entity extraction on the statement to obtain an entity set;

s2: document retrieval: retrieving and constructing a corresponding candidate document set from the cleaned corpus according to the entity set by using an entity linking method for the statement to be checked, and taking all sentences in the set as a candidate sentence subset;

s3: constructing an evidence by an evidence search method based on a greedy strategy, and training and testing an evaluation model by using a pre-trained language model BERT as an evaluation model of the evidence to obtain a final target evidence and a final target category;

Cleaning the corpus in S1 refers to performing text cleaning on all documents in the corpus, including removing stop words, low-frequency words, and special symbols.

The extraction of the entity of the declaration means that all entities in the declaration are extracted by using a hidden Markov model-based method, and the information comprises organization name, person name and place name.

The entity link in S2 specifically includes:

obtaining a corresponding entity set according to the step S1; and traversing all the documents in the corpus, and adding the documents into the candidate document set if the titles of the documents contain any entity in the statement to be checked.

Training and testing the evaluation model in S3, comprising the following steps:

s3.1: converting a search scheme based on a greedy strategy into six equivalent constraints, so that an evaluation model can learn the six constraints, and converting the six constraints into six corresponding loss objective functions;

for each sample in the training data, at least one constraint must be satisfied;

Substituting the training samples into the objective functions corresponding to the constraints which are met by the training samples to calculate corresponding loss values, and then performing parameter optimization updating on the evaluation model by using a random gradient descent method based on the loss values;

s3.2: for a given test sample, an evidence search method based on a greedy strategy is adopted to iteratively construct an evidence:

during each iterative search, calculating the scores of all candidate sentences in the candidate sentence set by using a pre-trained language model BERT for each candidate sentence based on the currently searched evidence (the evidence is initialized to an empty set before iteration is not started), and then obtaining the candidate sentence with the highest score and the corresponding category;

updating the candidate sentence subset, namely deleting the selected candidate sentences from the candidate sentence set;

updating the evidence searched currently, namely adding the selected candidate sentences into the evidence searched currently;

and taking the currently searched evidence and the corresponding category as the prediction evidence and the prediction category obtained by the current iterative search.

Stopping iteration if the number of sentences contained in the currently searched evidence reaches a preset threshold value;

because a prediction evidence, a prediction category and the highest score corresponding to the stage are obtained in each iteration; therefore, the highest scoring one of the predicted evidence and the category is used as the final target evidence and the category.

In S3.1, the six constraints are respectively as follows:

wherein

Representing categories

Score of above, α ₁ Distance super ginseng is more than or equal to 0; d is a given data set D ═ tone<c _i ,S _i ,e _i ,y _i >:1≤i≤N}，c _i ，S _i ，e _i Y _ i sequentially represents the ith statement, a candidate sentence subset corresponding to the ith statement, a labeling evidence of the ith statement and a labeling type of the ith statement;

constraint two, if the labeling category y of a declaration is T or F, that is, "declare true" or "declare false", then the score of the singleton subset of labeling evidence corresponding to the declaration on the N category is lower than the scores on the T and F categories; the penalty function for this constraint is as follows:

wherein alpha is ₂ Distance super ginseng is more than or equal to 0;

constraint III, the score of the marking evidence e on the marking category y is higher than the score of the marking evidence e on the error category; the penalty function for this constraint is as follows:

wherein alpha is ₃ Distance super ginseng is more than or equal to 0;

constraint four, for any subset of the annotated evidence e, its score is higher than the scores of the other sets, which are consistent with the size of the subset, and only one element is the element of the subset. The penalty function for this constraint is as follows:

Wherein alpha is ₄ Distance super ginseng is more than or equal to 0;

constraint five, the score of the marking evidence e on the marking category y is higher than the scores of all proper subsets of the marking evidence e; the penalty function for this constraint is as follows:

wherein alpha is ₅ Distance super ginseng is more than or equal to 0;

constraint six, the score of the marking evidence e on the marking category y is higher than that of the true superset of the marking evidence e; the penalty function for this constraint is as follows:

wherein alpha is ₆ Distance greater than or equal to 0.

The evaluation model optimization is to minimize the following loss function as an optimization target, and a random gradient descent algorithm is used for optimization to complete the back propagation of the model:

L＝L ₁ +L ₂ +L ₃ +L ₄ +L ₅ +L ₆ 。

the evidence searching method based on the greedy strategy comprises the following steps:

step 1: set the evidence currently looked up as

The current prediction is of the class

Evidence of an object

Object classes

The candidate sentence set contained in the candidate document set is S ═ S ₁ ,s ₂ ,…,s _N In which s is _i The ith sentence is represented, and the statement is c;

step 2: constructing a set of candidate evidences

Wherein

Representing the ith candidate evidence;

and 4, step 4: the candidate evidence and the category corresponding to the highest score are made For the current evidence and prediction categories, i.e.

And 7: if the number of sentences contained in the currently searched evidence reaches a preset threshold value K, that is to say

The search is stopped and output

Otherwise, repeating the step 2 to the step 6.

The training examples corresponding to the six constraints are constructed in the following manner:

And a candidate sentence subset S ═ S ₁ ,s ₂ ,…,s _N And constructing a training sample by the following method:

for constraint one, if y is equal to N, namely the label category of the declaration is "false proof of declaration can not be established", the training examples of the constraint are all the single element subsets in S, namely the training example set is T ₁ ＝{{s _i }:s _i E.g., S), where S _i Is a training example of the constraint;

for constraint two, if y ═ T or y ═ F, then the label category of the claim is "claim true" or "soundMin is false ", the training examples of the constraint are all the singleton subsets of e, i.e., the set of training examples is

Wherein

Is a training example of the constraint;

Wherein S _sub Is any proper subset of e; { e, S _sub Is a training example of the constraint;

Wherein S _sup Is any subset of S, and e is S _sup Is a proper subset of (1) and S _sup Only one sentence more than e. { e, S _sup Is a training example of the constraint. The present embodiment is described below with reference to specific examples:

Given one example: statement c is "Giada at Home wa only available on DVD", with labeled category y of N and labeled evidence E of { s _e1 ,s _e2 In which s is _e1 Is "Giada at Home a television show and first aid on October 18,2008, on the Food Network. ", s _e2 Is "Food Network is an American basic cable and satellite telecommunications channel".

A data preprocessing stage, as shown in fig. 1, performing entity labeling on c to obtain an entity set { Giada at Home, DVD, Giada, Home }; and then, using a physical link technology to retrieve a candidate document set from the corpus, wherein the document title is { Giada _ at _ Home, DVD, Giada }, the text of the document "Giada _ at _ Home" has 3 sentences, the text of the document "DVD" has 2 sentences, the text of the document "Giada" has 4 sentences, and therefore the candidate sentence subset corresponding to c is S { S ═ S ₁ ,s ₂ ,…,s ₉ In which s is ₁ Is (Giada _ at _ Home,0) the first sentence representing the document "Giada _ at _ Home", the other s _i And so on.

In the training phase, as shown in fig. 2, the distance hyperparameter of each constraint is set to be 1. Constructing training data according to the candidate sentence subset S and the marking evidence E, wherein the construction process is as follows:

1. constructing a proper subset S of E _sub ＝{{s _e1 },{s _e2 And } the subset is required to satisfy constraint two and constraint five, so substituting it into the corresponding objective function calculates the corresponding loss value:

2. construction set S _vsub ＝{{s _e1 ,s _i }:s _i ∈S∧s _i ≠s _e1 ∧s _i ≠s _e2 }∪{{s _e2 ,s _i }:s _i ∈S∧s _i ≠s _e2 ∧s _i ≠s _e2 And the set S _sub E, they need to satisfy the constraint four, so substituting it into the corresponding objective function calculates the corresponding loss value:

3. true superset S of structure E _sup ＝{{s _e1 ,s _e2 ,s _i }:s _i ∈S∧s _i ≠s _e1 ∧s _i ≠s _e2 H, it and E need to satisfy the constraint six, so substituting it into the corresponding objective function calculates the corresponding loss value:

e should satisfy the constraint three, so substituting it into the corresponding objective function calculates the corresponding loss value:

based on the six loss values for the six constraints, a final target loss is calculated:

L＝L ₁ +L ₂ +L ₃ +L ₄ +L ₅ +L ₆

this loss is then used to perform a random gradient descent, updating the parameters of the evaluation model.

In the prediction stage, an evidence search method based on a greedy strategy is adopted for prediction, and as shown in fig. 3, a model prediction process is as follows:

step 1: set the evidence currently looked up as

The current prediction is of the class

Target evidence

Object classes

All sentence subsets contained in the candidate document set are S ═ { S ═ S ₁ ,s ₂ ,…,s ₉ The statement is c;

step 2: constructing a set of candidate evidences

Wherein

Representing the ith candidate evidence;

The search is stopped and output

Otherwise, repeating the step 2 to the step 6.

The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A method for extracting evidence and declaration combined for fact detection, the method comprising the following steps:

s2: document retrieval: retrieving and constructing a corresponding candidate document set from the cleaned corpus according to the entity set by using an entity linking method for the statement to be checked, and taking all sentences in the candidate document set as a candidate sentence subset;

the entity link is specifically: obtaining a corresponding entity set according to the step S1; traversing all documents in the corpus, and if the title of the document contains any entity in the statement to be checked, adding the document into the candidate document set;

wherein the evidence is a subset of the candidate sentence subset;

training and testing the evaluation model, comprising the following steps:

s3.1: converting a search scheme based on a greedy strategy into six equivalent constraints, and converting the six constraints into six corresponding loss objective functions;

Constructing training samples and testing samples corresponding to six constraints according to the existing marking evidence and candidate sentence sets in the data set;

the six constraints are respectively as follows:

constraint one, if the labeling category y of a statement is N, that is, "the truth of the statement can not be established", then the scores of all candidate evidences corresponding to the statement on the N category are higher than the scores on the other categories; the penalty function for this constraint is as follows:

wherein

Representing categories

Score of (a) f _N (. h) represents a score on category N, { s } represents a notational evidence with only one candidate sentence, α ₁ Distance super ginseng is more than or equal to 0; d is a given data set D ═ tone<c _i ,S _i ,e _i ,y _i >:1≤i≤N}，c _i ，S _i ，e _i ，y _i Sequentially representing the ith statement, a candidate sentence subset corresponding to the ith statement, a labeling evidence of the ith statement and a labeling category of the ith statement;

Wherein alpha is ₂ Distance super parameter is more than or equal to 0;

thirdly, the score of the marking evidence e on the marking category y is higher than that of the error category; the penalty function for this constraint is as follows:

wherein alpha is ₃ Distance super ginseng is more than or equal to 0;

constraint four, for any subset of the marked evidences e, the score is higher than the scores of other sets which are consistent with the size of the subset and only one element is not the element of the subset; the penalty function for this constraint is as follows:

wherein alpha is ₄ Distance greater than or equal to 0, S _sub Representing any subset of the annotated evidence e, S _vsub Is represented by the formula _sub Sets of the same size, with and only one element different;

wherein alpha is ₅ Not less than 0 is distance of super ginseng, S' _sub Any proper subset of the marked evidence e is represented;

wherein alpha is ₆ Distance super ginseng is more than or equal to 0;

During each iterative search, calculating the score of each candidate sentence in the candidate sentence set on all categories by using a pre-training language model BERT on the basis of the currently searched evidence, and then obtaining the candidate sentence with the highest score and the corresponding category;

updating the candidate sentence set, namely deleting the selected candidate sentences from the candidate sentence set;

updating the current searched evidence, namely adding the selected candidate sentences into the current searched evidence;

taking the currently searched evidence and the corresponding category as a prediction evidence and a prediction category obtained by current iterative search;

because a prediction evidence, a prediction category and the highest score corresponding to the iteration stage are obtained in each iteration; therefore, the highest score in the predicted evidence and the categories is taken as the final target evidence and the categories;

step 1: set the evidence currently looked up as

The current prediction is of the class

Target evidence

Object classes

The candidate sentence set contained in the candidate document set is S ═ S ₁ ,s ₂ ,…,s _N In which s _i The ith sentence is represented, and the statement is c;

and 2, step: constructing a set of candidate evidences

Wherein

Representing the ith candidate evidence;

The search is stopped and output

Otherwise, repeating the step 2 to the step 6.

2. The method for extracting evidence and declaration jointly oriented to fact detection according to claim 1, wherein cleansing the corpus in S1 is to perform text cleansing on all documents in the corpus, including removing stop words, low-frequency words and special symbols.

3. The method for extracting evidence and declaration jointly oriented to fact detection as claimed in claim 2, wherein the entity extraction of the declaration is to extract all entities in the declaration, including information of organization name, person name and place name, by using a hidden markov model-based method.

4. The fact-oriented detection evidence and statement combined extraction method as claimed in claim 1, wherein the evaluation model optimization is based on a minimization loss function as an optimization target, and a stochastic gradient descent algorithm is used for optimization to complete back propagation of the model.

5. The method for extracting evidence and declaration jointly oriented to fact detection according to claim 4, wherein the loss function is:

L＝L ₁ +L ₂ +L ₃ +L ₄ +L ₅ +L ₆ 。

6. the method for extracting evidence and statement jointly oriented to fact detection according to claim 5, wherein training examples corresponding to six constraints are constructed as follows:

given a statement c to be checked in the training set, the corresponding labeled category y of the statement, labeled evidence

Wherein

Is a training example of the constraint;

Wherein S _sub Is any subset of e, S _vsub Is any subset of S, and S _sub And S _vsub The number of the contained sentences is the same, and only one sentence is different; { S _sub ,S _vsub Is a training example of the constraint;

Wherein S' _sub Is any proper subset of e; { e, S' _sub Is a training example of the constraint;

Wherein S _sup Is any subset of S, and e is S _sup Is a proper subset of (1) and S _sup Only one sentence more than e; { e,S _sup Is a training example of the constraint.