CN110362735B - Method and device for judging the authenticity of a statement, electronic device, readable medium - Google Patents

Method and device for judging the authenticity of a statement, electronic device, readable medium Download PDF

Info

Publication number
CN110362735B
CN110362735B CN201910634928.8A CN201910634928A CN110362735B CN 110362735 B CN110362735 B CN 110362735B CN 201910634928 A CN201910634928 A CN 201910634928A CN 110362735 B CN110362735 B CN 110362735B
Authority
CN
China
Prior art keywords
statement
result
retrieval
candidate
results
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910634928.8A
Other languages
Chinese (zh)
Other versions
CN110362735A (en
Inventor
冯欣伟
戴松泰
余淼
周环宇
时鸿剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910634928.8A priority Critical patent/CN110362735B/en
Publication of CN110362735A publication Critical patent/CN110362735A/en
Application granted granted Critical
Publication of CN110362735B publication Critical patent/CN110362735B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosed embodiment provides a method for judging statement authenticity, which comprises the following steps: searching in a search engine by taking the statement as a search formula, and selecting a plurality of search results as candidate search results; and obtaining a confidence score representing the truth of the statement according to the support degree of the statement by each candidate retrieval result and the support degree of each candidate retrieval result by all other candidate retrieval results. The embodiment of the disclosure also provides a device, an electronic device and a computer readable medium for judging the authenticity of the statement.

Description

Method and device for judging the authenticity of a statement, electronic device, readable medium
Technical Field
The disclosed embodiments relate to the field of authenticity determination technology, and in particular, to a method and apparatus for determining statement authenticity, an electronic device, and a computer-readable medium.
Background
In reality, there are many "statements" such as "mr. B is an incumbent school of university a", "D is a holding time of conference C", "E% is a current basic loan interest rate", etc., and these statements may or may not be true (false).
In many cases, it is important to determine whether a statement is true. For example, in an automated question-and-answer system, it is necessary to determine whether a given answer (as derived from a given document by reading comprehension) is currently authentic; as another example, in the field of automated decision-making, the decision-making must be based on a true statement (e.g., the decision whether to loan is based on the current underlying loan rate).
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for judging statement authenticity, an electronic device and a computer readable medium.
In a first aspect, embodiments of the present disclosure provide a method for determining statement authenticity, which includes:
searching in a search engine by taking the statement as a search formula, and selecting a plurality of search results as candidate search results;
and obtaining a confidence score representing the truth of the statement according to the support degree of the statement to each candidate retrieval result and the support degree of each candidate retrieval result to all other candidate retrieval results.
In some embodiments, said retrieving in the retrieval engine with said statement as a retrievable form, the selecting a plurality of candidate retrieval results from the obtained retrieval results comprises:
and retrieving in an intelligent retrieval engine by taking the statement as a retrieval formula to obtain a plurality of ordered retrieval results, and selecting a plurality of retrieval results with pre-positioned positions before ranking as candidate retrieval results.
In some embodiments, said retrieving in the retrieval engine with said statement as a retrievable form, the selecting a plurality of candidate retrieval results from the obtained retrieval results comprises:
and taking the statement as a retrieval formula, respectively retrieving in a plurality of retrieval engines, and selecting at least one retrieval result of each retrieval engine as a candidate retrieval result.
In some embodiments, after said obtaining a confidence score representing said statement authenticity, further comprising:
and judging whether the confidence score exceeds a first threshold value, if so, judging that the statement is real, and if not, judging that the statement is not real.
In some embodiments, before said retrieving in the retrieval engine with said statement as a retrievable form, further comprising:
carrying out initial search in a search engine by taking the statement as a search formula, and selecting a plurality of initial candidate search results from the obtained search results;
obtaining an initial score representing the truth of the statement according to the support degree of the statement by each initial candidate retrieval result and the support degree of each initial candidate retrieval result by all other initial candidate retrieval results;
after said obtaining a confidence score representing said statement authenticity, further comprising:
and judging whether the difference obtained by subtracting the confidence score from the initial score exceeds a second threshold value, if so, judging that the statement is not real, and if not, judging that the statement is real.
In some embodiments, said deriving a confidence score representing said statement authenticity based on said measure of support of said statement by each candidate search result and said measure of support of each candidate search result by all other candidate search results comprises:
obtaining a statement vector representation representing the content features of the statement, and respectively obtaining a plurality of result vector representations representing the content features of each candidate retrieval result;
according to the statement vector representation and the result vector representation, respectively obtaining a plurality of statement-result vector representations representing the support degree of the statement by each candidate retrieval result;
according to the statement-result vector representations, respectively obtaining result-result vector representations representing the support degree of each candidate retrieval result by all other candidate retrieval results;
the confidence score is derived from each of the statement-result vector representations and the result-result vector representations.
In some embodiments, said deriving a plurality of statement-result vector representations representing degrees of support of said statement by each candidate retrieval result, respectively, from said statement vector representation and result vector representation comprises:
processing the statement vector representations and each result vector representation respectively through a bidirectional attention mechanism to obtain a plurality of sets of corresponding intermediate result vector representations and intermediate statement vector representations;
stitching each set of corresponding intermediate result vector representations and intermediate statement vector representations respectively through a self-attention mechanism to obtain a plurality of statement-result vector representations;
the obtaining, according to the plurality of statement-result vector representations, a result-result vector representation representing a degree of support of each candidate search result by all other candidate search results, respectively, includes:
each statement-result vector representation is processed separately from all other statement-result vector representations by an attention mechanism, resulting in a plurality of said result-result vector representations.
In a second aspect, embodiments of the present disclosure provide an apparatus for determining the authenticity of a statement, comprising:
the retrieval module is used for retrieving in a retrieval engine by taking the statement as a retrieval formula and selecting a plurality of retrieval results as candidate retrieval results;
and the confidence score module is used for obtaining a confidence score representing the truth of the statement according to the support degree of the statement by each candidate retrieval result and the support degree of each candidate retrieval result by all other candidate retrieval results.
In some embodiments, the retrieval module is configured to retrieve in an intelligent retrieval engine using the statement as a retrieval formula to obtain a plurality of ranked retrieval results, and select a plurality of retrieval results with pre-predetermined positions before ranking as candidate retrieval results.
In some embodiments, the retrieval module is configured to perform retrieval in a plurality of retrieval engines respectively by using the statement as a retrieval formula, and select at least one of the retrieval results of each retrieval engine as a candidate retrieval result.
In some embodiments, the apparatus further comprises:
and the first judging module is used for judging whether the confidence score exceeds a first threshold value, if so, judging that the statement is real, and if not, judging that the statement is not real.
In some embodiments, the apparatus further comprises:
an initial sub-module, which is used for carrying out initial search in a search engine by taking the statement as a search formula, selecting a plurality of initial candidate search results from the obtained search results, and obtaining an initial sub-score representing the truth of the statement according to the support degree of the statement to each initial candidate search result and the support degree of each initial candidate search result to all other initial candidate search results;
and the second judgment module is used for judging whether the difference obtained by subtracting the confidence score from the initial score exceeds a second threshold value, if so, judging that the statement is not real, and otherwise, judging that the statement is real.
In some embodiments, the confidence score module comprises:
a vectorization unit, configured to obtain statement vector representations representing content features of the statements, and obtain a plurality of result vector representations representing content features of each candidate search result, respectively;
a statement-result unit for obtaining a plurality of statement-result vector representations representing the degrees of support of the statements by each candidate retrieval result, respectively, based on the statement vector representation and the result vector representation;
a result-result unit for obtaining, based on a plurality of said statement-result vector representations, a result-result vector representation representing the degree of support of each candidate search result by all other candidate search results, respectively;
and the confidence score unit is used for obtaining the confidence score according to the statement-result vector representation and the result-result vector representation.
In some embodiments, the statement-result unit is configured to process the statement vector representation and each result vector representation separately by a two-way attention mechanism, resulting in a plurality of sets of corresponding intermediate result vector representations and intermediate statement vector representations, and to concatenate each set of corresponding intermediate result vector representations and intermediate statement vector representations separately by a self-attention mechanism, resulting in a plurality of the statement-result vector representations;
the result-result representation unit is adapted to process each statement-result vector representation with all other statement-result vector representations, respectively, by means of an attention mechanism, resulting in a plurality of said result-result vector representations.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
one or more processors;
memory having one or more programs stored thereon that, when executed by the one or more processors, cause the one or more processors to implement any of the above methods of determining the authenticity of a statement.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium having a computer program stored thereon, which when executed by a processor, implement any of the above-mentioned methods of determining the authenticity of a statement.
In the embodiment of the disclosure, the basis for judging the authenticity of the statement (candidate search result) is obtained by searching in the search engine, and the candidate search result comes from a large number of different websites of the public network and has stronger reliability and timeliness as a whole, so that compared with a mode of taking a specific authoritative website or news as a basis judgment basis, the mode of the embodiment of the disclosure avoids dependence on individual information, and has high judgment accuracy and high universality.
Further, data in public networks is constantly updated over time, and where emerging data is mostly for the "newest" fact: for example, when the university of a is mr. B, the newly appeared content about the university of a in the public network is mostly related to mr. B; when the university school of A becomes that of F first, the newly appeared contents about the university school of A in the public network also become mostly related to Mr. F, and the contents about Mr. B do not or rarely appear. Therefore, the candidate retrieval result also has strong timeliness, namely, whether the statement is true at the present time (namely, when the process of the embodiment of the disclosure is carried out) can be represented well, and therefore the timeliness of judgment is improved.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:
FIG. 1 is a flow chart of a method of determining the authenticity of a statement provided by an embodiment of the present disclosure;
FIG. 2 is a flow chart of some of the steps in another method of determining the authenticity of a statement provided by an embodiment of the present disclosure;
FIG. 3 is a flow chart of some of the steps in another method of determining the authenticity of a statement provided by an embodiment of the present disclosure;
FIG. 4 is a flow chart of another method of determining statement authenticity provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart of another method of determining statement authenticity provided by an embodiment of the present disclosure;
FIG. 6 is a flow chart of some of the steps in another method of determining the authenticity of a statement provided by an embodiment of the present disclosure;
FIG. 7 is a flow chart of some of the steps in another method for determining statement authenticity provided by an embodiment of the present disclosure
FIG. 8 is a block diagram illustrating an apparatus for determining the authenticity of statements provided by embodiments of the present disclosure;
FIG. 9 is a block diagram illustrating an alternative apparatus for determining statement authenticity provided by embodiments of the present disclosure;
fig. 10 is a block diagram of another apparatus for determining statement authenticity provided by an embodiment of the disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those skilled in the art, the method and apparatus for determining the authenticity of a statement, an electronic device, and a computer-readable medium provided in the present disclosure are described in detail below with reference to the accompanying drawings.
Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but which may be embodied in different forms and should not be construed as limited to the embodiments set forth in the disclosure. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As used in this disclosure, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The terminology used in the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
When the terms "comprises" and/or "comprising … …" are used in this disclosure, the presence of stated features, integers, steps, operations, elements, and/or components are specified, but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Embodiments of the present disclosure may be described with reference to plan and/or cross-sectional views in light of idealized schematic illustrations of the present disclosure. Accordingly, the example illustrations can be modified in accordance with manufacturing techniques and/or tolerances.
Embodiments of the present disclosure are not limited to the embodiments shown in the drawings, but include modifications of configurations formed based on a manufacturing process. Thus, the regions illustrated in the figures have schematic properties, and the shapes of the regions shown in the figures illustrate specific shapes of regions of elements, but are not intended to be limiting.
Unless otherwise defined, all terms (including technical and scientific terms) used in this disclosure have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Description of technical terms
In the embodiments of the present disclosure, unless otherwise specified, the following technical terms should be understood in accordance with the following explanations:
statements, which are intended to be descriptive of a certain concept or definition, are necessarily descriptive of a certain type and not of a certain type; of course, questions may also be included in the statement, for example, the statement may consist of a question and an answer to the question.
The veracity (or reliability, credibility, etc.) of the statement that it is apparent that the concept or definition that the statement represents may or may not be true at the present time (e.g., the statement "the earth revolves around the sun"), and; to this end, the authenticity of a statement represents the extent or likelihood that in the real world, the concept or definition that the statement represents is currently "true" or "factual". Specifically, authenticity may be represented by a confidence score (e.g., a value between 0 and 1), with higher confidence scores representing a higher likelihood of stating authenticity; alternatively, authenticity may also be represented by a statement as "true" or as "false", i.e., one of two cases where a statement must necessarily be "true" or "false".
Search engine, which refers to a tool for finding search results (e.g., web pages) related to a search formula in a public network.
FIG. 1 is a flow chart of a method of determining the authenticity of a statement according to an embodiment of the disclosure.
In a first aspect, referring to fig. 1, an embodiment of the present disclosure provides a method of determining statement authenticity, comprising:
s101, searching in a search engine by using the statement as a search formula, and selecting a plurality of search results as candidate search results.
In this step, a statement is used as a search formula (Query), a search is performed in a specific search engine to obtain a plurality of search results related to the statement, and then a plurality of search results are selected from the search results to serve as candidate search results for judging the authenticity of the statement.
Where a statement is intended to refer to a word or words used to describe a certain concept or definition, the statement is necessarily a positive description and not a question. Of course, the concepts or definitions represented by the statements may or may not be realistic at the present time (e.g., the statement "the earth revolves around the sun"), and may or may not be realistic (e.g., the statement "the sun revolves around the earth").
The particular forms set forth, as well as the applications of the embodiments of the present disclosure, may be varied.
For example, the statement may be composed of a question and an answer to the question, which may be an answer to the question obtained by the automatic question and answer system, and thus, determining the authenticity of the statement is equivalent to determining the correctness of the answer obtained by the automatic question and answer system, i.e., the embodiments of the present disclosure may be used for automatic question and answer. Specifically, it is assumed that the question is "who the incumbent school leader of university a is", and an automatic question-and-answer system gives an answer of "mr B" based on the question, it is stated that "who the incumbent school leader of university a is, mr B".
Alternatively, the statement may be a direct description, such as "the incumbent university of A is Mr.B," etc.
Specifically, the above statement may be a word selected from an article, or a manually inputted word, etc., and after verifying the authenticity of the word, the word may be used as a basis for an automatic decision system, etc., i.e., the embodiment of the present disclosure may be used for automatic decision.
Specifically, the above statements may also be derived from the search results, for example, after a plurality of search results are obtained through keyword search, if the representative meanings of different search results are different, it is not possible to determine which search results are correct, so that a part (such as a title) of the search results may be used as the statements to verify which search results are correct, that is, the embodiment of the present disclosure may be used to verify the correctness of the search results.
And S102, obtaining a confidence score representing the truth of the statement according to the support degree of the statement to each candidate retrieval result and the support degree of each candidate retrieval result to all other candidate retrieval results.
Wherein the confidence score is representative of the statement authenticity, the greater the value, the greater the authenticity of the statement. For example, the confidence score may be a value between 0 and 1, with 0 representing the least plausibility of the statement and 1 representing the most plausibility of the statement.
In this step, from the perspective of text content, the support degree (or similarity, matching degree) of each candidate retrieval result to the statement is calculated, the support degree of each candidate retrieval result to all other candidate retrieval results is calculated, and a confidence score representing the truth of the statement is calculated according to the support degrees.
It can be seen that if a statement is currently true, most candidate search results obtained by taking the statement as a search formula should represent the meaning similar to the statement, i.e. have higher support on the statement (i.e. the candidate search results are all similar to the statement); similarly, these candidate search results representing similar meanings should have higher support (i.e. different candidate search results are relatively similar). Therefore, according to the support degrees, a confidence score representing the statement authenticity can be determined and used as a basis for judging the statement authenticity.
In the embodiment of the present disclosure, the basis for judging the authenticity of the statement (candidate search result) is obtained by searching in the search engine, and since the candidate search result comes from a large number of different websites in the public network and has a strong reliability as a whole, compared with a mode of taking a specific authoritative website or news as a basis for judging, the mode of the embodiment of the present disclosure avoids dependence on individual information, and has high accuracy of judgment and high universality.
Further, data in public networks is constantly updated over time, and where the emerging data is mostly for the "newest" fact: for example, when the university of a is mr. B, the newly appeared content about the university of a in the public network is mostly related to mr. B; when the university school of A becomes that of F first, the newly appeared contents about the university school of A in the public network also become mostly related to Mr. F, and the contents about Mr. B do not or rarely appear. Therefore, the candidate retrieval result also has strong timeliness, namely, whether the statement is true at the present time (namely, when the process of the embodiment of the disclosure is carried out) can be represented well, and therefore the timeliness of judgment is improved.
Referring to fig. 2, in some embodiments, performing a search in a search engine in a statement as a search formula, and selecting a plurality of search results as candidate search results from the obtained search results (step S101) includes:
and S1011, retrieving in the intelligent retrieval engine by taking the statement as a retrieval formula to obtain a plurality of ordered retrieval results, and selecting a plurality of retrieval results with pre-positioned positions before ranking as candidate retrieval results.
Searching can be carried out in the intelligent searching engine, and searching results of predetermined bits (such as TOP N) before ranking are selected as candidate searching results; because the intelligent retrieval engine can automatically sort the retrieval results according to a plurality of factors such as reliability, timeliness and the like, the retrieval results in the front have better accuracy, and the retrieval results are used as candidate retrieval results to obtain better judgment effect.
Referring to fig. 3, in some embodiments, performing a search in a search engine in a statement as a search formula, and selecting a plurality of search results as candidate search results from the obtained search results (step S101) includes:
s1012, taking the statement as a search formula, respectively searching in a plurality of search engines, and selecting at least one search result as a candidate search result in the search results of each search engine.
That is, the search may be performed in a plurality of different search engines, so that each search engine may obtain a plurality of search results, and the final candidate search result is also from the plurality of search engines (for example, the search result of the predetermined bit before the ranking in each search engine is taken as the candidate search result).
By simultaneously introducing a plurality of search engines, the diversity of search results can be further enriched, and the judgment accuracy is improved.
Referring to fig. 4, in some embodiments, after obtaining a confidence score (S102) representative of the statement authenticity, further comprising:
and S1031, judging whether the confidence score exceeds a first threshold value, if so, judging that the statement is true, and if not, judging that the statement is not true.
That is, the statement may be determined to be authentic or not, based directly on whether the above confidence score exceeds a predetermined first threshold.
For example, if the confidence score is a value between 0 and 1, the first threshold may be set to 0.5 (although other specific values are possible), such that the statement is judged to be not authentic when the confidence score is less than or equal to 0.5, and authentic when the confidence score is greater than 0.5.
Referring to fig. 5, in some embodiments, before performing retrieval in the retrieval engine in a statement as a retrieval formula (S101), further comprising:
s1001, an initial search is performed in the search engine using the statement as a search formula, and a plurality of initial candidate search results are selected from the obtained search results.
S1002, obtaining an initial score representing the truth of the statement according to the support degree of the statement on each initial candidate retrieval result and the support degree of each initial candidate retrieval result on all other initial candidate retrieval results.
After obtaining a confidence score (S102) representing the statement authenticity, further comprising:
s1032, judging whether the difference obtained by subtracting the confidence score from the initial score exceeds a second threshold value, if so, judging that the statement is not true, and if not, judging that the statement is true.
That is, in the case that a statement is determined to be necessarily authentic (e.g., determined manually), the statement may be retrieved and the corresponding confidence score may be obtained as an initial score according to the method of the embodiment of the present disclosure, and the initial score may be recorded; after a period of time, the statement may be retrieved again (e.g., according to a predetermined period) and a corresponding real-time confidence score may be obtained, and it is determined whether a decrease in the confidence score obtained each time with respect to the initial score (i.e., a difference between the initial score and the confidence score) exceeds a second threshold, if so, the statement is determined to be no longer true, and if not, the statement is determined to be still true.
It is clear that some statements are volatile, i.e. they may be true within a certain time frame, but become false after an indefinite time has elapsed. For example, at a certain point in time, the president of university A is Mr. B, but the president is exchangeable and the time of the exchange cannot be determined, so the statement that "the president of university A is Mr. B" may be true at a certain point in time and may no longer be true after a certain period of time, i.e., the statement has "volatility".
Meanwhile, the attention degree, the social cognition unification degree, the included degree of the detected index engine and the like of different statements are different, so that the confidence scores of the statements with the same truth can have larger differences according to the same algorithm. For this reason, an initial score may be obtained if it is determined that a statement (preferably a statement having a volatility) is authentic, and then a determination may be made as to whether it remains authentic (or when it becomes unreal) based on the degree to which its confidence score has decreased from the initial score to improve the accuracy of the determination.
Of course, it should be understood that some statements are non-volatile, such as the statement "earth revolves around the sun" that is always true for a foreseeable time, but the above method may also be used to make a determination for the statement that is non-volatile (of course, its determination should always be true).
Referring to FIG. 6, in some embodiments, deriving a confidence score representative of the veracity of the statement based on the degree of support of the statement by each candidate search result and the degree of support of each candidate search result by all other candidate search results (S102) comprises:
and S1021, obtaining a statement vector representation representing the content feature of the statement, and respectively obtaining a plurality of result vector representations representing the content feature of each candidate retrieval result.
The statement and the candidate retrieval results are texts composed of words or phrases, and in order to facilitate the algorithm to process the texts, the texts can be firstly converted into the "representation (or expression and form)" of a vector expression, that is, the statement and each candidate retrieval result are respectively vectorized (the function of the statement and each candidate retrieval result is equivalent to an encoding layer or a representation layer).
For example, the step may specifically be to divide the statement and each candidate search result into words, replace the words with BERT Representations (Bidirectional encoded Representations of deformers), and then deepen text understanding through Bi-GRU (Bi-Gated recursive Unit) to obtain vector Representations (statement vector representation and result vector representation) of the statement and each candidate search result.
In general, each candidate search result includes a title and a content, so the title and the content of each candidate search result may be connected together to form a corpus (para), and then the corpus may be vectorized.
Of course, it should be understood that it is also feasible if only one of the title and content of each candidate search result is vectorized.
S1022, according to the statement vector representation and the result vector representation, a plurality of statement-result vector representations representing the support degree of the statement to each candidate retrieval result are respectively obtained.
The support degree of each candidate retrieval result to the statement is respectively calculated (the support degree is equivalent to the interaction layer in function), and the support degree is represented by the statement-result vector corresponding to the candidate retrieval result, so that a plurality of statement-result vector representations corresponding to the candidate retrieval results in a one-to-one mode are obtained.
Referring to fig. 7, in some embodiments, this step (S1022) may include:
s10221, processing the statement vector representation and each result vector representation by a bidirectional attention mechanism, respectively, resulting in a plurality of sets of corresponding intermediate result vector representations and intermediate statement vector representations.
S10222, concatenating each set of corresponding intermediate result vector representations and intermediate statement vector representations respectively through a self-attention mechanism to obtain a plurality of statement-result vector representations.
As one way to get the statement-result vector representation, the "Bi-Directional Attention Flow" and the "Self-Attention Mechanism (Self-Attention Mechanism)" may be utilized.
Among them, the Attention (Attention) mechanism of document 1 to document 2 is: each word (or word) in document 1(Doc1) and document 2(Doc2) is subjected to similarity calculation (such as dot product, concatenation, perceptron, etc.) to obtain a multidimensional similarity representation, then the similarities are normalized (such as by using a Softmax function) to a weight (or degree of importance) for each word, and then the weights and the corresponding words are multiplied to obtain a new vector representation of document 2.
More specifically, "bidirectional attention mechanism" refers to: respectively obtaining new vector representation (intermediate result vector representation) of each candidate retrieval result by taking the statement as a document 1 and taking each candidate retrieval result as a document 2; meanwhile, taking each candidate retrieval result as a document 2 and a statement as a document 1 respectively to obtain a new vector representation (middle statement vector representation) which is stated to be specific to each candidate retrieval result; wherein the intermediate result vector representation and the intermediate statement vector representation for one candidate retrieval result are represented as a "set".
More specifically, "self-attention mechanism" refers to: concatenating the corresponding intermediate result vector representation and intermediate statement vector representation (i.e., the two vector representations in a set) subject to the above two-way attention mechanism to yield a corresponding "statement-result vector representation"; here, document 1 and document 2 are the same at this time, and are both statement + candidate search results.
The process of this step is similar to "reading understanding", and "reading understanding" refers to finding the position of the key content most relevant to the statement (or finding the answer to the statement) in the candidate search result. The embodiments of the present disclosure are distinguished from "reading understanding" in that: in the embodiment of the disclosure, although the weight of each word is obtained, the position of the key content is not determined according to the weight, but the intermediate result vector representation and the intermediate statement vector representation are spliced to obtain the statement-result vector representation representing the support degree of each candidate retrieval result to the statement.
Therefore, the method is equivalent to the application of the reading understanding technology of the single text to the multi-text, and the operation of determining the position of the key content is eliminated, so that the deep learning technology can be fully utilized, and the heavy feature engineering with poor universality is avoided.
And S1023, respectively obtaining result-result vector representation representing the support degree of each candidate retrieval result by all other candidate retrieval results according to the plurality of statement-result vector representations.
The above statement-result vector representations are respectively directed to different candidate retrieval results, so that they also represent the contents of the candidate retrieval results, and therefore, the support degree of each statement-result vector representation by all other statement-result vectors can be continuously calculated, and a result-result vector representation for each candidate retrieval result is obtained.
For example, assuming that there are m candidate search results in total, and the statement-result vector corresponding to the nth candidate search result is denoted as Vec1(n), and the result-result vector corresponding to the nth candidate search result is denoted as Vec2(n), the Vec2(n) is calculated from Vec1(1), Vec1(2) … Vec1(n-1), and Vec1(n +1) … Vec1 (m). It should be understood that the result-result vector representation obtained at this time is one-to-one corresponding to the candidate search results, i.e., the total number of Vec2 is also m.
Referring to fig. 7, in some embodiments, the present step (S1023) may include:
s10231, each statement-result vector representation is processed with all other statement-result vector representations, respectively, by an attention mechanism, resulting in a plurality of result-result vector representations.
This step may also be performed using an attention mechanism, wherein the calculation process for each statement-result vector representation may specifically include: the statement-result vector is represented as a document 2, attention mechanism calculation is performed on each of the other statement-result vectors represented as a document 1 to obtain a plurality of results, dimension reduction processing (such as addition, maximum pooling and the like) is performed on the results to obtain an intermediate result representing the support degree of the other statement-result vector representations on the statement-result vector representations, and the intermediate result is spliced with the statement-result vector representations to obtain a result-result vector representation corresponding to the statement-result vector representation (or the candidate retrieval result).
And S1024, obtaining a confidence score according to the statement-result vector representation and the result-result vector representation.
A confidence score is derived based on the statement-result vector representation and the result-result vector representation for each candidate search result.
Specifically, the above vector representations can be processed by a Sigmoid activation function through a linear layer, so that a confidence score of 0-1 is obtained.
It should be appreciated that based on the confidence score, the determination of whether the statement is authentic may continue in the manner described above using the first threshold, the second threshold, and so on; alternatively, the confidence score may be used directly as a characterization to state authenticity.
It should be understood that the above procedure is equivalent to deriving a classification result (true class or false class) according to some information, and thus it can be a deep-learning classification model in nature, that is, the classification model can be preset with a layer structure (such as an input layer, a coding layer/presentation layer, an interaction layer, a linear layer, an output layer, etc.) for implementing the above functions, and train and adjust parameters therein through a preset large number of positive samples (determining true statements) and negative samples (determining false statements), so as to obtain a final classification model.
Of course, it should be understood that it is also feasible to implement the above operation process by using a plurality of preset algorithms, formulas and the like instead of using the deep learning classification model.
Fig. 8 is a block diagram of an apparatus for determining the authenticity of a statement in accordance with an embodiment of the disclosure.
In a second aspect, referring to fig. 8, an embodiment of the present disclosure provides an apparatus for judging the authenticity of a statement, including:
the retrieval module is used for retrieving in a retrieval engine by taking the statement as a retrieval formula and selecting a plurality of retrieval results as candidate retrieval results;
and the confidence score module is used for obtaining a confidence score representing the truth of the statement according to the support degree of the statement to each candidate retrieval result and the support degree of each candidate retrieval result to all other candidate retrieval results.
In some embodiments, the retrieval module is configured to retrieve in the intelligent retrieval engine using the statement as a retrieval formula to obtain a plurality of ranked retrieval results, and select a plurality of retrieval results with pre-predetermined positions before ranking as candidate retrieval results.
In some embodiments, the retrieval module is configured to perform retrieval in a plurality of retrieval engines respectively by using the statement as a retrieval formula, and select at least one of the retrieval results of each retrieval engine as a candidate retrieval result.
Referring to fig. 9, in some embodiments, the apparatus further comprises:
and the first judgment module is used for judging whether the confidence score exceeds a first threshold value, judging that the statement is true if the confidence score exceeds the first threshold value, and judging that the statement is not true if the confidence score does not exceed the first threshold value.
Referring to fig. 10, in some embodiments, the apparatus further comprises:
the initial sub-module is used for carrying out initial retrieval in a retrieval engine by taking the statement as a retrieval formula, selecting a plurality of initial candidate retrieval results from the retrieval results, and obtaining an initial sub-score representing the truth of the statement according to the support degree of the statement to each initial candidate retrieval result and the support degree of each initial candidate retrieval result to all other initial candidate retrieval results;
and the second judgment module is used for judging whether the difference obtained by subtracting the confidence score from the initial score exceeds a second threshold value, if so, the statement is not true, and if not, the statement is true.
Referring to fig. 9, 10, in some embodiments, the confidence score module includes:
a vectorization unit for obtaining a statement vector representation representing content features of the statement, and obtaining a plurality of result vector representations representing content features of each candidate retrieval result respectively;
a statement-result unit for deriving, based on the statement vector representation and the result vector representation, a plurality of statement-result vector representations representing degrees of support of the statements by each of the candidate retrieval results, respectively;
a result-result unit for obtaining, based on the plurality of statement-result vector representations, a result-result vector representation representing a degree of support of each candidate search result by all other candidate search results, respectively;
and the confidence score unit is used for obtaining a confidence score according to the statement-result vector representation and the result-result vector representation.
In some embodiments, the statement-result unit is configured to process the statement vector representation and each result vector representation separately by a two-way attention mechanism, resulting in a plurality of sets of corresponding intermediate result vector representations and intermediate statement vector representations, and to concatenate each set of corresponding intermediate result vector representations and intermediate statement vector representations separately by an attention mechanism, resulting in a plurality of the statement-result vector representations;
the result-result representation unit is adapted to process each statement-result vector representation with all other statement-result vector representations, respectively, by means of an attention mechanism, resulting in a plurality of result-result vector representations.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including:
one or more processors;
memory having one or more programs stored thereon that, when executed by the one or more processors, cause the one or more processors to implement any of the above methods of determining the authenticity of a statement.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable medium, on which a computer program is stored, where the program, when executed by a processor, implements any one of the above methods for determining the authenticity of a statement.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM (random access memory), ROM (read only memory), EEPROM (electrically erasable programmable read only memory), flash memory or other memory technology, CD-ROM (compact disc read only memory), Digital Versatile Discs (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The present disclosure has disclosed example embodiments, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims (10)

1. A method of determining the authenticity of a statement, comprising:
carrying out initial search in a search engine by taking the statement as a search formula, and selecting a plurality of initial candidate search results from the obtained search results;
obtaining an initial score representing the truth of the statement according to the support degree of the statement by each initial candidate retrieval result and the support degree of each initial candidate retrieval result by all other initial candidate retrieval results;
searching in a search engine by taking the statement as a search formula, and selecting a plurality of search results as candidate search results;
obtaining a confidence score representing the truth of the statement according to the support degree of the statement by each candidate retrieval result and the support degree of each candidate retrieval result by all other candidate retrieval results;
judging whether the difference obtained by subtracting the confidence score from the initial score exceeds a second threshold value, if so, judging that the statement is not real, and if not, judging that the statement is real;
the obtaining a confidence score representing the truth of the statement according to the support degree of the statement by each candidate retrieval result and the support degree of each candidate retrieval result by all other candidate retrieval results comprises:
obtaining a statement vector representation representing the content features of the statement, and respectively obtaining a plurality of result vector representations representing the content features of each candidate retrieval result;
according to the statement vector representation and the result vector representation, respectively obtaining a plurality of statement-result vector representations representing the support degree of the statement by each candidate retrieval result;
according to the statement-result vector representations, respectively obtaining result-result vector representations representing the support degree of each candidate retrieval result by all other candidate retrieval results;
the confidence score is derived from each of the statement-result vector representations and the result-result vector representations.
2. The method of claim 1, wherein the retrieving in the retrieval engine with the statement as a retrieval formula, and selecting a plurality of candidate retrieval results from the obtained retrieval results comprises:
and retrieving in an intelligent retrieval engine by taking the statement as a retrieval formula to obtain a plurality of ordered retrieval results, and selecting a plurality of retrieval results with pre-positioned positions before ranking as candidate retrieval results.
3. The method of claim 1, wherein the retrieving in the retrieval engine with the statement as a retrieval formula, and selecting a plurality of candidate retrieval results from the obtained retrieval results comprises:
and taking the statement as a retrieval formula, respectively retrieving in a plurality of retrieval engines, and selecting at least one retrieval result as a candidate retrieval result from the retrieval results of each retrieval engine.
4. The method of claim 1, wherein,
the obtaining a plurality of statement-result vector representations representing the support degree of the statement by each candidate retrieval result respectively according to the statement vector representation and the result vector representation comprises:
processing the statement vector representations and each result vector representation respectively through a bidirectional attention mechanism to obtain a plurality of groups of corresponding intermediate result vector representations and intermediate statement vector representations;
stitching each set of corresponding intermediate result vector representations and intermediate statement vector representations respectively through a self-attention mechanism to obtain a plurality of statement-result vector representations;
the obtaining, according to the plurality of statement-result vector representations, a result-result vector representation representing a degree of support of each candidate search result by all other candidate search results, respectively, includes:
each statement-result vector representation is processed separately from all other statement-result vector representations by an attention mechanism, resulting in a plurality of said result-result vector representations.
5. An apparatus for determining the authenticity of a statement, comprising:
an initial sub-module, which is used for carrying out initial search in a search engine by taking the statement as a search formula, selecting a plurality of initial candidate search results from the obtained search results, and obtaining an initial sub-score representing the truth of the statement according to the support degree of the statement to each initial candidate search result and the support degree of each initial candidate search result to all other initial candidate search results;
the retrieval module is used for retrieving in a retrieval engine by taking the statement as a retrieval formula and selecting a plurality of retrieval results as candidate retrieval results;
the confidence score module is used for obtaining a confidence score representing the truth of the statement according to the support degree of the statement by each candidate retrieval result and the support degree of each candidate retrieval result by all other candidate retrieval results;
the second judgment module is used for judging whether the difference obtained by subtracting the confidence score from the initial score exceeds a second threshold value, if so, the statement is judged not to be real, and if not, the statement is judged to be real;
the confidence score module comprises:
a vectorization unit configured to obtain a statement vector representation representing content features of the statement, and obtain a plurality of result vector representations representing content features of each of the candidate search results, respectively;
a statement-result unit for obtaining a plurality of statement-result vector representations representing the degrees of support of the statements by each candidate retrieval result, respectively, based on the statement vector representation and the result vector representation;
a result-result unit for obtaining, based on a plurality of said statement-result vector representations, a result-result vector representation representing a degree of support of each candidate search result by all other candidate search results, respectively;
and the confidence score unit is used for obtaining the confidence score according to the statement-result vector representation and the result-result vector representation.
6. The apparatus of claim 5, wherein,
the retrieval module is used for retrieving in the intelligent retrieval engine by taking the statement as a retrieval formula to obtain a plurality of ordered retrieval results, and selecting a plurality of retrieval results with pre-positioned positions before ranking as candidate retrieval results.
7. The apparatus of claim 5, wherein,
the retrieval module is used for taking the statement as a retrieval formula, respectively retrieving in a plurality of retrieval engines, and selecting at least one retrieval result as a candidate retrieval result from the retrieval results of each retrieval engine.
8. The apparatus of claim 5, wherein,
the statement-result unit is used for processing the statement vector representation and each result vector representation respectively through a bidirectional attention mechanism to obtain multiple groups of corresponding intermediate result vector representations and intermediate statement vector representations, and splicing each group of corresponding intermediate result vector representations and intermediate statement vector representations respectively through a self-attention mechanism to obtain multiple statement-result vector representations;
the result-result representation unit is adapted to process each statement-result vector representation with all other statement-result vector representations, respectively, by means of an attention mechanism, resulting in a plurality of said result-result vector representations.
9. An electronic device, comprising:
one or more processors;
memory having one or more programs stored thereon which, when executed by the one or more processors, cause the one or more processors to implement a method of determining the authenticity of a statement according to any of claims 1 to 4.
10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method of judging the authenticity of a statement according to any one of claims 1 to 4.
CN201910634928.8A 2019-07-15 2019-07-15 Method and device for judging the authenticity of a statement, electronic device, readable medium Active CN110362735B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910634928.8A CN110362735B (en) 2019-07-15 2019-07-15 Method and device for judging the authenticity of a statement, electronic device, readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910634928.8A CN110362735B (en) 2019-07-15 2019-07-15 Method and device for judging the authenticity of a statement, electronic device, readable medium

Publications (2)

Publication Number Publication Date
CN110362735A CN110362735A (en) 2019-10-22
CN110362735B true CN110362735B (en) 2022-05-13

Family

ID=68219293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910634928.8A Active CN110362735B (en) 2019-07-15 2019-07-15 Method and device for judging the authenticity of a statement, electronic device, readable medium

Country Status (1)

Country Link
CN (1) CN110362735B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737580B (en) * 2020-06-30 2021-01-29 深圳市中电网络技术有限公司 Information verification method and device, computer equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630788A (en) * 2014-10-28 2016-06-01 佳能株式会社 Method and device for determining approximate judgment with distinctive truth
CN108509463A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of answer method and device of problem
CN109033262A (en) * 2018-07-09 2018-12-18 北京寻领科技有限公司 Question and answer knowledge base update method and device
CN109783631A (en) * 2019-02-02 2019-05-21 北京百度网讯科技有限公司 Method of calibration, device, computer equipment and the storage medium of community's question and answer data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10073845B2 (en) * 2015-01-07 2018-09-11 International Business Machines Corporation Estimating article publication dates and authors based on social media context

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630788A (en) * 2014-10-28 2016-06-01 佳能株式会社 Method and device for determining approximate judgment with distinctive truth
CN108509463A (en) * 2017-02-28 2018-09-07 华为技术有限公司 A kind of answer method and device of problem
CN109033262A (en) * 2018-07-09 2018-12-18 北京寻领科技有限公司 Question and answer knowledge base update method and device
CN109783631A (en) * 2019-02-02 2019-05-21 北京百度网讯科技有限公司 Method of calibration, device, computer equipment and the storage medium of community's question and answer data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Detecting Rumor and Disinformation by Web Mining";Boris Galitsky;《Sociotechnical Behavior Mining: From Data to Decisions? Papers from the 2015 AAAI Spring Symposium》;20160915;第16-23页 *

Also Published As

Publication number Publication date
CN110362735A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
US9449271B2 (en) Classifying resources using a deep network
US11868724B2 (en) Generating author vectors
CN106339756B (en) Generation method, searching method and the device of training data
Al-Yahya et al. Arabic fake news detection: comparative study of neural networks and transformer-based approaches
CN105144164B (en) Scoring concept terms using a deep network
WO2016210268A1 (en) Selecting representative video frames for videos
CN111263238B (en) Method and equipment for generating video comments based on artificial intelligence
CN112528010B (en) Knowledge recommendation method and device, computer equipment and readable storage medium
CN111310464A (en) Word vector acquisition model generation method and device and word vector acquisition method and device
CN113011172A (en) Text processing method and device, computer equipment and storage medium
CN114625858A (en) Intelligent government affair question-answer replying method and device based on neural network
CN116150399A (en) Knowledge graph question-answering method, electronic equipment and storage medium
CN110362735B (en) Method and device for judging the authenticity of a statement, electronic device, readable medium
US20240119078A1 (en) Sorting documents according to comprehensibility scores determined for the documents
CN114329051A (en) Data information identification method, device, equipment, storage medium and program product
CN106021346B (en) Retrieval processing method and device
US20210011921A1 (en) Method, apparatus, electronic device and computer readable medium for obtaining answer to question
CN115203388A (en) Machine reading understanding method and device, computer equipment and storage medium
CN114510561A (en) Answer selection method, device, equipment and storage medium
CN114329181A (en) Question recommendation method and device and electronic equipment
CN113688633A (en) Outline determination method and device
CN112395405B (en) Query document sorting method and device and electronic equipment
CN111026850A (en) Intellectual property matching technology of bidirectional coding representation of self-attention mechanism
CN111522903A (en) Deep hash retrieval method, equipment and medium
Putra et al. Influence of Sentiment on Mandiri Bank Stocks (BMRI) Using Feature Expansion with FastText and Logistic Regression Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant