WO2019019969A1 - 知识验证方法、知识验证设备以及存储介质 - Google Patents

知识验证方法、知识验证设备以及存储介质 Download PDF

Info

Publication number
WO2019019969A1
WO2019019969A1 PCT/CN2018/096652 CN2018096652W WO2019019969A1 WO 2019019969 A1 WO2019019969 A1 WO 2019019969A1 CN 2018096652 W CN2018096652 W CN 2018096652W WO 2019019969 A1 WO2019019969 A1 WO 2019019969A1
Authority
WO
WIPO (PCT)
Prior art keywords
knowledge
evidence
conflict
target
candidate knowledge
Prior art date
Application number
PCT/CN2018/096652
Other languages
English (en)
French (fr)
Inventor
张振中
陈雪
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to EP18838259.2A priority Critical patent/EP3660693A4/en
Publication of WO2019019969A1 publication Critical patent/WO2019019969A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/046Forward inferencing; Production systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • Embodiments of the present disclosure relate to a knowledge verification method, a knowledge verification device, and a storage medium.
  • Big data is a collection of data that cannot be captured, managed, and processed with regular software tools over a period of time. Big data has the characteristics of large data size, multiple data types, fast data processing speed, and low data value density.
  • Big data includes structured, semi-structured, and unstructured data.
  • unstructured data has grown exponentially due to its large data, numerous types, and strong timeliness. Unstructured data has gradually become the era of big data. Mainstream data.
  • At least one embodiment of the present disclosure provides a knowledge verification method, including: acquiring target candidate knowledge and conflict candidate knowledge that contradicts the target candidate knowledge; acquiring target evidence set of the target candidate knowledge and the conflict candidate knowledge a conflict evidence group; calculating a verification probability of the target candidate knowledge based on a logic rule of each evidence in the target evidence group, and calculating a verification probability of the conflict candidate knowledge based on a logic rule of each evidence in the conflict evidence group And comparing the verification probability of the target candidate knowledge with the verification probability of the conflict candidate knowledge, and determining whether the target candidate knowledge is correct knowledge according to the comparison result.
  • At least one embodiment of the present disclosure also provides a knowledge verification apparatus including a processor and a memory for storing non-transitory computer instructions that, when executed by the processor, perform the following operations: Obtaining target candidate knowledge and conflict candidate knowledge that contradicts the target candidate knowledge; acquiring a target evidence group of the target candidate knowledge and a conflict evidence group of the conflict candidate knowledge; and logic based on each evidence in the target evidence group a rule, calculating a verification probability of the target candidate knowledge, calculating a verification probability of the conflict candidate knowledge based on a logic rule of each evidence in the conflict evidence group; and comparing a verification probability of the target candidate knowledge with the conflict candidate The verification probability of the knowledge, and determining whether the target candidate knowledge is correct knowledge based on the comparison result.
  • At least one embodiment of the present disclosure also provides a storage medium for storing non-transitory computer instructions, when executed by a processor, to perform the following operations: acquiring target candidate knowledge and collating with the target candidate knowledge a contradictory conflict candidate knowledge; acquiring a target evidence group of the target candidate knowledge and a conflict evidence group of the conflict candidate knowledge; calculating a verification probability of the target candidate knowledge based on a logic rule of each evidence in the target evidence group, Calculating a verification probability of the conflict candidate knowledge based on a logic rule of each evidence in the conflict evidence group; and comparing a verification probability of the target candidate knowledge with a verification probability of the conflict candidate knowledge, and determining the result according to the comparison result Whether the target candidate knowledge is correct knowledge.
  • FIG. 1 is a schematic flowchart of a knowledge verification method according to at least one embodiment of the present disclosure
  • 2A is a schematic block diagram of a target evidence set/collision evidence set provided by at least one embodiment of the present disclosure
  • 2B is a schematic block diagram of logic rules of a target evidence set/collision evidence group provided by at least one embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of still another knowledge verification method according to at least one embodiment of the present disclosure.
  • 4A is a schematic block diagram of an example of a target evidence set provided by at least one embodiment of the present disclosure
  • 4B is a schematic block diagram of an example of a conflict evidence set provided by at least one embodiment of the present disclosure
  • FIG. 5 is a schematic flowchart of another knowledge verification method according to at least one embodiment of the present disclosure.
  • FIG. 6 is a schematic flowchart of still another knowledge verification method according to at least one embodiment of the present disclosure.
  • FIG. 7 is a schematic block diagram of a knowledge verification device according to at least one embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides a knowledge verification method, a knowledge verification device, and a storage medium.
  • the knowledge verification method can model the logic rules of each evidence of the candidate knowledge, and calculate the verification probability of the candidate knowledge according to the logic rules of each evidence, thereby automatically verifying the correctness of the candidate knowledge, solving the knowledge conflict problem, and saving manpower and time cost.
  • the knowledge verification method and the knowledge verification device provided by the embodiments of the present disclosure can automatically analyze, process, and acquire useful knowledge from the massive, unstructured big data and verify the correctness of the acquired knowledge.
  • FIG. 1 is a schematic flowchart of a knowledge verification method according to at least one embodiment of the present disclosure.
  • the knowledge verification method provided by the embodiment of the present disclosure may include the following operations:
  • Operation S11 acquiring target candidate knowledge and conflict candidate knowledge that contradicts the target candidate knowledge
  • Operation S12 acquiring a target evidence group of target candidate knowledge and a conflict evidence group of conflict candidate knowledge
  • Operation S13 calculating a verification probability of the target candidate knowledge based on a logic rule of each evidence in the target evidence group, and calculating a verification probability of the conflict candidate knowledge based on a logic rule of each evidence in the conflict evidence group;
  • Operation S14 comparing the verification probability of the target candidate knowledge with the verification probability of the conflict candidate knowledge, and determining whether the target candidate knowledge is correct knowledge according to the comparison result.
  • Markov's logical network For example, the basic idea of Markov's logical network is that when an event violates one of a series of logical rules, the probability of existence of this event will be reduced, but not impossible. The fewer logical rules an event violates, the greater the likelihood that the event will exist. Therefore, each logic rule sets a specific weight that reflects the binding force on possible events that satisfy the logic rule. If the weight of a logical rule is larger, the difference between them will be greater for two events that satisfy and do not satisfy the logic rule.
  • the compatibility between target candidate knowledge (or conflict candidate knowledge) and existing correct knowledge and data sources depends on how much it violates logical rules and the importance of logical rules.
  • the knowledge verification method provided by the embodiment of the present disclosure can model the extracted logical rules between the candidate knowledge and the evidence group through the Markov logic network (for example, the logical rule between the target candidate knowledge and the target evidence group) And the logic rule between the conflict candidate knowledge and the conflict evidence group), the verification probability of the extracted target candidate knowledge and the verification probability of the conflict candidate knowledge are calculated based on the logic rules of each evidence in the evidence group, and the verification probability according to the target candidate knowledge
  • the comparison result of the verification probability with the conflict candidate knowledge determines whether the extracted target candidate knowledge is correct knowledge.
  • both the target candidate knowledge and the conflict candidate knowledge are extracted from the data source.
  • a data source can consist of unstructured data.
  • the data source can be a separate collection of different types of knowledge, such as a collection of medical knowledge, a collection of literary knowledge, a collection of historical knowledge, and a collection of physical knowledge.
  • the data source can also be a mixed set of various different knowledge (eg, physical, historical, mathematical, etc.).
  • various unstructured data in a data source can be a variety of knowledge from different sources.
  • the source of various knowledge can be textbooks, websites, essays, and literary works.
  • the source of medical knowledge may be a medical website, a medical essay, a medical textbook, and a medical record.
  • the knowledge verification method provided by the embodiment of the present disclosure is described in detail by taking a data source as a set of medical knowledge as an example.
  • the data source can also be other types of data sources.
  • a plurality of candidate knowledge may be extracted from the data source to form a candidate knowledge group; for example, all candidate knowledge in the data source may also be formed into a candidate knowledge group.
  • the target candidate knowledge and the conflict candidate knowledge may each be selected from the candidate knowledge group.
  • multiple candidate knowledge in the candidate knowledge group can be “vitamin C can prevent cold”, “calcium helps prevent osteoporosis”, “vitamin C can not prevent cold”, “shrimp skin can prevent osteoporosis”, “ Lemon can prevent colds, etc.
  • there may be many conflicting knowledge in the candidate knowledge group such as “vitamin C can prevent colds” and “vitamin C can not prevent colds” in the above candidate knowledge groups as contradictory knowledge.
  • “vitamin C can prevent cold” as the target candidate knowledge "vitamin C can not prevent cold” is the conflict candidate knowledge.
  • NLP natural language processing
  • natural language processing can include language processing techniques such as syntax analysis, word segmentation, lexical analysis, semantic analysis, and text recognition.
  • natural language processing can be processed by a method such as a deep learning neural network.
  • the use of deep learning neural networks to process unstructured data in data sources can improve the accuracy of selected target candidate knowledge and/or conflict candidate knowledge.
  • the deep learning neural network may include a neural network such as Recurrent Neural Networks (RNN) or Recursive Neural Networks (RNN).
  • Cyclic neural networks can be used for natural language processing such as word vector expression, statement legality check, and part-of-speech tagging.
  • the cyclic neural network may include a Long Short-Term Memory (LSTM) neural network. Long- and short-term memory neural networks have the ability to learn long-term dependencies, which can use a wide range of context information in text processing to determine the probability of the next word.
  • the deep learning neural network may, for example, analyze and process the natural language using one or a combination of the above neural networks.
  • FIG. 2A shows a schematic block diagram of a target evidence set/collision evidence set provided by at least one embodiment of the present disclosure
  • FIG. 2B shows a schematic diagram of a logical rule of a target evidence set/collision evidence set provided by at least one embodiment of the present disclosure.
  • the target evidence set may be used to determine the likelihood that the target candidate knowledge is correct
  • the conflict evidence set may be used to determine the likelihood that the conflict candidate knowledge is correct.
  • the present invention is not limited thereto, and the target evidence group can also be used to determine the possibility of the target candidate knowledge error, and the conflict evidence group can also be used to determine the possibility of the conflict candidate knowledge error.
  • the knowledge verification method provided by the embodiment of the present disclosure can model the logic rules of each evidence of the candidate knowledge, and calculate the verification probability of the candidate knowledge according to the logic rules of each evidence, thereby automatically analyzing from the massive and unstructured big data. Process, acquire useful knowledge, and verify the correctness of the acquired knowledge, solve knowledge conflicts, and save manpower and time costs.
  • each evidence set may include at least one of source evidence 102, redundancy evidence 103, and presentation style evidence 104 (eg, the target evidence group may include source evidence 102, redundancy evidence 103 And expressing at least one of the style evidence 104; at the same time, the conflict evidence set may include at least one of the source evidence 102, the redundancy evidence 103, and the representation style evidence 104).
  • source evidence 102 source evidence 102 in the target evidence set represents the source of the target candidate knowledge
  • source evidence 102 in the conflict evidence set represents the source of the conflict candidate knowledge.
  • source evidence 102, redundancy evidence 103, and presentation style evidence 104 may all be from a data source.
  • the target evidence group and the conflict evidence group may also include multiple pieces of evidence from the data source (for example, the evidence T shown in FIG. 2A).
  • Embodiments of the present disclosure do not limit the specific types of evidence in the target evidence set and the conflict evidence set.
  • the type of evidence in the target evidence group and the type of evidence in the conflict evidence group may be the same.
  • source evidence 102 may include evidence from multiple different sources.
  • the source evidence 102 may include, for example, first source evidence and second source evidence, and the first source evidence and the second source evidence are derived from medical textbooks and medical papers, respectively.
  • the target evidence group and the conflict evidence group may also include consistency evidence 101.
  • the consistency evidence 101 can be from an existing knowledge base.
  • An existing knowledge base for example, can represent a collection of all or part of the existing correct knowledge.
  • existing knowledge bases and data sources can be selected based on target candidate knowledge and conflict candidate knowledge.
  • target candidate knowledge is medical knowledge
  • the data source may be a collection of medical knowledge
  • the existing knowledge base may be a collection of existing correct medical knowledge.
  • the evidence in the target evidence group and the evidence in the conflict evidence group should correspond to each other and in the same number.
  • evidence in the target evidence group and the conflict evidence group can also be obtained from data sources and/or existing knowledge bases using natural language processing techniques such as deep learning neural networks.
  • the logical rule of the source evidence 102 can be expressed as: mention (y, S); the logical rule of the redundancy evidence 103 can be expressed as: number of occurrences (y, N); representation style evidence 104
  • S represents the source of the target candidate knowledge
  • N represents the number of times the target candidate knowledge appears
  • M represents the number of different representation modes of the target candidate knowledge
  • S represents a conflict
  • N represents the number of occurrences of conflict candidate knowledge
  • M represents the number of different representations of conflict candidate knowledge.
  • source evidence 102 the basic idea of source evidence 102 is that the more authoritative sources of information (ie, the source of knowledge) are more likely to have correct knowledge.
  • the weight W 2 of the source evidence 102 can be expressed as the authority of S, and the higher the authority of S, the greater the probability that the target candidate knowledge is correct.
  • the weights W 2 of the source evidence 102 from different sources may be different or the same.
  • the weight W 2 of the source evidence 102 can be preset. For example, when S is a medical textbook, its weight W 2 may be 10; when S is a medical paper, its weight W 2 may also be 10; and when S is a medical record, its weight W 2 may be 9; When S is a medical website, its weight W 2 can be 5.
  • target candidate knowledge eg, "vitamin C can prevent colds”
  • conflicting candidate knowledge eg, "vitamin C does not prevent colds”
  • sources of evidence target candidates knowledge of the weight 102 a weight W 2 to 5
  • the source of evidence of knowledge conflicts candidate weight 102 weight W 2 10
  • the target candidate knowledge e.g., "Vitamin C can prevent colds”
  • the probability of conflict candidate knowledge is correct.
  • redundancy evidence 103 the basic idea of redundancy evidence 103 is that correct knowledge may appear in more sources of information than wrong knowledge.
  • the weight W 3 of the redundancy evidence 103 can be expressed as log a N.
  • target candidate knowledge eg, "vitamin C can prevent colds”
  • conflicting candidate knowledge eg, "vitamin C does not prevent colds”
  • 16 medical textbooks e.g., if target candidate knowledge (eg, "vitamin C can prevent colds") appears in eight medical textbooks, conflicting candidate knowledge (eg, "vitamin C does not prevent colds") appears in 16 medical textbooks.
  • a 2
  • style evidence 104 For example, the basic idea of expressing style evidence 104 is that correct knowledge may be expressed in more different ways than wrong knowledge.
  • the weight W 4 of the representation style evidence 104 can be expressed as log a M.
  • vitamin C can prevent colds there are four different expressions such as “vitamin C can effectively prevent colds” and “eat vitamin C tablets can prevent colds” in the entire data source. Ways; and for conflict candidate knowledge (for example, “vitamin C can not prevent colds”), if there is still "the role of vitamin C in the prevention and treatment of colds" in the entire data source, "the use of vitamin C to prevent colds has no effect" 8 different ways of expression.
  • the right presentation form evidence of knowledge conflicts candidate 104 weight W 4 as log 2 8 3 so that the target candidate knowledge (For example, "vitamin C can prevent colds")
  • the correct probability is less than the correct probability of conflicting candidate knowledge (for example, "vitamin C cannot prevent colds").
  • log a represents a logarithmic function with a base.
  • Redundancy weight of evidence and the weight W 3 expression patterns of evidence weights W 4 is not limited to the above-described function expression, which may also be a function of other expressions, e.g., the weight of evidence redundancy can be expressed as a weight W 3
  • the weight of the representation style evidence W 4 can be expressed as
  • consistency evidence 101 For example, the basic idea of consistency evidence 101 is that, relative to wrong knowledge, correct knowledge should be compatible with existing correct knowledge, that is, correct knowledge should not conflict with existing correct knowledge.
  • the first existing knowledge and the second existing knowledge are knowledge in the existing knowledge base, that is, the first existing knowledge and the second existing knowledge are both existing. Correct knowledge.
  • the logical rule of the consistency evidence 101 is the constraint rule between the existing correct knowledge and the target candidate knowledge. For example, if the target candidate knowledge is “shrimp skin can prevent osteoporosis”, the knowledge of the conflict candidate is “shrimp skin can not prevent osteoporosis”, and the existing knowledge stock is in the first existing knowledge and the second existing knowledge, and the first has There is knowledge that "shrimp skin contains calcium", and the second knowledge is "calcium can prevent osteoporosis”.
  • the first prior knowledge can be expressed as "containing (K, M)”
  • the second prior knowledge can be expressed as “prevention (M, D)”
  • y can be expressed as "prevention (K, D)”
  • K may be food, medicine, etc.
  • M may be an element, substance, etc. contained in K
  • D may be a symptom, a disease, or the like.
  • the first prior knowledge is "lemon contains a lot of vitamin C”
  • the second knowledge is "vitamin C can prevent cold”
  • y is “lemon can prevent cold”
  • the weight W 1 of the consistency evidence 101 is represented as a logical value of the logical rule of the consistency evidence 101. For example, when the logical value is true, the weight W 1 is 1, and when the logical value is false, the weight W 1 is 0.
  • the target candidate knowledge is “Lemon can prevent colds” and the conflict candidate knowledge is “Lemon can not prevent colds”. If the first prior knowledge is "lemon contains a lot of vitamin C”, the second knowledge is "vitamin C can prevent colds.”
  • the target candidate knowledge is “Lemon can prevent colds” and the conflict candidate knowledge is “Lemon can not prevent colds”. If the first prior knowledge is "lemon contains a lot of vitamin C", the second knowledge is "vitamin C can prevent colds.”
  • the existing knowledge base may include a plurality of existing correct knowledge (for example, the first existing knowledge, the second existing knowledge, the third existing knowledge, and the fourth existing knowledge shown in FIG. 2B).
  • a plurality of existing correct knowledge may constitute a plurality of pieces of consistency evidence 101 (eg, the consistency evidence 101a and the consistency evidence 101b shown in FIG. 2A, etc.), and the plurality of consistency evidences 101 may have multiple weights W 1 (eg, , the weight W 1a and the weight W 1b shown in FIG. 2A, etc.).
  • the verification probability of the target candidate knowledge may be the compatibility probability of the target candidate knowledge and the data source and the existing knowledge base. That is, the verification probability of the target candidate knowledge is the correct probability of the target candidate knowledge.
  • the verification probability of the conflict candidate knowledge may be the compatibility probability of the conflict candidate knowledge and the data source and the existing knowledge base. That is, the verification probability of the conflict candidate knowledge is the correct probability of the conflict candidate knowledge.
  • the verification probability of the target candidate knowledge may also be an incompatible probability of the target candidate knowledge and the data source and the existing knowledge base. That is, the verification probability of the target candidate knowledge is the error probability of the target candidate knowledge.
  • the verification probability of the conflict candidate knowledge may also be the incompatible probability of the conflict candidate knowledge and the data source and the existing knowledge base. That is, the verification probability of the conflict candidate knowledge is the error probability of the conflict candidate knowledge.
  • the verification probability is taken as an example of the correct probability, but the verification probability may also be an error probability, which is not limited by the embodiment of the present disclosure.
  • FIG. 3 is a schematic flowchart of still another knowledge verification method provided by at least one embodiment of the present disclosure.
  • operation S14 may include the following operations:
  • Operation S141 determining whether the verification probability of the target candidate knowledge is greater than the verification probability of the conflict candidate knowledge
  • operation S143 is performed: determining the target candidate knowledge as correct knowledge.
  • the verification probability of the target candidate knowledge and the verification probability of the conflict candidate knowledge can be expressed as:
  • Z is the normalization factor.
  • P(y) in the above formula (1) is the verification probability of the target candidate knowledge
  • f i (y) is the logical value of the logical rule of the i-th evidence in the target evidence group
  • W i indicates the number in the target evidence group.
  • the weight of i evidence, T represents the amount of evidence in the target evidence group.
  • P(y) in the above equation (1) is the verification probability of the conflict candidate knowledge
  • f i (y) is the logical value of the logical rule of the ith evidence in the conflict evidence group
  • W i represents the first in the conflict evidence group.
  • the weight of i evidence, and T is the number of evidence in the conflict evidence group.
  • each closed predicate or closed atom corresponds to a binary node (ie, the eigenvalue of a closed predicate or a closed atom). If the closed predicate or closed atom is true, the corresponding binary node takes a value of 1; if the closed predicate Or if the closed atom is false, the corresponding binary node takes a value of 0.
  • Each closed rule corresponds to one feature value. If the closed rule is true, the corresponding feature value is 1; if the closed rule is false, the corresponding feature value is 0.
  • the source evidence 102, the redundancy evidence 103, and the presentation style evidence 104 are both closed predicates or closed atoms, and the consistency evidence 101 is a closed rule.
  • FIG. 4A illustrates an example of a set of target evidence provided by at least one embodiment of the present disclosure
  • FIG. 4B illustrates an example of a set of conflicting evidence provided by at least one embodiment of the present disclosure.
  • the target candidate knowledge is "shrimp skin can prevent osteoporosis” and the conflict candidate knowledge is "shrimp skin cannot prevent osteoporosis”.
  • W 2 10
  • the conflict candidate knowledge ie, "shrimp skin cannot prevent osteoporosis"
  • the verification probability of the target candidate knowledge can be calculated.
  • the verification probability of the target candidate knowledge is expressed as follows:
  • the verification probability of conflict candidate knowledge is expressed as follows:
  • Z is the same for both target candidate knowledge and conflict candidate knowledge.
  • the target candidate knowledge can be determined as correct knowledge.
  • the knowledge verification method also includes outputting correct knowledge.
  • the correct knowledge of the output can be displayed on the display, or voice output via a speaker.
  • the knowledge verification method can output all or part of the correct knowledge.
  • the knowledge verification method may further include the following operations:
  • operation S22 is performed: outputting target candidate knowledge.
  • FIG. 5 is a schematic flowchart of another knowledge verification method provided by at least one embodiment of the present disclosure.
  • the knowledge verification method can also output the correct knowledge that the user desires to display, for example, display N correct knowledge.
  • the knowledge verification method may also perform the following operations:
  • Operation S15 obtaining a verification probability of R correct knowledge and a verification probability of R error knowledge that contradicts R correct knowledge;
  • Operation S16 calculating a ratio of a verification probability of R correct knowledge and a verification probability of R error knowledge
  • Operation S17 sorting R correct knowledge according to the ratio
  • Operation S18 outputting the N correct knowledge after sorting.
  • a plurality of correct knowledge and its verification probability and a plurality of erroneous knowledge and its verification probability may be determined according to the method illustrated in FIGS. 1 and/or 3.
  • the correct knowledge may be the target candidate knowledge or the conflict candidate knowledge; accordingly, the error knowledge may be the conflict candidate knowledge or the target candidate knowledge.
  • the ratio can be expressed as follows:
  • P (correct knowledge) may be P (target candidate knowledge) or P (collision candidate knowledge); correspondingly, P (error knowledge) may be P (collision candidate knowledge) or P (target candidate) Know how).
  • N is a positive integer and N ⁇ R.
  • N can be the amount of correct knowledge that the user desires to display.
  • N may be related to the number of candidate knowledge of the candidate knowledge set, N being, for example, 10% of the number of candidate knowledge.
  • the embodiment of the present disclosure does not specifically limit N.
  • N correct knowledge can correspond to the largest N ratios.
  • the N correct knowledge can be the target candidate knowledge with the largest N ratios. But not limited to this, N correct knowledge can also correspond to the smallest N ratios.
  • R correct knowledge can be all correct knowledge, ie R is the number of all correct knowledge; R correct knowledge can also be part of the correct knowledge.
  • the knowledge verification method provided by the embodiment of the present disclosure may also output information such as a ratio of correct knowledge, a verification probability of correct knowledge, and the like. It should be noted that the knowledge verification method provided by the embodiment of the present disclosure may also output error knowledge.
  • FIG. 6 is a schematic flowchart of still another knowledge verification method provided by at least one embodiment of the present disclosure.
  • the knowledge verification method may further include the following operations:
  • Operation S31 obtaining a candidate knowledge group from a data source
  • Operation S32 selecting target candidate knowledge from the candidate knowledge group
  • Operation S33 determining whether there is conflict candidate knowledge in the candidate knowledge group that conflicts with the target candidate knowledge; if there is conflict candidate knowledge in the candidate knowledge group, proceeding to operation S11; if there is no conflict candidate knowledge in the candidate knowledge group, Then, operation S34 is performed: determining whether the target candidate knowledge contradicts the existing knowledge in the existing knowledge base, and if yes, performing operation S35: determining that the target candidate knowledge is error knowledge, and if not, performing operation S36: determining the target candidate Knowledge is the correct knowledge.
  • the candidate knowledge group is composed of all candidate knowledge in the data source
  • the target candidate knowledge when the target candidate knowledge is determined to be correct knowledge, it means that there is no knowledge in the data source and the existing knowledge base that contradicts the target candidate knowledge. Therefore, the target candidate knowledge can be directly determined as the correct knowledge, and the target candidate knowledge can be output as needed.
  • the candidate knowledge group is composed of a plurality of candidate knowledge extracted from the data source
  • the operation S142, the operation S143, or the diagram shown in FIG. 3 is performed.
  • the target candidate knowledge (or conflict candidate knowledge) can be stored in the correct knowledge group.
  • outputting correct knowledge can include the following operations: obtaining correct knowledge from the correct knowledge set; constructing erroneous knowledge that contradicts correct knowledge; obtaining correct evidence sets of correct knowledge and false evidence sets of erroneous knowledge; based on correct evidence sets
  • the logic rule of each evidence, the verification probability of correct knowledge is calculated; the verification probability of the error knowledge is calculated based on the logic rules of each evidence in the error evidence group; the ratio of the verification probability of the correct knowledge to the verification probability of the corresponding error knowledge is calculated; Sort the correct knowledge; output the N correct knowledge after sorting.
  • the correct knowledge is as shown in FIG. 1 and/or FIG.
  • the correct knowledge verified by the method the verification probability of the correct knowledge and the verification probability of the erroneous knowledge can be directly obtained (for example, refer to operation S13 in FIG. 1 and/or FIG. 3), thereby verifying the probability and error knowledge of the correct knowledge.
  • the ratio between the verification probabilities may be directly calculated; or the ratio between the verification probabilities of the correct knowledge and the verification probabilities of the erroneous knowledge may also be obtained (for example, refer to operation S16 in FIG. 5). Then, the correct knowledge is sorted according to the ratio.
  • FIG. 7 is a schematic block diagram of a knowledge verification device according to at least one embodiment of the present disclosure.
  • the knowledge verification apparatus 200 may include a processor 201, a memory 202, and a display 203. It should be noted that the components of the knowledge verification device shown in FIG. 7 are merely exemplary and not limiting, and the knowledge verification device may have other components depending on actual application needs.
  • components such as processor 201, memory 202, and display 203 can communicate over a network connection.
  • the components such as the processor 201, the memory 202, and the display 203 can communicate with each other directly or indirectly.
  • the network can include wireless networks, wired networks, and/or any combination of wireless networks and wired networks.
  • the network may include a local area network, the Internet, a telecommunications network, an Internet of Things based Internet and/or telecommunications network, and/or any combination of the above networks, and the like.
  • the wired network can be communicated by means of twisted pair, coaxial cable or optical fiber transmission, for example, a wireless communication network such as a 3G/4G/5G mobile communication network, Bluetooth, Zigbee or WiFi.
  • a wireless communication network such as a 3G/4G/5G mobile communication network, Bluetooth, Zigbee or WiFi.
  • the disclosure does not limit the type and function of the network.
  • processor 201 can be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or program execution capabilities, such as a field programmable gate array (FPGA) or a tensor processing unit (TPU), The processor 201 can control other components in the knowledge verification device to perform the desired functions.
  • the central processing unit (CPU) may be an X86 or ARM architecture or the like.
  • memory 202 can include any combination of one or more computer program products, which can include various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • Volatile memory can include, for example, random access memory (RAM) and/or caches and the like.
  • the non-volatile memory may include, for example, a read only memory (ROM), a hard disk, an erasable programmable read only memory (EPROM), a portable compact disk read only memory (CD-ROM), a USB memory, a flash memory, and the like.
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM portable compact disk read only memory
  • USB memory a flash memory
  • One or more non-transitory computer instructions may be stored on the memory 202, and the processor 201 may execute the non-transitory computer instructions to perform various functions.
  • Various applications and various data may also be stored in the computer readable storage medium, such as data sources, existing knowledge bases, weights, verification probabilities of
  • the display 203 can be a liquid crystal display (LCD), an organic light emitting diode display (OLED), or the like.
  • LCD liquid crystal display
  • OLED organic light emitting diode display
  • the knowledge verification device may further include an input device (eg, a touch device, a keyboard, a microphone, a mouse, etc.), a speaker, and the like according to actual needs.
  • the user can utilize the display 203 and input devices and the like to implement interaction with the knowledge verification device 200. For example, the user can view the correct knowledge through the display 203, and can input candidate knowledge or the like to be verified through the input device.
  • the following operations may be performed: acquiring the target candidate knowledge and the conflict candidate knowledge that contradicts the target candidate knowledge; acquiring the target evidence group of the target candidate knowledge and the conflict evidence group of the conflict candidate knowledge Calculating the verification probability of the target candidate knowledge based on the logic rules of each evidence in the target evidence group, calculating the verification probability of the conflict candidate knowledge based on the logic rules of each evidence in the conflict evidence group; and comparing the verification probability and the conflict candidate of the target candidate knowledge The probability of verification of knowledge, and based on the comparison result, determine whether the target candidate knowledge is correct knowledge.
  • natural language processing can be used to extract target candidate knowledge and conflict candidate knowledge.
  • the natural language processing may be, for example, a language processing using a deep learning neural network (for example, a cyclic neural network, a recurrent neural network, etc.).
  • both the target evidence group and the conflict evidence group include at least one of source evidence, redundancy evidence, and presentation style evidence.
  • Source evidence, redundancy evidence, and presentation style evidence are derived from data sources.
  • the target evidence group and the conflict evidence group also include consistency evidence, and the consistency evidence comes from the existing knowledge base.
  • S represents the source of the target candidate knowledge
  • N represents the number of times the target candidate knowledge appears
  • M represents the number of different representation ways of the target candidate knowledge
  • S represents the conflict candidate knowledge.
  • Source N indicates the number of occurrences of conflict candidate knowledge
  • M indicates the number of different representations of conflict candidate knowledge.
  • the weight of the source evidence can be expressed as the authority of S
  • the weight of the redundancy evidence can be expressed as log a N
  • the weight of the representation style evidence can be expressed as log a M
  • the log a represents the logarithmic function with a base
  • the weight of the consistency evidence can be expressed as the logical value of the logical rule of the evidence of consistency.
  • embodiments of the present disclosure employ a Markov logic network to model the logic rules of each evidence and calculate the verification probability of the target candidate knowledge (or conflict candidate knowledge) based on the logical rules of each evidence. For example, based on the logic rules of each evidence modeled by the Markov logic network, the verification probability of the target candidate knowledge and the verification probability of the conflict candidate knowledge can be expressed as:
  • Z is the normalization factor
  • P(y) is the verification probability of the target candidate knowledge
  • f i (y) is the eigenvalue of the logical rule of the ith evidence in the target evidence group
  • W i represents the target evidence group.
  • the weight of the i-th evidence T represents the number of evidence in the target evidence group; when y represents the conflict candidate knowledge, P(y) is the verification probability of the conflict candidate knowledge, and f i (y) is the i-th in the conflict evidence group.
  • False W i represents the weight of the ith evidence in the conflict evidence group
  • T represents the amount of evidence in the conflict evidence group.
  • the verification probability is a correct probability
  • the non-transitory computer instruction when executed by the processor 201, "the verification probability of the comparison target candidate knowledge and the verification probability of the conflict candidate knowledge are compared, and the target candidate is determined according to the comparison result.
  • the operation of whether the knowledge is correct knowledge includes: determining whether the verification probability of the target candidate knowledge is greater than the verification probability of the conflict candidate knowledge; if not, determining the conflict candidate knowledge as correct knowledge; if yes, determining the target candidate knowledge as correct knowledge .
  • the following operations may also be implemented: obtaining verification probabilities of R correct knowledge and verification probabilities of R erroneous knowledge contradicting R correct knowledge; The ratio of the verification probability of R correct knowledge to the verification probability of R error knowledge; sorting R correct knowledge according to the ratio; outputting the N correct knowledge after sorting.
  • N is a positive integer and N ⁇ R.
  • N can be the amount of correct knowledge that the user desires to display.
  • non-transitory computer instructions when executed by processor 201, may also perform the operations of outputting the sorted N correct knowledge to display 203; displaying the sorted N correct knowledge on display 203.
  • N correct knowledge can correspond to the largest N ratios.
  • the following operations may also be performed: acquiring candidate knowledge groups from the data source; selecting target candidate knowledge from the candidate knowledge groups; determining whether the candidate knowledge groups exist and The target candidate knowledge conflicts with the conflict candidate knowledge; if there is conflict candidate knowledge in the candidate knowledge group, the verification probability of the target candidate knowledge and the verification probability of the conflict candidate knowledge are calculated, and according to the verification probability of the target candidate knowledge and the conflict candidate knowledge
  • the comparison result of the verification probability determines whether the target candidate knowledge is correct knowledge; if there is no conflict candidate knowledge in the candidate knowledge group, it is judged whether the target candidate knowledge contradicts the existing knowledge in the existing knowledge base, and if so, it is determined The target candidate knowledge is wrong knowledge, and if not, the target candidate knowledge is determined to be correct knowledge.
  • the target candidate knowledge when the target candidate knowledge (or conflict candidate knowledge) is determined to be correct knowledge, the target candidate knowledge (or conflict candidate knowledge) can be stored in the correct knowledge set.
  • the non-transitory computer instructions are executed by the processor 201, the following operations can also be implemented: obtaining correct knowledge from the correct knowledge group; constructing erroneous knowledge that contradicts the correct knowledge; obtaining correct evidence sets of correct knowledge and error evidence sets of erroneous knowledge Calculate the verification probability of correct knowledge based on the logic rules of each evidence in the correct evidence group; calculate the verification probability of the error knowledge based on the logic rules of each evidence in the error evidence group; calculate the verification probability of the correct knowledge and the verification of the corresponding error knowledge The ratio of probabilities; sorting the correct knowledge according to the ratio; outputting the N correct knowledge after sorting.
  • At least one embodiment of the present disclosure also provides a storage medium for storing non-transitory computer instructions.
  • the non-transitory computer instruction When the non-transitory computer instruction is executed by the processor, the following operations may be performed: acquiring target candidate knowledge and conflict candidate knowledge that contradicts the target candidate knowledge; acquiring a target evidence group of the target candidate knowledge and a conflict evidence group of the conflict candidate knowledge; The logic rule of each evidence in the target evidence group, calculating the verification probability of the target candidate knowledge, calculating the verification probability of the conflict candidate knowledge based on the logic rules of each evidence in the conflict evidence group; and comparing the verification probability of the target candidate knowledge with the conflict candidate knowledge The probability is verified, and based on the comparison result, it is determined whether the target candidate knowledge is correct knowledge.
  • the storage medium may be applied to the knowledge verification device described in any of the above embodiments, for example, it may be the memory 202 in the knowledge verification device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Analysis (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种知识验证方法、知识验证设备以及存储介质。该知识验证方法包括:获取目标候选知识和与目标候选知识相矛盾的冲突候选知识(S11);获取目标候选知识的目标证据组和冲突候选知识的冲突证据组(S12);基于目标证据组中各证据的逻辑规则,计算目标候选知识的验证概率,基于冲突证据组中各证据的逻辑规则,计算冲突候选知识的验证概率(S13);比较目标候选知识的验证概率与冲突候选知识的验证概率,并根据比较结果确定目标候选知识是否为正确知识(S14)。

Description

知识验证方法、知识验证设备以及存储介质
本申请要求于2017年07月24日递交的中国专利申请第201710606293.1号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种知识验证方法、知识验证设备以及存储介质。
背景技术
在科学研究、互联网应用、电子商务等领域,数据规模、数据种类等飞速增长,大数据逐渐成为研究热点。大数据是指无法在一定时间范围内用常规软件工具进行捕捉、管理和处理的数据集合。大数据具有数据规模大、数据种类多、数据处理速度快、数据价值密度低等特性。
大数据包括结构化、半结构化和非结构化数据。随着社交网络、物联网、云计算等的高速发展,非结构化数据因其具有数据庞大、种类众多、时效性强等特征而呈指数级快速增长,非结构化数据逐渐成为大数据时代的主流数据。
发明内容
本公开至少一实施例提供一种知识验证方法,其包括:获取目标候选知识和与所述目标候选知识相矛盾的冲突候选知识;获取所述目标候选知识的目标证据组和所述冲突候选知识的冲突证据组;基于所述目标证据组中各证据的逻辑规则,计算所述目标候选知识的验证概率,基于所述冲突证据组中各证据的逻辑规则,计算所述冲突候选知识的验证概率;以及比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识。
本公开至少一实施例还提供一种知识验证设备,包括处理器和存储器,所述存储器用于存储非暂时性计算机指令,所述非暂时性计算机指令被所述处理器执行时实现以下操作:获取目标候选知识和与所述目标候选知识相矛盾的冲突候选知识;获取所述目标候选知识的目标证据组和所述冲突候选知识的冲突证据组;基于所述目标证据组中各证据的逻辑规则,计算所述目标候选知识的 验证概率,基于所述冲突证据组中各证据的逻辑规则,计算所述冲突候选知识的验证概率;以及比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识。
本公开至少一实施例还提供一种存储介质,用于存储非暂时性计算机指令,所述非暂时性计算机指令被处理器执行时实现以下操作:获取目标候选知识和与所述目标候选知识相矛盾的冲突候选知识;获取所述目标候选知识的目标证据组和所述冲突候选知识的冲突证据组;基于所述目标证据组中各证据的逻辑规则,计算所述目标候选知识的验证概率,基于所述冲突证据组中各证据的逻辑规则,计算所述冲突候选知识的验证概率;以及比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1示为本公开至少一实施例提供的一种知识验证方法的示意性流程图;
图2A为本公开至少一实施例提供的目标证据组/冲突证据组的示意性框图;
图2B为本公开至少一实施例提供的目标证据组/冲突证据组的逻辑规则的示意性框图;
图3为本公开至少一实施例提供的又一种知识验证方法的示意性流程图;
图4A为本公开至少一实施例提供的目标证据组的一个示例的示意性框图;
图4B为本公开至少一实施例提供的冲突证据组的一个示例的示意性框图;
图5为本公开至少一实施例提供的另一种知识验证方法的示意性流程图;
图6为本公开至少一实施例提供的再一种知识验证方法的示意性流程图;以及
图7为本公开至少一实施例提供的一种知识验证设备的示意性框图。
具体实施方式
为了使得本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。为了保持本公开实施例的以下说明清楚且简明,本公开省略了已知功能和已知部件的详细说明。
随着云计算的高速发展,人们越来越多地关注大数据。大数据时代带来了两方面的影响:一方面,数据的增加能够满足人们不同的信息需求;另一方面有用的信息和知识隐藏在大量不相关的数据中。从海量的非结构化数据中自动抽取指定领域的知识,能够帮助人们快速掌握知识并加深对知识的理解。然而,从海量数据中自动抽取的多种知识中,可能存在相互冲突和矛盾的知识。目前,通常通过领域专家判断抽取的多种知识的对错,以此解决知识冲突的问题。这种基于领域专家的判断方法需要耗费大量的时间和人力,不适于大数据时代的海量知识的判断。
本公开至少一个实施例提供一种知识验证方法、知识验证设备以及存储介质。该知识验证方法可以建模候选知识的各证据的逻辑规则,并根据各证据的逻辑规则计算候选知识的验证概率,从而自动验证候选知识的正确性,解决知识冲突问题,节省人力和时间成本。例如,本公开实施例提供的知识验证方法和知识验证设备可以自动地从海量的、非结构化的大数据中分析、处理、获取有用的知识并验证所获取的知识的正确性。
下面将结合附图对本公开实施例提供的一种知识验证方法、知识验证设备 以及存储介质进行详细的说明。
图1示出了本公开至少一实施例提供的一种知识验证方法的示意性流程图。
例如,如图1所示,本公开实施例提供的知识验证方法可以包括以下操作:
操作S11:获取目标候选知识和与目标候选知识相矛盾的冲突候选知识;
操作S12:获取目标候选知识的目标证据组和冲突候选知识的冲突证据组;
操作S13:基于目标证据组中各证据的逻辑规则,计算目标候选知识的验证概率,基于冲突证据组中各证据的逻辑规则,计算冲突候选知识的验证概率;
操作S14:比较目标候选知识的验证概率与冲突候选知识的验证概率,并根据比较结果确定目标候选知识是否为正确知识。
例如,马尔科夫逻辑网的基本思想是:当一个事件违反了一系列逻辑规则中的一条逻辑规则,则这个事件存在的可能性将降低,但并非不可能。一个事件违反的逻辑规则越少,那么这个事件存在的可能性就越大。因此,每个逻辑规则都设定一个特定的权重,权重反映了对满足该逻辑规则的可能事件的约束力。若一个逻辑规则的权重越大,则对于满足和不满足该逻辑规则的两个事件而言,它们之间的差异将越大。目标候选知识(或冲突候选知识)与已有的正确知识和数据源之间的相容性取决于其违反逻辑规则的多少和逻辑规则的重要性。
本公开实施例提供的知识验证方法可以通过马尔科夫逻辑网(Markov logic network)建模抽取的候选知识和证据组之间的逻辑规则(例如,目标候选知识和目标证据组之间的逻辑规则,以及冲突候选知识和冲突证据组之间的逻辑规则),基于证据组中各证据的逻辑规则计算抽取的目标候选知识的验证概率和冲突候选知识的验证概率,并根据目标候选知识的验证概率和冲突候选知识的验证概率的比较结果判断抽取的目标候选知识是否为正确知识。例如,在操作S11中,目标候选知识和冲突候选知识均从数据源中抽取。数据源可以由非结构化数据组成。
例如,数据源可以为医学知识的集合、文学知识的集合、历史知识的集合以及物理知识的集合等不同类型的知识的单独集合。又例如,数据源也可以为各种不同知识(例如,物理、历史、数学等)的混合集合。
例如,数据源中的各种非结构化数据可以为来源不同的各种知识。各种知识的来源可以为教课书、网站、论文以及文学著作等。例如,当数据源为医学 知识的集合时,医学知识的来源可以为医疗网站、医学论文、医学教课书以及病历等。
例如,本公开的描述中,以数据源为医学知识的集合为例详细说明本公开的实施例提供的知识验证方法。但本领域技术人员应该知道该数据源还可以为其他类型的数据源。
例如,可以从数据源中抽取多个候选知识以组成候选知识组;又例如,还可以将数据源中的所有候选知识组成候选知识组。目标候选知识和冲突候选知识可以均从该候选知识组中选取。
例如,候选知识组中的多个候选知识可以为“维生素C能够预防感冒”、“钙有助于预防骨质疏松”、“维生素C不能预防感冒”、“虾皮能够预防骨质疏松”、“柠檬能够预防感冒”等。例如,在候选知识组中可能存在许多相互矛盾的知识,例如上述候选知识组中的“维生素C能够预防感冒”和“维生素C不能预防感冒”为相互矛盾的知识。当选取“维生素C能够预防感冒”为目标候选知识,则“维生素C不能预防感冒”为冲突候选知识。
例如,可以采用自然语言处理(Natural Language Processing,NLP)技术从数据源中抽取目标候选知识和冲突候选知识。
例如,自然语言处理可以包括句法分析、分词、词法分析、语义分析、文本识别等语言处理技术。例如,自然语言处理可以采用深度学习神经网络等方法进行语言处理。利用深度学习神经网络对数据源中的非结构化数据进行处理可以提高选取的目标候选知识和/或冲突候选知识的准确性。
例如,深度学习神经网络可以包括循环神经网络(Recurrent Neural Networks,RNN)、递归神经网络(Recursive Neural Networks,RNN)等神经网络。循环神经网络可以用于词向量表达、语句合法性检查、词性标注等自然语言处理。循环神经网络可以包括长短时记忆(Long Short-Term Memory,LSTM)神经网络。长短时记忆神经网络具有能够学习长期依赖关系的能力,其能够在文本处理中利用很宽范围内的上下文信息来判断下一个词的概率。深度学习神经网络例如可以采用上述神经网络中的一种或几种的组合对自然语言进行分析处理。
图2A示出了本公开至少一实施例提供的目标证据组/冲突证据组的示意性框图;图2B示出了本公开至少一实施例提供的目标证据组/冲突证据组的逻辑规则的示意性框图。
例如,在操作S12中,目标证据组可以用于判断目标候选知识正确的可能性,冲突证据组可以用于判断冲突候选知识正确的可能性。但不限于此,目标证据组还可以用于判断目标候选知识错误的可能性,冲突证据组还可以用于判断冲突候选知识错误的可能性。
本公开实施例提供的知识验证方法可以建模候选知识的各证据的逻辑规则,并根据各证据的逻辑规则计算候选知识的验证概率,从而自动地从海量的、非结构化的大数据中分析、处理、获取有用的知识,并验证所获取的知识的正确性,解决知识冲突问题,节省人力和时间成本。
例如,如图2A所示,每个证据组可以包括来源证据102、冗余度证据103和表述样式证据104中的至少之一(例如,目标证据组可以包括来源证据102、冗余度证据103和表述样式证据104中的至少之一;同时,冲突证据组可以包括来源证据102、冗余度证据103和表述样式证据104中的至少之一)。例如,对于来源证据102,目标证据组中的来源证据102表示目标候选知识的来源,而冲突证据组中的来源证据102表示冲突候选知识的来源。例如,来源证据102、冗余度证据103以及表述样式证据104可以均来自于数据源。
需要说明的是,目标证据组和冲突证据组还可以分别包括多个来自于数据源的证据(例如,图2A所示的证据T)。本公开的实施例对目标证据组和冲突证据组中具体的证据类型不作限制。目标证据组的证据类型和冲突证据组的证据类型可以相同。
例如,来源证据102可以包括多个不同来源的证据。来源证据102例如可以包括第一来源证据和第二来源证据,且第一来源证据和第二来源证据分别来自于医学教课书和医学论文。
例如,如图2A所示,目标证据组和冲突证据组还可以均包括一致性证据101。例如,一致性证据101可以来自于已有知识库。已有知识库例如可以表示所有或部分已有的正确知识的集合。
例如,已有知识库和数据源可以根据目标候选知识和冲突候选知识进行选择。例如,当目标候选知识为医学知识时,数据源可以为医学知识的集合,已有知识库可以为已有的正确医学知识的集合。
例如,目标证据组中的证据和冲突证据组中的证据应当彼此对应且数量相同。
例如,目标证据组和冲突证据组中的各证据也可以利用深度学习神经网络 等自然语言处理技术从数据源和/或已有知识库中获取。
例如,如图2B所示,来源证据102的逻辑规则可以表示为:提及(y,S);冗余度证据103的逻辑规则可以表示为:出现次数(y,N);表述样式证据104的逻辑规则可以表示为:表述样式(y,M);一致性证据101的逻辑规则可以表示为:第一已有知识∧第二已有知识=>y。
例如,当y表示目标候选知识时,S表示目标候选知识的来源,N表示目标候选知识出现的次数,M表示目标候选知识的不同表述方式的数量;当y表示冲突候选知识时,S表示冲突候选知识的来源,N表示冲突候选知识出现的次数,M表示冲突候选知识的不同表述方式的数量。
例如,来源证据102的基本思想是:权威度越高的信息源(即,知识的来源)出现正确知识的可能性更大。
例如,当y表示目标候选知识时,来源证据102的权重W 2可以表示为S的权威度,S的权威度越高,则表示该目标候选知识正确的概率越大。不同来源的来源证据102的权重W 2可能不同,也可能相同。来源证据102的权重W 2可以预先设定。例如,当S为医学教课书时,其权重W 2可以为10;当S为医学论文时,其权重W 2也可以为10;而当S为病历时,其权重W 2可以为9;当S为医疗网站时,其权重W 2可以为5。
例如,若目标候选知识(例如,“维生素C能够预防感冒”)来源于医疗网站,而冲突候选知识(例如,“维生素C不能预防感冒”)来源于医学教科书。由此,目标候选知识的来源证据102的权重W 2为5,冲突候选知识的来源证据102的权重W 2为10,从而目标候选知识(例如,“维生素C能够预防感冒”)正确的概率小于冲突候选知识(例如,“维生素C不能预防感冒”)正确的概率。
例如,冗余度证据103的基本思想是:相对于错误知识,正确知识可能出现在更多的信息源中。
例如,冗余度证据103的权重W 3可以表示为log aN。
例如,若目标候选知识(例如,“维生素C能够预防感冒”)出现在8本医学教科书中,而冲突候选知识(例如,“维生素C不能预防感冒”)出现在16本医学教科书中。由此,若a为2,目标候选知识的冗余度证据103的权重W 3为log 28=3,冲突候选知识的冗余度证据103的权重W 3为log 216=4,从而目标候选知识(例如,“维生素C能够预防感冒”)正确的概率小于冲突候选知识(例如,“维生素C不能预防感冒”)正确的概率。
例如,表述样式证据104的基本思想是:相对于错误知识,正确知识可能会以更多不同的方式进行表达。
例如,表述样式证据104的权重W 4可以表示为log aM。
例如,对于目标候选知识(例如,“维生素C能够预防感冒”),若在整个数据源中还存在“维生素C可以有效预防感冒”、“吃维生素C片可以预防感冒”等4种不同的表述方式;而对于冲突候选知识(例如,“维生素C不能预防感冒”),若在整个数据源中还存在“维生素C在防治感冒方面的作用不大”、“服用维生素C防治感冒没有任何效果”等8种不同的表述方式。由此,若a为2,目标候选知识的表述样式证据104的权重W 4为log 24=2,冲突候选知识的表述样式证据104的权重W 4为log 28=3,从而目标候选知识(例如,“维生素C能够预防感冒”)正确的概率小于冲突候选知识(例如,“维生素C不能预防感冒”)正确的概率。
需要说明的是,在上述说明中,log a表示以a为底数的对数函数。冗余度证据的权重W 3和表述样式证据的权重W 4不限于上述函数表达式,其还可以为其他的函数表达式,例如,冗余度证据的权重W 3可以表示为
Figure PCTCN2018096652-appb-000001
表述样式证据的权重W 4可以表示为
Figure PCTCN2018096652-appb-000002
例如,一致性证据101的基本思想是:相对于错误知识,正确知识应当与已有的正确知识相容,即正确知识应当与已有的正确知识无冲突。一致性证据101的逻辑规则表示为“第一已有知识∧第二已有知识=>y”,其可以表示根据第一已有知识和第二已有知识可以推导得出候选知识y(例如,y可以为目标候选知识或冲突候选知识),即,候选知识y与第一已有知识和第二已有知识都不冲突。
例如,在一致性证据101的逻辑规则中,第一已有知识和第二已有知识均为已有知识库中的知识,即第一已有知识和第二已有知识均为已有的正确知识。一致性证据101的逻辑规则为已有的正确知识与目标候选知识之间的约束规则。例如,若目标候选知识为“虾皮能预防骨质疏松”,冲突候选知识为“虾皮不能预防骨质疏松”,而已有知识库存在第一已有知识和第二已有知识,且第一已有知识为“虾皮含有钙”,第二已有知识为“钙能预防骨质疏松”。由此,根据一致性证据101的逻辑规则(即第一已有知识∧第二已有知识=>y)可以推导出y为“虾皮能够预防骨质疏松”,从而目标候选知识与已有的正确知识无冲突,冲突候选知识与已有的正确知识相冲突,目标候选知识正确的概率大 于冲突候选知识正确的概率。
例如,在一个示例中,第一已有知识可以表示为“含有(K,M)”,第二已有知识可以表示为“预防(M,D)”,y可以表示为“预防(K,D)”,其中,K可以为食物、药品等,M可以为K中含有的元素、物质等,D可以为症状、疾病等。由此,一致性证据101的逻辑规则可以建模为“含有(K,M)∧预防(M,D)=>预防(K,D)”。例如,第一已有知识为“柠檬含有大量的维生素C”,第二已有知识为“维生素C能够预防感冒”,y为“柠檬能够预防感冒”,则一致性证据101的逻辑规则表示为:含有(柠檬,维生素C)∧预防(维生素C,感冒)=>预防(柠檬,感冒)。
例如,一致性证据101的权重W 1表示为一致性证据101的逻辑规则的逻辑值。例如,当逻辑值为真时,权重W 1为1,当逻辑值为假时,权重W 1为0。例如,目标候选知识为“柠檬能够预防感冒”,冲突候选知识为“柠檬不能预防感冒”。若第一已有知识为“柠檬含有大量的维生素C”,第二已有知识为“维生素C能够预防感冒”。基于一致性证据101的逻辑规则,目标候选知识的一致性证据101的权重W 1为1,冲突候选知识的一致性证据101的权重W 1为0,从而目标候选知识正确的概率大于冲突候选知识正确的概率。
例如,已有知识库中可以包括多个已有的正确知识(例如,图2B所示的第一已有知识、第二已有知识、第三已有知识和第四已有知识等),多个已有的正确知识可以组成多个一致性证据101(例如,图2A所示的一致性证据101a和一致性证据101b等),多个一致性证据101可以具有多个权重W 1(例如,图2A所示的权重W 1a和权重W 1b等)。
例如,目标候选知识的验证概率可以为目标候选知识与数据源和已有知识库的相容性概率。即,目标候选知识的验证概率为目标候选知识的正确概率。冲突候选知识的验证概率可以为冲突候选知识与数据源和已有知识库的相容性概率。即,冲突候选知识的验证概率为冲突候选知识的正确概率。
又例如,目标候选知识的验证概率也可以为目标候选知识与数据源和已有知识库的不相容概率。即,目标候选知识的验证概率为目标候选知识的错误概率。冲突候选知识的验证概率也可以为冲突候选知识与数据源和已有知识库的不相容概率。即,冲突候选知识的验证概率为冲突候选知识的错误概率。
需要说明的是,本公开的实施例中,以验证概率为正确概率为例进行详细说明,但验证概率也可以为错误概率,本公开的实施例对此不作限制。
图3示出了本公开至少一实施例提供的又一种知识验证方法的示意性流程图。
例如,如图3所示,在一个示例中,当验证概率为正确概率时,操作S14可以包括以下操作:
操作S141:判断目标候选知识的验证概率是否大于冲突候选知识的验证概率;
如果不是,则执行操作S142:确定冲突候选知识为正确知识;
如果是,则执行操作S143:确定目标候选知识为正确知识。
例如,基于马尔科夫逻辑网建模的各证据的逻辑规则,目标候选知识的验证概率和冲突候选知识的验证概率均可以表示为:
Figure PCTCN2018096652-appb-000003
其中,Z为归一化因子。当y表示目标候选知识时,上式(1)中的P(y)为目标候选知识的验证概率,f i(y)为目标证据组中第i个证据的逻辑规则的逻辑值,f i(y)=1表示目标证据组中第i个证据的逻辑规则为真,f i(y)=0表示目标证据组中第i个证据的逻辑规则为假,W i表示目标证据组中第i个证据的权重,T表示目标证据组中的证据数量。当y表示冲突候选知识时,上式(1)中的P(y)为冲突候选知识的验证概率,f i(y)为冲突证据组中第i个证据的逻辑规则的逻辑值,f i(y)=1表示冲突证据组中第i个证据的逻辑规则为真,f i(y)=0表示冲突证据组中第i个证据的逻辑规则为假,W i表示冲突证据组中第i个证据的权重,T表示冲突证据组中的证据数量。
例如,马尔科夫逻辑网中的顶点为闭谓词(ground predicates)或闭原子(ground atoms),各闭谓词或闭原子之间的逻辑关系为闭规则(ground formulas)。每个闭谓词或闭原子均对应一个二值节点(即闭谓词或闭原子的特征值),若该闭谓词或闭原子为真,则对应的二值节点取值为1;若该闭谓词或闭原子为假,则对应的二值节点取值为0。每个闭规则都对应一个特征值,若该闭规则为真,则对应的特征值为1;若该闭规则为假,则对应的特征值为0。
例如,来源证据102、冗余度证据103和表述样式证据104均为闭谓词或闭原子,一致性证据101为闭规则。例如,对于来源证据102、冗余度证据103和表述样式证据104,f i(y)的逻辑规则为真,即f i(y)=1。对于一致性证据101,若目标候选知识(或冲突候选知识)与已有的正确知识相容,则f i(y)的逻辑规 则为真,即f i(y)=1,否则f i(y)=0。
图4A示出了本公开至少一实施例提供的目标证据组的一个示例;图4B示出了本公开至少一实施例提供的冲突证据组的一个示例。
例如,在一个具体的示例中,目标候选知识为“虾皮能够预防骨质疏松”,冲突候选知识为“虾皮不能预防骨质疏松”。
例如,如图4A所示,目标证据组包括:提及(“虾皮能够预防骨质疏松”,“医学教科书”),出现次数(“虾皮能够预防骨质疏松”,8),表述样式(“虾皮能够预防骨质疏松”,4)以及“含有(虾皮,钙)”∧“预防(钙,骨质疏松)”=>“预防(虾皮,骨质疏松)”。由此,目标候选知识的来源证据102的权重W 2=10,目标候选知识的冗余度证据103的权重W 3=log 28=3,目标候选知识的表述样式证据104的权重W 4=log 24=2。目标候选知识(即“虾皮不能预防骨质疏松”)与该一致性证据101不冲突,即目标候选知识的一致性证据101的权重W 1=1。
例如,如图4B所示,冲突证据组包括:提及(“虾皮不能预防骨质疏松”,“医学教科书”),出现次数(“虾皮不能预防骨质疏松”,4),表述样式(“虾皮不能预防骨质疏松”,4)以及“含有(虾皮,钙)”∧“预防(钙,骨质疏松)”=>“预防(虾皮,骨质疏松)”。由此,冲突候选知识的来源证据102的权重W 2'=10,冲突候选知识的冗余度证据103的权重W 3'=log 24=2,冲突候选知识的表述样式证据104的权重W 4'=log 24=2。冲突候选知识(即“虾皮不能预防骨质疏松”)与该一致性证据101相冲突,即冲突候选知识的一致性证据101的权重W 1'=0。
综上所述,在操作S13中,基于目标证据组中各证据的逻辑规则,可以计算目标候选知识的验证概率。目标候选知识的验证概率表示如下:
Figure PCTCN2018096652-appb-000004
基于冲突证据组中各证据的逻辑规则,可以计算冲突候选知识的验证概率。冲突候选知识的验证概率表示如下:
Figure PCTCN2018096652-appb-000005
对于目标候选知识和冲突候选知识,Z均相同。
例如,在操作S14中,根据目标候选知识的验证概率与冲突候选知识的验证概率的比较结果,则可以确定目标候选知识是否为正确知识。例如,在图4A 和图4B所示的示例中,
P(目标候选知识)>P(冲突候选知识)
从而可以确定目标候选知识为正确知识。
例如,该知识验证方法还包括输出正确知识。例如,输出的正确知识可以在显示器上进行显示,或者还可以通过扬声器进行语音输出等。
例如,该知识验证方法可以输出所有或部分正确知识。如图3所示,在一个示例中,该知识验证方法还可以包括以下操作:
在执行操作S142后,执行操作S21:输出冲突候选知识;
在执行操作S143后,执行操作S22:输出目标候选知识。
图5示出了本公开至少一实施例提供的另一种知识验证方法的示意性流程图。
例如,该知识验证方法也可以输出用户期望显示的正确知识,例如,显示N个正确知识。如图5所示,在另一个示例中,当执行操作14后,该知识验证方法还可以执行以下操作:
操作S15:获取R个正确知识的验证概率和与R个正确知识相矛盾的R个错误知识的验证概率;
操作S16:计算R个正确知识的验证概率和R个错误知识的验证概率的比值;
操作S17:根据比值对R个正确知识进行排序;
操作S18:输出排序后的N个正确知识。
例如,在操作S15中,可以根据图1和/或图3所示的方法,确定多个正确知识及其验证概率和多个错误知识及其验证概率。
例如,正确知识可以为目标候选知识,也可以为冲突候选知识;相应地,错误知识可以为冲突候选知识,也可以为目标候选知识。
例如,比值可以表示如下:
P(正确知识)/P(错误知识)
其中,P(正确知识)可以为P(目标候选知识),也可以为P(冲突候选知识);相应地,P(错误知识)可以为P(冲突候选知识),也可以为P(目标候选知识)。
例如,N为正整数,且N≤R。N可以为用户期望显示的正确知识的数量。N可以与候选知识组的候选知识的数量相关,N例如可以为候选知识的数量的 10%。本公开的实施例对N不作具体限定。
例如,N个正确知识可以与最大的N个比值相对应。例如,N个正确知识可以为具有最大的N个比值的目标候选知识。但不限于此,N个正确知识还可以与最小的N个比值相对应。
例如,R个正确知识可以为所有的正确知识,即R为所有的正确知识的数量;R个正确知识也可以为部分正确知识。
例如,本公开的实施例提供的知识验证方法还可以输出正确知识的比值、正确知识的验证概率等信息。需要说明的是,本公开实施例提供的知识验证方法也可以输出错误知识。
图6示出了本公开至少一实施例提供的再一种知识验证方法的示意性流程图。
例如,如图6所示,在一个示例中,在执行图1的操作S11前,知识验证方法还可以包括以下操作:
操作S31:从数据源中获取候选知识组;
操作S32:从候选知识组选择目标候选知识;
操作S33:判断候选知识组中是否存在与目标候选知识相矛盾的冲突候选知识;如果在候选知识组中存在冲突候选知识,则进行到操作S11;如果在候选知识组中不存在冲突候选知识,则执行操作S34:判断目标候选知识与已有知识库中的已有知识是否相矛盾,如果是,则执行操作S35:确定目标候选知识为错误知识,如果不是,则执行操作S36:确定目标候选知识为正确知识。
例如,若候选知识组由数据源中所有的候选知识组成,当确定目标候选知识为正确知识时,则表示在数据源和已有知识库中均不存在与该目标候选知识相矛盾的知识。因此,可以直接确定该目标候选知识为正确知识,并根据需要输出该目标候选知识。
又例如,若候选知识组由从数据源中抽取的多个候选知识组成,当确定目标候选知识(或冲突候选知识)为正确知识时,即执行图3所示的操作S142、操作S143或图6所示的操作S36后,该目标候选知识(或冲突候选知识)可以被存入正确知识组。由此,输出正确知识可以包括以下操作:从正确知识组中获取正确知识;构建与正确知识相矛盾的错误知识;获取正确知识的正确证据组和错误知识的错误证据组;基于正确证据组中各证据的逻辑规则,计算正确知识的验证概率;基于错误证据组中各证据的逻辑规则,计算错误知识的验 证概率;计算正确知识的验证概率与对应的错误知识的验证概率的比值;根据比值对正确知识进行排序;输出排序后的N个正确知识。
需要说明的是,为了减少计算量,当候选知识组中既存在目标候选知识,又存在与目标候选知识相矛盾的冲突候选知识时,即该正确知识为由图1和/或图3所示的方法所验证的正确知识,则该正确知识的验证概率和错误知识的验证概率可以直接获取(例如,参照图1和/或图3中的操作S13),从而正确知识的验证概率和错误知识的验证概率之间的比值可以直接计算;或者正确知识的验证概率和错误知识的验证概率之间的比值也可以获取(例如,参照图5中的操作S16)。然后,根据比值对该正确知识进行排序。
图7示出了本公开至少一实施例提供的一种知识验证设备的示意性框图。
例如,如图7所示,本公开实施例提供的知识验证设备200可以包括处理器201、存储器202和显示器203。应当注意,图7所示的知识验证设备的组件只是示例性的,而非限制性的,根据实际应用需要,该知识验证设备还可以具有其他组件。
例如,处理器201、存储器202和显示器203等组件之间可以通过网络连接进行通信。处理器201、存储器202和显示器203等组件之间可以直接或间接地互相通信。
例如,网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,处理器201可以是中央处理单元(CPU)或者具有数据处理能力和/或程序执行能力的其它形式的处理单元,例如现场可编程门阵列(FPGA)或张量处理单元(TPU)等,处理器201可以控制知识验证设备中的其它组件以执行期望的功能。又例如,中央处理器(CPU)可以为X86或ARM架构等。
例如,存储器202可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器 (ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在存储器202上可以存储一个或多个非暂时性计算机指令,处理器201可以运行所述非暂时性计算机指令,以实现各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如数据源、已有知识库、权重、目标候选知识的验证概率、冲突候选知识的验证概率、以及应用程序使用和/或产生的各种数据等。
例如,显示器203可以为液晶显示器(LCD)、有机发光二极管显示器(OLED)等。
需要说明的是,在一些实施例中,根据实际需求,该知识验证设备还可以包括输入装置(例如触控装置、键盘、麦克风、鼠标等)、扬声器等。用户可以利用显示器203和输入装置等以实现与知识验证设备200之间的交互。例如,用户可以通过显示器203查看正确知识,还可以通过输入装置输入需要验证的候选知识等。
例如,非暂时性计算机指令被处理器201执行时可以实现以下操作:获取目标候选知识和与目标候选知识相矛盾的冲突候选知识;获取目标候选知识的目标证据组和冲突候选知识的冲突证据组;基于目标证据组中各证据的逻辑规则,计算目标候选知识的验证概率,基于冲突证据组中各证据的逻辑规则,计算冲突候选知识的验证概率;以及比较目标候选知识的验证概率与冲突候选知识的验证概率,并根据比较结果确定目标候选知识是否为正确知识。
例如,可以采用自然语言处理(Natural Language Processing,NLP)技术抽取目标候选知识和冲突候选知识。自然语言处理例如可以采用深度学习神经网络(例如,循环神经网络、递归神经网络等)等方法进行语言处理。
例如,目标证据组和冲突证据组均包括来源证据、冗余度证据和表述样式证据中的至少之一,来源证据、冗余度证据以及表述样式证据来自于数据源。又例如,目标证据组和冲突证据组还分别包括一致性证据,一致性证据来自于已有知识库。
例如,来源证据的逻辑规则可以表示为:提及(y,S),冗余度证据的逻辑规则可以表示为:出现次数(y,N),表述样式证据的逻辑规则可以表示为:表述样式(y,M),一致性证据的逻辑规则可以表示为:第一已有知识∧第二已有知识=>y。当y表示目标候选知识时,S表示目标候选知识的来源,N表示目标候选知识出现的次数,M表示目标候选知识的不同表述方式的数量;当 y表示冲突候选知识时,S表示冲突候选知识的来源,N表示冲突候选知识出现的次数,M表示冲突候选知识的不同表述方式的数量。
例如,来源证据的权重可以表示为S的权威度,冗余度证据的权重可以表示为log aN,表述样式证据的权重可以表示为log aM,log a表示以a为底数的对数函数,一致性证据的权重可以表示为一致性证据的逻辑规则的逻辑值。
例如,本公开的实施例采用马尔科夫逻辑网建模各证据的逻辑规则,并根据各证据的逻辑规则计算目标候选知识(或冲突候选知识)的验证概率。例如,基于马尔科夫逻辑网建模的各证据的逻辑规则,目标候选知识的验证概率和冲突候选知识的验证概率均可以表示为:
Figure PCTCN2018096652-appb-000006
其中,Z为归一化因子,当y表示目标候选知识时,P(y)为目标候选知识的验证概率,f i(y)为目标证据组中第i个证据的逻辑规则的特征值,f i(y)=1表示目标证据组中第i个证据的逻辑规则为真,f i(y)=0表示目标证据组中第i个证据的逻辑规则为假,W i表示目标证据组中第i个证据的权重,T表示目标证据组中的证据数量;当y表示冲突候选知识时,P(y)为冲突候选知识的验证概率,f i(y)为冲突证据组中第i个证据的逻辑规则的特征值,f i(y)=1表示冲突证据组中第i个证据的逻辑规则为真,f i(y)=0表示冲突证据组中第i个证据的逻辑规则为假,W i表示冲突证据组中第i个证据的权重,T表示冲突证据组中的证据数量。
例如,在一个示例中,当验证概率为正确概率时,非暂时性计算机指令被处理器201执行时实现“比较目标候选知识的验证概率与冲突候选知识的验证概率,并根据比较结果确定目标候选知识是否为正确知识”的操作,包括:判断目标候选知识的验证概率是否大于冲突候选知识的验证概率;如果不是,则确定冲突候选知识为正确知识;如果是,则确定目标候选知识为正确知识。
例如,在一个示例中,非暂时性计算机指令被处理器201执行时还可以实现以下操作:获取R个正确知识的验证概率和与R个正确知识相矛盾的R个错误知识的验证概率;计算R个正确知识的验证概率和R个错误知识的验证概率的比值;根据比值对R个正确知识进行排序;输出排序后的N个正确知识。
例如,N为正整数,且N≤R。N可以为用户期望显示的正确知识的数量。
例如,在一个示例中,非暂时性计算机指令被处理器201执行时还可以实现以下操作:输出排序后的N个正确知识至显示器203;在显示器203上显示排序后的N个正确知识。
例如,N个正确知识可以与最大的N个比值相对应。
例如,在一个示例中,非暂时性计算机指令被处理器201执行时还可以实现以下操作:从数据源中获取候选知识组;从候选知识组选择目标候选知识;判断候选知识组中是否存在与目标候选知识相矛盾的冲突候选知识;如果在候选知识组中存在冲突候选知识,则计算目标候选知识的验证概率和冲突候选知识的验证概率,并根据目标候选知识的验证概率和冲突候选知识的验证概率的比较结果确定目标候选知识是否为正确知识;如果在候选知识组中不存在冲突候选知识,则判断目标候选知识与已有知识库中的已有知识是否相矛盾,如果是,则确定目标候选知识为错误知识,如果不是,则确定目标候选知识为正确知识。
例如,在一个示例中,当确定目标候选知识(或冲突候选知识)为正确知识时,该目标候选知识(或冲突候选知识)可以被存入正确知识组。非暂时性计算机指令被处理器201执行时还可以实现以下操作:从正确知识组中获取正确知识;构建与正确知识相矛盾的错误知识;获取正确知识的正确证据组和错误知识的错误证据组;基于正确证据组中各证据的逻辑规则,计算正确知识的验证概率;基于错误证据组中各证据的逻辑规则,计算错误知识的验证概率;计算正确知识的验证概率与对应的错误知识的验证概率的比值;根据比值对正确知识进行排序;输出排序后的N个正确知识。
需要说明的是,关于数据源、已有知识库、来源证据、冗余度证据、表述样式证据以及一致性证据等的详细说明可以参考知识验证方法的实施例中的相关描述,重复之处在此不再赘述。
本公开至少一实施例还提供一种存储介质,用于存储非暂时性计算机指令。该非暂时性计算机指令被处理器执行时可以实现以下操作:获取目标候选知识和与目标候选知识相矛盾的冲突候选知识;获取目标候选知识的目标证据组和冲突候选知识的冲突证据组;基于目标证据组中各证据的逻辑规则,计算目标候选知识的验证概率,基于冲突证据组中各证据的逻辑规则,计算冲突候选知识的验证概率;以及比较目标候选知识的验证概率与冲突候选知识的验证概率,并根据比较结果确定目标候选知识是否为正确知识。
例如,在本公开实施例的一个示例中,该存储介质可以应用于上述任一实施例所述的知识验证设备中,例如,其可以为知识验证设备中的存储器202。
例如,关于存储介质的说明可以参考知识验证设备的实施例中对于存储器202的描述,重复之处不再赘述。
对于本公开,还有以下几点需要说明:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种知识验证方法,包括:
    获取目标候选知识和与所述目标候选知识相矛盾的冲突候选知识;
    获取所述目标候选知识的目标证据组和所述冲突候选知识的冲突证据组;
    基于所述目标证据组中各证据的逻辑规则,计算所述目标候选知识的验证概率,基于所述冲突证据组中各证据的逻辑规则,计算所述冲突候选知识的验证概率;以及
    比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识。
  2. 根据权利要求1所述的知识验证方法,其中,所述目标证据组和所述冲突证据组均包括来源证据、冗余度证据和表述样式证据中的至少之一,所述来源证据、冗余度证据以及表述样式证据来自于数据源。
  3. 根据权利要求2所述的知识验证方法,其中,所述目标证据组和所述冲突证据组还均包括一致性证据,所述一致性证据来自于已有知识库。
  4. 根据权利要求3所述的知识验证方法,其中,
    所述来源证据的逻辑规则表示为:提及(y,S),
    所述冗余度证据的逻辑规则表示为:出现次数(y,N),
    所述表述样式证据的逻辑规则表示为:表述样式(y,M),
    所述一致性证据的逻辑规则表示为:第一已有知识∧第二已有知识=>y,
    其中,当y表示所述目标候选知识时,S表示所述目标候选知识的来源,N表示所述目标候选知识出现的次数,M表示所述目标候选知识的不同表述方式的数量;
    当y表示所述冲突候选知识时,S表示所述冲突候选知识的来源,N表示所述冲突候选知识出现的次数,M表示所述冲突候选知识的不同表述方式的数量;
    所述来源证据的权重表示为S的权威度,所述冗余度证据的权重表示为log aN,所述表述样式证据的权重表示为log aM,log a表示以a为底数的对数函数,所述一致性证据的权重表示为所述一致性证据的逻辑规则的逻辑值。
  5. 根据权利要求4所述的知识验证方法,其中,所述目标候选知识的验证概率和所述冲突候选知识的验证概率均表示为:
    Figure PCTCN2018096652-appb-100001
    其中,Z为归一化因子,
    当y表示所述目标候选知识时,f i(y)为所述目标证据组中第i个证据的逻辑规则的特征值,f i(y)=1表示所述目标证据组中第i个证据的逻辑规则为真,f i(y)=0表示所述目标证据组中第i个证据的逻辑规则为假,W i表示所述目标证据组中第i个证据的权重,T表示所述目标证据组中的证据数量;
    当y表示所述冲突候选知识时,f i(y)为所述冲突证据组中第i个证据的逻辑规则的特征值,f i(y)=1表示所述冲突证据组中第i个证据的逻辑规则为真,f i(y)=0表示所述冲突证据组中第i个证据的逻辑规则为假,W i表示所述冲突证据组中第i个证据的权重,T表示所述冲突证据组中的证据数量。
  6. 根据权利要求1-5任一项所述的知识验证方法,其中,所述验证概率为正确概率,
    比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据所述比较结果确定所述目标候选知识是否为正确知识,包括:
    判断所述目标候选知识的验证概率是否大于所述冲突候选知识的验证概率,如果不是,则确定所述冲突候选知识为正确知识以及所述目标候选知识为错误知识,并输出所述冲突候选知识;如果是,则确定所述目标候选知识为正确知识以及所述冲突候选知识为错误知识,并输出所述目标候选知识。
  7. 根据权利要求1-6任一项所述的知识验证方法,还包括:
    获取R个正确知识的验证概率和与所述R个正确知识相矛盾的R个错误知识的验证概率;
    计算所述R个正确知识的验证概率和所述R个错误知识的验证概率的比值;
    根据所述比值对所述R个正确知识进行排序;
    输出排序后的N个所述正确知识,N为正整数,且N≤R。
  8. 根据权利要求7所述的知识验证方法,其中,N个所述正确知识与最大的N个所述比值相对应。
  9. 一种知识验证设备,包括处理器和存储器,所述存储器用于存储非暂时性计算机指令,其中,所述非暂时性计算机指令被所述处理器执行时实现以下操作:
    获取目标候选知识和与所述目标候选知识相矛盾的冲突候选知识;
    获取所述目标候选知识的目标证据组和所述冲突候选知识的冲突证据组;
    基于所述目标证据组中各证据的逻辑规则,计算所述目标候选知识的验证概率,基于所述冲突证据组中各证据的逻辑规则,计算所述冲突候选知识的验证概率;以及
    比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识。
  10. 根据权利要求9所述的知识验证设备,其中,所述目标证据组和所述冲突证据组均包括来源证据、冗余度证据和表述样式证据中的至少之一,所述来源证据、冗余度证据以及表述样式证据来自于数据源。
  11. 根据权利要求10所述的知识验证设备,其中,所述目标证据组和所述冲突证据组还均包括一致性证据,所述一致性证据来自于已有知识库。
  12. 根据权利要求11所述的知识验证设备,其中,
    所述来源证据的逻辑规则表示为:提及(y,S),
    所述冗余度证据的逻辑规则表示为:出现次数(y,N),
    所述表述样式证据的逻辑规则表示为:表述样式(y,M),
    所述一致性证据的逻辑规则表示为:第一已有知识∧第二已有知识=>y,
    其中,当y表示所述目标候选知识时,S表示所述目标候选知识的来源,N表示所述目标候选知识出现的次数,M表示所述目标候选知识的不同表述方式的数量;
    当y表示所述冲突候选知识时,S表示所述冲突候选知识的来源,N表示所述冲突候选知识出现的次数,M表示所述冲突候选知识的不同表述方式的数量;
    所述来源证据的权重表示为S的权威度,所述冗余度证据的权重表示为log aN,所述表述样式证据的权重表示为log aM,log a表示以a为底数的对数函数,所述一致性证据的权重表示为所述一致性证据的逻辑规则的逻辑值。
  13. 根据权利要求12所述的知识验证设备,其中,所述目标候选知识的验证概率和所述冲突候选知识的验证概率均表示为:
    Figure PCTCN2018096652-appb-100002
    其中,Z为归一化因子,
    当y表示所述目标候选知识时,f i(y)为所述目标证据组中第i个证据的逻辑规则的特征值,f i(y)=1表示所述目标证据组中第i个证据的逻辑规则为真,f i(y)=0表示所述目标证据组中第i个证据的逻辑规则为假,W i表示所述目标证据组中第i个证据的权重,T表示所述目标证据组中的证据数量;
    当y表示所述冲突候选知识时,f i(y)为所述冲突证据组中第i个证据的逻辑规则的特征值,f i(y)=1表示所述冲突证据组中第i个证据的逻辑规则为真,f i(y)=0表示所述冲突证据组中第i个证据的逻辑规则为假,W i表示所述冲突证据组中第i个证据的权重,T表示所述冲突证据组中的证据数量。
  14. 根据权利要求9-13任一项所述的知识验证设备,其中,所述非暂时性计算机指令被所述处理器执行时还实现以下操作:
    获取R个正确知识的验证概率和与所述R个正确知识相矛盾的R个错误知识的验证概率;
    计算所述R个正确知识的验证概率和所述R个错误知识的验证概率的比值;
    根据所述比值对所述R个正确知识进行排序;
    输出排序后的N个所述正确知识,N为正整数,且N≤R。
  15. 根据权利要求14所述的知识验证设备,还包括显示器,
    其中,所述非暂时性计算机指令被所述处理器执行时还实现以下操作:
    输出排序后的N个所述正确知识至所述显示器;
    在所述显示器上显示排序后的N个所述正确知识。
  16. 根据权利要求14或15所述的知识验证设备,其中,N个所述正确知识与最大的N个所述比值相对应。
  17. 根据权利要求9-16任一项所述的知识验证设备,其中,所述验证概率为正确概率,
    所述非暂时性计算机指令被所述处理器执行比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识的操作时实现以下操作:
    判断所述目标候选知识的验证概率是否大于所述冲突候选知识的验证概率,如果不是,则确定所述冲突候选知识为正确知识以及所述目标候选知识为错误知识,并输出所述冲突候选知识;如果是,则确定所述目标候选知识为正确知识以及所述冲突候选知识为错误知识,并输出所述目标候选知识。
  18. 一种存储介质,用于存储非暂时性计算机指令,所述非暂时性计算机指令被处理器执行时实现以下操作:
    获取目标候选知识和与所述目标候选知识相矛盾的冲突候选知识;
    获取所述目标候选知识的目标证据组和所述冲突候选知识的冲突证据组;
    基于所述目标证据组中各证据的逻辑规则,计算所述目标候选知识的验证概率,基于所述冲突证据组中各证据的逻辑规则,计算所述冲突候选知识的验证概率;以及
    比较所述目标候选知识的验证概率与所述冲突候选知识的验证概率,并根据比较结果确定所述目标候选知识是否为正确知识。
  19. 根据权利要求18所述的存储介质,其中,所述非暂时性计算机指令被所述处理器执行时还实现以下操作:
    获取R个正确知识的验证概率和与所述R个正确知识相矛盾的R个错误知识的验证概率;
    计算所述R个正确知识的验证概率和所述R个错误知识的验证概率的比值;
    根据所述比值对所述R个正确知识进行排序;
    输出排序后的N个所述正确知识,N为正整数,且N≤R。
  20. 根据权利要求19所述的存储介质,其中,所述非暂时性计算机指令被所述处理器执行时还实现以下操作:
    显示排序后的N个所述正确知识。
PCT/CN2018/096652 2017-07-24 2018-07-23 知识验证方法、知识验证设备以及存储介质 WO2019019969A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP18838259.2A EP3660693A4 (en) 2017-07-24 2018-07-23 KNOWLEDGE VERIFICATION PROCEDURES, KNOWLEDGE VERIFICATION DEVICE AND STORAGE MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710606293.1 2017-07-24
CN201710606293.1A CN107391682B (zh) 2017-07-24 2017-07-24 知识验证方法、知识验证设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2019019969A1 true WO2019019969A1 (zh) 2019-01-31

Family

ID=60337183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096652 WO2019019969A1 (zh) 2017-07-24 2018-07-23 知识验证方法、知识验证设备以及存储介质

Country Status (4)

Country Link
US (1) US20190026638A1 (zh)
EP (1) EP3660693A4 (zh)
CN (1) CN107391682B (zh)
WO (1) WO2019019969A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391682B (zh) * 2017-07-24 2020-06-09 京东方科技集团股份有限公司 知识验证方法、知识验证设备以及存储介质
CN109977763B (zh) * 2019-02-03 2022-10-04 河南科技大学 一种基于改进证据信任度的空中小目标识别方法
CN111640511B (zh) * 2020-05-29 2023-08-04 北京百度网讯科技有限公司 医疗事实验证的方法、装置、电子设备及存储介质
CN111950253B (zh) * 2020-08-28 2023-12-08 鼎富智能科技有限公司 用于裁判文书的证据信息提取方法及装置
CN113032791B (zh) * 2021-04-01 2024-05-31 深圳市纽创信安科技开发有限公司 一种ip核、ip核的管理方法和芯片
CN115758120B (zh) * 2022-11-21 2023-06-13 南京航空航天大学 基于不确定信息融合的车门系统诊断方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024615A1 (en) * 2007-07-16 2009-01-22 Siemens Medical Solutions Usa, Inc. System and Method for Creating and Searching Medical Ontologies
CN103500208A (zh) * 2013-09-30 2014-01-08 中国科学院自动化研究所 结合知识库的深层数据处理方法和系统
CN106294323A (zh) * 2016-08-10 2017-01-04 上海交通大学 对短文本进行常识性因果推理的方法
CN107391682A (zh) * 2017-07-24 2017-11-24 京东方科技集团股份有限公司 知识验证方法、知识验证设备以及存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0341519A (ja) * 1989-07-10 1991-02-22 Hitachi Ltd 知識処理システム
US20100211533A1 (en) * 2009-02-18 2010-08-19 Microsoft Corporation Extracting structured data from web forums
US8818930B2 (en) * 2009-05-18 2014-08-26 Takatoshi Yanase Knowledge base system, logic operation method, program, and recording medium
US9110882B2 (en) * 2010-05-14 2015-08-18 Amazon Technologies, Inc. Extracting structured knowledge from unstructured text
CN104750795B (zh) * 2015-03-12 2017-09-01 北京云知声信息技术有限公司 一种智能语义检索系统和方法
CN104750828B (zh) * 2015-03-31 2018-01-12 克拉玛依红有软件有限责任公司 一种基于6w规则的归纳演绎知识无意识自学习方法
CN105095969B (zh) * 2015-09-25 2018-08-17 沈阳农业大学 面向共享知识的自主学习模型的装置
CN105354224B (zh) * 2015-09-30 2019-07-23 百度在线网络技术(北京)有限公司 知识数据的处理方法和装置
CN105843890B (zh) * 2016-03-21 2020-01-24 华南师范大学 基于知识库面向大数据及普通数据的数据采集方法和系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090024615A1 (en) * 2007-07-16 2009-01-22 Siemens Medical Solutions Usa, Inc. System and Method for Creating and Searching Medical Ontologies
CN103500208A (zh) * 2013-09-30 2014-01-08 中国科学院自动化研究所 结合知识库的深层数据处理方法和系统
CN106294323A (zh) * 2016-08-10 2017-01-04 上海交通大学 对短文本进行常识性因果推理的方法
CN107391682A (zh) * 2017-07-24 2017-11-24 京东方科技集团股份有限公司 知识验证方法、知识验证设备以及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3660693A4 *

Also Published As

Publication number Publication date
US20190026638A1 (en) 2019-01-24
EP3660693A1 (en) 2020-06-03
CN107391682A (zh) 2017-11-24
CN107391682B (zh) 2020-06-09
EP3660693A4 (en) 2021-04-28

Similar Documents

Publication Publication Date Title
KR102564144B1 (ko) 텍스트 관련도를 확정하기 위한 방법, 장치, 기기 및 매체
WO2019019969A1 (zh) 知识验证方法、知识验证设备以及存储介质
Angione et al. Using machine learning as a surrogate model for agent-based simulations
US9436760B1 (en) Measuring accuracy of semantic graphs with exogenous datasets
Yu-Wei Machine learning with R cookbook
Greenacre et al. Multivariate analysis of ecological data
CN102947842B (zh) 用于分析和合成复杂知识表示的系统和方法
CN108681557B (zh) 基于自扩充表示和相似双向约束的短文本主题发现方法及系统
CN106687952A (zh) 利用知识源进行相似性分析和数据丰富化的技术
US20180102062A1 (en) Learning Map Methods and Systems
Machicao et al. Authorship attribution based on life-like network automata
Bühlmann et al. Statistics for big data: A perspective
Julian Designing machine learning systems with Python
US20220138534A1 (en) Extracting entity relationships from digital documents utilizing multi-view neural networks
Ledur et al. Towards a domain-specific language for geospatial data visualization maps with big data sets
Lagemann et al. Deep learning of causal structures in high dimensions under data limitations
JP2024507029A (ja) ウェブページ識別方法、装置、電子機器、媒体およびコンピュータプログラム
WO2019142157A1 (en) System and method for talent insights generation and recommendation
Ataman et al. Transforming large-scale participation data through topic modelling in urban design processes
Choi et al. Does active learning reduce human coding?: A systematic comparison of neural network with nCoder
Fischetti et al. R: Predictive Analysis
De Smet et al. A note on the detection of outliers in a binary outranking relation
Rafatirad et al. Machine learning for computer scientists and data analysts
Rafatirad et al. What Is Applied Machine Learning?
CN115206533B (zh) 基于知识图谱健康管理方法、装置及电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018838259

Country of ref document: EP

Effective date: 20200224