WO2018226404A1 - Machine reasoning based on knowledge graph - Google Patents

Machine reasoning based on knowledge graph Download PDF

Info

Publication number
WO2018226404A1
WO2018226404A1 PCT/US2018/034017 US2018034017W WO2018226404A1 WO 2018226404 A1 WO2018226404 A1 WO 2018226404A1 US 2018034017 W US2018034017 W US 2018034017W WO 2018226404 A1 WO2018226404 A1 WO 2018226404A1
Authority
WO
WIPO (PCT)
Prior art keywords
determining
subgraph
expression
node
natural language
Prior art date
Application number
PCT/US2018/034017
Other languages
French (fr)
Inventor
Yatao LI
Huanhuan Xia
Bin Shao
Tie-Yan Liu
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of WO2018226404A1 publication Critical patent/WO2018226404A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation

Definitions

  • the knowledge may be managed in the form of database and may be processed accordingly to promote human-computer interaction. For example, a person may query a machine in the form of natural language to ascertain whether certain logic holds, or ask a question to the machine to obtain a solution to the question, or the like. Interpretation and processing of the machine on the natural language of the human beings form a basis and an important constituent of the artificial intelligence. Although there have been proposed a number of machine interaction techniques based on the natural language, the logic reasoning capability of the machine on the natural language of the human is still insufficient at present.
  • an electronic device comprising a processing unit; and a memory coupled to the processing unit and having instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including: in response to receiving a natural language expression, determining a predefined language template matching the natural language expression; extracting a plurality of items from the natural language expression based on the predefined expression template; obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
  • FIG. 1 is block diagram illustrating an example computing system/server in which one or more implementations of the subject matter described herein can be implemented;
  • FIG. 2 is a block diagram illustrating an example architecture in which one or more implementations of the subject matter described herein can be implemented;
  • FIG. 3 illustrates a part of an example knowledge graph according to one or more implementations of the subject matter described herein;
  • FIG. 4 illustrates a part of an example knowledge graph according to one or more implementations of the subject matter described herein;
  • FIG. 5a and FIG. 5b illustrate a part of an example knowledge graph according to one or more implementations of the subject matter described herein;
  • FIG. 6 is a flowchart illustrating a logical reasoning method according to one or more implementations of the subject matter described herein.
  • the term “comprising” and its variants are to be read as open-ended terms that mean “comprising, but not limited to.”
  • the term “or” is to be read as “and/or” unless the context clearly indicates otherwise.
  • the term “based on” is to be read as “based at least in part on.”
  • the terms “an implementation” and “one implementation” are to be read as “at least one implementation.”
  • the term “another implementation” is to be read as “at least another implementation.”
  • the terms “first,” “second,” and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below. The definitions of the terms throughout the Description are consistent, unless the context clearly indicates otherwise.
  • FIG. 1 is block diagram illustrating an example computing system/server 100 in which one or more implementations of the subject matter described herein can be implemented.
  • the computing system/server 100 as shown in FIG. 1 is provided only as an example and should not be construed as limiting the function and range of the use of the implementations of the subject matter described herein.
  • the computing system/server 100 is in a form of a general computing device.
  • Components of the computer system/server 100 may include, but not limited to, one or more processors or processing units 100, a memory 120, one or more input devices 130, one or more output devices 140, a storage 150, and one or more communication units 160.
  • the processing unit 100 may be an actual or virtual processor and can execute various processes based on programs stored in the memory 120.
  • a multiprocessing unit executes computer-executable instructions to improve the processing capacity.
  • the computing system/server 100 generally includes a plurality of computer storage media, which can be any available medium accessible by the computing system/server 100, including but not limited to volatile and non-volatile medium, and removable and non-removable medium.
  • the memory 120 can be a volatile memory (for example, a register, cache, Random Access Memory (RAM)), a non-volatile memory (for example, a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory), or any combination thereof.
  • the storage 150 may be removable/non-removable, and may include a machine readable medium, such as flash drive, disk or any other medium, which may be configured to store information and be accessed in the computing system/server 100.
  • the computing system/server 100 may further include an additional removable/non-removable, volatile/non-volatile memory medium.
  • a disk drive may be provided to read from and write into a removable non-volatile disk and a disc drive may be provided to read from and write into a removable non-volatile disc.
  • each drive is connected to the bus via one or more data medium interfaces.
  • the memory 120 may include at least one program product including a set of program modules (for example at least one program module) configured to perform functions of various implementations of the subject matter described herein.
  • the program module (for example at least one program module) 122 may be stored in a memory 120, for example.
  • This program module 122 may include, but not limited to, an operation system, one or more applications, other program modules and operation data. Each of the examples or a specific combination of the examples may include an implementation of a networked environment.
  • the program module 122 may be used to perform the function and/or method of the implementations of the subject matter described herein.
  • the input device 130 may be one or more various input devices.
  • the input device 130 may include a user device, such as a mouse, keyboard, tracking ball, or the like.
  • the communication unit 160 communicates with a further computing entity via a communication medium.
  • functions of components in the computing system/server 100 can be implemented by a single computing cluster or multiple computing machines that can be communicated via a communication link. Therefore, the computing system/server 100 can be operated in a networked environment using a logical link with one or more other servers, network personal computers (PCs) or another general network node.
  • the communication media include, for example but not limited to, wired or wireless network techniques.
  • the computing system/server 100 can also communicate with one or more external devices (not shown) as required, such as a storage device, display device or the like, one or more devices that enable users to interact with the computing system/server 100, or any devices that enable the computing system/server 100 to communicate with one or more other computing devices (for example, a network card, a modem, or the like). Such a communication is performed via an input/output (I/O) interface (not shown).
  • I/O input/output
  • the program module 122 may receive a natural language expression input by a user, such as "Why can Albert Einstein think?" The program module 122 may also receive data related to a knowledge graph from the storage 150, and obtain a reasoning result for the expression. The program module 122 then outputs the reasoning result via the output device 160, for example "Albert Einstein is a person; person has brain; brain is capable of thinking.”
  • FIG. 2 is a block diagram illustrating an example architecture 200 in which one or more implementations of the subject matter described herein may be implemented.
  • a knowledge graph 210 is stored in a form of graph database in the memory such as the memory 120 or the storage 150 as shown in FIG. 1.
  • the knowledge graph 210 may be implemented by a distributed storage environment to accommodate a graph database of a greater capacity.
  • the user may input the natural language expression via the input device 130.
  • the natural language expression input by the user may be a natural language question such as "Why can Albert Einstein think?"
  • a graph engine 220 may be provided to answer the question and obtain reasoning for this question.
  • the natural language expression may be an expression of another type such as "Bill Gates sets up the Microsoft Corporation.”
  • the graph engine 220 may determine whether the expression is valid, i.e., may obtain reasoning for the expression.
  • the graph engine 220 determines a predefined expression template matching the natural language expression.
  • the graph engine 220 may be implemented by the processing unit 110 as shown in FIG. 1, or may be implemented by a distributed computing environment to increase requirement on the computing performance.
  • the predefined expression template may be of various types, such as “why can AB?”, "how does A B?", "is AB?”, or the like.
  • a template may be determined to match the natural language expression based on a degree of correlation between the natural language expression and the template. For example, since the natural language question "why can Albert Einstein think?" includes the keyword "can,” the natural language question has a high degree of correlation with the predefined expression template "why can A B,” and it may be determined that the natural language question matches this predefined expression template.
  • the graph engine 220 extracts a plurality of items. These items may include one or more specified items in the predefined expression template. For example, the template “why can A B?” specifies items A and B, and the template “how does A B?" specifies items A and B. As a result, two items “Albert Einstein” and “think” may be extracted from the natural language question "why can Albert Einstein think?"
  • the graph engine 220 may obtain reasoning for the natural language expression by querying the knowledge graph using the plurality of items.
  • the knowledge graph includes a data graph which includes nodes representing entities or concepts and edges representing logical relations between nodes.
  • the data graph may include entities (also referred to as facts) and concepts (also referred to as common sense).
  • the knowledge graph may also include a logical rule graph which includes nodes representing abstract identifiers and edges representing logical expressions therebetween, such as hyperedges.
  • the hyperedge is an edge linking a plurality of nodes.
  • the representation of the knowledge graph will be introduced with reference to FIG. 3, which shows a part of an example knowledge graph 300 according to one or more implementations of the subject matter described herein.
  • the knowledge graph 300 includes three layers, including an entity layer 350, a concept layer 360, and a logical rule layer 370, which may be referred to as an entity graph, a concept graph, and a logical rule graph, respectively. It becomes more and more abstract from the entity portion 350 to the logical rule layer 370, and the data volume gradually decreases from the entity layer 350 to the logical rule layer 370 accordingly.
  • the entity layer 350 may be linked to the concept layer 360 via the relations between nodes to form a data graph as described above.
  • the structure as shown in FIG. 3 is provided only for illustrative purpose, and the representation of the knowledge graph may include more or fewer layers or nodes.
  • the structure of FIG. 3 may include only the concept layer 360 and the logical rule layer 370.
  • the entity layer 350 and the concept layer 360 may not be distinguished from each other.
  • the entity layer 350 is also referred to as an entity graph, including nodes representing entities and edges representing logical relations between the entities.
  • the entity may represent a specific object existing in the real world, such as Albert Einstein, Bill Gates, and so on.
  • Each entity node records various attributes or information related to the node.
  • "Pal" 351 is shown to be an entity node in the entity layer 350.
  • the concept layer 360 is also referred to as a concept graph, including nodes representing the concepts and edges representing a logical relation between concepts.
  • the concept may be an abstraction of the entity, such as person, animal, dog, and so on.
  • FIG. 3 shows that "bark” 361, "animal” 362, “dog” 363, “person” 364 and "actor” 365 are concept nodes in the concept layer 360.
  • the entity layer 350 and the concept layer 360 may be connected or linked through the logical relation between each pair of nodes, such that the entity layer 350 and the concept layer 360 are combined as an organic whole.
  • This may be implemented by any method existing at present and/or to be developed in the future, and the subject matter described herein is not limited in this regard.
  • the logical rule layer 370 includes various logical rules, and each logical rule may include abstraction nodes and one or more hyperedges representing relations between the nodes. Each of the abstraction nodes may indicate the respective attribute, or may not indicate any attribute. As shown in FIG. 3, the logical rule layer 370 includes a plurality of nodes represented by abstraction identifiers "?A” 310, "?B” 330, "?C” 340, "?X”322, “?Y” 324, and “?Z”326, respectively.
  • the relation between nodes is represented by a logic expression, and for example, A does not have X (ANotHasA X), and C has a prerequisite X (C HasPrerequisite X).
  • each logical rule may be represented by a subgraph.
  • the subgraph includes a node 320, a node 330, and a node 340, and defines that the node 320 is capable of C and the node 320 is a part of B.
  • the knowledge graph 300 may be a recursive hybrid hypergraph (RHHG) for expressing knowledge of various types in the real world, and the knowledge may be presented in various manners.
  • RHHG recursive hybrid hypergraph
  • each node of the graph may be another graph.
  • at least a part of nodes in the data graph may be one or more data graphs, such as one or more concept graphs, one or more entity graphs, and/or one or more combinations of data graphs and concept graphs.
  • the "city” node may include a "Beijing” subgraph and a "Shanghai” subgraph.
  • the "Beijing" subgraph may include nodes of "Haidian", “Chaoyang", and so on, and edges representing logical relations therebetween.
  • the "Haidian” node may include one or more further subgraphs.
  • the recursive graph may simulate the organization manner of knowledge in the real world.
  • “Hybrid” means that the knowledge in the graph may be heterogeneous.
  • the relations between nodes represented by edges may be deterministic, or may be probabilistic.
  • the edge between "dog” 363 and "bark” 360 may be represented as “can, 0.8", i.e., the "dog” 363 has a probability of 80% that it can bark. Accordingly, in traversing the graph, there may be a probability of 80% to traverse from the "dog” 363" to "bark” 360.
  • the edge between the "dog” 363” and the "animal” 362 may be represented as "yes, 0.98", i.e., there is a probability of 98% that the "dog” 363 is animal, etc.
  • “hybrid” may further represent that the edges of the graph may represent an explicit relation, or an implicit relation.
  • the explicit relation may be for example “can,” “be,” or the like, as described above.
  • the implicit relation may be that A is related to B, A and B occur simultaneously, and the like, which may be represented by a statistical model (for example, a neural network model).
  • the relation between the "dog” 363 and the "actor” 365 as represented by a dotted line, may be an implicit relation, because the dog is not actor most of the time.
  • the implicit relation indicates a certain implicit association between the "dog” 363 and the "actor” 365, which cannot be appropriately represented by the explicit relation above.
  • “Hypergraph” means that the graph may include one or more hyperedges that link a plurality of nodes. For example, Zhang San, Li Si and Wang Wu who are friends with each other may be linked together using a hyperedge to represent the mutual relations among them.
  • the recursive hybrid hypergraph may represent various types of indications into one graph in a uniform form, so as to facilitate operations on knowledge.
  • a plurality of nodes in the knowledge graph which are respectively related to a plurality of items extracted from the natural language expression, may be determined. For example, as for the question "why can Albert Einstein think?" nodes “Albert Einsten” and "think” may be determined in the knowledge graph 300. A path including these nodes may be determined from the knowledge graph 300 (for example, the data graph), the path including a part of the edges in the knowledge graph 300 (for example, the data graph).
  • nodes in the knowledge graph 300 may be determined. Then, a path including these nodes may be determined from the knowledge graph 300 (for example, the data graph), the path including a part of the edges in the knowledge graph 300. Subsequently, reasoning corresponding to the natural language expression is determined based on the logical relation represented by the edges included in the path.
  • the matched path determined in the knowledge graph may be obtained by pattern matching a subgraph of logical rules with the data subgraph.
  • a subgraph corresponding to the predefined expression template may be determined in a logical rule layer.
  • the subgraph includes a node 320, a node 330, and a node 340, which defines that the node 320 can C and the node 320 is a part of B.
  • the subgraph of the data layer matching the subgraph of the logical rule layer may be determined.
  • the subgraph of the data layer includes the nodes above.
  • the path corresponding to the natural language expression may be determined based on the matched subgraph of the data layer.
  • the knowledge graph may be stored in a form of a graph database.
  • the knowledge graph may support a traversing operation on the graph, and the pattern matching may be achieved by traversing the graph. This may improve a parallel computing efficiency and may enlarge the capacity of the graph accordingly.
  • the pattern matching between graphs may be implemented by any method existing at present or to be developed in the future, and the subject matter described herein is not limited in this regard.
  • At least a part of the edges in the knowledge graph indicates probabilities of relations represented by these edges.
  • a plurality of subgraphs matching the subgraphs of the logical rule layer may be determined from the data graph by pattern matching. Then, a degree of match between these subgraphs and the subgraph of the logical rule layer may be determined based on the probabilities of the relations represented by edges, and the subgraph of a degree of match over a predetermined threshold is determined as the subgraph of the data layer. For example, the subgraph of the highest degree of match may be determined as the subgraph of the data layer.
  • FIG. 4 is a part of an example knowledge graph 400 according to one or more implementations of the subject matter described herein.
  • the knowledge graph 400 shows a subgraph obtained for the natural language expression "Why can Albert Einstein think, but computer cannot?"
  • the graph engine 220 may match the natural language expression with the predefined template or rule in the knowledge graph, namely the rules "why can B C?” and "why cannot A C?” in the logical rule layer 370, as shown in FIG. 3.
  • the corresponding subgraph of the rules in the rule layer 370 may be pattern matched with the data layer to determine the matched subgraph, as shown in FIG. 4.
  • the computer 410 matches "?A” 310 in the logical rule graph, "person” 430 matches “?B” 330 in the logical rule graph, and "think” 440 matches “?C” 340 in the logical rule layer.
  • "brain” 422 matches “?X” 322
  • "cerebral cortex” 424 matches “?Y” 324
  • "neuron” 426 matches "?Z” 326.
  • FIGS. 5a and 5b are diagrams of example knowledge graphs 500 and 550 according to one or more implementations of the subject matter described herein.
  • "Pal” 510 is a "dog” 520
  • "dog” 520 is an "animal” 530.
  • the graph 500 may be transformed to be a graph 550, in which it is shown that "Pal” 510 is "animal” 530. This may achieve logic deduction of the transitive relation.
  • an intermediate node e.g. "dog” 520
  • two nodes e.g. "Pal” 510 and “animal” 530
  • the knowledge graph 500 may be transformed into the knowledge graph 550, and the path from "Pal” 510 to "dog” 520 and then to "animal” 530 may be correspondingly transformed into the path from "Pal” 510 to "animal” 530.
  • FIG. 6 is a flow chart illustrating a logic reasoning method 600 according to one or more implementations of the subject matter described herein.
  • the method 600 may be performed by the graph engine 220 or processing unit 110, and the subject matter described herein is not limited in this regard.
  • a predefined expression template matching the natural language expression is determined.
  • the natural language expression may be a natural language question, such as "Why can Albert Einstein think?"
  • a plurality of items is extracted from the natural language expression based on the predefined expression template.
  • reasoning for the natural language expression is obtained by querying a knowledge graph using the plurality of items.
  • the knowledge graph includes nodes representing entities or concepts and edges representing logical relations between nodes.
  • the knowledge graph may be stored in a form of a graph database at the memory 120 or storage 150, for example.
  • obtaining the reasoning includes: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
  • the knowledge graph includes a logical rule layer and a data layer
  • the data layer includes nodes representing entities or concepts
  • the logical rule layer includes subgraphs representing logical rules
  • determining the path includes: determining a subgraph corresponding to the predefined expression template from the logical rule layer; determining a subgraph of the data layer matching the subgraph of the logical rule layer based on the plurality of nodes; and determining the path based on the subgraph of the data layer.
  • At least a part of the edges in the path indicate a probability of relations represented by the edges
  • determining the subgraph of the data layer comprising: determining a plurality of subgraphs of the data layer matching the subgraph of the logical rule layer; determining degrees of match between the plurality of subgraphs and the subgraph of the logical rule layer based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the subgraph of the data layer.
  • determining the subgraph of the data layer includes: in response to determining that an intermediate node is included between a first node and a second node in the data graph, determining whether a first relation between the first node and the intermediate node and a second relation between the intermediate node and the second node are transitive; and in response to determining that the first relation and the second relation are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
  • the functionality described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include Field-Programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
  • Program code for carrying out methods of the subject matter described herein may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented.
  • the program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine readable medium may be a machine readable signal medium or a machine readable storage medium.
  • a machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • machine readable storage medium More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM portable compact disc read-only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • a computer- implemented method comprises in response to receiving a natural language expression, determining a predefined expression template matching the natural language expression; extracting a plurality of items from the natural language expression based on the predefined expression template; obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
  • obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
  • the knowledge graph includes a logical rule layer and a data layer
  • the data layer includes nodes representing entities or concepts
  • the logical rule layer includes a subgraph representing logical rules
  • determining the path comprising: determining from the logical rule layer a first subgraph corresponding to the predefined expression template; determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and determining the path based on the second subgraph.
  • determining the second subgraph comprising: determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
  • determining the second subgraph comprises: in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
  • the knowledge graph is stored in a form of graph database.
  • the natural language expression is a natural language question.
  • an electronic device comprising a processing unit; and a memory coupled to the processing unit and having instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including: in response to receiving a natural language expression, determining a predefined language template matching the natural language expression; extracting a plurality of items from the natural language expression based on the predefined expression template; obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
  • obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
  • the knowledge graph includes a logical rule layer and a data layer
  • the data layer includes nodes representing entities or concepts
  • the logical rule layer includes a subgraph representing logical rules
  • determining the path comprising: determining from the logical rule layer a first subgraph corresponding to the predefined expression template; determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and determining the path based on the second subgraph.
  • determining the second subgraph comprising: determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
  • determining the second subgraph comprises: in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
  • the knowledge graph is stored in a form of graph database.
  • the natural language expression is a natural language question.
  • a computer program product is tangibly stored in a non-transitory computer storage medium and comprising machine executable instructions.
  • the machine executable instructions when executed by a device, cause the device to: in response to receiving a natural language expression, determine a predefined language template matching the natural language expression; extract a plurality of items from the natural language expression based on the predefined expression template; obtain reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
  • obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
  • the knowledge graph includes a logical rule layer and a data layer
  • the data layer includes nodes representing entities or concepts
  • the logical rule layer includes a subgraph representing logical rules, determining the path comprising: determining from the logical rule layer a first subgraph corresponding to the predefined expression template; determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and determining the path based on the second subgraph.
  • determining the second subgraph comprising: determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
  • determining the second subgraph comprises: in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
  • the knowledge graph is stored in a form of graph database.
  • the natural language expression is a natural language question.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Implementations of the subject matter described herein relate to machine reasoning based on a knowledge graph. In some implementations, there is provided a computer-implemented method. The method comprises, in response to receiving a natural language expression, determining a predefined expression template matching the natural language expression. A plurality of items is extracted from the natural language expression based on the predefined expression template. Reasoning for the natural language expression is obtained by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.

Description

MACHINE REASONING BASED ON KNOWLEDGE GRAPH
BACKGROUND
[0001] With the development of network, the people can obtain a great deal of knowledge with various types. The knowledge may be managed in the form of database and may be processed accordingly to promote human-computer interaction. For example, a person may query a machine in the form of natural language to ascertain whether certain logic holds, or ask a question to the machine to obtain a solution to the question, or the like. Interpretation and processing of the machine on the natural language of the human beings form a basis and an important constituent of the artificial intelligence. Although there have been proposed a number of machine interaction techniques based on the natural language, the logic reasoning capability of the machine on the natural language of the human is still insufficient at present. SUMMARY
[0002] According to some implementations, there is provided an electronic device. The device comprises a processing unit; and a memory coupled to the processing unit and having instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including: in response to receiving a natural language expression, determining a predefined language template matching the natural language expression; extracting a plurality of items from the natural language expression based on the predefined expression template; obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
[0003] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0004] FIG. 1 is block diagram illustrating an example computing system/server in which one or more implementations of the subject matter described herein can be implemented;
[0005] FIG. 2 is a block diagram illustrating an example architecture in which one or more implementations of the subject matter described herein can be implemented;
[0006] FIG. 3 illustrates a part of an example knowledge graph according to one or more implementations of the subject matter described herein;
[0007] FIG. 4 illustrates a part of an example knowledge graph according to one or more implementations of the subject matter described herein;
[0008] FIG. 5a and FIG. 5b illustrate a part of an example knowledge graph according to one or more implementations of the subject matter described herein; and
[0009] FIG. 6 is a flowchart illustrating a logical reasoning method according to one or more implementations of the subject matter described herein.
[0010] Throughout the drawings, the same or similar reference signs refer to the same or similar elements.
DETAILED DESCRIPTION OF EMBODIMENTS
[0011] The subject matter described herein will now be described with reference to various example embodiments. It is to be understood that the implementations are described only to enable those skilled in the art to better understand and further implement the subject matter described herein and by no means imply any limitations as to the scope of the subject matter described herein.
[0012] As used herein, the term "comprising" and its variants are to be read as open-ended terms that mean "comprising, but not limited to." The term "or" is to be read as "and/or" unless the context clearly indicates otherwise. The term "based on" is to be read as "based at least in part on." The terms "an implementation" and "one implementation" are to be read as "at least one implementation." The term "another implementation" is to be read as "at least another implementation." The terms "first," "second," and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below. The definitions of the terms throughout the Description are consistent, unless the context clearly indicates otherwise.
[0013] FIG. 1 is block diagram illustrating an example computing system/server 100 in which one or more implementations of the subject matter described herein can be implemented. The computing system/server 100 as shown in FIG. 1 is provided only as an example and should not be construed as limiting the function and range of the use of the implementations of the subject matter described herein.
[0014] As shown in FIG. 1, the computing system/server 100 is in a form of a general computing device. Components of the computer system/server 100 may include, but not limited to, one or more processors or processing units 100, a memory 120, one or more input devices 130, one or more output devices 140, a storage 150, and one or more communication units 160. The processing unit 100 may be an actual or virtual processor and can execute various processes based on programs stored in the memory 120. In a multiprocessing system, a multiprocessing unit executes computer-executable instructions to improve the processing capacity.
[0015] The computing system/server 100 generally includes a plurality of computer storage media, which can be any available medium accessible by the computing system/server 100, including but not limited to volatile and non-volatile medium, and removable and non-removable medium. The memory 120 can be a volatile memory (for example, a register, cache, Random Access Memory (RAM)), a non-volatile memory (for example, a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory), or any combination thereof. The storage 150 may be removable/non-removable, and may include a machine readable medium, such as flash drive, disk or any other medium, which may be configured to store information and be accessed in the computing system/server 100.
[0016] The computing system/server 100 may further include an additional removable/non-removable, volatile/non-volatile memory medium. Although not shown in FIG. 1, a disk drive may be provided to read from and write into a removable non-volatile disk and a disc drive may be provided to read from and write into a removable non-volatile disc. In these cases, each drive is connected to the bus via one or more data medium interfaces. The memory 120 may include at least one program product including a set of program modules (for example at least one program module) configured to perform functions of various implementations of the subject matter described herein.
[0017] The program module (for example at least one program module) 122 may be stored in a memory 120, for example. This program module 122 may include, but not limited to, an operation system, one or more applications, other program modules and operation data. Each of the examples or a specific combination of the examples may include an implementation of a networked environment. The program module 122 may be used to perform the function and/or method of the implementations of the subject matter described herein.
[0018] The input device 130 may be one or more various input devices. For example, the input device 130 may include a user device, such as a mouse, keyboard, tracking ball, or the like. The communication unit 160 communicates with a further computing entity via a communication medium. Additionally, functions of components in the computing system/server 100 can be implemented by a single computing cluster or multiple computing machines that can be communicated via a communication link. Therefore, the computing system/server 100 can be operated in a networked environment using a logical link with one or more other servers, network personal computers (PCs) or another general network node. The communication media include, for example but not limited to, wired or wireless network techniques.
[0019] The computing system/server 100 can also communicate with one or more external devices (not shown) as required, such as a storage device, display device or the like, one or more devices that enable users to interact with the computing system/server 100, or any devices that enable the computing system/server 100 to communicate with one or more other computing devices (for example, a network card, a modem, or the like). Such a communication is performed via an input/output (I/O) interface (not shown).
[0020] As shown in FIG. 1, the program module 122 may receive a natural language expression input by a user, such as "Why can Albert Einstein think?" The program module 122 may also receive data related to a knowledge graph from the storage 150, and obtain a reasoning result for the expression. The program module 122 then outputs the reasoning result via the output device 160, for example "Albert Einstein is a person; person has brain; brain is capable of thinking."
[0021] FIG. 2 is a block diagram illustrating an example architecture 200 in which one or more implementations of the subject matter described herein may be implemented. In the example architecture 200, a knowledge graph 210 is stored in a form of graph database in the memory such as the memory 120 or the storage 150 as shown in FIG. 1. The knowledge graph 210 may be implemented by a distributed storage environment to accommodate a graph database of a greater capacity. The user may input the natural language expression via the input device 130. In some implementations, the natural language expression input by the user may be a natural language question such as "Why can Albert Einstein think?" A graph engine 220 may be provided to answer the question and obtain reasoning for this question. The natural language expression may be an expression of another type such as "Bill Gates sets up the Microsoft Corporation." The graph engine 220 may determine whether the expression is valid, i.e., may obtain reasoning for the expression.
[0022] In response to receiving the natural language expression, the graph engine 220 determines a predefined expression template matching the natural language expression. The graph engine 220 may be implemented by the processing unit 110 as shown in FIG. 1, or may be implemented by a distributed computing environment to increase requirement on the computing performance. The predefined expression template may be of various types, such as "why can AB?", "how does A B?", "is AB?", or the like. In some implementations, a template may be determined to match the natural language expression based on a degree of correlation between the natural language expression and the template. For example, since the natural language question "why can Albert Einstein think?" includes the keyword "can," the natural language question has a high degree of correlation with the predefined expression template "why can A B," and it may be determined that the natural language question matches this predefined expression template.
[0023] Based on the predefined expression template, the graph engine 220 extracts a plurality of items. These items may include one or more specified items in the predefined expression template. For example, the template "why can A B?" specifies items A and B, and the template "how does A B?" specifies items A and B. As a result, two items "Albert Einstein" and "think" may be extracted from the natural language question "why can Albert Einstein think?"
[0024] The graph engine 220 may obtain reasoning for the natural language expression by querying the knowledge graph using the plurality of items. The knowledge graph includes a data graph which includes nodes representing entities or concepts and edges representing logical relations between nodes. The data graph may include entities (also referred to as facts) and concepts (also referred to as common sense). In some implementations, the knowledge graph may also include a logical rule graph which includes nodes representing abstract identifiers and edges representing logical expressions therebetween, such as hyperedges. The hyperedge is an edge linking a plurality of nodes.
[0025] In order to describe the principle of the subject matter described herein in more detail, in particular how to query the knowledge graph using the items extracted from the natural language expression, the representation of the knowledge graph will be introduced with reference to FIG. 3, which shows a part of an example knowledge graph 300 according to one or more implementations of the subject matter described herein. In the example as shown in FIG. 3, the knowledge graph 300 includes three layers, including an entity layer 350, a concept layer 360, and a logical rule layer 370, which may be referred to as an entity graph, a concept graph, and a logical rule graph, respectively. It becomes more and more abstract from the entity portion 350 to the logical rule layer 370, and the data volume gradually decreases from the entity layer 350 to the logical rule layer 370 accordingly. The entity layer 350 may be linked to the concept layer 360 via the relations between nodes to form a data graph as described above. However, it is to be understood that the structure as shown in FIG. 3 is provided only for illustrative purpose, and the representation of the knowledge graph may include more or fewer layers or nodes. For example, the structure of FIG. 3 may include only the concept layer 360 and the logical rule layer 370. In some implementations, the entity layer 350 and the concept layer 360 may not be distinguished from each other.
[0026] The entity layer 350 is also referred to as an entity graph, including nodes representing entities and edges representing logical relations between the entities. The entity may represent a specific object existing in the real world, such as Albert Einstein, Bill Gates, and so on. Each entity node records various attributes or information related to the node. In the example of FIG. 3, "Pal" 351 is shown to be an entity node in the entity layer 350.
[0027] The concept layer 360 is also referred to as a concept graph, including nodes representing the concepts and edges representing a logical relation between concepts. The concept may be an abstraction of the entity, such as person, animal, dog, and so on. For example, FIG. 3 shows that "bark" 361, "animal" 362, "dog" 363, "person" 364 and "actor" 365 are concept nodes in the concept layer 360.
[0028] As described above, the entity layer 350 and the concept layer 360 may be connected or linked through the logical relation between each pair of nodes, such that the entity layer 350 and the concept layer 360 are combined as an organic whole. This may be implemented by any method existing at present and/or to be developed in the future, and the subject matter described herein is not limited in this regard.
[0029] The logical rule layer 370 includes various logical rules, and each logical rule may include abstraction nodes and one or more hyperedges representing relations between the nodes. Each of the abstraction nodes may indicate the respective attribute, or may not indicate any attribute. As shown in FIG. 3, the logical rule layer 370 includes a plurality of nodes represented by abstraction identifiers "?A" 310, "?B" 330, "?C" 340, "?X"322, "?Y" 324, and "?Z"326, respectively. The relation between nodes is represented by a logic expression, and for example, A does not have X (ANotHasA X), and C has a prerequisite X (C HasPrerequisite X). In the logical rule layer 370, a plurality of logical rules may be predefined, and each logical rule may be represented by a subgraph. For example, as shown in FIG. 3, regarding "why can B C?" the subgraph includes a node 320, a node 330, and a node 340, and defines that the node 320 is capable of C and the node 320 is a part of B.
[0030] In some implementations, the knowledge graph 300 may be a recursive hybrid hypergraph (RHHG) for expressing knowledge of various types in the real world, and the knowledge may be presented in various manners.
[0031] "Recursive" means that knowledge is organized in a hierarchical manner, and each node of the graph may be another graph. For example, at least a part of nodes in the data graph may be one or more data graphs, such as one or more concept graphs, one or more entity graphs, and/or one or more combinations of data graphs and concept graphs. For example, the "city" node may include a "Beijing" subgraph and a "Shanghai" subgraph. The "Beijing" subgraph may include nodes of "Haidian", "Chaoyang", and so on, and edges representing logical relations therebetween. The "Haidian" node may include one or more further subgraphs. The recursive graph may simulate the organization manner of knowledge in the real world.
[0032] "Hybrid" means that the knowledge in the graph may be heterogeneous. For example, the relations between nodes represented by edges may be deterministic, or may be probabilistic. For example, the edge between "dog" 363 and "bark" 360 may be represented as "can, 0.8", i.e., the "dog" 363 has a probability of 80% that it can bark. Accordingly, in traversing the graph, there may be a probability of 80% to traverse from the "dog" 363" to "bark" 360. The edge between the "dog" 363" and the "animal" 362 may be represented as "yes, 0.98", i.e., there is a probability of 98% that the "dog" 363 is animal, etc. By using the probabilistic edges, relatively unimportant information may be filtered in traversing the graph, so as to save the computation overhead.
[0033] In addition, "hybrid" may further represent that the edges of the graph may represent an explicit relation, or an implicit relation. The explicit relation may be for example "can," "be," or the like, as described above. The implicit relation may be that A is related to B, A and B occur simultaneously, and the like, which may be represented by a statistical model (for example, a neural network model). For example, the relation between the "dog" 363 and the "actor" 365, as represented by a dotted line, may be an implicit relation, because the dog is not actor most of the time. The implicit relation indicates a certain implicit association between the "dog" 363 and the "actor" 365, which cannot be appropriately represented by the explicit relation above.
[0034] "Hypergraph" means that the graph may include one or more hyperedges that link a plurality of nodes. For example, Zhang San, Li Si and Wang Wu who are friends with each other may be linked together using a hyperedge to represent the mutual relations among them.
[0035] The recursive hybrid hypergraph may represent various types of indications into one graph in a uniform form, so as to facilitate operations on knowledge.
[0036] Referring to FIGS. 3 and 4, how to obtain reasoning based on the knowledge graph will be further introduced below. In some implementations, a plurality of nodes in the knowledge graph, which are respectively related to a plurality of items extracted from the natural language expression, may be determined. For example, as for the question "why can Albert Einstein think?" nodes "Albert Einsten" and "think" may be determined in the knowledge graph 300. A path including these nodes may be determined from the knowledge graph 300 (for example, the data graph), the path including a part of the edges in the knowledge graph 300 (for example, the data graph).
[0037] In some implementations, nodes in the knowledge graph 300 (for example, the data graph), which are respectively associated with the extracted items, may be determined. Then, a path including these nodes may be determined from the knowledge graph 300 (for example, the data graph), the path including a part of the edges in the knowledge graph 300. Subsequently, reasoning corresponding to the natural language expression is determined based on the logical relation represented by the edges included in the path.
[0038] In some implementations, the matched path determined in the knowledge graph may be obtained by pattern matching a subgraph of logical rules with the data subgraph. For example, a subgraph corresponding to the predefined expression template may be determined in a logical rule layer. As shown in FIG. 3, for the natural language question "why can B C?" the subgraph includes a node 320, a node 330, and a node 340, which defines that the node 320 can C and the node 320 is a part of B.
[0039] Based on the nodes extracted from the natural language expression, the subgraph of the data layer matching the subgraph of the logical rule layer may be determined. The subgraph of the data layer includes the nodes above. Next, the path corresponding to the natural language expression may be determined based on the matched subgraph of the data layer.
[0040] The knowledge graph may be stored in a form of a graph database. As a result, in pattern matching, the knowledge graph may support a traversing operation on the graph, and the pattern matching may be achieved by traversing the graph. This may improve a parallel computing efficiency and may enlarge the capacity of the graph accordingly. However, it is to be understood that the pattern matching between graphs may be implemented by any method existing at present or to be developed in the future, and the subject matter described herein is not limited in this regard.
[0041] As described above, at least a part of the edges in the knowledge graph indicates probabilities of relations represented by these edges. In this case, a plurality of subgraphs matching the subgraphs of the logical rule layer may be determined from the data graph by pattern matching. Then, a degree of match between these subgraphs and the subgraph of the logical rule layer may be determined based on the probabilities of the relations represented by edges, and the subgraph of a degree of match over a predetermined threshold is determined as the subgraph of the data layer. For example, the subgraph of the highest degree of match may be determined as the subgraph of the data layer.
[0042] FIG. 4 is a part of an example knowledge graph 400 according to one or more implementations of the subject matter described herein. The knowledge graph 400 shows a subgraph obtained for the natural language expression "Why can Albert Einstein think, but computer cannot?"
[0043] Specifically, upon receiving the natural language question "Why can Albert Einstein think, but computer cannot?", the graph engine 220 may match the natural language expression with the predefined template or rule in the knowledge graph, namely the rules "why can B C?" and "why cannot A C?" in the logical rule layer 370, as shown in FIG. 3. Next, the corresponding subgraph of the rules in the rule layer 370 may be pattern matched with the data layer to determine the matched subgraph, as shown in FIG. 4.
[0044] In the knowledge graph 400, the computer 410 matches "?A" 310 in the logical rule graph, "person" 430 matches "?B" 330 in the logical rule graph, and "think" 440 matches "?C" 340 in the logical rule layer. In addition, "brain" 422 matches "?X" 322, "cerebral cortex" 424 matches "?Y" 324, and "neuron" 426 matches "?Z" 326.
[0045] Reasoning for the natural language expression may be obtained based on the knowledge graph 400 matching the logical rules. Specifically, as shown in FIG. 4, "brain" 422 can "think" 440, "person" 430 has a "brain" 422, and "Albert Einstein" 450 is a "person" 430. Therefore, "Albert Einstein" 450 can "think" 440. Likewise, "think" 440 requires "brain" 422, and "computer" 410 does not have a "brain" 422. Consequently, "computer" 410 cannot "think" 440.
[0046] In some implementations, since the relations between the nodes are transitive, a plurality of nodes may be reduced to two nodes and the relation therebetween. FIGS. 5a and 5b are diagrams of example knowledge graphs 500 and 550 according to one or more implementations of the subject matter described herein. As shown in FIG. 5a, "Pal" 510 is a "dog" 520, and "dog" 520 is an "animal" 530. Since the relation of "isA" (be) is transitive, the graph 500 may be transformed to be a graph 550, in which it is shown that "Pal" 510 is "animal" 530. This may achieve logic deduction of the transitive relation.
[0047] For example, if it is determined that an intermediate node (e.g. "dog" 520) is located between two nodes (e.g. "Pal" 510 and "animal" 530) in the data graph 500, it may be determined whether the relation between "Pal" 510 and "animal" 530 and the relation between "dog" 520 and "animal" 530 are transitive. Since the relations are both "be" ("is A"), it may be determined that the two relations are transistive. In this case, the knowledge graph 500 may be transformed into the knowledge graph 550, and the path from "Pal" 510 to "dog" 520 and then to "animal" 530 may be correspondingly transformed into the path from "Pal" 510 to "animal" 530.
[0048] FIG. 6 is a flow chart illustrating a logic reasoning method 600 according to one or more implementations of the subject matter described herein. The method 600 may be performed by the graph engine 220 or processing unit 110, and the subject matter described herein is not limited in this regard.
[0049] At 602, in response to receiving the natural language expression, a predefined expression template matching the natural language expression is determined. In some implementations, the natural language expression may be a natural language question, such as "Why can Albert Einstein think?"
[0050] At 604, a plurality of items is extracted from the natural language expression based on the predefined expression template.
[0051] In 606, reasoning for the natural language expression is obtained by querying a knowledge graph using the plurality of items. The knowledge graph includes nodes representing entities or concepts and edges representing logical relations between nodes. The knowledge graph may be stored in a form of a graph database at the memory 120 or storage 150, for example.
[0052] In some implementations, obtaining the reasoning includes: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
[0053] In some implementations, the knowledge graph includes a logical rule layer and a data layer, the data layer includes nodes representing entities or concepts, and the logical rule layer includes subgraphs representing logical rules, and determining the path includes: determining a subgraph corresponding to the predefined expression template from the logical rule layer; determining a subgraph of the data layer matching the subgraph of the logical rule layer based on the plurality of nodes; and determining the path based on the subgraph of the data layer.
[0054] In some implementations, at least a part of the edges in the path indicate a probability of relations represented by the edges, determining the subgraph of the data layer comprising: determining a plurality of subgraphs of the data layer matching the subgraph of the logical rule layer; determining degrees of match between the plurality of subgraphs and the subgraph of the logical rule layer based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the subgraph of the data layer.
[0055] In some implementations, determining the subgraph of the data layer includes: in response to determining that an intermediate node is included between a first node and a second node in the data graph, determining whether a first relation between the first node and the intermediate node and a second relation between the intermediate node and the second node are transitive; and in response to determining that the first relation and the second relation are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
[0056] The functionality described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-Programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
[0057] Program code for carrying out methods of the subject matter described herein may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
[0058] In the context of this disclosure, a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
[0059] Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein. Certain features that are described in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable subcombination.
[0060] Some exemplary implementations of the subject matter described herein are listed below.
[0061] In accordance with some implementations, there is provided a computer- implemented method. The method comprises in response to receiving a natural language expression, determining a predefined expression template matching the natural language expression; extracting a plurality of items from the natural language expression based on the predefined expression template; obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
[0062] In some implementations, obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
[0063] In some implementations, the knowledge graph includes a logical rule layer and a data layer, the data layer includes nodes representing entities or concepts, and the logical rule layer includes a subgraph representing logical rules, and determining the path comprising: determining from the logical rule layer a first subgraph corresponding to the predefined expression template; determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and determining the path based on the second subgraph.
[0064] In some implementations, at least a part of the edges in the path indicate a probability of relations represented by the edges, determining the second subgraph comprising: determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
[0065] In some implementations, determining the second subgraph comprises: in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
[0066] In some implementations, the knowledge graph is stored in a form of graph database.
[0067] In some implementations, the natural language expression is a natural language question.
[0068] In accordance with some implementations, there is provided an electronic device. The electronic device comprises a processing unit; and a memory coupled to the processing unit and having instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including: in response to receiving a natural language expression, determining a predefined language template matching the natural language expression; extracting a plurality of items from the natural language expression based on the predefined expression template; obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
[0069] In some implementations, obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
[0070] In some implementations, the knowledge graph includes a logical rule layer and a data layer, the data layer includes nodes representing entities or concepts, and the logical rule layer includes a subgraph representing logical rules, and determining the path comprising: determining from the logical rule layer a first subgraph corresponding to the predefined expression template; determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and determining the path based on the second subgraph.
[0071] In some implementations, at least a part of the edges in the path indicate a probability of relations represented by the edges, determining the second subgraph comprising: determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
[0072] In some implementations, determining the second subgraph comprises: in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
[0073] In some implementations, the knowledge graph is stored in a form of graph database.
[0074] In some implementations, the natural language expression is a natural language question.
[0075] In accordance with some implementations, there is provided a computer program product. The computer program product is tangibly stored in a non-transitory computer storage medium and comprising machine executable instructions. The machine executable instructions, when executed by a device, cause the device to: in response to receiving a natural language expression, determine a predefined language template matching the natural language expression; extract a plurality of items from the natural language expression based on the predefined expression template; obtain reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
[0076] In some implementations, obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively; determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
[0077] In some implementations, the knowledge graph includes a logical rule layer and a data layer, the data layer includes nodes representing entities or concepts, and the logical rule layer includes a subgraph representing logical rules, determining the path comprising: determining from the logical rule layer a first subgraph corresponding to the predefined expression template; determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and determining the path based on the second subgraph.
[0078] In some implementations, at least a part of the edges in the path indicate a probability of relations represented by the edges, determining the second subgraph comprising: determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
[0079] In some implementations, determining the second subgraph comprises: in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
[0080] In some implementations, the knowledge graph is stored in a form of graph database.
[0081] In some implementations, the natural language expression is a natural language question. [0082] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter specified in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A computer-implemented method, comprising:
in response to receiving a natural language expression, determining a predefined expression template matching the natural language expression;
extracting a plurality of items from the natural language expression based on the predefined expression template;
obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
2. The method according to claim 1, wherein obtaining the reasoning comprises: determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively;
determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and
determining the reasoning based on the logical relations represented by the edges included in the path.
3. The method according to claim 2, wherein the knowledge graph includes a logical rule layer and a data layer, the data layer includes nodes representing entities or concepts, and the logical rule layer includes a subgraph representing logical rules, and determining the path comprising:
determining from the logical rule layer a first subgraph corresponding to the predefined expression template;
determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and
determining the path based on the second subgraph.
4. The method according to claim 3, wherein at least a part of the edges in the path indicate a probability of relations represented by the edges, determining the second subgraph comprising:
determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and
determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
5. The method according to claim 3, wherein determining the second subgraph comprises:
in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and
in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
6. The method according to claim 1, wherein the knowledge graph is stored in a form of graph database.
7. The method according to claim 1, wherein the natural language expression is a natural language question.
8. An electronic device, comprising:
a processing unit; and
a memory coupled to the processing unit and having instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including:
in response to receiving a natural language expression, determining a predefined language template matching the natural language expression;
extracting a plurality of items from the natural language expression based on the predefined expression template;
obtaining reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
9. The electronic device according to claim 8, wherein obtaining the reasoning comprises:
determining a plurality of nodes in the knowledge graph associated with the plurality of items, respectively;
determining a path including the plurality of nodes from the knowledge graph, the path consisting of a part of the edges in the knowledge graph; and determining the reasoning based on the logical relations represented by the edges included in the path.
10. The electronic device according to claim 9, wherein the knowledge graph includes a logical rule layer and a data layer, the data layer includes nodes representing entities or concepts, and the logical rule layer includes a subgraph representing logical rules, and determining the path comprising:
determining from the logical rule layer a first subgraph corresponding to the predefined expression template;
determining from the data layer a second subgraph matching the first subgraph based on the plurality of nodes; and
determining the path based on the second subgraph.
11. The electronic device according to claim 10, wherein at least a part of the edges in the path indicate a probability of relations represented by the edges, determining the second subgraph comprising:
determining a plurality of subgraphs of the data layer matching the first subgraph; determining degrees of match between the plurality of subgraphs and the first subgraph based on the probability; and
determining, from the plurality of subgraphs, a subgraph with a degree of match over a predetermined threshold as the second subgraph.
12. The electronic device according to claim 10, wherein determining the second subgraph comprises:
in response to determining that the data graph includes an intermediate node between a first node and a second node, determining whether a first relation between the first node and the intermediate node and a second relation between the second node and the intermediate node are transitive; and
in response to determining that the first and second relations are transitive, transforming a path from the first node via the intermediate node to the second node into a path from the first node to the second node.
13. The electronic device according to claim 8, wherein the knowledge graph is stored in a form of graph database.
14. The electronic device according to claim 8, wherein the natural language expression is a natural language question.
15. A computer program product being tangibly stored in a computer storage medium and comprising machine executable instructions, the machine executable instructions, when executed by a device, causing the device to:
in response to receiving a natural language expression, determine a predefined language template matching the natural language expression;
extract a plurality of items from the natural language expression based on the predefined expression template;
obtain reasoning for the natural language expression by querying a knowledge graph using the plurality of items, the reasoning answering a question related to the expression or verifying meaning correctness of the expression, the knowledge graph including nodes representing entities or concepts and edges representing logical relations between the nodes.
PCT/US2018/034017 2017-06-09 2018-05-23 Machine reasoning based on knowledge graph WO2018226404A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710433308.9 2017-06-09
CN201710433308.9A CN109033063B (en) 2017-06-09 2017-06-09 Machine inference method based on knowledge graph, electronic device and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2018226404A1 true WO2018226404A1 (en) 2018-12-13

Family

ID=63643046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/034017 WO2018226404A1 (en) 2017-06-09 2018-05-23 Machine reasoning based on knowledge graph

Country Status (2)

Country Link
CN (1) CN109033063B (en)
WO (1) WO2018226404A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840255A (en) * 2019-01-09 2019-06-04 平安科技(深圳)有限公司 Reply document creation method, device, equipment and storage medium
CN110175226A (en) * 2019-05-09 2019-08-27 厦门邑通软件科技有限公司 A kind of dialogue decision-making technique based on various dimensions scene analysis
CN111831797A (en) * 2019-04-19 2020-10-27 广东省智能制造研究所 Management and recommendation system for manufacturing industry processing equipment model
CN112541043A (en) * 2020-12-24 2021-03-23 北京明略软件系统有限公司 Method, device and equipment for detecting connectivity of nodes of knowledge graph
CN113282720A (en) * 2020-02-20 2021-08-20 清华大学 Visual reasoning method and device
US11227018B2 (en) 2019-06-27 2022-01-18 International Business Machines Corporation Auto generating reasoning query on a knowledge graph
CN115827935A (en) * 2023-02-09 2023-03-21 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment
WO2023169072A1 (en) * 2022-03-08 2023-09-14 支付宝(杭州)信息技术有限公司 Configuration method and apparatus, and analysis method and apparatus for entities in knowledge graph

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109766453A (en) * 2019-01-18 2019-05-17 广东小天才科技有限公司 A kind of method and system of user's corpus semantic understanding
CN110008413B (en) * 2019-03-14 2023-11-10 海信集团有限公司 Traffic travel problem query method and device
CN110245253B (en) * 2019-05-21 2021-11-23 华中师范大学 Semantic interaction method and system based on environmental information
CN112580357A (en) * 2019-09-29 2021-03-30 微软技术许可有限责任公司 Semantic parsing of natural language queries
CN110717025B (en) * 2019-10-08 2022-08-12 北京百度网讯科技有限公司 Question answering method and device, electronic equipment and storage medium
CN112818092B (en) * 2020-04-20 2023-08-11 腾讯科技(深圳)有限公司 Knowledge graph query statement generation method, device, equipment and storage medium
CN111898004A (en) * 2020-06-20 2020-11-06 中国建设银行股份有限公司 Data mining method and device, electronic equipment and readable storage medium thereof
CN112581955B (en) * 2020-11-30 2024-03-08 广州橙行智动汽车科技有限公司 Voice control method, server, voice control system, and readable storage medium
US20220300799A1 (en) * 2021-03-16 2022-09-22 International Business Machines Corporation Neuro-Symbolic Approach for Entity Linking
CN114238648B (en) * 2021-11-17 2022-11-08 中国人民解放军军事科学院国防科技创新研究院 Game countermeasure behavior decision method and device based on knowledge graph

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7013308B1 (en) * 2000-11-28 2006-03-14 Semscript Ltd. Knowledge storage and retrieval system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750795B (en) * 2015-03-12 2017-09-01 北京云知声信息技术有限公司 A kind of intelligent semantic searching system and method
CN105787105B (en) * 2016-03-21 2019-04-19 浙江大学 A kind of Chinese encyclopaedic knowledge map classification system construction method based on iterative model
CN105868313B (en) * 2016-03-25 2019-02-12 浙江大学 A kind of knowledge mapping question answering system and method based on template matching technique

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7013308B1 (en) * 2000-11-28 2006-03-14 Semscript Ltd. Knowledge storage and retrieval system and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ABDALGHANI ABUJABAL ET AL: "Automated Template Generation for Question Answering over Knowledge Graphs", WORLD WIDE WEB, INTERNATIONAL WORLD WIDE WEB CONFERENCES STEERING COMMITTEE, REPUBLIC AND CANTON OF GENEVA SWITZERLAND, 3 April 2017 (2017-04-03), pages 1191 - 1200, XP058327227, ISBN: 978-1-4503-4913-0, DOI: 10.1145/3038912.3052583 *
IAN ROBINSON ET AL: "Graph Databases, 2nd Edition", 25 June 2015 (2015-06-25), pages 205 - 210, XP055519499, ISBN: 978-1-4919-3089-2, Retrieved from the Internet <URL:https://epo.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV25TsQwEB1xFVTLJbFLQCkoMcrhI673CAXVaregiiYeW6LZgiz_zzgJgm0oqEeKNYrmePa8NwCP6JT22mnRlpaEpMIKbHMlWtTx0U2roCLf-W0uN7VZ1_J1JIV1vfjfkA-fOwwMFmOn2Q2aEX3a7hWaLDcdHByRWc7lNM7xLdbbn4sVVRoVdw-fDZ_4VS5Wk_8ddAGnPlIPLuHI765g8r1wIR3j7xqSOspLpwvcY6w-3VN> [retrieved on 20181026] *
WILLIAM TUNSTALL-PEDOE: "True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference", AI MAGAZINE., vol. 31, no. 3, 1 January 2010 (2010-01-01), CA, pages 80, XP055519478, ISSN: 0738-4602, DOI: 10.1609/aimag.v31i3.2298 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840255A (en) * 2019-01-09 2019-06-04 平安科技(深圳)有限公司 Reply document creation method, device, equipment and storage medium
CN109840255B (en) * 2019-01-09 2023-09-19 平安科技(深圳)有限公司 Reply text generation method, device, equipment and storage medium
CN111831797A (en) * 2019-04-19 2020-10-27 广东省智能制造研究所 Management and recommendation system for manufacturing industry processing equipment model
CN110175226A (en) * 2019-05-09 2019-08-27 厦门邑通软件科技有限公司 A kind of dialogue decision-making technique based on various dimensions scene analysis
CN110175226B (en) * 2019-05-09 2021-06-08 厦门邑通软件科技有限公司 Dialogue decision method based on multi-dimensional scene analysis
US11227018B2 (en) 2019-06-27 2022-01-18 International Business Machines Corporation Auto generating reasoning query on a knowledge graph
CN113282720A (en) * 2020-02-20 2021-08-20 清华大学 Visual reasoning method and device
CN112541043A (en) * 2020-12-24 2021-03-23 北京明略软件系统有限公司 Method, device and equipment for detecting connectivity of nodes of knowledge graph
WO2023169072A1 (en) * 2022-03-08 2023-09-14 支付宝(杭州)信息技术有限公司 Configuration method and apparatus, and analysis method and apparatus for entities in knowledge graph
CN115827935A (en) * 2023-02-09 2023-03-21 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment

Also Published As

Publication number Publication date
CN109033063A (en) 2018-12-18
CN109033063B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
WO2018226404A1 (en) Machine reasoning based on knowledge graph
US20210374610A1 (en) Efficient duplicate detection for machine learning data sets
Ramírez-Gallego et al. An information theory-based feature selection framework for big data under apache spark
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
CA2953959C (en) Feature processing recipes for machine learning
CN106663038B (en) Feature processing recipe for machine learning
KR20200098378A (en) Method, device, electronic device and computer storage medium for determining description information
US11887013B2 (en) System and method for facilitating model-based classification of transactions
WO2020140624A1 (en) Method for extracting data from log, and related device
CN110162518A (en) Data grouping method, apparatus, electronic equipment and storage medium
Wang et al. Research on optimization and application of Spark decision tree algorithm under cloud‐edge collaboration
CN107885834A (en) A kind of Hadoop big datas component uniformly verifies system
Kalpana et al. Feature selection for machine learning in big data
CN116450667A (en) Data query method, device, computer equipment and storage medium
Seignor Comparative Study on Traditional Learning methods’ performance in Big Data Analysis
CN116992021A (en) Website type identification method and device, electronic equipment and storage medium
CN113159129A (en) Method, device, equipment and storage medium for classifying power consumers
CN118152127A (en) System and method for managing feature processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18773288

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18773288

Country of ref document: EP

Kind code of ref document: A1