Detailed Description
In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.
The embodiment of the specification provides a data reasoning method based on a knowledge graph, which is used for solving the problem that large-scale knowledge reasoning cannot be realized due to overlarge time and space overhead.
In the technical scheme of the embodiment of the specification, the data are subjected to multi-batch rule iterative inference according to the rule inference characteristics of the knowledge graph to obtain the scheme suitable for rule inference of the knowledge graph, so that the technical effects of being capable of performing iterative inference in a distributed environment and being applied to large-scale inference are achieved.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
Fig. 1 is a schematic diagram of a data inference application scenario based on a knowledge graph according to an embodiment of the present disclosure, where a terminal 100 is located on a user side and communicates with a server 200 on a network side. The client 101 in the terminal 100 may be an APP or a website for implementing a service based on the internet, and provides an interface of knowledge graph data for a user and provides the data to a network side for reasoning; the server 200 sends the inferred data to the client 101.
The data inference method based on the knowledge graph provided by the embodiment of the specification surrounds a domain knowledge graph data system and an algorithm, can support application scenes of finance, insurance, customer service, enterprises and the like, and constructs a graph data application ecology, and fig. 2 is a flow schematic diagram of the data inference method based on the knowledge graph in the embodiment of the specification. As shown in fig. 2, includes:
step 110, obtaining input instance data of a knowledge graph;
specifically, data processing may be performed according to a plurality of preset inference rules to obtain input instance data of the knowledge graph. Because many original input data in the map are irrelevant to subsequent reasoning, irrelevant map data can be filtered out according to the contents required by subsequent rules, and the residual Xuya input instance data enter the subsequent steps for reasoning.
Further, one of the inference rules includes a rule body, wherein the rule body includes one or more sub-targets, wherein the sub-targets have an assignment statement.
Specifically, the Knowledge Graph (knowlege Graph/value) is also called a scientific Knowledge Graph, is called a Knowledge domain visualization or Knowledge domain mapping map in the book intelligence world, is a series of different graphs for displaying the relationship between the Knowledge development process and the structure, describes Knowledge resources and carriers thereof by using a visualization technology, and excavates, analyzes, constructs, draws and displays Knowledge and the mutual relation between the Knowledge resources and the carriers. Specifically, the knowledge graph is a modern theory which achieves the purpose of multi-discipline fusion by combining theories and methods of applying subjects such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology introduction analysis, co-occurrence analysis and the like and utilizing a visualized graph to vividly display the core structure, development history, frontier field and overall knowledge framework of the subjects. The method displays the complex knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveals the dynamic development rule of the knowledge field, and provides a practical and valuable reference for subject research.
In the embodiment of the present specification, initial input data required for inference, that is, input instance data of a knowledge graph, is first obtained, and before this, original input instance data of the knowledge graph to which the initial input instance data belongs needs to be obtained, the original input instance data is preprocessed, data filtering is performed on the original input instance data according to a plurality of preset inference rules, and the original input instance data that does not conform to the plurality of preset inference rules is removed, so that the remaining original input instance data conforms to the plurality of preset inference rules, and the original input instance data that conforms to the plurality of preset inference rules is used as the input instance data of the knowledge graph.
The preset multiple inference rules can be DataLog rules which accord with the knowledge graph, the DataLog rules belong to a limited Prolog language, the DataLog is a logic-based programming language, is an improved form language of the Prolog which is suitable for a knowledge base, is a limited form of Horn clause logic in first-order predicate logic, and only allows variables or constant brightness to be used as argument of the predicate. DataLog's statements are composed of facts and rules that, like Prolog, enable the demonstration of a knowledge baseDeductive reasoning, i.e. new facts can be deduced from known facts from following reasoning. Specifically, one DataLog rule includes: a rule header, which may be a relationship atom, such as student (x); implication symbols: the implication symbol may represent a logical relationship rather than an operator symbol; a rule body, the rule body includes one or more sub-targets, the sub-targets can be relationship atoms or arithmetic atoms, wherein, the arithmetic atoms can be Boolean values, such as a > 10, each sub-target is equivalent to and connection, the meaning of the rule can be described as checking all possible values of variables in the rule, when the variables make all sub-targets in the rule body true, the rule head is true, otherwise, the rule head is false, for example, P: -P1,P2,...Pn// if P1,P2,...PnIf both are true, P is true.
In order to make the DataLog rule better applicable to knowledge graph reasoning, and in particular to make it more efficient to manage in a distributed environment, the existing DataLog rule is improved by allowing custom functions to exist in sub-targets, in particular, the sub-targets may be assignment statements such as b def _ age (a).
For example, sub-goals may be divided into three types:
1 relationship atoms, e.g. edge _ friends (X, Y)
2 atoms of value, e.g. a ═ def _ age (b)
3 arithmetic atoms, the result is a boolean arithmetic expression, such as a > b.
For a knowledge graph, relational statements (including rule headers) can be defined into three types:
1. edges, which are appointed to begin with edge _ followed by the name of edge English, such as edge _ friends (x, y), which means that the friend of x is y;
2. type, convention begins with type _ followed by type english name, e.g., type _ person (x), indicating that x is a person;
3. an attribute, appointed to begin with prop _ followed by an attribute Chinese name, such as prop _ age (X, a), indicating that the age of X is a; the second type is an assignment statement, such as a ═ def _ age (b); the third type is an arithmetic statement, which results in a Boolean-type arithmetic expression, e.g., a > b.
Step 120, dividing a plurality of preset inference rules into more than two batch rules according to inference dependency relationship among the inference rules, wherein different batch rules have different operation priorities;
specifically, the different run priorities of the different batch rules include: the running priority is specifically a dependency number, the smaller the dependency number is, the higher the running priority is, the different batches of rules have different dependency numbers, the same batch of rules have the same dependency number, the dependency number of a rule indicates the degree of dependency of the rule on other rules, and the dependency number is positively correlated with the degree of dependency.
In particular, the dependency relationships are also referred to as "logical relationships". In project management, a relationship is referred to that indicates that a change in one of two activities (a leading activity and a following activity) will affect the other activity. The typical dependencies between activities include three forms, namely mandatory dependencies (inherent in the work done), freely processable dependencies (determined by the project team), and external dependencies (between the project activity and the non-project activity). In the embodiment of the present specification, when each rule performs inference, it may need to use the inference result of other rules to infer the result of the rule, where there is a dependency relationship between the rules. The rules are classified through the dependency relationship among the rules, and batch reasoning is carried out according to the classification, so that the reasoning can be carried out in an iterative reasoning mode in a distributed environment.
According to the dependency relationship of the knowledge graph, the dependency relationship exists among the rules, some rules may not be depended on by other rules, some rules may be depended on by another rule, and some rules may be depended on by a plurality of other rules.
There are many rules with dependency relationship in the database, and it can be concluded how many other rules each rule depends on according to the dependency relationship, the number that is depended on by other rules is called dependency number, the rules are divided into batches according to the dependency number of each rule, the rules with the same dependency number are divided into the same batch of rules, for example, the rules depended on by two other rules are divided into the same batch of rules, the rules depended on by one other rule are divided into another batch of rules, so as to sort the rules in the rule base, and the disordered rules are divided into batches, so as to perform rule reasoning on the data according to the batches, and perform iterative reasoning on the data by the rules of the same batch.
For example, a plurality of preset inference rules are divided according to inference dependency relationships between each other to obtain two batch rules, which are referred to as a first batch rule and a second batch rule herein, the first batch rule is a batch of rules with the same dependency number divided according to a division manner with the same dependency number, and the second batch rule is another batch of rules with the same dependency number, where the dependency numbers of the first batch rule and the second batch rule are different. And in order to distinguish the two batch rules, the first and second batches of rules have no specific significance, the sizes of the dependency numbers of the first batch of rules and the second batch of rules are compared, namely the smaller the dependency number is, the higher the corresponding operation priority is, if the dependency number of the first batch of rules is smaller than the dependency number of the second batch of rules, iterative inference is performed on the first data according to the first batch of rules, and after the iterative inference is performed according to the first batch of rules, iterative inference is performed according to the second batch of rules, wherein the dependency number indicates the degree of dependency of the rule on other rules, the larger the dependency number is, the higher the degree of dependency between the rules is, and otherwise, the smaller the dependency number is, the lower the degree of dependency between the rules is.
For example,
rule 1: edge _ farmInLaw (X, Z): edge _ life (X, Y); the grandmother of edge _ fat (Y, Z) # X is Y, the father of Y is Z, and the Yuenafather of deducting X is Z;
rule 2: type _ middlelager (X) — type _ person (X); prop _ age (X, a); a >35and a <50# X are humans, X is a of age, a >35and <50 then X is a middle aged human;
rule 3: prop _ age (X, b): type _ person (X); prop _ birthDate (X, a); b is def _ age (a) # X is a person, the birthday of X is a, the function def _ age finds the age b, and the age of X is b.
With the above three examples of rules, we can derive the dependency between the rules, such as rule 2 depending on rule 3, since the sub-target prop _ age is derived by rule 3. That is, since there is no dependency relationship between rule 1 and rule 3, the dependency number of rule 1 and rule 2 can be defined as 0, and then can be used as a first batch rule, and has a higher operation priority; meanwhile, since rule 2 depends on rule 3, the dependency number of rule 2 is defined as 1, and may be used as a second batch rule with a lower operation priority than the first batch rule. Similarly, if there are other rules and there may be layer-by-layer dependencies, such as rule 4 relying on rule 5and rule 5 relying on rule 1, then the number of dependencies for rule 4 can be defined as 2.
Certainly, the preset multiple inference rules can be divided according to different dependent numbers to distinguish priorities, and can be divided into one batch rule, two batch rules, or a plurality of batch rules.
Step 130, operating the rules of each batch in sequence in a distributed iteration mode for reasoning according to the operation priority of each batch from high to low to obtain a summary reasoning result of the knowledge graph; when a first batch rule is operated for reasoning, the specific input data is the input instance data; when the rules of the second batch and more than the second batch are operated for reasoning, the specific input data are the reasoning result output by the rules of the previous batch and the input instance data.
In particular, an iteration in an iterative inference is the activity of a repetitive feedback process, usually with the purpose of approximating a desired goal or result. Each repetition of the process is called an iteration, and the result obtained by each iteration is used as the initial value of the next iteration until the first data inference is completed to obtain the summary inference result of the knowledge graph. Therefore, when the iterative inference of the first batch is carried out, the specific input data is the input instance data, and when the inference is carried out on the rules of the second batch and the rules above the rules of the second batch, the input data adds the inference result output by the rules of the previous batch into the input instance data to be used as the input data of the rules of the current batch for inference, thereby realizing the effect of the iterative inference.
Specifically, the distributed iteration mode is a MapReduce distributed iteration method, where Map calculation is performed on the input instance data to obtain first input data, where the first input data includes a first node, and the first node is a node corresponding to a next sub-target in the first rule; and taking the first node as a dividing object, and performing Reduce calculation on the first input data according to MapReduce distributed calculation to obtain a summary reasoning result of the knowledge graph.
In particular, MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). The concepts "Map" and "Reduce", and their main ideas, are borrowed from functional programming languages, as well as features borrowed from vector programming languages. The method greatly facilitates programmers to operate programs on the distributed system under the condition of no distributed parallel programming. Current software implementations specify a Map function to Map a set of key-value pairs into a new set of key-value pairs, and a concurrent Reduce function to ensure that each of all mapped key-value pairs share the same key-set.
Briefly, a mapping function is a specified operation on each element of a conceptual list of individual elements (e.g., a list of test results) (e.g., in the previous example, one finds that all students' results are over-rated by one, which may define a "minus one" mapping function to correct the error). In fact, each element is operated on independently, and the original list is not modified, since a new list is created to hold the new answers. That is, Map operations can be highly parallel, which is very useful for high performance demanding applications and for requirements in the field of parallel computing. While Reduce refers to the appropriate merging of elements of a list (continuing with the previous example, if one wants to know how to do the average score of a class. Although not as parallel as the mapping function, the reduction function is also useful in a highly parallel environment because reduction always has a simple answer and large scale operations are relatively independent.
The MapReduce distributes the large-scale operation of the data set to each node on the network to realize reliability; each node will periodically return the work it has done and the latest state. If a node remains silent for more than a predetermined period of time, the master node (similar to the master server in the Google File System) records the node's status as dead and sends the data assigned to the node to another node. Each operation uses an atomic operation of a named file to ensure that conflicts between parallel threads do not occur; when files are renamed, the system may copy them to another name than the task name. (avoid side effects).
The system can automatically divide the big data to be processed of one Job (Job) into a plurality of data blocks, each data block corresponds to one calculation Task (Task), and automatically schedules the calculation nodes to process the corresponding data blocks. The job and task scheduling function is mainly responsible for distributing and scheduling computing nodes (Map nodes or Reduce nodes), monitoring the execution states of the nodes and controlling the synchronization executed by the Map nodes. In a large-scale MapReduce computing cluster formed by low-end commercial servers, node hardware (host, disk, memory and the like) errors and software errors are normal, so that the MapReduce can detect and isolate error nodes and schedule and allocate new nodes to take over the computing tasks of the error nodes. Meanwhile, the system also maintains the reliability of data storage, improves the reliability of data storage by using a multi-backup redundant storage mechanism, and can detect and recover error data in time.
In the foregoing, when performing inference, since the rules are divided into a plurality of batches, the rules of each batch need to be run one by one in order. Since the processing principle of each batch of rules at runtime is the same, the specific process of running each batch of rules is described in detail below by taking the running of one of the batches (i.e. a word) as an example. In an alternative implementation manner, as shown in fig. 3 and 5, the process of operating a single batch rule in the distributed iterative manner to perform inference specifically includes:
step 131: initializing the input data by applying the batch rule by using distributed computation to obtain intermediate data with unfinished reasoning or data to be checked with finished reasoning;
step 132: applying the batch rule to the intermediate data and the input data in an inductive mode to carry out reasoning according to the characteristic correlation of the data, and carrying out N-round iterative inference until the output data does not contain the intermediate data any more, wherein N is a positive integer;
step 133: and (4) checking the data to be checked output by each round of iterative inference, and outputting the inference result of running the batch rule if the checking is successful, wherein the inference result is used as a part of input data for running the next batch rule inference.
In step 131, a batch rule is applied to the input data by using distributed computing (such as MapReduce distributed computing) to initialize, which aims to divide and process the input data, so that each machine in a distributed computing system processes different data, thereby achieving the technical effects of distributed processing of data, solving the problem of space-time efficiency, and improving time efficiency and space efficiency when the source data scale is very large. Further, in step 131, after the batch rule is applied to the input data by using distributed computation for initialization, two types of data may be used, one type is intermediate data whose inference is not finished, and the other type is data to be checked whose inference is finished, and different processing methods may be used for the two types of data:
and for the intermediate data which is not subjected to inference, applying the batch rule to carry out second inference by taking the intermediate data which is not subjected to inference and the input data as objects, then judging whether the intermediate data which is not subjected to inference exists or not, if the intermediate data which is not subjected to inference exists, applying the batch rule to carry out third inference by taking the intermediate data which is not subjected to inference and the input data as objects again, and then judging whether the intermediate data which is not subjected to inference exists or not. And circulating the processing logic until the output data does not contain intermediate data any more, and further judging that the reasoning is successful.
After the inference is successful, the data with the inference finished needs to be detected, that is, the data to be detected with the inference finished needs to be processed. The process is shown as step 133. In step 133, in order to ensure the accuracy of the inference result and prevent the output of duplicate data, which may cause inaccurate inference result, the data to be checked needs to be verified, that is, because the rule determination mode is that if the sub-targets are all true, the rule header is true, the previously obtained data to be checked is essentially the content of the sub-targets of each rule, that is, it indicates that the sub-targets have been pushed out, but it is also necessary to verify whether each sub-target is true, if true, it may be possible to conclude that the content of the rule header is true, that is, the verification is successful, and output the result.
In summary, the embodiment of the present specification adapts to the distributed type by performing multi-batch rule iterative inference on data through graph node division according to the inference characteristics of knowledge graph rules, that is, in the process of data initialization in step 131, clustering is performed according to the content of nodes, that is, related node instances enter a computing node for processing; the other hand means that when the intermediate data is reasoned in step 132, clustering is performed according to the nodes (that is, the characteristic correlation of the data, and the related data are gathered together for calculation), so that the problem that the large-scale knowledge inference cannot be realized due to too large time and space overhead is solved, and the technical effects that iterative inference can be performed in a distributed environment and the method can be applied to large-scale inference are achieved.
Carry out two
In order to better explain a data processing method based on knowledge graph provided by the embodiments of the present specification, the following description is made with reference to a specific example.
For a clearer explanation, referring to table 1, the data types, rule examples, and the like required by the present embodiment are explained as follows:
description of (A) data types
1. Input data, { data _ type ═ input, type ═ prop/edge/type, entry _ ID, prop _ name, prop _ value, edge _ name, edge _ entry _ ID, entry _ type } # edge _ entry _ ID is the node ID to which the edge points, and entry _ type is the type to which the node corresponds.
2. Intermediate data, { data _ type ═ middle, rule, entry _ ID, item _ ID, var _ map } # rule is an inference rule corresponding to the intermediate data, entry _ ID is a node ID related to a next sub-target, item _ ID is a next sub-target number, and var _ map is assigned data in an inference process
3. Data to be checked, { data _ type ═ check, type, entry _ id, prop _ name, prop _ value, edge _ name, edge _ entry _ id, entry _ type }
4. Result data, { data _ type ═ result, type, entry _ id, prop _ name, prop _ value, edge _ name, edge _ entry _ id, entry _ type }
(II) rule examples
1. edge _ farmInLaw (X, Z): edge _ life (X, Y); the grandmother of edge _ fat (Y, Z) # X is Y, the father of Y is Z, and the father of Y deduces that the mountain of X is Z
2. type _ middlelager (X) — type _ person (X); prop _ age (X, a); a >35and a <50# X are humans, X is a of age, a >35and <50 then X is a middle aged human
3. prop _ age (X, b): type _ person (X); prop _ birthDate (X, a); b is def _ age (a) # X is a person, the birthday of X is a, the function def _ age yields the age b, and the age of X is b
The data processing method based on the knowledge graph provided by the embodiment of the present specification is described below, as shown in fig. 4 and 5:
s1: filtering data necessary for reaching inference (equivalent to step S110 in the previous embodiment)
In the knowledge graph, there are many data irrelevant to reasoning, and this step needs to filter out irrelevant graph data to obtain the following input example data, as shown in table 1:
TABLE 1
data_type
|
type
|
entity_id
|
prop_name
|
prop_value
|
edge_name
|
edge_entity_id
|
entity_type
|
input
|
edge
|
1
|
|
|
wife
|
2
|
|
input
|
edge
|
2
|
|
|
father
|
3
|
|
input
|
prop
|
1
|
birthDate
|
1982-01-11
|
|
|
|
input
|
prop
|
2
|
birthDate
|
1986-06-12
|
|
|
|
input
|
prop
|
3
|
birthDate
|
1960-08-20
|
|
|
|
input
|
type
|
1
|
|
|
|
|
Person
|
input
|
type
|
2
|
|
|
|
|
Person
|
input
|
type
|
3
|
|
|
|
|
Person |
S2: the rules are divided into a plurality of batches according to the rule dependency relationship (corresponding to step S120 in the foregoing embodiment).
Among them, the following dependencies can be set by rule 1, rule 2, and rule 3 in the above rule example: rule 2 depends on rule 3 (the sub-target prop _ age is derived from rule 3), so the number of dependencies between rule 1 and rule 3 is 0 as the first batch rule; rule 2 has a dependency number of 1 as the second batch rule.
S31: a first iterative inference, wherein the first iterative inference is calculated by applying a first batch rule (corresponding to a part of step S130), specifically including:
performing initial map calculation on input data by using MapReduce distributed calculation to obtain the following output data, as shown in Table 2:
TABLE 2
data_type
|
rule
|
entity_id
|
item_id
|
var_map
|
middle
|
edge_fatherInLaw
|
2
|
1
|
{X=1,Y=2}
|
middle
|
prop_age
|
1
|
1
|
{X=1}
|
middle
|
prop_age
|
2
|
1
|
{X=2}
|
middle
|
prop_age
|
3
|
1
|
{X=3} |
In the above table, the data type is middle data (midle), and we illustrate the first row example. In the first example, the rule (rule) is edge _ missing, the next sub-target related node ID (entry _ ID) is 2, the next sub-target number (item _ ID) is 1, and the assigned data (var _ map) in the inference process is { X ═ 1, Y ═ 2 }. For step S31, by initializing Map calculation, a suitable cut point, such as entity _ id, is obtained, and by establishing the cut point, it is possible for inference to make iterative inference in a distributed environment, thereby solving the problem that the time and space overhead of large-scale inference is unacceptable.
Then, reduce calculation is performed according to the entry _ id division, and the following output data is obtained, as shown in table 3:
TABLE 3
data_type
|
type
|
entity_id
|
prop_name
|
prop_value
|
edge_name
|
edge_entity_id
|
entity_type
|
check
|
edge
|
1
|
|
|
fatherInLaw
|
3
|
|
check
|
prop
|
1
|
age
|
36
|
|
|
|
check
|
prop
|
2
|
age
|
31
|
|
|
|
check
|
prop
|
3
|
age
|
57
|
|
|
|
In table 3, the data type is to-be-checked data (check), and since the data is not yet finished, the following inference result needs to be obtained after the last reduce calculation is finished and through a round of mapreduce, as shown in table 4, until the first batch rule inference is finished. And in the verification of the data to be checked, detecting all possible values of variables in the rules according to the rules of Datalog, and when the variables enable all sub-targets in the rule body to be true, the rule head is true. For the first row example of this embodiment, when the sub-targets edge _ life (X, Y) and edge _ heat (Y, Z) are all true for the rule header edge _ heat inlaw (X, Z), the rule header is determined to be true
edge _ fasternLaw (X, Z) is true.
TABLE 4
data_type
|
type
|
entity_id
|
prop_name
|
prop_value
|
edge_name
|
edge_entity_id
|
entity_type
|
result
|
edge
|
1
|
|
|
fatherInLaw
|
3
|
|
result
|
prop
|
1
|
age
|
36
|
|
|
|
result
|
prop
|
2
|
age
|
31
|
|
|
|
result
|
prop
|
3
|
age
|
57
|
|
|
|
It can be seen from the above steps that, during the reduce calculation, the calculation is performed according to the clustering of the entry _ ids, that is, the clustering of the entry _ ids all being 1, and the clustering of the entry _ ids all being 2, and so on, the entry _ ids can be understood as the data features in the foregoing embodiment S320, and according to the feature correlation of the data, the calculation can be understood as summarizing the clustering of the same or similar features together to perform the calculation, so that the method is applicable to a distributed environment, and different calculation nodes process data of the same or similar features, that is, the purpose that the knowledge graph inference can be applicable to the distributed calculation environment is achieved through reasonable node segmentation.
S32: and (4) performing Nth iterative reasoning. (corresponding to a part of step S130 of the previous embodiment)
And adding the result obtained by reasoning according to the previous batch of rules into the input data for reasoning according to the batch of rules, carrying out reasoning calculation on the second batch of rules, and carrying out the Nth iteration processing according to the processing result of the unfinished data.
In the second batch of rule reasoning calculation, mapreduce distributed calculation is used to perform initial map calculation on input data to obtain the following output data, as shown in table 5:
TABLE 5
After map calculation is initialized, reduce calculation is performed to obtain the following output data, as shown in table 6, where the following output data are data to be checked:
TABLE 6
data_type
|
type
|
entity_id
|
prop_name
|
prop_value
|
edge_name
|
edge_entity_id
|
entity_type
|
check
|
type
|
1
|
|
|
|
|
MiddleAger |
The data to be inspected obtained above is processed through a round of mapreduce to obtain the following data, as shown in table 7:
TABLE 7
data_type
|
type
|
entity_id
|
prop_name
|
prop_value
|
edge_name
|
edge_entity_id
|
entity_type
|
result
|
type
|
1
|
|
|
|
|
MiddleAger |
And finally, obtaining a summary reasoning result of the knowledge graph after reasoning is finished, performing multi-batch rule iterative reasoning on the data through graph node division to adapt to a distributed computing environment as shown in table 8, finding a proper segmentation point according to the characteristics of the knowledge graph rule reasoning, and enabling the reasoning to be subjected to iterative reasoning in the distributed environment, so that the technical effects of being capable of performing iterative reasoning in the distributed environment and being applied to large-scale reasoning are achieved.
TABLE 8
data_type
|
type
|
entity_id
|
prop_name
|
prop_value
|
edge_name
|
edge_entity_id
|
entity_type
|
result
|
edge
|
1
|
|
|
fatherInLaw
|
3
|
|
result
|
prop
|
1
|
age
|
36
|
|
|
|
result
|
prop
|
2
|
age
|
31
|
|
|
|
result
|
prop
|
3
|
age
|
57
|
|
|
|
result
|
type
|
1
|
|
|
|
|
MiddleAger |
It can be seen that the final output reasoning results are: the Yuenai of 1 is 3, the age of 1 is 36, the age of 2 is 31, the age of 3 is 57, and 1 is a middle aged person.
EXAMPLE III
Based on the same inventive concept as the data inference method based on the knowledge graph in the foregoing embodiment, an embodiment of the present specification further provides a data inference device based on the knowledge graph, as shown in fig. 6, including:
a first input instance data obtaining unit 11 for obtaining input instance data of a knowledge graph;
the inference rule batch dividing unit 12 is configured to divide a plurality of preset inference rules into more than two batch rules according to inference dependency relationships among the multiple inference rules, where different batch rules have different operation priorities;
the first summarizing inference result obtaining unit 13 is configured to sequentially run the rules of each batch in a distributed iteration manner from high to low according to the running priority of each batch to perform inference to obtain a summarizing inference result of the knowledge graph; when a first batch rule is operated for reasoning, the specific input data is the input instance data; when the rules of the second batch and more than the second batch are operated for reasoning, the specific input data are the reasoning result output by the rules of the previous batch and the input instance data.
In an alternative implementation, the apparatus further includes:
the priority dividing unit is used for specifically setting the operation priority as a dependency number, the smaller the dependency number is, the higher the corresponding operation priority is, different batches of rules have different dependency numbers, the same batch of rules have the same dependency number, the dependency number of a rule indicates the degree of dependency of the rule on other rules, and the dependency number is positively correlated with the degree of dependency.
In an alternative implementation, the apparatus further includes:
an original input instance data obtaining unit for obtaining original input instance data of the knowledge graph;
and the second input instance data obtaining unit is used for carrying out data processing according to a plurality of preset inference rules to obtain the input instance data of the knowledge graph.
In an alternative implementation, the apparatus further includes:
the initialization unit is used for initializing the input data by applying the batch rule by using distributed computation to obtain intermediate data with unfinished reasoning or data to be checked with finished reasoning;
the intermediate data reasoning unit is used for applying the batch rule to the intermediate data and the input data in an inductive mode to carry out reasoning again according to the characteristic correlation of the data, and carrying out N-round iterative reasoning until the output data does not contain the intermediate data any more, wherein N is a positive integer;
and the to-be-inspected data inspection unit is used for inspecting the to-be-inspected data output by each iteration inference, and outputting the inference result of the operation of the batch rule if the inspection is successful, wherein the inference result is used as a part of input data for operating the inference of the next batch rule.
In an alternative implementation, one of the inference rules includes a rule body, wherein the rule body includes one or more sub-targets, wherein the sub-targets have an assignment statement.
In an alternative implementation, the distributed iteration mode is MapReduce.
Various changes and specific examples of the data inference method based on the knowledge graph in the first embodiment of fig. 1 are also applicable to the data inference device based on the knowledge graph in the present embodiment, and through the foregoing detailed description of the data inference method based on the knowledge graph, those skilled in the art can clearly know the implementation method of the data inference device based on the knowledge graph in the present embodiment, so for the brevity of the description, detailed description is not provided here.
Example four
Based on the same inventive concept as the data inference method based on knowledge graph in the previous embodiment, the present specification further provides a server, as shown in fig. 7, including a memory 304, an inference engine 302, and a computer program stored in the memory 304 and operable on the inference engine 302, wherein the inference engine 302 implements the steps of any one of the data inference methods based on knowledge graph as described above when executing the computer program.
Where in fig. 4, a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more reasoners represented by reasoner 302 and memory represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium.
Reasoner 302 is responsible for managing bus 300 and general reasoning, and memory 304 may be used to store data used by reasoner 302 in performing operations.
On the other hand, based on the inventive concept of abnormal transaction identification as in the foregoing embodiments, the present specification embodiment further provides a computer readable storage medium, on which a computer program is stored, which when executed by the reasoner, implements the steps of any one of the methods of abnormal transaction identification as described above.
The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a reasoner of a general purpose computer, special purpose computer, embedded inference engine, or other programmable data inference apparatus to produce a machine, such that the instructions, which execute via the reasoner of the computer or other programmable data inference apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data inference apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data inference device to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented inference, such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, the specification is intended to include such modifications and variations.