CN116610822A

CN116610822A - Knowledge graph multi-hop reasoning method for diabetes text

Info

Publication number: CN116610822A
Application number: CN202310899864.0A
Authority: CN
Inventors: 郭永安; 狄杰斯; 钱琪杰; 周沂; 王宇翱
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-07-21
Filing date: 2023-07-21
Publication date: 2023-08-18

Abstract

The invention belongs to the technical field of medical text information data processing, and discloses a knowledge graph multi-hop reasoning method for diabetes text, which comprises the steps of vectorizing source entities and relations, and carrying out embedded representation on knowledge graph triplet data; constructing an intelligent agent for reinforcement learning, and acquiring the next action of the intelligent agent through an intelligent agent random sampling strategy function; excavating logic rules in the knowledge graph through a meta-path set corresponding to the instance path set; calculating an agent total prize according to hit prizes and rule prizes of the agent, and training a strategy network of the agent by maximizing an expected value of the agent total prize; based on the extracted entity and the relation and logic rule between the entities, predicting the new relation between the source entity and the target entity of the instance path by finding the instance path matched with the rule main body by the corresponding meta path, and outputting the reasoning result. The invention improves the quality of the selected path of the agent and the performance of multi-hop reasoning, and improves the interpretability of multi-hop knowledge reasoning.

Description

Knowledge graph multi-hop reasoning method for diabetes text

Technical Field

The invention belongs to the technical field of medical text information data processing, and particularly relates to a knowledge graph multi-hop reasoning method for diabetes text.

Background

In the medical field, due to the rapid development of informatization technology and the popularization of medical information systems, massive medical text information and clinical diagnosis data are accumulated in a medical database, and particularly, in recent years, the rapid development of artificial intelligence and the proposal of intelligent medical precise medical treatment and medical auxiliary diagnosis are carried out, and the knowledge graph is gradually paid attention to in the medical field. Knowledge in large-scale data can be effectively mined, organized and managed by the knowledge graph, and the service quality of knowledge information is improved, so that more intelligent service can be provided for doctors and patients in the medical field.

Medical knowledge reasoning based on knowledge graph aims at distinguishing wrong medical knowledge data from existing knowledge graph data and exploring and deducing new knowledge. Through medical knowledge reasoning, a new relation between the existing entity pairs in the medical knowledge graph can be obtained, and then the new relation is fed back to the medical knowledge graph, so that the existing medical knowledge graph can be expanded and complemented, and complete medical knowledge support is provided for medical advanced application. Based on the extensive practical application of medical knowledge graph and the defect requirement of incomplete existing medical knowledge graph, the medical knowledge reasoning based on the knowledge graph becomes a popular problem in the fields of current knowledge graph and knowledge reasoning research.

However, although the medical knowledge graph has been well applied and developed, there are some drawbacks, in which the medical knowledge graph severely restricts its use efficiency due to the incompleteness in the construction process, for example, the diabetes data text contains a large number of entities and relationships, and the knowledge graph constructed for the diabetes text is mostly sparse, which causes the following problems in the reasoning process: (1) incomplete motion space: the knowledge graph facing the diabetes text at present still cannot cover the whole diabetes text database despite the fact that the knowledge graph already contains a large number of fact triples, and the constructed knowledge graph is sparse mostly; in the reasoning process, the lack of key entities and relations can cause incomplete action space, so that the intelligent agent is difficult to select a correct search path; (2) inference path is correct but unreliable: from the semantic level, the reliable reasoning path and the semantics of the query relationship should be similar, and the unreliable reasoning path and the semantics of the query relationship should be very different; reinforcement learning relies to some extent on feedback of rewards and due to lack of rule constraints, the model needs to optimize the distribution of action possibilities according to how much rewards are; however, due to the complexity of the knowledge-graph environment, models are often difficult to give the right rewards, and due to the agent's feedback of rewards based on delays, the agent will search for paths with "correct results but unreliable reasoning" during the search.

Disclosure of Invention

In order to solve the technical problems, the invention provides a knowledge graph multi-hop reasoning method for diabetes texts, which solves the problem that reasoning is difficult to carry out due to path missing and unreliable in a sparse knowledge graph, can integrate logic rules into a multi-hop reasoning method based on reinforcement learning by using a new reward mechanism in a sparse knowledge graph environment, and improves the quality of an agent selection path and the performance of multi-hop reasoning by feeding back the path reliability.

The invention discloses a knowledge graph multi-hop reasoning method for diabetes text, which comprises the following steps:

step 1, extracting a source entity, a target entity and a relation between the source entity and the target entity in a diabetes text knowledge graph, vectorizing the source entity, the target entity and the relation between the source entity and the target entity through pre-training, and embedding and representing knowledge graph triplet data;

step 2, constructing a reinforcement learning agent, and obtaining the next action of the agent by utilizing the most similar relation and entity pair in the current query relation and the previous hop reasoning relation of the source entity, expanding the action space of the agent and randomly sampling a strategy function by the agent;

step 3, extracting an instance path set of the source entity, summarizing a meta path set corresponding to the instance path set, and mining logic rules in the diabetes text knowledge graph;

step 4, calculating total rewards of the agent according to hit rewards when the agent hits the target entity and rule rewards when the agent element paths accord with logic rules, and training a strategy network of the agent by maximizing expected values of the total rewards of the agent;

and 5, based on the extracted relation between the source entity and the target entity and the logic rule, finding an instance path matched with the rule main body by the corresponding element path so as to predict a new relation between the source entity and the target entity of the instance path and outputting an reasoning result.

Further, the step 1 specifically includes: the diabetes text knowledge graph is expressedIn the form of (c), wherein,representing entity set,/->Representing a set of relationships; assume that each entity belongs to the entity type set +.>Is mapped by the unique type inDefinition; each directed connection in the knowledge graph +.>Representing a triplet; for any relation->Use->Representing the inverse relationship of the correspondence, i.e. +.>Equivalent to->The method comprises the steps of carrying out a first treatment on the surface of the Given a queryWherein->As source entity->For query relationships? The result of the query or the reasoning is a target entity; agent from source entity->Initially, the outgoing edge of a current entity is continuously selected and jumped to the next entity until it reaches the target entity +.>Or it is->The number of hops traversed above satisfies a predefined maximum step size.

Further, in step 2, the elements of the constructed agent include: status, actions, rewards, policy network and transfer functions; the state represents the node embedded representation of the current node of the intelligent agent, the action represents all possible next operations of the intelligent agent at the current node, the rewards represent feedback obtained after the intelligent agent takes action, the strategy network represents the network for the intelligent agent to perform reinforcement learning according to the node state, the action, the rewards and the transfer function, and the transfer function represents the transfer result of the state after the intelligent agent performs the next action;

when constructing reinforcement learning agent(s)The state of the jump is +.>Wherein->Embedded representation representing the relation to be queried, +.>Indicate->Embedded representation of jump-to-entity, < >>Indicate->Path history information of jump exploration;the entities and relations in (1) are represented as low-dimensional continuous vectors +.>And->A concatenation vector of a relationship vector and an entity vectorRepresenting actions->，/>Status +.>Current action space composed of all corresponding actions, action space +.>By->All actions in (1) represent vector stack composition, history path mentioned in state +.>The code is given by the LSTM code,。

further, in step 2, when updating the agent action space, it is assumed that the triples are queriedThe agent is currently located in the entity +.>The reasoning relation of the previous jump is +.>The method comprises the steps of carrying out a first treatment on the surface of the First, all relations and +.>And->Similarity of (3): />Wherein->For the ith relation vector in relation set R, ">"is a dot product operation.

Further, in step 2, by calculating all relationships in the relationship set R and the current query relationship of the entityAnd the reasoning relation of the previous hop +.>Similarity of>Person and->And->The highest similarity relation uses the ConvE embedded model of the bottom layer to predict the corresponding tail entity composition pair, and provides an extra size of +.>Is->The method comprises the steps of carrying out a first treatment on the surface of the And the action space of the intelligent agent at the current node is combined to update the search space generated by the intelligent agent at the node with the missing current reasoning path +.>The update procedure is denoted->。

Further, in step 2, the process will be described in the followingStatus of step->Inputting the motion vector into a reinforcement learning strategy network, guiding an intelligent agent to search in an action space by using the strategy network to obtain probability distribution of next motion, selecting the next motion by adopting random sampling, and further carrying out path searching of the next step; policy network definition isWherein->Is indicated at->In the state, all movements in the movement spaceProbability distribution of->Representing a softmax function->And->Representing two linear neural networks, reLU representing an activation function; then, a random sampling method is adopted, and one action is selected in an action space to carry out state transition, so that the next transition is carried out.

Further, in step 3, two types of paths are distinguished: instance paths and meta paths; text knowledge graph for diabetesThe upper length is->Is defined by a sequence +.>Given, whereinThe method comprises the steps of carrying out a first treatment on the surface of the Calling the corresponding sequence of entity types->As a meta-path.

Further, the logic rules commonly used for knowledge graph reasoning are written asConsider the following form of the circulation rule +.>Wherein->A round-robin rule as rule body connects source entity classes of element paths by rule headerIs->And target entity type->I.e. < ->。

Further, in step 4, a set of component paths is considered in the agent path reasoning processWherein each element corresponds to a body of a round robin rule; for each meta-path->By usingRepresenting a score, the score representing a confidence level of the corresponding rule; for instance pathsUse->Representing a corresponding meta-path;

if the extracted sequence of meta-paths corresponds to a rule, confidence of the rule correspondence is used as an additional rule rewardsEach rule corresponds to a confidence score, and the higher the confidence score is, the higher the confidence score represents the corresponding obtained rule confidence; the higher the confidence score, the more frequently the extracted sequence of the element path appears in the knowledge graph, and the higher the corresponding rule reliability; and obtaining hit rewards +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the agent total rewards function is: />Wherein->，The method comprises the steps of carrying out a first treatment on the surface of the Hyper-parameters->Set to 1, i.e. as long as the meta-path corresponds to the rule body, the reward will be increased, or +.>Set to->Namely, only if the prediction is correct, the extra rewards are obtained, which is helpful for the intelligent agent to extract the meta-path corresponding to the high-resolution rule main body; hyper-parameters->The two components of the prize are balanced.

Further, in step 5, by maximizing the expected value of the prize sumsTraining a policy network of agents, wherein the optimization of the parameters of the agents is to maximize the expectations of rewards and update the parameters of the agents by REINFORCE algorithm:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>A parameter representing the policy network is provided,expressed in the network parameter +.>When the model can obtain rewards, a REINFORCE algorithm strategy gradient formula is shown as follows: />，/>Wherein->Representing the learning rate.

The beneficial effects of the invention are as follows: compared with the existing knowledge reasoning technology oriented to diabetes texts, the method combines the sequence decision process of the reinforcement learning agent to expand the action space of the agent in the reasoning process, thereby realizing the completion of the reasoning path and realizing more comprehensive knowledge reasoning; and by introducing the meta-path and the logic rule, the rule rewards when the meta-path accords with the logic rule are added when the overall rewards of the intelligent agent are determined, the quality of the selected path of the intelligent agent is improved in a feedback path reliability mode, the performance of multi-hop reasoning is improved, the characteristic of strong interpretability of the logic rule is utilized, the reinforcement learning is guided to preferentially select the path with strong interpretability for exploration, and the interpretability of multi-hop knowledge reasoning is improved.

Drawings

FIG. 1 is a flow chart of knowledge reasoning in an embodiment;

FIG. 2 is a schematic diagram of knowledge reasoning performed in a sparse knowledge graph environment in an embodiment;

FIG. 3 is a diagram illustrating the expansion of the smart body motion space in an embodiment.

Detailed Description

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments that are illustrated in the appended drawings.

Referring to fig. 1, the embodiment of the invention provides a knowledge graph multi-hop reasoning method for diabetes text, which comprises the following steps:

As shown in FIG. 2, the knowledge reasoning task is performed in the sparse knowledge graph environment, and each entity belongs to the entity type setIs defined by mapping +.>And (5) defining. In the present embodiment, <' > a->The type of entity instrument is horone. In addition, the missing relationship between entities (black thick solid arrow) can be deduced from existing triples (black thin solid arrow) by inference paths (arrow direction), for example (Insulin resembles Recombinant Human Insulin), (Recombinant Human Insulin treats Diabetes millitus) can be deduced (instrument instruments)Diabetes millitus); however, the sparse map environment causes many key reasoning paths to be missing, such as the missing relation (black thick dotted arrow) in the graph causes that the result (black thin dotted arrow) of the required reasoning is difficult to obtain, so that the knowledge reasoning is performed in the sparse knowledge map environment, and the phenomenon of reasoning interruption often occurs. Meanwhile, the sparse map environment causes that the model is difficult to get enough training samples, sufficient information is lack to guide the training of the model, and a strategy decision network for strengthening learning energy is difficult to train well. In the embodiment of the invention, the complete path reasoning is realized by fusing semantic information and logic rules, and the reasoning truncation problem caused by the sparse knowledge graph missing path is solved.

The constructed reinforcement learning agent elements include: state, action, reward, policy network and per-step transfer function (transition) of the agent; the state represents the node embedded representation of the current node of the intelligent agent, the action represents all possible next operations of the intelligent agent at the current node, the rewards represent feedback obtained after the intelligent agent takes action, the strategy network represents the network for the intelligent agent to perform reinforcement learning according to the state, the action, the rewards and the transfer function of the node, and the transfer function represents the transfer result of the state after the intelligent agent performs the next action.

Knowledge reasoning is carried out based on reinforcement learning, a multi-hop reasoning problem in the knowledge reasoning is modeled as a sequencing decision problem, and reinforcement learning agents are trained in a feedback and interaction mode. In the process of searching the path of the intelligent agent, the strategy network of the intelligent agent is updated by learning the path which obtains high rewards, and the path reasoning is carried out by a reinforcement learning method. When the path is interrupted and the inferring path is lost in the inferring process, the trip type inferring of the intelligent agent is restrained through the semantic information of the previous trip relation and the query relation, and the path searching direction of the intelligent agent is guided. The possible action space is increased by expanding the action space, while the increase of the action space depends on the history information (encoded by LSTM) and the current state information for an additional increase of the action space. As shown in fig. 3For example, the intelligent agent action space expansion schematic diagram, it is assumed that the query triplesThe agent is located at the current entity->The relation of the previous jump is +.>Calculating all relations in relation set R and current query relation of entity>Query relation with previous hop +.>Similarity of>And predicting a corresponding tail entity composition pair by using a ConvE embedded model at the bottom layer to expand the action space of the reinforcement learning intelligent agent, and then embedding the reinforcement learning intelligent agent into the original action space of the current entity as an additional action space to solve the problem of path incoherence in multi-hop reasoning.

Before each hop agent selects a promising action, the action space of the current entity is extended with the pair (relationship, entity) most similar to the current query relationship and the previous hop inference relationship. All relationships in the agent relationship set and the entity current query relationshipInference relation with previous hop->The similarity calculation process of (1) is expressed as:。

sorting according to similarity before selectionPerson and->And->The most similar relation is taken as the most likely complement path set of the intelligent agent in the current state. Next, by +_ according to the current entity>And the dynamic complement path provides an extra size for the reinforcement learning agent by carrying out link prediction based on the embedded method ConvEIs->I.e. +.>. Combining the action space of the intelligent agent at the current node originally, the intelligent agent generates a larger search space at the node with the current reasoning path missing, namely +.>。

Will be at the firstStatus of step->Inputting the motion information into a reinforcement learning strategy network, guiding an agent to search in an action space by using the strategy network to obtain probability distribution of next motion, selecting the next motion by adopting random sampling, and further searching a next path, wherein the strategy network is defined as +_>Wherein->Is indicated at->In state, probability distribution of all actions in action space, +.>Representing a softmax function->And->Representing two linear neural networks, reLU representing an activation function; then, a random sampling method is adopted, and one action is selected in an action space to carry out state transition, so that the next transition is carried out.

We distinguish between two types of paths: instance paths and meta paths;the upper length is->Is defined by a sequence +.>Give, wherein->. We call the corresponding sequence of entity types +.>As a meta-path. In an embodiment of the present invention,an example path of length 2 is constructed,is a corresponding elementA path.

Logical rules commonly used for knowledge-graph reasoning can be written asWe consider the following form of the circulation rule +.>Wherein->The round-robin rule as rule body is composed of rule head (not to be confused with head entity in triplet) connecting element path source entity type +.>And target entity type->I.e.The method comprises the steps of carrying out a first treatment on the surface of the Definitions->The general rule has the following form. Specifically, the rule body corresponds to a posterior pathway starting with a compound and ending with a disease. In the embodiment of the invention, the rule miner adopts an AnyBURL method, and the rule main body is as follows，Corresponding to the rule body, this suggests that insulin may treat diabetes.

In the process of reasoning the path of the intelligent agent, a group of element paths are consideredWherein each element corresponds to a body of the round robin rule. For a pair ofIn each meta-path->We use +.>Represents a score which represents the confidence of the corresponding rule and, furthermore, for the example path +.>We use +.>Representing the corresponding meta-path. If the extracted sequence of meta-paths corresponds to a rule, confidence corresponding to the rule is awarded +.>Each rule corresponds to a confidence score, with higher confidence scores representing higher confidence in the corresponding derived rule. The higher the confidence score, the more frequently the extracted sequence of the element path appears in the knowledge graph, and the higher the corresponding rule reliability; and obtaining hit rewards +.>The method comprises the steps of carrying out a first treatment on the surface of the Wherein the agent total rewards function is: />Wherein->，The method comprises the steps of carrying out a first treatment on the surface of the Hyper-parameters->Set to 1, i.e. as long as the meta-path corresponds to the rule body, the reward will be increased, or +.>Set to->That is, only if the prediction is correct, the extra rewards are obtained, which is helpful for the intelligent agent to extract the meta-path corresponding to the high-resolution rule main body. Hyper-parameters->The two components of the prize are balanced.

We have performed experiments on the hetnet dataset with this embodiment. Hetionet is a medical text data set in which 1552 compounds and 137 diseases are present, and 775 types of therapeutic links are observed between compounds and diseases. We randomly split the 755 triples into a training set, a validation set, and a test set, where the training set contains 483 triples, the validation set contains 121 triples, and the test set contains 151 triples. It consists of 11 different types of 47,031 entities and 24 different types of 2,250,197 edges.

In the Hetionet dataset, we use the meta-path connecting the Compound type entity with the Disease type entity and corresponding to various pharmacological mechanisms of action as the rule body, and the confidence of the rule as the quality score. The confidence of a rule is defined as the rule support divided by the body support in the data. We estimate the confidence score for each rule by sampling 5000 element paths corresponding to the paths of the rule body and then calculate the frequency with which the rule header holds. Table 1 summarizes these 10 meta-paths and their corresponding scores:

TABLE 1

。

In order to verify the advantages of the method in the knowledge graph reasoning, the embodiment selects seven widely used knowledge graph reasoning methods at present to be compared with the knowledge graph reasoning method based on logic rules and reinforcement learning, wherein the seven knowledge graph reasoning methods are respectively as follows: logic rule based method: anyburnl; embedding-based methods: transE, distMult, convE; the graph convolution method comprises the following steps: R-GCN, compGCN; reinforcement learning-based method: MINERVA.

TABLE 2

。

The method is applied to a medical dataset Hetionet, and values of Hits@1, hits@3, hits@10 and average reciprocal rank (MRR) of a link prediction task are calculated, wherein Hits@1, hits@3 and Hits@10 respectively represent the proportions of the inferred tail entity to be 1 st, top 3 and top 10 in a candidate entity list, and the MRR represents the average value of the reciprocal rank of the inferred tail entity in the candidate entity ranking list. In the reasoning process, beam searching is carried out, the most promising path is found, and the target entities are ordered according to the probability of the corresponding path.

Table 2 compares the experimental conditions of the method with the current common knowledge graph reasoning method on the Hetionet data set, and the result shows that the method of the invention is superior to other knowledge graph reasoning methods on the medical data set Hetionet.

Through the action space expansion strategy and the introduction meta-path and logic rules, when the reinforcement learning agent breaks the reasoning path at the current node, the global information provided by the logic rules and the target prediction information can be combined to dynamically make the reasoning path complement, and the next reasoning jump is performed. The method provides a larger and more effective action search space for the reinforcement learning agent, thereby relieving the reasoning and cutting-off problems caused by the lack of the sparse knowledge graph path, adding rule rewards when element paths accord with logic rules when determining the overall rewards of the agent, and improving the reliability and the interpretability of multi-hop knowledge reasoning.

The foregoing is merely a preferred embodiment of the present invention, and is not intended to limit the present invention, and all equivalent variations using the description and drawings of the present invention are within the scope of the present invention.

Claims

1. A knowledge graph multi-hop reasoning method for diabetes text is characterized by comprising the following steps:

2. The knowledge-graph multi-hop reasoning method for diabetes-oriented text according to claim 1, wherein the step 1 is specifically as follows: the diabetes text knowledge graph is expressedForm of (1), wherein->Representing entity set,/->Representing a set of relationships; assume that each entity belongs to the entity type set +.>Is defined by mapping +.>Definition; each directed connection in the knowledge graph +.>Representing a triplet; for any relation->Use->Representing the inverse relationship of the correspondence, i.e. +.>Equivalent to->The method comprises the steps of carrying out a first treatment on the surface of the Giving a query->Wherein->As source entity->For query relationships? The result of the query or the reasoning is a target entity; agent from source entity->Initially, the outgoing edge of a current entity is continuously selected and jumped to the next entity until it reaches the target entity +.>Or it is->The number of hops traversed above satisfies a predefined maximum step size.

3. The knowledge-graph multi-hop inference method for diabetes text according to claim 1, wherein in step 2, elements of the constructed intelligent agent comprise: status, actions, rewards, policy network and transfer functions; the state represents the node embedded representation of the current node of the intelligent agent, the action represents all possible next operations of the intelligent agent at the current node, the rewards represent feedback obtained after the intelligent agent takes action, the strategy network represents the network for the intelligent agent to perform reinforcement learning according to the node state, the action, the rewards and the transfer function, and the transfer function represents the transfer result of the state after the intelligent agent performs the next action;

when constructing reinforcement learning agent(s)The state of the jump is +.>Wherein->Embedded representation representing the relation to be queried, +.>Indicate->Embedded representation of jump-to-entity, < >>Indicate->Path history information of jump exploration; />The entities and relations in (1) are represented as low-dimensional continuous vectors +.>And->A concatenation vector of a relationship vector and an entity vectorRepresenting actions->，/>Status +.>Current action space composed of all corresponding actions, action space +.>By->All actions in (1) represent vector stack composition, history path mentioned in state +.>The code is given by the LSTM code,。

4. the knowledge-graph multi-hop reasoning method for diabetes text according to claim 3, which is characterized in thatCharacterized in that in step 2, when updating the intelligent agent action space, it is assumed that the triples are queriedThe agent is currently located in the entity +.>The reasoning relation of the previous jump is +.>The method comprises the steps of carrying out a first treatment on the surface of the First, all relations and +.>And->Similarity of (3):wherein->Is the ith relation vector in relation set R.

5. The method for multi-hop inference of knowledge graph for diabetes text according to claim 4, wherein in step 2, all relationships in the relationship set R are calculated to be related to the current query relationship of the entityAnd the reasoning relation of the previous hop +.>Similarity of>Person and->And->The highest similarity relation uses the ConvE embedded model of the bottom layer to predict the corresponding tail entity composition pair, and provides an extra size of +.>Is->The method comprises the steps of carrying out a first treatment on the surface of the And the action space of the intelligent agent at the current node is combined to update the search space generated by the intelligent agent at the node with the missing current reasoning path +.>The update procedure is denoted->。

6. The method for multi-hop inference of a knowledge graph of a diabetes-oriented text according to claim 5, wherein in step 2, the method is as followsStatus of step->Inputting the motion vector into a reinforcement learning strategy network, guiding an intelligent agent to search in an action space by using the strategy network to obtain probability distribution of next motion, selecting the next motion by adopting random sampling, and further carrying out path searching of the next step; policy network definition is +.>Wherein->Is indicated at->In state, probability distribution of all actions in action space, +.>Representing a softmax function->And->Representing two linear neural networks, reLU representing an activation function; then, a random sampling method is adopted, and one action is selected in an action space to carry out state transition, so that the next transition is carried out.

7. The knowledge-graph multi-hop inference method for diabetes-oriented text according to claim 1, wherein in step 3, two types of paths are distinguished: instance paths and meta paths; text knowledge graph for diabetesUpper length isIs defined by a sequence +.>Give, wherein->The method comprises the steps of carrying out a first treatment on the surface of the Calling the corresponding sequence of entity types->As a meta-path.

8. A sugar-oriented food according to claim 7A knowledge-graph multi-hop reasoning method of urine disease text is characterized in that a logic rule for knowledge-graph reasoning is written asConsider the following form of the round robin ruleWherein->The round-robin rule as rule body is composed of source entity type of rule head connector path +.>And target entity type->I.e. < ->。

9. The method for multi-hop inference of knowledge graph for diabetes text as claimed in claim 8, wherein in step 4, a set of element paths are considered in the process of reasoning the agent pathWherein each element corresponds to a body of a round robin rule; for each meta-path->Use->Representing a score, the score representing a confidence level of the corresponding rule; for example Path->By using/>Representing a corresponding meta-path;

10. The method for knowledge-graph multi-hop inference of diabetes-oriented text according to claim 1, wherein in step 5, the sum of rewards is maximized by maximizing the expected valueTraining a policy network of agents, wherein the optimization of the parameters of the agents is to maximize the expectations of rewards and update the parameters of the agents by REINFORCE algorithm:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>A parameter representing the policy network is provided,expressed in the network parameter +.>When the model can obtain rewards, a REINFORCE algorithm strategy gradient formula is shown as follows: />，/>Wherein->Representing the learning rate.