CN114662693A - Reinforced learning knowledge graph reasoning method based on action sampling - Google Patents

Reinforced learning knowledge graph reasoning method based on action sampling Download PDF

Info

Publication number
CN114662693A
CN114662693A CN202210244316.XA CN202210244316A CN114662693A CN 114662693 A CN114662693 A CN 114662693A CN 202210244316 A CN202210244316 A CN 202210244316A CN 114662693 A CN114662693 A CN 114662693A
Authority
CN
China
Prior art keywords
action
sampler
agent
motion
lstm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210244316.XA
Other languages
Chinese (zh)
Inventor
贾海涛
乔磊崖
李家伟
李嘉豪
林萧
曾靓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202210244316.XA priority Critical patent/CN114662693A/en
Publication of CN114662693A publication Critical patent/CN114662693A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a reinforcement learning knowledge graph reasoning method based on action sampling. Aiming at the problems of insufficient representation capability, ineffective redundant action selection and no memory component in the traditional knowledge graph reinforcement learning inference algorithm, the method aims at selecting a representation learning method with stronger adaptability to represent a reinforcement learning environment to enhance the algorithm representation capability according to the original fact prediction score of the representation learning method on a data set; designing an action sampler to reduce invalid redundant action selection of the intelligent agent in the walking process; the LSTM is used as a memory component, and historical information is coded to increase model precision, so that the algorithm can obtain an effect superior to that of a path-based reasoning algorithm without pre-training. The method maps the path obtained by the agent walking in the environment to the three-layer LSTM strategy network, promotes the agent to select a more meaningful path through action sampling, and finally realizes more accurate entity relationship path learning.

Description

Reinforced learning knowledge graph reasoning method based on action sampling
Technical Field
The invention belongs to the field of natural language processing.
Background
In recent years, deep learning techniques have achieved many of the most advanced results in a variety of classification and recognition problems. However, complex natural language processing problems often require multiple interrelated decisions, and the ability to have deep learning models learn reasoning remains a challenging problem. To process complex queries without obvious answers, the intelligent machine must be able to reason about existing resources and learn to infer unknown answers.
With the continuous development of knowledge graph reasoning technology, reinforcement learning is proved to obtain better results in a knowledge reasoning task. The deep Path issued by the EMNLP2017 introduces reinforcement learning into reasoning of the knowledge graph for the first time, and the deep Path simply samples the knowledge graph and puts the knowledge graph into a strategy network for training. The main task is to give an entity pair (entity1, entity2) in a knowledge graph, so that a model can reason the path from a head entity to a tail entity; its subtasks include Link Prediction (Link Prediction) and Fact Prediction (Fact Prediction). However, deepPath suffers from the following problems:
(1) states in the environment are simply represented by TransE, and the representation capability is insufficient;
(2) the random action sampling mode may cause the agent to take many invalid redundant actions, consuming computational cost, and generating a false path problem;
(3) the state vector is directly input into the strategy network, and rich relevance and semantic information among original states are lost.
Aiming at the problems, the invention provides a reinforced Learning known Knowledge Graph Reasoning Method (RLKGR-ASM) based on Action Sampling and an LSTM memory component.
Disclosure of Invention
The invention provides a reinforcement learning knowledge graph reasoning method based on action sampling, and aims to solve the problems of insufficient representation capability, invalid action selection, no memory component and the like of the existing reinforcement learning reasoning method. The method comprises the following steps:
(1) and selecting an optimal representation method for different data sets in a data processing layer, and representing the triad and reasoning relation in the data as a feature vector.
(2) And pre-training the model by using a random Breadth First Strategy (BFS) and expert data in a pre-training layer so as to improve the convergence of the model.
(3) And adding a reward function to the secondary training layer for retraining, and adding an action sampler and an LSTM memory component to the RL model.
(4) The output layer uses a policy network for output.
Drawings
FIG. 1 RLKGR-ASM algorithm flow chart
FIG. 2 is a schematic diagram of an LSTM memory device
FIG. 3 is a schematic view of an operation sampler
FIG. 4 the train series of MAP scores on the fact prediction task
FIG. 5 NELL-995 data set Link prediction task MAP-value comparison
FIG. 6 FB15K-2375 data set Link prediction task MAP value comparison
FIG. 7 Hits @1, Hits @3, MRR, MAP values on the prediction task of this experiment and DeepPath linking the NELL-995 and FB15K-237 datasets
FIG. 8 shows actual MAP values of predicted results for TransE, TransR, TransH, TransD, DeepPath and RLKGR-ASM (in this experiment)
FIG. 9 number of inference paths used by PRA and this experiment
FIG. 10 DeepPath, RLKGR-ASM (without motion sampler), RLKGR-ASM (for this experiment) migration time per round on two datasets (unit: sec)
Detailed Description
The technical solution in the embodiment of the present invention will be clearly and completely described below with reference to fig. 1 in the embodiment of the present invention.
As shown in the attached figure 1, the invention is based on an action sampling and LSTM memory component, and the reasoning algorithm mainly comprises five parts of data preprocessing, pre-training, reward retraining and output. The specific implementation mode is as follows: the method comprises the following steps: data processing layer
After the basic preprocessing is carried out on the data sets NELL-995 and FB15K-237 used in the experiment, the evaluation standard is consistent with the evaluation standard of the final result of the experiment by directly applying four embedding-based methods of TransE, TransH, TransR and TransD to the task of fact prediction: average accuracy (MAP), results are shown in figure 4. As shown, TransD works best in NELL-995; among FB15K-237, TransH gave the best results.
The original reasoning result of the embedding method on the data set can directly reflect the adaptation degree of the representation method and the data set, the higher the score is, the better the reasoning effect is, namely, the method can more and more perfectly acquire the original semantic information of the data, and the algorithm environment has stronger representation capability; based on this, the present invention selects TransD as a representative method of NELL-995 and TransH as a representative method of FB 15K-237.
Step two: pre-training layer
The model is pre-trained using a random Breadth First Strategy (BFS) with expert data to improve the convergence of the model. For each relationship, we learn the supervised policy using a subset of all positive samples (entity pairs).
For each relationship, the algorithm learns a supervised policy using a subset of all positive samples; for each positive sample (es, et), bilateral BFS is employed in the pre-training process to find the correct path between entities. For each sequence of path relationships (r1, r 2.., rn), θ is updated to maximize the desired reward, as shown in equation (1), where J (θ) is the desired reward.
Figure BDA0003544394750000031
For supervised learning, the algorithm rewards each successful walk with +1, as shown in equation (2) we update the gradient of the policy network with the correct path found by BFS.
Figure BDA0003544394750000032
Step three: reward retraining layer
The RL agent used to implement reinforcement learning and the external environment of reinforcement learning are defined, and the environment is initialized according to the definition of the global reward function.
The reinforcement learning system is composed of two parts, the first part is an external environment E, and the dynamics between KG and intelligent agent interaction are specified. This environment is modeled as a Markov Decision Process (MDP). The MDP is defined as a tuple<S,A,P,R>Where S is a continuous state space, and A ═ a1,a2,.....anIs the set of all available actions, P is the transition probability matrix, and R (s, a) is the reward function for each (s, a).
The second part of the system is an agent, which is represented as a policy network, e.g., piθ(s, a) ═ p (a | s; θ). It maps the state to a random strategy and updates the neural net geometric parameter theta by adopting a random gradient descent method.
The components of the system are respectively as follows:
action (Action): give the entity pair with relation r (e)s,et) Reinforcement learning agents are expected to find the most informative paths connecting these two entities. Starting from the head entity es, the agent uses the policy network to select the most likely relationship, expanding the path in each step until it reaches the target entity et. We defineThe output dimension of the strategy is equal to the related coefficient in a large-scale Knowledge Graph (KG), namely, the action space is defined as all relations in KG.
Status (States): the entities and relationships in the KG are discrete symbols, and to obtain semantic information for these symbols, the present invention uses TransD and TransH to map NELL-995 and FB15K-237, respectively, into a low-dimensional space. The state vector of the t step is
ωt=(et⊥,etarget⊥-et⊥). Where et is the embedding of the current entity, etargetIs the embedded vector of the target entity.
Reward (Reward): in the rewarding retraining process, the intelligent agent needs to obtain the rewarding feedback to judge the quality of the walk, so as to update the network parameters, and the global rewarding function defined by the invention is shown as the following formula (3):
Figure RE-GDA0003662932980000042
if the agent can reach the target through a series of actions, a +1 global reward is obtained.
For the relational reasoning task, it is observed that short paths tend to provide more reliable reasoning evidence than long paths. Shorter relationship chains can also improve reasoning efficiency by limiting the length of RL interaction with the environment. The efficiency reward is defined as shown in equation (4).
Figure RE-GDA0003662932980000043
Wherein a path p is defined as a series of relations r1→r2→......→rn
The training samples (entity1, entity2) have similar state representations in vector space. The agent is inclined to find paths with similar syntax and semantics. These paths usually contain redundant information, and in order to encourage the agent to find different paths, a diversity reward function is defined using the cosine similarity between the current path and the existing path as shown in equation (5).
Figure BDA0003544394750000051
Wherein
Figure BDA0003544394750000052
Represents a relationship chain r1→r2→......→rnIs input.
And then an LSTM memory component is built, and the LSTM memory component is added in the aspect of state representation. The dynamic environment path planning algorithm is optimized and reinforced by combining with the LSTM, as shown in the attached figure 2.
The strategy network of the common DeepPath reinforcement learning algorithm only receives the state representation of the current time, but the search strategy is related to the historical information, and in order to enable the algorithm to obtain the sufficient correlation between the historical information and the state, the invention adopts a three-layer LSTM network to encode the historical search information;
cell state s of the LSTM layer at time ttAnd output htThe calculation process of (2) is as follows:
first, synthesize the current input xtOutput h at time (t-1)t-1And bias term b of forgetting gatefThe activation value f of the gate at this time is obtainedt. Invalid information is deleted from the state at time t-1. And then sigmoid is selected as an activation function to realize normalized output as shown in the following formula (6).
ft=sigmoid(Wf,xxt+Wf,hht-1+bf) (6)
Next, the LSTM layer stores the cell state stThe more efficient information is selected for storage. First, candidate values that may possibly be added to the cell state are calculated
Figure BDA0003544394750000056
Calculating an activation value i for an input gatetAs shown in formula (7) and formula (8).
Figure RE-GDA0003662932980000054
it=sigmoid(Wi,xxt+Wi,hht-1+bi) (8)
Finally, the current unit state s is updated according to the previous calculation resulttAs shown in formula (10), wherein
Figure BDA0003544394750000058
Representing a Hadamard (element) product.
Figure RE-GDA0003662932980000056
Output h of the LSTM networktCan be expressed by the following formulae (10) and (11).
ot=sigmoid(Wo,xxt+Wo,hht-1+bo) (10)
Figure BDA0003544394750000061
After three layers of coding, can be expressed as ht=LSTM(ht-1,wt) When t is 0, ht-10. After the encoding is completed, the state of RL at this time is denoted as st=(ht,wt) Inputting the state into a strategy network, and training through a fully-connected neural network consisting of two layers of ReLU and one layer of Softmax to obtain an action probability matrix.
When the agent selects an action, the invention sets an action sampler, as shown in fig. 3.
In the reinforcement learning algorithm, an intelligent agent continuously expands a path through interaction with the environment, a strategy network outputs an action probability matrix after receiving a joint state of a current state and historical information, and the intelligent agent selects a next action according to the action probability matrix to expand the path.
In order to avoid the intelligent agent from going to excessive selection invalid paths during action selection, an action sampler is added in the chapter during action selection of the intelligent agent: recording termination node e whenever self occurs in random walk of agentdAction (relation) r with this selectiondAnd added to the memory of the motion sampler, and recorded as invalid motion, and expressed as (e)d,rd) The pair of entity relationships of (1); in subsequent walks, assume that the agent arrives at etIf e istExisting in the physical memory set of the motion sampler, the motion sampler will remove r from the motion space when the next motion is selecteddAt this time, the next action selected by the agent is not necessary to be an invalid action which has occurred before, so that the agent is encouraged to have a greater probability to perform a complete walk to search a more informative path set, and meanwhile, the calculation power can be saved.
Step four: output layer
In order to find the inference path controlled by the reward function, the supervised policy network needs to be retrained using the reward function. The training process is similar to pre-training, except that the reward function portion is added, where the gradient of the parameter is updated as shown in equation (12).
Figure BDA0003544394750000062
Starting from the source node, the agent selects a relationship according to a random strategy pi (a | s) to extend its thrust path, the relationship links that may cause the agent to reach a new entity or may not produce any result (action sampler reduces the occurrence of such a situation), the failed step causes the agent to obtain a negative reward, and if successfully reaching the target entity, the agent obtains a positive reward of + 1.
The purpose of Link Prediction (Link Prediction) is to predict the target entity. For each entity-relationship pair (e, r), there is a true value trum and about 10 generated false values tfaiid. Here, the results of PRA, DeepPath, TransE, TransR and RLKGR-ASM (experiment) and RLKGR-ASM (no pre-training) are listed (the most representative 10 relationships of NELL-995 and FB15K-237, respectively), as shown in FIGS. 5 and 6. As can be seen from the table, the reinforcement learning based inference method in most cases due to the embedding based methods (TransE, TransR) and the path based method (PRA), RLKGR-ASM (this experiment) achieved better results overall on each link prediction task at NELL-995. The MAP value of the invention is higher than that of other algorithms, and particularly, the invention obtains excellent effect on FB 15K-237; on NELL-995, the MAP index of this experiment increased by 7.8%, 2.7%, 13.9%, 1.9% compared to TransE, TransR, PRA and DeepPath, respectively; on FB15K-237, the MAP index of the experiment is increased by 8.1%, 7.4%, 7.2% and 4.1% compared with TransE, TransR, PRA and DeepPath respectively, which proves the effectiveness of the experiment.
In addition, since no supervised pre-training of expert data is performed, the MAP value of the RLKGR-ASM (without pre-training) linked prediction task on two data is superior to the embedded inference methods of TransE and TransR and the path-based inference method of PRA, although it is still superior to deep path and the experiment. On NELL-995, the overall MAP index of RLKGR-ASM (without pre-training) was improved by 5.4%, 0.3%, 11.5%, respectively, compared to the conventional algorithm; on FB15K-237, the overall MAP indicator of RLKGR-ASM (without pre-training) was improved by 2.7%, 2.0%, and 1.8%, respectively, compared to the conventional algorithm.
In addition, a detailed comparison of the model performance of the present algorithm with DeepPath is shown in FIG. 7, which lists the values of Hits @1, Hits @3, MRR and MAP for both algorithms on the NELL-995 dataset and the FB15K-237 dataset, respectively, on the link prediction task. As can be seen from the table, on the NELL-995 data set, the experiment compares that DeepPath increases the result indexes of Hits @1, Hits @3, MRR and MAP by 1.3%, 2.5%, 2.1% and 1.9%, respectively; on the FB15K-237 data set, the experiment compared with DeepPath shows that the values of the result indexes, i.e. Hits @1, Hits @3, MRR and MAP, are respectively increased by 4.6%, 6.3%, 4.9% and 4.1%.
Fact Prediction (Fact Prediction) aims at predicting the truth of an unknown Fact, the proportion of positive and negative triplets in a dataset being about 1: 10. this task is different from link prediction to rank target entities, but directly ranks all positive and negative samples of a particular relationship. In the fact prediction task, this section selects TransE, TransR, TransD, TransH and DeepPath to compare with RLKGR-ASM (this experiment). MAP is used as an evaluation index of a comparison experiment, and Hits @ N and MRR are used as auxiliary evaluation indexes when DeepPath is used for fine comparison.
FIG. 8 lists the MAP scores in the actual prediction task based on the embedded Trans series model and DeepPath, respectively, and this experiment. It can be seen that the experiment is superior to the embedded method and DeepPath method based on the traditional reinforcement learning reasoning method in the result of the fact prediction task, and in the NELL-995 data set, the MAP value of the experiment is respectively increased by 14.4%, 13.8%, 12.1%, 11.4% and 3.4% compared with TransE, TransR, TransD, TransH and DeepPath; in the FB15K-237 dataset, the MAP values for this experiment were increased by 4.2%, 1.0%, 1.7%, 1.6%, 0.8% over TransE, TransR, TransD, TransH, and DeepPath, respectively. Excellent performance was achieved on the NELL data set, but slightly improved in the FB15K data set.
In addition, taking the relations "athletehomemestadium", "works for", and "organization hiredperson" of the link prediction task in the NELL-995 dataset as an example, fig. 9 lists the number of inference paths used by PRA and RLKGR-ASM (this experiment), and it can be seen that the number of inference paths used in this experiment is far less than that of inference paths used in the reasoning experiment, which shows that the reinforcement learning method can achieve better mapping by a more compact learning path set compared with the inference method based on paths.
For the time overhead, as shown in fig. 10, the experiment adds a burden to the computational overhead due to the addition of the LSTM memory component, and compared with deep path, the average generation time per round of the experiment on the NELL-995 data set is 13.19862 seconds, which is increased by 31.88%; the average iteration time per round on the FB15K-237 data set was 18.01331 seconds, which increased 16.31%
When the motion sampler is not used, the situation that the intelligent agent has invalid motion selection and the like in the walking process can be generated, the computational power is consumed in an invalid mode, on a NELL-995 data set, the iteration time of each round is 14.25433 seconds when the motion sampler is not used, and the time overhead of the experiment can be reduced by 7.42% through the motion sampler; on the FB15K-237 data set, the time of each iteration is 19.23654 seconds when the motion sampler is not used, and the time overhead of the experiment can be reduced by 6.34% by the motion sampler.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.

Claims (4)

1. A reinforcement learning knowledge graph reasoning algorithm based on action sampling comprises the following steps:
step 1: selecting an optimal representation method for different data sets in a data processing layer, and representing the triple and the inference relation in the data as a feature vector;
and 2, step: pre-training the model by using a random Breadth First Strategy (BFS) and expert data in a pre-training layer so as to improve the convergence of the model;
and step 3: the steps are the core contents of the patent: adding a reward function for retraining, and adding an action sampler and an LSTM memory component into the RL model; the invention adopts a three-layer LSTM network to encode the historical search information, as shown in the formula;
ht=LSTM(ht-1,wt) When t is 0, ht-1=0
The LSTM with three layers receives the entity embedded vector at the moment, three threshold modules are added in the structure of the circulation body of the LSTM, and the problems of gradient disappearance and explosion probably existing in the traditional neural network are solved while the LSTM has a memory function; after the encoding is completed, the state of RL at this time is denoted as st=(ht,wt) Inputting the state into a strategy network, training through a fully-connected neural network consisting of two layers of ReLU and one layer of Softmax to obtain an action probability matrix, and selecting the next action by the intelligent agent through the action probability matrix fed back by the strategy network to continuously expand the path;
the following formula is an output action probability matrix of the strategy network;
πθ(at|st)=σ(At×W2ReLU(W|[ht;st]))
in order to avoid the intelligent agent from going to excessive selection invalid paths during action selection, an action sampler is added in the chapter during action selection of the intelligent agent: recording termination node e whenever self condition occurs for random walk of agentdAction (relation) r with this selectiondAnd added to the memory of the motion sampler, and recorded as invalid motion, and expressed as (e)d,rd) The pair of entity relationships of (1); in subsequent walks, assume that the agent arrives at etIf e istExisting in the physical memory set of the motion sampler, the motion sampler will remove r from the motion space when the next motion is selecteddAt the moment, the next action selected by the intelligent agent is not necessary to be an invalid action which is already generated before, so that the intelligent agent is encouraged to have a higher probability to carry out complete wandering once, a more information path set is searched, and meanwhile, the calculation power can be saved;
and 4, step 4: the output layer uses a policy network for output.
2. The method as claimed in claim 1, wherein step 1 is to select a representation learning method with better effect according to the strong and weak representation abilities of different representation learning methods on a specific data set, and to improve the representation ability of the reinforcement learning environment from the bottom layer.
3. The method of claim 1, wherein step 3 adds an LSTM memory component to encode historical information to help the agent to find inference paths more efficiently, the algorithm can get rid of pre-training and the accuracy of the acquisition is better than path-based and embedded-based inference methods; under the condition of pre-training, the method effectively improves the result precision, and compared with TransE, TransR, PRA and DeepPath, the MAP index of the experiment is respectively increased by 7.8%, 2.7%, 13.9% and 1.9% on NELL-995; on FB15K-237, the MAP index of the experiment increases by 8.1%, 7.4%, 7.2% and 4.1% compared with TransE, TransR, PRA and DeepPath respectively; for the factual prediction task, in NELL-995 data set, the MAP values for this experiment were increased by 14.4%, 13.8%, 12.1%, 11.4%, 3.4% over TransE, TransR, TransD, TransH, and DeepPath, respectively; in the FB15K-237 dataset, the MAP values for this experiment were increased by 4.2%, 1.0%, 1.7%, 1.6%, 0.8% over TransE, TransR, TransD, TransH, and DeepPath, respectively.
4. The method according to claim 1, wherein step 3 sets a motion sampler, so as to reduce invalid redundant motion selection of the agent in the walking process, promote the agent to select more meaningful paths, and effectively save time and cost: on NELL-995 data set, when the iteration time of each round is 14.25433 seconds without using the action sampler, the action sampler can reduce the time overhead of the experiment by 7.42%; on the FB15K-237 data set, the time of each iteration is 19.23654 seconds when the motion sampler is not used, and the time overhead of the experiment can be reduced by 6.34% by the motion sampler.
CN202210244316.XA 2022-03-14 2022-03-14 Reinforced learning knowledge graph reasoning method based on action sampling Pending CN114662693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210244316.XA CN114662693A (en) 2022-03-14 2022-03-14 Reinforced learning knowledge graph reasoning method based on action sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210244316.XA CN114662693A (en) 2022-03-14 2022-03-14 Reinforced learning knowledge graph reasoning method based on action sampling

Publications (1)

Publication Number Publication Date
CN114662693A true CN114662693A (en) 2022-06-24

Family

ID=82029373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210244316.XA Pending CN114662693A (en) 2022-03-14 2022-03-14 Reinforced learning knowledge graph reasoning method based on action sampling

Country Status (1)

Country Link
CN (1) CN114662693A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116081797A (en) * 2022-08-25 2023-05-09 北控水务(中国)投资有限公司 Dynamic optimization method, device and equipment for full-flow control quantity of sewage treatment plant

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116081797A (en) * 2022-08-25 2023-05-09 北控水务(中国)投资有限公司 Dynamic optimization method, device and equipment for full-flow control quantity of sewage treatment plant

Similar Documents

Publication Publication Date Title
Neill An overview of neural network compression
Tamaazousti et al. Learning more universal representations for transfer-learning
US20210034968A1 (en) Neural network learning apparatus for deep learning and method thereof
CN111581343A (en) Reinforced learning knowledge graph reasoning method and device based on graph convolution neural network
CN117435715B (en) Question answering method for improving time sequence knowledge graph based on auxiliary supervision signals
CN108876044B (en) Online content popularity prediction method based on knowledge-enhanced neural network
Schilling et al. Hyperparameter optimization with factorized multilayer perceptrons
Bellinger et al. Active Measure Reinforcement Learning for Observation Cost Minimization.
CN115526317A (en) Multi-agent knowledge inference method and system based on deep reinforcement learning
CN115964459A (en) Multi-hop inference question-answering method and system based on food safety cognitive map
CN116561302A (en) Fault diagnosis method, device and storage medium based on mixed knowledge graph reasoning
CN115526321A (en) Knowledge reasoning method and system based on intelligent agent dynamic path completion strategy
Putra et al. lpspikecon: Enabling low-precision spiking neural network processing for efficient unsupervised continual learning on autonomous agents
CN114662693A (en) Reinforced learning knowledge graph reasoning method based on action sampling
CN118155860A (en) Method, equipment and medium for aligning traditional Chinese medicine large model preference
Rowe Algorithms for artificial intelligence
CN117150041A (en) Small sample knowledge graph completion method based on reinforcement learning
CN116719947A (en) Knowledge processing method and device for detecting power inspection defects
Gupta et al. A Roadmap to Domain Knowledge Integration in Machine Learning
CN114626530A (en) Reinforced learning knowledge graph reasoning method based on bilateral path quality assessment
Peng A Brief Summary of Interactions Between Meta-Learning and Self-Supervised Learning
CN114722212A (en) Automatic meta-path mining method oriented to character relation network
CN113051353A (en) Attention mechanism-based knowledge graph path reachability prediction method
Wang et al. Research on knowledge graph completion model combining temporal convolutional network and Monte Carlo tree search
CN114491080B (en) Unknown entity relationship inference method oriented to character relationship network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220624

WD01 Invention patent application deemed withdrawn after publication