CN114662693A - Reinforced learning knowledge graph reasoning method based on action sampling - Google Patents
Reinforced learning knowledge graph reasoning method based on action sampling Download PDFInfo
- Publication number
- CN114662693A CN114662693A CN202210244316.XA CN202210244316A CN114662693A CN 114662693 A CN114662693 A CN 114662693A CN 202210244316 A CN202210244316 A CN 202210244316A CN 114662693 A CN114662693 A CN 114662693A
- Authority
- CN
- China
- Prior art keywords
- action
- sampler
- agent
- motion
- lstm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000009471 action Effects 0.000 title claims abstract description 44
- 238000005070 sampling Methods 0.000 title claims abstract description 9
- 238000012549 training Methods 0.000 claims abstract description 22
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 20
- 230000002787 reinforcement Effects 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims abstract description 9
- 230000000694 effects Effects 0.000 claims abstract description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 34
- 238000002474 experimental method Methods 0.000 claims description 33
- 230000006870 function Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005295 random walk Methods 0.000 claims description 2
- 230000008034 disappearance Effects 0.000 claims 1
- 238000004880 explosion Methods 0.000 claims 1
- 230000006386 memory function Effects 0.000 claims 1
- 238000011156 evaluation Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Animal Behavior & Ethology (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a reinforcement learning knowledge graph reasoning method based on action sampling. Aiming at the problems of insufficient representation capability, ineffective redundant action selection and no memory component in the traditional knowledge graph reinforcement learning inference algorithm, the method aims at selecting a representation learning method with stronger adaptability to represent a reinforcement learning environment to enhance the algorithm representation capability according to the original fact prediction score of the representation learning method on a data set; designing an action sampler to reduce invalid redundant action selection of the intelligent agent in the walking process; the LSTM is used as a memory component, and historical information is coded to increase model precision, so that the algorithm can obtain an effect superior to that of a path-based reasoning algorithm without pre-training. The method maps the path obtained by the agent walking in the environment to the three-layer LSTM strategy network, promotes the agent to select a more meaningful path through action sampling, and finally realizes more accurate entity relationship path learning.
Description
Technical Field
The invention belongs to the field of natural language processing.
Background
In recent years, deep learning techniques have achieved many of the most advanced results in a variety of classification and recognition problems. However, complex natural language processing problems often require multiple interrelated decisions, and the ability to have deep learning models learn reasoning remains a challenging problem. To process complex queries without obvious answers, the intelligent machine must be able to reason about existing resources and learn to infer unknown answers.
With the continuous development of knowledge graph reasoning technology, reinforcement learning is proved to obtain better results in a knowledge reasoning task. The deep Path issued by the EMNLP2017 introduces reinforcement learning into reasoning of the knowledge graph for the first time, and the deep Path simply samples the knowledge graph and puts the knowledge graph into a strategy network for training. The main task is to give an entity pair (entity1, entity2) in a knowledge graph, so that a model can reason the path from a head entity to a tail entity; its subtasks include Link Prediction (Link Prediction) and Fact Prediction (Fact Prediction). However, deepPath suffers from the following problems:
(1) states in the environment are simply represented by TransE, and the representation capability is insufficient;
(2) the random action sampling mode may cause the agent to take many invalid redundant actions, consuming computational cost, and generating a false path problem;
(3) the state vector is directly input into the strategy network, and rich relevance and semantic information among original states are lost.
Aiming at the problems, the invention provides a reinforced Learning known Knowledge Graph Reasoning Method (RLKGR-ASM) based on Action Sampling and an LSTM memory component.
Disclosure of Invention
The invention provides a reinforcement learning knowledge graph reasoning method based on action sampling, and aims to solve the problems of insufficient representation capability, invalid action selection, no memory component and the like of the existing reinforcement learning reasoning method. The method comprises the following steps:
(1) and selecting an optimal representation method for different data sets in a data processing layer, and representing the triad and reasoning relation in the data as a feature vector.
(2) And pre-training the model by using a random Breadth First Strategy (BFS) and expert data in a pre-training layer so as to improve the convergence of the model.
(3) And adding a reward function to the secondary training layer for retraining, and adding an action sampler and an LSTM memory component to the RL model.
(4) The output layer uses a policy network for output.
Drawings
FIG. 1 RLKGR-ASM algorithm flow chart
FIG. 2 is a schematic diagram of an LSTM memory device
FIG. 3 is a schematic view of an operation sampler
FIG. 4 the train series of MAP scores on the fact prediction task
FIG. 5 NELL-995 data set Link prediction task MAP-value comparison
FIG. 6 FB15K-2375 data set Link prediction task MAP value comparison
FIG. 7 Hits @1, Hits @3, MRR, MAP values on the prediction task of this experiment and DeepPath linking the NELL-995 and FB15K-237 datasets
FIG. 8 shows actual MAP values of predicted results for TransE, TransR, TransH, TransD, DeepPath and RLKGR-ASM (in this experiment)
FIG. 9 number of inference paths used by PRA and this experiment
FIG. 10 DeepPath, RLKGR-ASM (without motion sampler), RLKGR-ASM (for this experiment) migration time per round on two datasets (unit: sec)
Detailed Description
The technical solution in the embodiment of the present invention will be clearly and completely described below with reference to fig. 1 in the embodiment of the present invention.
As shown in the attached figure 1, the invention is based on an action sampling and LSTM memory component, and the reasoning algorithm mainly comprises five parts of data preprocessing, pre-training, reward retraining and output. The specific implementation mode is as follows: the method comprises the following steps: data processing layer
After the basic preprocessing is carried out on the data sets NELL-995 and FB15K-237 used in the experiment, the evaluation standard is consistent with the evaluation standard of the final result of the experiment by directly applying four embedding-based methods of TransE, TransH, TransR and TransD to the task of fact prediction: average accuracy (MAP), results are shown in figure 4. As shown, TransD works best in NELL-995; among FB15K-237, TransH gave the best results.
The original reasoning result of the embedding method on the data set can directly reflect the adaptation degree of the representation method and the data set, the higher the score is, the better the reasoning effect is, namely, the method can more and more perfectly acquire the original semantic information of the data, and the algorithm environment has stronger representation capability; based on this, the present invention selects TransD as a representative method of NELL-995 and TransH as a representative method of FB 15K-237.
Step two: pre-training layer
The model is pre-trained using a random Breadth First Strategy (BFS) with expert data to improve the convergence of the model. For each relationship, we learn the supervised policy using a subset of all positive samples (entity pairs).
For each relationship, the algorithm learns a supervised policy using a subset of all positive samples; for each positive sample (es, et), bilateral BFS is employed in the pre-training process to find the correct path between entities. For each sequence of path relationships (r1, r 2.., rn), θ is updated to maximize the desired reward, as shown in equation (1), where J (θ) is the desired reward.
For supervised learning, the algorithm rewards each successful walk with +1, as shown in equation (2) we update the gradient of the policy network with the correct path found by BFS.
Step three: reward retraining layer
The RL agent used to implement reinforcement learning and the external environment of reinforcement learning are defined, and the environment is initialized according to the definition of the global reward function.
The reinforcement learning system is composed of two parts, the first part is an external environment E, and the dynamics between KG and intelligent agent interaction are specified. This environment is modeled as a Markov Decision Process (MDP). The MDP is defined as a tuple<S,A,P,R>Where S is a continuous state space, and A ═ a1,a2,.....anIs the set of all available actions, P is the transition probability matrix, and R (s, a) is the reward function for each (s, a).
The second part of the system is an agent, which is represented as a policy network, e.g., piθ(s, a) ═ p (a | s; θ). It maps the state to a random strategy and updates the neural net geometric parameter theta by adopting a random gradient descent method.
The components of the system are respectively as follows:
action (Action): give the entity pair with relation r (e)s,et) Reinforcement learning agents are expected to find the most informative paths connecting these two entities. Starting from the head entity es, the agent uses the policy network to select the most likely relationship, expanding the path in each step until it reaches the target entity et. We defineThe output dimension of the strategy is equal to the related coefficient in a large-scale Knowledge Graph (KG), namely, the action space is defined as all relations in KG.
Status (States): the entities and relationships in the KG are discrete symbols, and to obtain semantic information for these symbols, the present invention uses TransD and TransH to map NELL-995 and FB15K-237, respectively, into a low-dimensional space. The state vector of the t step is
ωt=(et⊥,etarget⊥-et⊥). Where et is the embedding of the current entity, etargetIs the embedded vector of the target entity.
Reward (Reward): in the rewarding retraining process, the intelligent agent needs to obtain the rewarding feedback to judge the quality of the walk, so as to update the network parameters, and the global rewarding function defined by the invention is shown as the following formula (3):
if the agent can reach the target through a series of actions, a +1 global reward is obtained.
For the relational reasoning task, it is observed that short paths tend to provide more reliable reasoning evidence than long paths. Shorter relationship chains can also improve reasoning efficiency by limiting the length of RL interaction with the environment. The efficiency reward is defined as shown in equation (4).
Wherein a path p is defined as a series of relations r1→r2→......→rn。
The training samples (entity1, entity2) have similar state representations in vector space. The agent is inclined to find paths with similar syntax and semantics. These paths usually contain redundant information, and in order to encourage the agent to find different paths, a diversity reward function is defined using the cosine similarity between the current path and the existing path as shown in equation (5).
And then an LSTM memory component is built, and the LSTM memory component is added in the aspect of state representation. The dynamic environment path planning algorithm is optimized and reinforced by combining with the LSTM, as shown in the attached figure 2.
The strategy network of the common DeepPath reinforcement learning algorithm only receives the state representation of the current time, but the search strategy is related to the historical information, and in order to enable the algorithm to obtain the sufficient correlation between the historical information and the state, the invention adopts a three-layer LSTM network to encode the historical search information;
cell state s of the LSTM layer at time ttAnd output htThe calculation process of (2) is as follows:
first, synthesize the current input xtOutput h at time (t-1)t-1And bias term b of forgetting gatefThe activation value f of the gate at this time is obtainedt. Invalid information is deleted from the state at time t-1. And then sigmoid is selected as an activation function to realize normalized output as shown in the following formula (6).
ft=sigmoid(Wf,xxt+Wf,hht-1+bf) (6)
Next, the LSTM layer stores the cell state stThe more efficient information is selected for storage. First, candidate values that may possibly be added to the cell state are calculatedCalculating an activation value i for an input gatetAs shown in formula (7) and formula (8).
it=sigmoid(Wi,xxt+Wi,hht-1+bi) (8)
Finally, the current unit state s is updated according to the previous calculation resulttAs shown in formula (10), whereinRepresenting a Hadamard (element) product.
Output h of the LSTM networktCan be expressed by the following formulae (10) and (11).
ot=sigmoid(Wo,xxt+Wo,hht-1+bo) (10)
After three layers of coding, can be expressed as ht=LSTM(ht-1,wt) When t is 0, ht-10. After the encoding is completed, the state of RL at this time is denoted as st=(ht,wt) Inputting the state into a strategy network, and training through a fully-connected neural network consisting of two layers of ReLU and one layer of Softmax to obtain an action probability matrix.
When the agent selects an action, the invention sets an action sampler, as shown in fig. 3.
In the reinforcement learning algorithm, an intelligent agent continuously expands a path through interaction with the environment, a strategy network outputs an action probability matrix after receiving a joint state of a current state and historical information, and the intelligent agent selects a next action according to the action probability matrix to expand the path.
In order to avoid the intelligent agent from going to excessive selection invalid paths during action selection, an action sampler is added in the chapter during action selection of the intelligent agent: recording termination node e whenever self occurs in random walk of agentdAction (relation) r with this selectiondAnd added to the memory of the motion sampler, and recorded as invalid motion, and expressed as (e)d,rd) The pair of entity relationships of (1); in subsequent walks, assume that the agent arrives at etIf e istExisting in the physical memory set of the motion sampler, the motion sampler will remove r from the motion space when the next motion is selecteddAt this time, the next action selected by the agent is not necessary to be an invalid action which has occurred before, so that the agent is encouraged to have a greater probability to perform a complete walk to search a more informative path set, and meanwhile, the calculation power can be saved.
Step four: output layer
In order to find the inference path controlled by the reward function, the supervised policy network needs to be retrained using the reward function. The training process is similar to pre-training, except that the reward function portion is added, where the gradient of the parameter is updated as shown in equation (12).
Starting from the source node, the agent selects a relationship according to a random strategy pi (a | s) to extend its thrust path, the relationship links that may cause the agent to reach a new entity or may not produce any result (action sampler reduces the occurrence of such a situation), the failed step causes the agent to obtain a negative reward, and if successfully reaching the target entity, the agent obtains a positive reward of + 1.
The purpose of Link Prediction (Link Prediction) is to predict the target entity. For each entity-relationship pair (e, r), there is a true value trum and about 10 generated false values tfaiid. Here, the results of PRA, DeepPath, TransE, TransR and RLKGR-ASM (experiment) and RLKGR-ASM (no pre-training) are listed (the most representative 10 relationships of NELL-995 and FB15K-237, respectively), as shown in FIGS. 5 and 6. As can be seen from the table, the reinforcement learning based inference method in most cases due to the embedding based methods (TransE, TransR) and the path based method (PRA), RLKGR-ASM (this experiment) achieved better results overall on each link prediction task at NELL-995. The MAP value of the invention is higher than that of other algorithms, and particularly, the invention obtains excellent effect on FB 15K-237; on NELL-995, the MAP index of this experiment increased by 7.8%, 2.7%, 13.9%, 1.9% compared to TransE, TransR, PRA and DeepPath, respectively; on FB15K-237, the MAP index of the experiment is increased by 8.1%, 7.4%, 7.2% and 4.1% compared with TransE, TransR, PRA and DeepPath respectively, which proves the effectiveness of the experiment.
In addition, since no supervised pre-training of expert data is performed, the MAP value of the RLKGR-ASM (without pre-training) linked prediction task on two data is superior to the embedded inference methods of TransE and TransR and the path-based inference method of PRA, although it is still superior to deep path and the experiment. On NELL-995, the overall MAP index of RLKGR-ASM (without pre-training) was improved by 5.4%, 0.3%, 11.5%, respectively, compared to the conventional algorithm; on FB15K-237, the overall MAP indicator of RLKGR-ASM (without pre-training) was improved by 2.7%, 2.0%, and 1.8%, respectively, compared to the conventional algorithm.
In addition, a detailed comparison of the model performance of the present algorithm with DeepPath is shown in FIG. 7, which lists the values of Hits @1, Hits @3, MRR and MAP for both algorithms on the NELL-995 dataset and the FB15K-237 dataset, respectively, on the link prediction task. As can be seen from the table, on the NELL-995 data set, the experiment compares that DeepPath increases the result indexes of Hits @1, Hits @3, MRR and MAP by 1.3%, 2.5%, 2.1% and 1.9%, respectively; on the FB15K-237 data set, the experiment compared with DeepPath shows that the values of the result indexes, i.e. Hits @1, Hits @3, MRR and MAP, are respectively increased by 4.6%, 6.3%, 4.9% and 4.1%.
Fact Prediction (Fact Prediction) aims at predicting the truth of an unknown Fact, the proportion of positive and negative triplets in a dataset being about 1: 10. this task is different from link prediction to rank target entities, but directly ranks all positive and negative samples of a particular relationship. In the fact prediction task, this section selects TransE, TransR, TransD, TransH and DeepPath to compare with RLKGR-ASM (this experiment). MAP is used as an evaluation index of a comparison experiment, and Hits @ N and MRR are used as auxiliary evaluation indexes when DeepPath is used for fine comparison.
FIG. 8 lists the MAP scores in the actual prediction task based on the embedded Trans series model and DeepPath, respectively, and this experiment. It can be seen that the experiment is superior to the embedded method and DeepPath method based on the traditional reinforcement learning reasoning method in the result of the fact prediction task, and in the NELL-995 data set, the MAP value of the experiment is respectively increased by 14.4%, 13.8%, 12.1%, 11.4% and 3.4% compared with TransE, TransR, TransD, TransH and DeepPath; in the FB15K-237 dataset, the MAP values for this experiment were increased by 4.2%, 1.0%, 1.7%, 1.6%, 0.8% over TransE, TransR, TransD, TransH, and DeepPath, respectively. Excellent performance was achieved on the NELL data set, but slightly improved in the FB15K data set.
In addition, taking the relations "athletehomemestadium", "works for", and "organization hiredperson" of the link prediction task in the NELL-995 dataset as an example, fig. 9 lists the number of inference paths used by PRA and RLKGR-ASM (this experiment), and it can be seen that the number of inference paths used in this experiment is far less than that of inference paths used in the reasoning experiment, which shows that the reinforcement learning method can achieve better mapping by a more compact learning path set compared with the inference method based on paths.
For the time overhead, as shown in fig. 10, the experiment adds a burden to the computational overhead due to the addition of the LSTM memory component, and compared with deep path, the average generation time per round of the experiment on the NELL-995 data set is 13.19862 seconds, which is increased by 31.88%; the average iteration time per round on the FB15K-237 data set was 18.01331 seconds, which increased 16.31%
When the motion sampler is not used, the situation that the intelligent agent has invalid motion selection and the like in the walking process can be generated, the computational power is consumed in an invalid mode, on a NELL-995 data set, the iteration time of each round is 14.25433 seconds when the motion sampler is not used, and the time overhead of the experiment can be reduced by 7.42% through the motion sampler; on the FB15K-237 data set, the time of each iteration is 19.23654 seconds when the motion sampler is not used, and the time overhead of the experiment can be reduced by 6.34% by the motion sampler.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.
Claims (4)
1. A reinforcement learning knowledge graph reasoning algorithm based on action sampling comprises the following steps:
step 1: selecting an optimal representation method for different data sets in a data processing layer, and representing the triple and the inference relation in the data as a feature vector;
and 2, step: pre-training the model by using a random Breadth First Strategy (BFS) and expert data in a pre-training layer so as to improve the convergence of the model;
and step 3: the steps are the core contents of the patent: adding a reward function for retraining, and adding an action sampler and an LSTM memory component into the RL model; the invention adopts a three-layer LSTM network to encode the historical search information, as shown in the formula;
ht=LSTM(ht-1,wt) When t is 0, ht-1=0
The LSTM with three layers receives the entity embedded vector at the moment, three threshold modules are added in the structure of the circulation body of the LSTM, and the problems of gradient disappearance and explosion probably existing in the traditional neural network are solved while the LSTM has a memory function; after the encoding is completed, the state of RL at this time is denoted as st=(ht,wt) Inputting the state into a strategy network, training through a fully-connected neural network consisting of two layers of ReLU and one layer of Softmax to obtain an action probability matrix, and selecting the next action by the intelligent agent through the action probability matrix fed back by the strategy network to continuously expand the path;
the following formula is an output action probability matrix of the strategy network;
πθ(at|st)=σ(At×W2ReLU(W|[ht;st]))
in order to avoid the intelligent agent from going to excessive selection invalid paths during action selection, an action sampler is added in the chapter during action selection of the intelligent agent: recording termination node e whenever self condition occurs for random walk of agentdAction (relation) r with this selectiondAnd added to the memory of the motion sampler, and recorded as invalid motion, and expressed as (e)d,rd) The pair of entity relationships of (1); in subsequent walks, assume that the agent arrives at etIf e istExisting in the physical memory set of the motion sampler, the motion sampler will remove r from the motion space when the next motion is selecteddAt the moment, the next action selected by the intelligent agent is not necessary to be an invalid action which is already generated before, so that the intelligent agent is encouraged to have a higher probability to carry out complete wandering once, a more information path set is searched, and meanwhile, the calculation power can be saved;
and 4, step 4: the output layer uses a policy network for output.
2. The method as claimed in claim 1, wherein step 1 is to select a representation learning method with better effect according to the strong and weak representation abilities of different representation learning methods on a specific data set, and to improve the representation ability of the reinforcement learning environment from the bottom layer.
3. The method of claim 1, wherein step 3 adds an LSTM memory component to encode historical information to help the agent to find inference paths more efficiently, the algorithm can get rid of pre-training and the accuracy of the acquisition is better than path-based and embedded-based inference methods; under the condition of pre-training, the method effectively improves the result precision, and compared with TransE, TransR, PRA and DeepPath, the MAP index of the experiment is respectively increased by 7.8%, 2.7%, 13.9% and 1.9% on NELL-995; on FB15K-237, the MAP index of the experiment increases by 8.1%, 7.4%, 7.2% and 4.1% compared with TransE, TransR, PRA and DeepPath respectively; for the factual prediction task, in NELL-995 data set, the MAP values for this experiment were increased by 14.4%, 13.8%, 12.1%, 11.4%, 3.4% over TransE, TransR, TransD, TransH, and DeepPath, respectively; in the FB15K-237 dataset, the MAP values for this experiment were increased by 4.2%, 1.0%, 1.7%, 1.6%, 0.8% over TransE, TransR, TransD, TransH, and DeepPath, respectively.
4. The method according to claim 1, wherein step 3 sets a motion sampler, so as to reduce invalid redundant motion selection of the agent in the walking process, promote the agent to select more meaningful paths, and effectively save time and cost: on NELL-995 data set, when the iteration time of each round is 14.25433 seconds without using the action sampler, the action sampler can reduce the time overhead of the experiment by 7.42%; on the FB15K-237 data set, the time of each iteration is 19.23654 seconds when the motion sampler is not used, and the time overhead of the experiment can be reduced by 6.34% by the motion sampler.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210244316.XA CN114662693A (en) | 2022-03-14 | 2022-03-14 | Reinforced learning knowledge graph reasoning method based on action sampling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210244316.XA CN114662693A (en) | 2022-03-14 | 2022-03-14 | Reinforced learning knowledge graph reasoning method based on action sampling |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114662693A true CN114662693A (en) | 2022-06-24 |
Family
ID=82029373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210244316.XA Pending CN114662693A (en) | 2022-03-14 | 2022-03-14 | Reinforced learning knowledge graph reasoning method based on action sampling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114662693A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116081797A (en) * | 2022-08-25 | 2023-05-09 | 北控水务(中国)投资有限公司 | Dynamic optimization method, device and equipment for full-flow control quantity of sewage treatment plant |
-
2022
- 2022-03-14 CN CN202210244316.XA patent/CN114662693A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116081797A (en) * | 2022-08-25 | 2023-05-09 | 北控水务(中国)投资有限公司 | Dynamic optimization method, device and equipment for full-flow control quantity of sewage treatment plant |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Neill | An overview of neural network compression | |
Tamaazousti et al. | Learning more universal representations for transfer-learning | |
US20210034968A1 (en) | Neural network learning apparatus for deep learning and method thereof | |
CN111581343A (en) | Reinforced learning knowledge graph reasoning method and device based on graph convolution neural network | |
CN117435715B (en) | Question answering method for improving time sequence knowledge graph based on auxiliary supervision signals | |
CN108876044B (en) | Online content popularity prediction method based on knowledge-enhanced neural network | |
Schilling et al. | Hyperparameter optimization with factorized multilayer perceptrons | |
Bellinger et al. | Active Measure Reinforcement Learning for Observation Cost Minimization. | |
CN115526317A (en) | Multi-agent knowledge inference method and system based on deep reinforcement learning | |
CN115964459A (en) | Multi-hop inference question-answering method and system based on food safety cognitive map | |
CN116561302A (en) | Fault diagnosis method, device and storage medium based on mixed knowledge graph reasoning | |
CN115526321A (en) | Knowledge reasoning method and system based on intelligent agent dynamic path completion strategy | |
Putra et al. | lpspikecon: Enabling low-precision spiking neural network processing for efficient unsupervised continual learning on autonomous agents | |
CN114662693A (en) | Reinforced learning knowledge graph reasoning method based on action sampling | |
CN118155860A (en) | Method, equipment and medium for aligning traditional Chinese medicine large model preference | |
Rowe | Algorithms for artificial intelligence | |
CN117150041A (en) | Small sample knowledge graph completion method based on reinforcement learning | |
CN116719947A (en) | Knowledge processing method and device for detecting power inspection defects | |
Gupta et al. | A Roadmap to Domain Knowledge Integration in Machine Learning | |
CN114626530A (en) | Reinforced learning knowledge graph reasoning method based on bilateral path quality assessment | |
Peng | A Brief Summary of Interactions Between Meta-Learning and Self-Supervised Learning | |
CN114722212A (en) | Automatic meta-path mining method oriented to character relation network | |
CN113051353A (en) | Attention mechanism-based knowledge graph path reachability prediction method | |
Wang et al. | Research on knowledge graph completion model combining temporal convolutional network and Monte Carlo tree search | |
CN114491080B (en) | Unknown entity relationship inference method oriented to character relationship network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220624 |
|
WD01 | Invention patent application deemed withdrawn after publication |