CN109241291B - Knowledge graph optimal path query system and method based on deep reinforcement learning - Google Patents

Knowledge graph optimal path query system and method based on deep reinforcement learning Download PDF

Info

Publication number
CN109241291B
CN109241291B CN201810791353.6A CN201810791353A CN109241291B CN 109241291 B CN109241291 B CN 109241291B CN 201810791353 A CN201810791353 A CN 201810791353A CN 109241291 B CN109241291 B CN 109241291B
Authority
CN
China
Prior art keywords
layer
entity
network
reinforcement learning
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810791353.6A
Other languages
Chinese (zh)
Other versions
CN109241291A (en
Inventor
黄震华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China Normal University
Original Assignee
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China Normal University filed Critical South China Normal University
Priority to CN201810791353.6A priority Critical patent/CN109241291B/en
Publication of CN109241291A publication Critical patent/CN109241291A/en
Application granted granted Critical
Publication of CN109241291B publication Critical patent/CN109241291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method for inquiring an optimal path of a knowledge map based on deep reinforcement learning, which comprises two modules, namely a first module and a second module, wherein the first module is a knowledge map optimal path model offline training module, the second module is a knowledge map optimal path model online application module, the knowledge map optimal path model offline training module is provided with a deep reinforcement learning part, the current entity is subjected to deep reinforcement training learning to obtain the next entity, the next entity is subjected to repeated training learning to obtain an optimal path model, and the initial entity and the target entity are input into the optimal path model obtained by the first module to finally obtain the optimal path. The operation efficiency is improved.

Description

Knowledge graph optimal path query system and method based on deep reinforcement learning
Technical Field
The invention relates to the field of computers, in particular to a knowledge graph optimal path query system and a knowledge graph optimal path query method based on deep reinforcement learning.
Background
Knowledge Graph (Knowledge Graph) aims to describe and depict various entities (Entity) existing in the real world and relationships (relationship) among the entities, and is generally organized and represented by a directed Graph, nodes in the Graph represent the entities, edges are formed by relationships, and the relationships are used for connecting two entities and depicting whether the two entities have the relevance described by the relationship or not; if an edge exists between two entities, the association is shown, otherwise, the association is not shown. In practical application, a numerical value between 0 and 1 is added to each entity relationship (namely each edge of a graph) in the knowledge graph, so that the association degree between entities is reflected; the value may represent confidence, closeness, distance or cost, etc. according to different application requirements, and thus such a knowledge graph is referred to as a probabilistic knowledge graph.
The optimal path query between the probability knowledge graph entities has extremely important significance for searching the relationship between two entities in the knowledge graph field, and is one of core technologies applied to knowledge extraction, entity search, knowledge graph network optimization, knowledge graph entity relationship analysis and the like. For such complex data query and retrieval types, an effective data organization method and an efficient query processing method are needed to accurately and effectively calculate the result required by the user, and therefore, it is very necessary and very challenging to improve the query efficiency and reduce the processing cost. The topology of the probabilistic knowledge graph is a weighted directed graph.
At present, the mainstream graph optimal path query method includes Dijkstra algorithm, Floyd algorithm, Bellman-Ford algorithm and the like. However, with the advent of the big data era, the query efficiency of these methods has not been able to meet the acceptable time frame and the storage space that the machine can accommodate, and they have not been able to solve the problem of the optimal route query with the huge amount of data.
It is found that for a large-scale data network such as a probabilistic knowledge graph, if query time is required to be reduced, a strategy of exchanging space for time is often adopted, query results with high query frequency are stored, the Landmaeks-BFS method sorts the query frequency of probability knowledge graph entities according to users, optimal paths among commonly used entities are pruned, and the optimal paths among the entities are stored in a set. In addition, there are also some that employ acceleration techniques on query data preprocessing, such as a parallel query method based on bidirectional search, a query method based on target guidance, and a query method based on hierarchy. These techniques satisfy the requirement in query efficiency, however, since pruning discards some intermediate points, there is a decrease in query accuracy, and if mispruning is not done, the shortest path may not be queried, and if too few pruning is done between two points, it is easily degraded to breadth-first search, time-inefficient and poor scalability. The shortest path of the accurate query probability knowledge graph is difficult to be achieved, a balance between time and space is required, and the query quality is difficult to be ensured while the query time meets the requirements of users.
Disclosure of Invention
In order to overcome at least one defect (deficiency) in the prior art, the invention provides the optimal path query method between the probabilistic knowledge graph entities, which has high accuracy, strong generalization capability, high speed and easy expansion.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the system comprises two modules, namely a first module and a second module, wherein the first module is a knowledge graph optimal path model offline training module, the second module is a knowledge graph optimal path model online application module, the knowledge graph optimal path model offline training module is provided with a depth-enhanced learning component, the current entity is subjected to depth-enhanced training learning to obtain a next entity, the next entity is subjected to repeated training learning to obtain an optimal path model, a starting entity and a target entity are input into the optimal path model obtained by the first module to finally obtain an optimal path, and the purposes of high accuracy, strong generalization capability, high speed and easiness in expansion are achieved through cooperation between the two modules.
Further, the deep reinforcement learning component is composed of an encoder, a network component and a logistic regression component, the network component comprises a conversion component and a training component, the conversion component comprises a CNN neural network and an FC neural network, and the training component comprises a reinforcement learning Policy strategy network and a reinforcement learning value network.
Further, the reinforcement learning Policy network is composed of five layers of fully-connected neural networks, the number of the first four layers of nodes of the reinforcement learning Policy neural network is reduced step by step, k neurons are arranged on the fifth layer, the first layer, the second layer and the third layer of the reinforcement learning Policy neural network are prevented from being over-fitted by a dropout technology, an activation function adopts a tanh function, batch standardization technology is adopted between the third layer and the fourth layer to enhance the generalization capability of the model, a sigmod function is adopted as the activation function, and the full connection is adopted between the fourth layer and the fifth layer to obtain the probability of k relations to be predicted, and the probability is used as the behavior selection of the next entity;
the reinforcement learning Value neural network is composed of five layers of fully-connected neural networks, the fully-connected neural networks which decrease gradually are adopted from the first layer to the fourth layer of the reinforcement learning Value neural network, only one neuron is arranged on the fifth layer, a dropout technology is adopted between the first layer and the second layer of the reinforcement learning Value neural network and between the second layer and the third layer of the reinforcement learning Value neural network to prevent overfitting, tanh functions are adopted for activation functions of the first layer and the second layer, a sigmod function is adopted for activation functions of the third layer, batch standardization technology is adopted between the third layer and the fourth layer to enhance the generalization capability of the model, relu functions are adopted for activation functions, full connection is adopted between the fourth layer and the fifth layer, and the output result is the income brought by the accumulation of the current state to the target state predicted by the Value network.
The invention provides a knowledge graph optimal path query method based on deep reinforcement learning, which specifically comprises the following steps:
s1, firstly, sorting entity relations in a probability knowledge graph from large to small according to user access frequency in unit time, selecting n relations, and generating a required data sample set;
s2, inputting the data sample set into a deep reinforcement learning component for training and learning;
s3, respectively carrying out training learning of three stages, namely stage 1, stage 2 and stage 3, in the deep reinforcement learning component;
stage 1: an encoder is adopted to convert the entity into an initial word vector, and then the encoded initial word vector is further processed and converted into a word vector required by a deep reinforcement learning component through a 1-10-layer CNN convolutional neural network;
and (2) stage: predicting the next passing relationship of the current entity based on the reinforcement learning Policy network;
and (3) stage: performing value calculation on the selected strategy based on the reinforcement learning value network;
s4, obtaining an optimal queried path model after training and learning in the step S3;
s5, inputting a starting entity and a target entity, sequentially converting the starting entity and the target entity into word vectors, then fusing the two word vectors and inputting the two word vectors into the optimal path model of the query in the step S4 until the target entity is found, and finally obtaining an optimal query path with the starting point as the starting entity and the end point as the target entity.
Further, in the step S1, n relations are selected, where n is not less than 1/10 of the total number of the probabilistic knowledge graph entity relations, γ ═ n/2 relations are randomly selected from the n relations, and the γ relations corresponding to the probabilistic knowledge graph and two entities connected to each relation constitute a data sample set required by model training.
Further, the entity e to be input in the stage 1 of the step S31And e2Converted into two word vectors G by an encoder and a network elementθ(e1) And Gθ(e2) Theta is a network parameter set to be optimized, and two word vectors G obtained in the stage 1 are usedθ(e1) And Gθ(e2) Similarity calculation is performed to find the cosine distance of the two, which is shown as the following formula:
Dθ(e1,e2)=||Gθ(e1)-Gθ(e2)||cos
in the training process, the two received data samples may be denoted as { (F, e)1,e2) F is the label of each data sample, thereby constructing a trained loss function, as shown in the following formula:
Figure BDA0001734989890000051
where n is the total number of training samples.
Further, the loss function L (θ) needs to be minimized, and the loss function L (θ) can be refined as:
Figure BDA0001734989890000052
Lsrepresents a loss function between the same entities, and LuRepresenting loss functions between different entities, it being necessary for L to beuAs small as possible, so that LsAs large as possible.
Further, the step of step S3The stage 2 and the stage 3 are carried out in a training part in a deep reinforcement learning part, the training part comprises a strategy network and a value network, the stage 2 carries out strategy training, the stage 3 carries out value training and optimizes a parameter set of the two networks, namely a parameter theta of a Policy strategy networkpAnd the Value network parameter thetavIn two training, there are four tuples<State, reward, action, model>Wherein the states are represented by entities in the probabilistic knowledge graph.
Further, the method comprises the following steps of obtaining a strategy function and a value function based on target-driven deep reinforcement learning in a strategy network and a value network: for the strategy function, fitting is carried out through a neural network estimated by a nonlinear function to obtain the strategy function of f (e)t,g|θp) For the cost function, the profit from the current node to the target node is also fitted through a neural network estimated by a nonlinear function, and the cost function is obtained to be h (e)t,g|θv)。
Further, the return obtained by the cost function is multiplied by the strategy estimation given by the strategy function to represent the loss function of the strategy network, as shown in the following formula:
Lf=log f(et,g|θp)×((rt+γh(et+1,g|θv)-h(et,g|θv)),
wherein γ ∈ (0,1) represents a discount factor and is in accordance with LfFor parameter thetapDerivation is carried out, and the parameter theta of the Policy network is updated in a gradient ascending mannerpTo obtain the following formula:
Figure BDA0001734989890000061
Figure BDA0001734989890000063
it is indicated that the derivation operation is performed,
Figure BDA0001734989890000062
representing the policy function f (e)t,g|θp) The entropy term of (1) is the learning rate;
if the product of the current strategy and the income brought by the strategy selection is positive, the parameter theta of the Policy network is updated positivelypSuch that the likelihood of predicting the state at the next time increases; if the product is negative, updating the parameter theta of the Policy network reverselypSuch that the probability of predicting the state next time is as small as possible until the policy of the current network prediction no longer fluctuates.
Further, the obtained cost function h (e)t,g|θv) Actual profit r from current entityt+γh(et+1,g|θv) And calculating the absolute value of the difference between the two to obtain a loss function of the value network, which is shown as the following formula:
Lh=|(rt+γ×h(et+1,g|θv))-h(et,g|θv)|,
wherein γ ∈ (0,1) represents a discount factor and is in accordance with LhFor parameter thetavDerivation is carried out, and the parameter theta of the Value network is updated in a gradient descending modevTo obtain the following formula:
Figure BDA0001734989890000071
Figure BDA0001734989890000072
representing the derivation operation, if the predicted benefit h (e)t,g|θv) And the calculated profit rt+γh(et+1,g|θv) If the error between the values is larger than the threshold Value l given by the user, the parameter theta of the Value network is updatedvMaking the predicted profit error as small as possible until the predicted profit h (e)t,g|θv) And the calculated profit rt+γh(et+1,g|θv) With an error of [ -l, l ] at a user-given threshold]No longer fluctuates within the range of (a).
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
(1) the invention provides a probabilistic knowledge graph, and performs probabilistic processing between 0 and 1 on entity relations, so that the optimal path query on the knowledge graph is more in line with the actual application requirements.
(2) Because the invention adopts the reinforcement learning mode to train, on one hand, the problem of poor final calculation effect caused by the irrational label design in the existing deep learning method is reduced, and on the other hand, the method reduces the search space by saving the shortest path from the current entity to a certain entity in each iteration process, so that the model has stronger adaptability and higher accuracy.
(3) The method is based on the deep learning technology, and the initial word vector and the target word vector are fused through two convolutional neural networks which have the same structure, share weights and are pre-trained, so that the training is prevented from being restarted due to the change of a target entity, the generalization capability of the model is improved, and the calculation accuracy is improved.
(4) The invention has clear logic structure in each module, flexible calculation mode and good loose coupling, can flexibly set network structure, meets the calculation requirement, is not limited by specific development tools and programming software, can be quickly expanded to distributed and parallelized development environments, can realize distributed calculation especially for reinforcement learning and deep learning, and improves the operation efficiency.
Drawings
Fig. 1 is a technical framework diagram of a knowledge graph optimal path query method based on deep reinforcement learning.
FIG. 2 is a logical block diagram of a deep reinforcement learning component.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples. Example 1
The invention provides a knowledge graph optimal path query system based on deep reinforcement learning, which comprises two modules, namely a first module and a second module, wherein the first module is a knowledge graph optimal path model offline training module, the second module is a knowledge graph optimal path model online application module, the knowledge graph optimal path model offline training module is provided with a deep reinforcement learning component, the current entity is subjected to deep reinforcement training learning, data is subjected to loading and changing training through the first module to obtain the next entity from the current entity to the target entity, the next entity is subjected to repeated training learning to obtain a trained optimal path model, the target entity and the initial entity are converted and input into the optimal path model generated by the first module in the second module to realize re-reinforcement, and finally an optimal query path can be obtained, through the cooperation use between two modules, reach the purpose that the degree of accuracy is high, the generalization ability is strong, fast and easily extension.
The first module firstly constructs a data sample set for the offline training of the optimal path model, and the construction is as follows: the method comprises the steps of firstly sequencing entity relations in a probability knowledge graph from large to small according to the access frequency of users in the latest m unit time, further selecting the first n relations, wherein n is not less than 1/8 of the total number of the entity relations of the probability knowledge graph, and then randomly selecting gamma-n/2 relations from the n relations, so that the gamma relations corresponding to the probability knowledge graph and two entities connected with each relation form a data sample set required by model training.
On the basis, the first module inputs each constructed data sample into the deep reinforcement learning component shown in fig. 2 for training and learning, searches and obtains a relationship with the highest next probability associated with the current entity, and after the obtaining is completed, the parameters of the deep reinforcement learning component are updated by fusing the return value of the next entity corresponding to the selected relationship. The process is iterated at the module one, and the parameters of the deep reinforcement learning component are continuously updated until the current entity is the target entity or the iteration number exceeds the maximum iteration threshold value given by the user, and a candidate path from the starting entity to the target entity is obtained at the moment. And then, the first module calculates the total return of the current candidate path and compares the total return with the total return of the complete path inquired before, if the benefit of the current path is higher than that of the path inquired before, the current path is used as the optimal path inquired before to obtain an optimal path model, and the process is repeatedly executed until the parameters of the deep reinforcement learning component are converged.
As shown in fig. 2, the deep reinforcement learning component of the module i is composed of a word2vec encoder, a CNN (Convolutional Neural Network) Neural Network, an FC (Full Connect) Neural Network, a reinforcement learning Policy Network, a reinforcement learning Value (Value) Network, and a logistic regression component. The training process of the deep reinforcement learning component is mainly divided into 3 stages, wherein in the stage 1, a word2vec encoder is adopted to convert an entity into an initial word vector, and then the encoded initial word vector is further processed and converted into a word vector required by the deep reinforcement learning component through a multilayer CNN convolutional neural network; stage 2, predicting the next passing relationship of the current entity based on a reinforcement learning Policy network; stage 3 performs Value calculation for the selected strategy based on a reinforcement learning Value (Value) network.
In stage 1, the invention firstly inputs c entities, respectively converts the c entities into corresponding c word vectors through a word2vec word embedding encoder, the dimensions of the c word vectors are the same, then randomly selects 2 word vectors from the c entity word vectors, and inputs the two word vectors into a multilayer CNN convolutional neural network, wherein the multilayer CNN convolutional neural network has a total structure of 8 layers: the first layer carries out convolution processing on 2 input entity word vectors respectively, the second layer carries out maximum pooling operation on convolution of the first layer, the third layer and the fourth layer continue to carry out convolution processing on data obtained by the second layer pooling layer, then, after passing through the maximum pooling layer of the fifth layer, the fifth layer and the seventh layer are sequentially accessed to carry out convolution processing, and finally, two final word vectors are obtained through the eighth layer average pooling layer. Particularly, after the second layer and the fifth layer complete the maximum pooling operation, the output results are subjected to batch standardization processing. Thus, the eighth layer gets the word vector as the output of stage 1. The task of the multilayer CNN convolutional neural network training is to calculate the distance between two word vectors obtained by the eighth layer, so that the distance between the word vectors obtained by the positive sample is as small as possible, and the distance between the word vectors obtained by the negative sample is as large as possible. In addition, the two multilayer convolutional neural networks have the same structure, and the network weights are shared.
The reinforcement learning Policy network is primarily trained in phase 2. The invention firstly takes the word vector of the current entity and the word vector of the target entity as input, and the output vector obtained by the full connection layer as the input word vector of the Policy network. The Policy network is composed of five layers of fully-connected neural networks, the number of nodes of the first four layers of neural networks is gradually reduced, and the fifth layer is provided with k neurons. Dropout technology is adopted between the first layer and the second layer and between the second layer and the third layer to prevent overfitting, and tanh function is adopted as the activation function. And a batch standardization technology is adopted between the third layer and the fourth layer to enhance the generalization capability of the model, and meanwhile, the sigmod function is adopted for the activation function. And the probability of k relations to be predicted is obtained by adopting full connection between the fourth layer and the fifth layer and is used as the behavior selection of the next entity. The output of Policy network is the most probable relationship and it is treated as the behavior (Action) obtained by Policy network. The k relationships are chosen as follows: first select k1The relationship with the highest confidence coefficient, and then randomly selecting k-k from the rest relationships1And sorting the k confidence coefficients according to the confidence coefficients from high to low, thereby obtaining the k relationship with the maximum confidence coefficient output by the Policy network. The training task of Policy networks is to select the best strategy possible, maximizing the revenue generated by the next entity to which the selected relationship arrives
While stage 3 is primarily trained on the reinforcement learning Value network. The input of the Value network is the same as that of the Policy network, namely, the word vector of the current entity and the word vector of the target entity are used as input, and the output vector is obtained through a full connection layer. The Value network is composed of five layers of fully-connected neural networks, the fully-connected neural networks which decrease step by step are adopted from the first layer to the fourth layer, and only one neuron is arranged at the fifth layer. Dropout technology is adopted between the first layer and the second layer and between the second layer and the third layer to prevent overfitting, tanh functions are adopted for the activation functions of the first layer and the second layer, and sigmod functions are adopted for the activation functions of the third layer. And a batch standardization technology is adopted between the third layer and the fourth layer to enhance the generalization capability of the model, and the relu functions are adopted as the activation functions. The fourth layer and the fifth layer are all connected, and the output result is the income brought by the accumulation of the current state to the target state predicted by the Value network. The training task of the Value network is to minimize the error between the predicted yield in the current state and the sum of the confidence of the relationship given by the Policy network and the predicted yield in the next state.
And a second module takes an initial entity and a target entity in the probability knowledge graph as input, respectively converts the initial entity and the target entity into one-dimensional word vectors through a word2vec word embedded encoder and an 8-layer CNN convolutional neural network in sequence, and then fuses the two one-dimensional word vectors to be used as the input of a reinforcement learning Policy network and a Value network. The Policy network and the Value network overlap each other, and from the starting entity, the current entity is given to the next entity with the optimal target entity each time until the target entity is found. Finally, an optimal query path with a starting point as a starting entity and an end point as a target entity is obtained.
The invention also provides a knowledge graph optimal path query method based on deep reinforcement learning, which specifically comprises the following steps:
s1, firstly, ordering the entity relations in the probability knowledge graph from large to small according to the access frequency of users in the latest m unit time, further selecting the first n relations, wherein n is not less than 1/8 of the total number of the entity relations of the probability knowledge graph, and then randomly selecting gamma-n/2 relations from the n relations, so that the gamma relations corresponding to the probability knowledge graph and two entities connected with each relation form a data sample set required by model training.
And S2, converting the input current entity and the input target entity into two one-dimensional word vectors with the length of 512 respectively by using a word2vec word embedded encoder of the google company.
And S3, then, respectively carrying out training learning of three stages of stage 1, stage 2 and stage 3 in the deep reinforcement learning component.
Stage 1: two CNN convolutional neural networks with completely same structures and shared weights are constructed, and the construction process is as follows:
the first layer of the CNN convolutional neural network comprises 512 neurons, 2 convolutional kernels of 2 multiplied by 1 are adopted, the sliding step length is fixed to be 2, and the layer is mainly used for carrying out convolution processing on one-dimensional word vectors (the length is equal to 512) obtained by the word2vec word embedded encoder, so that 2 one-dimensional vectors with the length of 256 are obtained. Next, the second layer of the CNN convolutional neural network performs maximum pooling operation on the 2 one-dimensional word vectors output by the first layer using 2 convolutional kernels with a convolutional kernel size of 2 × 1 and a sliding step size of 1, thereby obtaining 2 one-dimensional vectors with a length of 256. Then, on this basis, a batch of standard operations is performed on the 2 one-dimensional vectors. Then, the third layer of the CNN convolutional neural network uses 4 × 1 convolutional cores to perform convolutional processing on 2 one-dimensional vectors output by the second layer after batch calibration, and the sliding step length is fixed to 4, so as to obtain 8 one-dimensional vectors with the length of 64. Next, the fourth layer of the CNN convolutional neural network uses 1 convolution kernel of 4 × 1 with a sliding step of 1, and performs convolution processing again on the 8 one-dimensional vectors output from the third layer, thereby obtaining 8 one-dimensional vectors with a length of 64. Then, the fifth layer of the CNN convolutional neural network performs maximum pooling operation on the 8 one-dimensional vectors of the fourth layer again, the size of the convolution kernel is equal to 2 × 1, the number of the convolution kernels is equal to 4, and the sliding step size is 2, thereby obtaining 32 one-dimensional vectors with the length of 32. On this basis, a batch of standard operations is performed on the 32 one-dimensional vectors. Then, the sixth layer of the network performs convolution processing on 32 one-dimensional vectors subjected to batch calibration and output by the fifth layer by adopting 2 convolution kernels of 4 × 1, and the sliding step length is fixed to be 2, so that 64 one-dimensional vectors with the length of 16 are obtained. Then, the seventh layer of the network performs convolution processing on the 64 one-dimensional vectors output by the sixth layer by using 4 × 1 convolution cores with a sliding step size of 4, thereby obtaining 40 one-dimensional vectors with a length of 512. Finally, the eighth layer of the network adopts an average pooling operation, and finally obtains 256 one-dimensional vectors with the length of 4 dimensions, and then the 256 one-dimensional vectors are connected with 512 neurons through full connection, so that one-dimensional vectors with the length of 512 are obtained.
After two CNN convolutional neural networks with completely same structures and shared weights are constructed, the invention trains and optimizes the CNN convolutional neural networks through entities and relations in a probability knowledge graph, and the process is as follows:
the inputs of the two CNN convolutional neural networks are respectively two entities e1And e2And the output is two one-dimensional vectors G of length 512θ(e1) And Gθ(e2) And theta is a network parameter set to be optimized. Then, similarity calculation is performed on the two one-dimensional vectors, namely, the cosine distance of the two one-dimensional vectors is calculated: dθ(e1,e2)=||Gθ(e1)-Gθ(e2)||cosIf e is1And e2The two entities differ significantly, then Dθ(e1,e2) Is larger, and if e1And are the same or similar, then Dθ(e1,e2) Is smaller.
Thus, during the training process, the data samples received by the two CNN convolutional neural networks can be expressed as { (F, e)1,e2) Where F is the label of each data sample, if e1And e2Representing the same entity, then F is 1 and F is 0. Thus, the loss function for the formation training is derived as:
Figure BDA0001734989890000151
where n is the total number of training samples.
On the basis of the above, use LsRepresents a loss function between the same entities, and LuRepresenting the loss function between different entities. To achieve the goal of minimizing the loss function L (θ), L needs to be minimizeduAs small as possible, so that LsAs large as possible. The trained loss function L (θ) can thus be refined as:
Figure BDA0001734989890000152
in the training process, the same entity distance can be finally enabled to be as small as possible and different entity distances are as large as possible through the minimization of the loss function L (theta), so that the discrimination of the samples is increased. In addition, in the training process, 100 ten thousand sample entities are selected, 25 ten thousand pairs of identical entity pairs are randomly selected from the sample entities as positive samples, 25 ten thousand pairs of different entity pairs are randomly selected as negative samples, and the samples are mixed and input into a network for training.
After the two CNN convolutional neural networks are calculated, one-dimensional vectors with the length of 512 corresponding to the current entity and the target entity are obtained. Then, the two one-dimensional vectors are fully connected again, that is, the two one-dimensional vectors with the length of 512 are directly connected to obtain a one-dimensional vector with the length of 1024, and then the one-dimensional vector is connected to a fully connected layer with 512 neurons, and finally a one-dimensional vector with the length of 512 is obtained. We use it to represent the fused current entity and the target entity;
stage 2 and stage 3 are mainly used for training the Policy network and the Value network in the deep reinforcement learning component and optimizing parameter sets of the two networks, namely the parameter theta of the Policy networkpAnd the Value network parameter thetav. Continuously and iteratively training the two stages to search the next optimal strategy and dynamically updating the parameter thetapAnd thetavUntil the global optimal strategy is obtained. Each iteration finds a target entity within a limited number of steps and updates the parameter thetapAnd thetav. In particular, module one sets the maximum number of iterations cmaxAnd if the current iteration number exceeds, stopping the iteration.
For this purpose, the invention firstly defines the quadruple needed in the training process of the two networks based on the probability knowledge graph<State, reward, action, model>Wherein the states are represented by entities in a probabilistic knowledge graph, e.g. the current entity etA target entity g and a starting entity s; current entity etTo the next entity et+1R for reportingtIs represented bytIs equal to etAnd et+1Confidence of the relationship between; the action m represents the action selected for the behavior of the agent and corresponds to the relationship between the current entity and the next entity in the probabilistic knowledge graph; finally, the model represents a Policy function or a cost function based on target-driven deep reinforcement learning in a Policy network or a Value network: for the strategy function, the invention fits through a neural network of nonlinear function estimation, i.e. the strategy function is f (e)t,g|θp) For the cost function, the invention also adopts the neural network of the nonlinear function estimation to fit the benefit from the current node to the target node, namely the cost function is h (e)t,g|θv)
And (2) stage: first, a parameter set theta of Policy network is setpRandom initialization is performed. Then, the Policy network receives as input the one-dimensional vectors corresponding to the current entity and the target entity. The first layer of the Policy network of Policy has 256 neurons, and the neurons are fully connected with one-dimensional vectors (with the length of 512) corresponding to the current entity and the target entity; the second layer has 64 neurons; the third layer has 32 neurons; the fourth layer has 16 neurons; and the fifth layer has 10 neurons representing the value of the output 10 entities and the probability of selecting the 10 entities, the 10 entities are composed of the first 7 entities with higher confidence from the current entity to the next layer entity and 3 randomly selected entities from the rest entities, and if the number of the next layer entities is less than 10, redundant entity units are filled with 0. The first, second and third layers all employ a tanh activation function, while the fourth and fifth layers employ a sigmod activation function. Meanwhile, the prediction precision is improved by adopting a dropout technology and implementing batch standardization processing between layers. Finally, the fifth layer of 10 neurons outputs the probability of 10 relations selected by Policy network, and then the relation with the maximum probability is obtained through softmax function as the choice of the behavior.
In the stage 2 training process, the loss function of the Policy network expressed by multiplying the return obtained based on the cost function by the Policy estimate given by the current Policy function is as follows:
Lf=log f(et,g|θp)×((rt+γh(et+1,g|θv)-h(et,g|θv)),
where γ ∈ (0,1) represents the discount factor. Then according to LfFor parameter thetapDerivative and update the parameter theta in a gradient ascending mannerpAnd then obtaining:
Figure BDA0001734989890000171
wherein the content of the first and second substances,
Figure BDA0001734989890000172
it is indicated that the derivation operation is performed,
Figure BDA0001734989890000173
representing the policy function f (e)t,g|θp) The entropy term of (1) is the learning rate, and the purpose of adding the entropy term is to avoid that the Policy network gets the suboptimal Policy too early and falls into local optimization. If the product of the current strategy and the profit brought by selecting the strategy is positive, then theta is updated in the positive directionpA value such that the likelihood of predicting the state next increases; if the product is negative, update θ inverselypThe value is that the probability of predicting the state next time is as small as possible until the strategy of the current network prediction does not fluctuate any more;
and (3) stage: first, a parameter set theta of the Value networkvRandom initialization is performed. Then, as with Policy networks, the Value network receives as input one-dimensional vectors corresponding to the current entity and the target entity. The first layer of the Value network is provided with 256 neurons which are in full connection with one-dimensional vectors (with the length of 512) corresponding to the current entity and the target entity; the second layer has 128 neurons; the third layer has 64 neurons; the fourth layer has 32 neurons; the fifth layer has a neuron that represents the value of the current state. Dropout technology is adopted between the first layer and the second layer and between the second layer and the third layerAnd (4) stopping overfitting. The first layer and the second layer both adopt tanh activation functions, and the third layer and the fourth layer both adopt sigmod activation functions. And a batch standardization process is carried out between the third layer and the fourth layer to enhance the generalization capability of the model. And a fully-connected neural network is adopted between the fourth layer and the fifth layer to finally obtain the predicted value.
In the training process of the stage 3, the actual profit r of the current entity is calculatedt+γh(et+1,g|θv) And the predicted profit h (e)t,g|θv) The absolute Value of the difference between the values, and as a function of the loss of the Value network, is shown as:
Lh=|(rt+γ×h(et+1,g|θv))-h(et,g|θv)|,
where γ ∈ (0,1) represents the discount factor. Then according to LhFor parameter thetavDerivative and update the parameter theta in a gradient decreasing mannervAnd then obtaining: :
Figure BDA0001734989890000181
wherein the content of the first and second substances,
Figure BDA0001734989890000182
representing a derivation operation. If the predicted profit h (e)t,g|θv) And the calculated profit rt+γh(et+1,g|θv) With an error greater than a user-specified threshold/then update thetavMaking the predicted profit error as small as possible until the predicted profit h (e)t,g|θv) And the calculated profit rt+γh(et+1,g|θv) With an error of [ -l, l ] at a user-given threshold]No longer fluctuating within the range of (1);
and S4, continuously updating the parameters of the deep reinforcement learning component in the iteration process until the current entity is the target entity or the iteration times exceed the maximum iteration threshold value given by the user, and obtaining a candidate path from the initial entity to the target entity. And then, calculating the total return of the current candidate path and comparing the total return with the total return of the complete path of the previous query, if the benefit of the current path is higher than that of the previous query path, taking the current path as the optimal path model of the query, and repeatedly executing the processes until the parameters of the deep reinforcement learning component are converged.
And S5, inputting entities in the two probability knowledge maps, namely a starting entity s and a target entity g, and converting the entities into one-dimensional vectors with the length of 512 through a trained word2vec word embedding encoder. Then, the two vectors are combined into a one-dimensional vector with the length of 1024, and the one-dimensional vector is used as the input of the trained multilayer CNN convolutional neural network to obtain one-dimensional vectors with the length of 512 corresponding to the starting entity and the target entity respectively. And then, on the basis, generating a new 1024-length vector by the two one-dimensional vectors through a full connection layer, and using the new 1024-length vector as the input of a trained reinforcement learning Policy strategy network and a Value network. The Policy network and the Value network overlap each other, and from the starting entity, the current entity is given to the next entity with the optimal target entity each time until the target entity is found. Thus, an optimal query Path (s, g) with a starting point of the starting entity s and an end point of the target entity g is finally obtained.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (9)

1. A system for inquiring an optimal path of a knowledge graph based on deep reinforcement learning is characterized by comprising two modules, namely a first module and a second module, wherein the first module is an offline training module of an optimal path model of the knowledge graph, the second module is an online application module of the optimal path model of the knowledge graph, the offline training module of the optimal path model of the knowledge graph is provided with a deep reinforcement learning component, the current entity is subjected to deep reinforcement training learning to obtain a next entity, the next entity is subjected to repeated training learning of the current entity to obtain an optimal path model, and a starting entity and a target entity are input into the optimal path model obtained by the first module to finally obtain an optimal path;
the deep reinforcement learning component consists of an encoder, a network component and a logistic regression component, wherein the network component comprises a conversion component and a training component, the conversion component comprises a CNN neural network and an FC neural network, and the training component comprises a reinforcement learning Policy strategy network and a reinforcement learning value network;
the reinforcement learning Policy network is composed of five layers of fully-connected neural networks, the number of the first four layers of nodes of the reinforcement learning Policy neural network is gradually reduced, the fifth layer is provided with k neurons, the first layer, the second layer and the third layer of the reinforcement learning Policy neural network adopt a dropout technology to prevent overfitting, an activation function adopts a tanh function, batch standardization technology is adopted between the third layer and the fourth layer to enhance the generalization capability of the model, a sigmod function is adopted as the activation function, and the full connection is adopted between the fourth layer and the fifth layer to obtain the probability of k relations to be predicted, and the probability is used as the behavior selection of the next entity.
2. The system according to claim 1, the method is characterized in that the reinforcement learning value neural network is composed of five layers of fully-connected neural networks, the fully-connected neural networks which are gradually decreased from the first layer to the fourth layer of the reinforcement learning value neural network are adopted, the fifth layer is only provided with one neuron, the first layer and the second layer of the reinforcement learning value neural network, the second layer and the third layer of the reinforcement learning value neural network are both prevented from being over-fitted by a dropout technology, the activation functions of the first layer and the second layer are both tanh functions, the third layer of the activation function adopts a sigmod function, the third layer and the fourth layer adopt a batch standardization technology to enhance the generalization capability of the model, the activation functions all adopt relu functions, the fourth layer and the fifth layer adopt full connection, the output result is the income brought by the accumulation of the current state to the target state predicted by the Value network.
3. A knowledge graph optimal path query method based on deep reinforcement learning is characterized by comprising the following steps:
s1, firstly, sorting entity relations in a probability knowledge graph from large to small according to user access frequency in unit time, selecting n relations, and generating a required data sample set;
s2, inputting the data sample set into a deep reinforcement learning component for training and learning;
s3, respectively carrying out training learning of three stages, namely stage 1, stage 2 and stage 3, in the deep reinforcement learning component;
stage 1: an encoder is adopted to convert the entity into an initial word vector, and then the encoded initial word vector is further processed and converted into a word vector required by a deep reinforcement learning component through a 1-10-layer CNN convolutional neural network;
and (2) stage: predicting the next passing relationship of the current entity based on the reinforcement learning Policy network;
and (3) stage: performing value calculation on the selected strategy based on the reinforcement learning value network;
s4, obtaining an optimal queried path model after training and learning in the step S3;
s5, inputting a starting entity and a target entity, sequentially converting the starting entity and the target entity into word vectors, then fusing the two word vectors and inputting the two word vectors into the optimal path model of the query in the step S4 until the target entity is found, and finally obtaining an optimal query path with the starting point as the starting entity and the end point as the target entity.
4. The method as claimed in claim 3, wherein n is not less than 1/10 of the total number of the probabilistic knowledgebase entity relationships in step S1, γ ═ n/2 relationships are randomly selected from the n relationships, and the γ relationships in the probabilistic knowledgebase and two entities connected to each relationship form a data sample set required for model training.
5. The method for querying the optimal path of the knowledge-graph based on the deep reinforcement learning of claim 3, wherein the input entity e is input in stage 1 of the step S31And e2Converted into two word vectors G by an encoder and a network elementθ(e1) And Gθ(e2) Theta is a network parameter set to be optimized, and two word vectors G obtained in the stage 1 are usedθ(e1) And Gθ(e2) Similarity calculation is performed to find the cosine distance of the two, which is shown as the following formula:
Dθ(e1,e2)=||Gθ(e1)-Gθ(e2)||cos
in the training process, the two received data samples may be denoted as { (F, e)1,e2) F is the label of each data sample, thereby constructing a trained loss function, as shown in the following formula:
Figure FDA0003264348400000021
wherein n is the total number of training samples;
the stage 2 and the stage 3 of the step S3 are performed in a training component in a deep reinforcement learning component, the stage 2 is used for strategy training, the stage 3 is used for value training, and a parameter set of the two networks, namely a parameter θ of a Policy strategy network, is optimized in the training processpAnd the Value network parameter thetavAnd is provided with quadruplets<State, reward, action, model>Wherein the states are represented by entities in the probabilistic knowledge graph.
6. The method for querying the optimal path of the knowledge-graph based on the deep reinforcement learning as claimed in claim 5, wherein the loss function L (θ) needs to be minimized, and the loss function L (θ) can be refined as follows:
Figure FDA0003264348400000022
Lsrepresents a loss function between the same entities, and LuRepresenting loss functions between different entities, it being necessary for L to beuAs small as possible, so that LsAs large as possible.
7. The method for querying the optimal path of the knowledge-graph based on the deep reinforcement learning of claim 5, wherein the stages 2 and 3 of the step S3 are performed in a training component of the deep reinforcement learning component, so as to obtain a policy function and a cost function; for the strategy function, fitting is carried out through a neural network estimated by a nonlinear function to obtain the strategy function of f (e)t,g|θp) For the cost function, the profit from the current node to the target node is also fitted through a neural network estimated by a nonlinear function, and the cost function is obtained to be h (e)t,g|θv)。
8. The method for querying the optimal path of the knowledge-graph based on the deep reinforcement learning of claim 7, wherein the profit from the current node to the target node is multiplied by a policy estimate given by a policy function to represent a loss function of the policy network, as shown in the following formula:
Lf=logf(et,g|θp)×((rt+γh(et+1,g|θv)-h(et,g|θv)),
wherein γ ∈ (0,1) represents a discount factor and is in accordance with LfFor parameter thetapDerivation is carried out, and the parameter theta of the Policy network is updated in a gradient ascending mannerpTo obtain the following formula:
Figure FDA0003264348400000023
Figure FDA0003264348400000031
it is indicated that the derivation operation is performed,
Figure FDA0003264348400000032
representing the policy function f (e)t,g|θp) The entropy term of (1) is the learning rate;
if the product of the current strategy and the income brought by the strategy selection is positive, the parameter theta of the Policy network is updated positivelypSuch that the likelihood of predicting the state at the next time increases; if the product is negative, updating the parameter theta of the Policy network reverselypSuch that the probability of predicting the state next time is as small as possible until the policy of the current network prediction no longer fluctuates.
9. The method for querying the optimal path of the knowledge-graph based on the deep reinforcement learning as claimed in claim 7, wherein the obtained cost function h (e) ist,g|θv) Actual profit r from current entityt+γh(et+1,g|θv) And calculating the absolute value of the difference between the two to obtain a loss function of the value network, which is shown as the following formula:
Lh=|(rt+γ×h(et+1,g|θv))-h(et,g|θv)|,
wherein γ ∈ (0,1) represents a discount factor and is in accordance with LhFor parameter thetavDerivation is carried out, and the parameter theta of the Value network is updated in a gradient descending modevTo obtain the following formula:
Figure FDA0003264348400000033
Figure FDA0003264348400000034
representing the derivation operation, if the predicted benefit h (e)t,g|θv) And the calculated profit rt+γh(et+1,g|θv) In-line with the aboveIf the error is larger than the threshold Value l given by the user, the parameter theta of the Value network is updatedvMaking the predicted profit error as small as possible until the predicted profit h (e)t,g|θv) And the calculated profit rt+γh(et+1,g|θv) With an error of [ -l, l ] at a user-given threshold]No longer fluctuates within the range of (a).
CN201810791353.6A 2018-07-18 2018-07-18 Knowledge graph optimal path query system and method based on deep reinforcement learning Active CN109241291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810791353.6A CN109241291B (en) 2018-07-18 2018-07-18 Knowledge graph optimal path query system and method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810791353.6A CN109241291B (en) 2018-07-18 2018-07-18 Knowledge graph optimal path query system and method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN109241291A CN109241291A (en) 2019-01-18
CN109241291B true CN109241291B (en) 2022-02-15

Family

ID=65072112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810791353.6A Active CN109241291B (en) 2018-07-18 2018-07-18 Knowledge graph optimal path query system and method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN109241291B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109818786B (en) * 2019-01-20 2021-11-26 北京工业大学 Method for optimally selecting distributed multi-resource combined path capable of sensing application of cloud data center
CN109829579B (en) * 2019-01-22 2023-01-24 平安科技(深圳)有限公司 Shortest route calculation method, shortest route calculation device, computer device, and storage medium
CN111563209B (en) * 2019-01-29 2023-06-30 株式会社理光 Method and device for identifying intention and computer readable storage medium
CN111611339A (en) * 2019-02-22 2020-09-01 北京搜狗科技发展有限公司 Recommendation method and device for inputting related users
CN109947098A (en) * 2019-03-06 2019-06-28 天津理工大学 A kind of distance priority optimal route selection method based on machine learning strategy
CN110347857B (en) * 2019-06-06 2020-12-01 武汉理工大学 Semantic annotation method of remote sensing image based on reinforcement learning
CN110391843B (en) * 2019-06-19 2021-01-05 北京邮电大学 Transmission quality prediction and path selection method and system for multi-domain optical network
CN110288878B (en) * 2019-07-01 2021-10-08 科大讯飞股份有限公司 Self-adaptive learning method and device
CN110825821B (en) * 2019-09-30 2022-11-22 深圳云天励飞技术有限公司 Personnel relationship query method and device, electronic equipment and storage medium
CN110956254B (en) * 2019-11-12 2022-04-05 浙江工业大学 Case reasoning method based on dynamic knowledge representation learning
CN110990548B (en) * 2019-11-29 2023-04-25 支付宝(杭州)信息技术有限公司 Method and device for updating reinforcement learning model
CN110825890A (en) * 2020-01-13 2020-02-21 成都四方伟业软件股份有限公司 Method and device for extracting knowledge graph entity relationship of pre-training model
CN113255347B (en) * 2020-02-10 2022-11-15 阿里巴巴集团控股有限公司 Method and equipment for realizing data fusion and method for realizing identification of unmanned equipment
CN111382359B (en) * 2020-03-09 2024-01-12 北京京东振世信息技术有限公司 Service policy recommendation method and device based on reinforcement learning, and electronic equipment
CN111581343B (en) * 2020-04-24 2022-08-30 北京航空航天大学 Reinforced learning knowledge graph reasoning method and device based on graph convolution neural network
CN111597209B (en) * 2020-04-30 2023-11-14 清华大学 Database materialized view construction system, method and system creation method
CN111401557B (en) * 2020-06-03 2020-09-18 超参数科技(深圳)有限公司 Agent decision making method, AI model training method, server and medium
CN114248265B (en) * 2020-09-25 2023-07-07 广州中国科学院先进技术研究所 Method and device for learning multi-task intelligent robot based on meta-simulation learning
CN112801731B (en) * 2021-01-06 2021-11-02 广东工业大学 Federal reinforcement learning method for order taking auxiliary decision
CN112966591B (en) * 2021-03-03 2023-01-20 河北工业职业技术学院 Knowledge map deep reinforcement learning migration system for mechanical arm grabbing task
CN114626530A (en) * 2022-03-14 2022-06-14 电子科技大学 Reinforced learning knowledge graph reasoning method based on bilateral path quality assessment
CN115099401B (en) * 2022-05-13 2024-04-26 清华大学 Learning method, device and equipment of continuous learning framework based on world modeling
CN115936091B (en) * 2022-11-24 2024-03-08 北京百度网讯科技有限公司 Training method and device for deep learning model, electronic equipment and storage medium
CN117009548B (en) * 2023-08-02 2023-12-26 广东立升科技有限公司 Knowledge graph supervision system based on secret equipment maintenance

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776729A (en) * 2016-11-18 2017-05-31 同济大学 A kind of extensive knowledge mapping path query fallout predictor building method
CN106934012A (en) * 2017-03-10 2017-07-07 上海数眼科技发展有限公司 A kind of question answering in natural language method and system of knowledge based collection of illustrative plates
CN107577805A (en) * 2017-09-26 2018-01-12 华南理工大学 A kind of business service system towards the analysis of daily record big data
CN107944025A (en) * 2017-12-12 2018-04-20 北京百度网讯科技有限公司 Information-pushing method and device
CN108073711A (en) * 2017-12-21 2018-05-25 北京大学深圳研究生院 A kind of Relation extraction method and system of knowledge based collection of illustrative plates

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124497A1 (en) * 2015-10-28 2017-05-04 Fractal Industries, Inc. System for automated capture and analysis of business information for reliable business venture outcome prediction
CN106598856B (en) * 2016-12-14 2019-03-01 威创集团股份有限公司 A kind of path detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776729A (en) * 2016-11-18 2017-05-31 同济大学 A kind of extensive knowledge mapping path query fallout predictor building method
CN106934012A (en) * 2017-03-10 2017-07-07 上海数眼科技发展有限公司 A kind of question answering in natural language method and system of knowledge based collection of illustrative plates
CN107577805A (en) * 2017-09-26 2018-01-12 华南理工大学 A kind of business service system towards the analysis of daily record big data
CN107944025A (en) * 2017-12-12 2018-04-20 北京百度网讯科技有限公司 Information-pushing method and device
CN108073711A (en) * 2017-12-21 2018-05-25 北京大学深圳研究生院 A kind of Relation extraction method and system of knowledge based collection of illustrative plates

Also Published As

Publication number Publication date
CN109241291A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109241291B (en) Knowledge graph optimal path query system and method based on deep reinforcement learning
Almalaq et al. Evolutionary deep learning-based energy consumption prediction for buildings
JP7366274B2 (en) Adaptive search method and device for neural networks
CN112633010B (en) Aspect-level emotion analysis method and system based on multi-head attention and graph convolution network
CN111325338B (en) Neural network structure evaluation model construction and neural network structure searching method
CN112184391A (en) Recommendation model training method, medium, electronic device and recommendation model
CN113469426A (en) Photovoltaic output power prediction method and system based on improved BP neural network
CN115456202B (en) Method, device, equipment and medium for improving learning performance of working machine
CN110428015A (en) A kind of training method and relevant device of model
WO2022227217A1 (en) Text classification model training method and apparatus, and device and readable storage medium
Cai et al. On-device image classification with proxyless neural architecture search and quantization-aware fine-tuning
CN112036651A (en) Electricity price prediction method based on quantum immune optimization BP neural network algorithm
CN116432037A (en) Online migration learning method, device, equipment and storage medium
CN115222046A (en) Neural network structure searching method and device, electronic equipment and storage medium
Baghi et al. Improving ranking function and diversification in interactive recommendation systems based on deep reinforcement learning
CN113051828B (en) Online prediction method for natural gas water dew point driven by technological parameters
CN108280548A (en) Intelligent processing method based on network transmission
CN116127376A (en) Model training method, data classification and classification method, device, equipment and medium
WO2024011475A1 (en) Method and apparatus for graph neural architecture search under distribution shift
CN115620046A (en) Multi-target neural architecture searching method based on semi-supervised performance predictor
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
CN108182273A (en) Network data processing method based on cloud storage
CN115102868A (en) Web service QoS prediction method based on SOM clustering and depth self-encoder
CN112580797A (en) Incremental learning method of multi-mode multi-label prediction model
CN114936296B (en) Indexing method, system and computer equipment for super-large-scale knowledge map storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant