CN112948506A

CN112948506A - Improved meta-learning relation prediction method based on convolutional neural network

Info

Publication number: CN112948506A
Application number: CN202110357268.0A
Authority: CN
Inventors: 吴涛; 朱静; 先兴平; 许爱东; 马红玉; 冯柏淋; 王宇轩
Original assignee: Chongqing University of Post and Telecommunications; Research Institute of Southern Power Grid Co Ltd
Current assignee: Chongqing University of Post and Telecommunications; CSG Electric Power Research Institute; Research Institute of Southern Power Grid Co Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-06-11

Abstract

The invention belongs to the field of graph data relation prediction, and particularly relates to a relation prediction method for improved meta-learning based on a convolutional neural network, which comprises the following steps: acquiring data to be detected in real time, and converting the data to be detected into ternary group data; inputting the converted triple data into a trained improved meta-learning relationship prediction model to obtain a prediction result of the data to be tested; the method solves the problem that a deep learning model which needs a large amount of data support cannot be used due to less samples, and simultaneously uses the convolutional neural network to further obtain the characteristics of the entity through the neighbors of the entity, thereby improving the calculation efficiency.

Description

Improved meta-learning relation prediction method based on convolutional neural network

Technical Field

The invention belongs to the field of graph data relation prediction, and particularly relates to a relation prediction method for improved meta-learning based on a convolutional neural network.

Background

With the advent of the artificial intelligence era, more and more applications fall on the ground, such as intelligent recommendation, knowledge question answering, assistant decision making and the like, so that a large amount of data resources are generated, implicit relations among a plurality of entities are not discovered, and prediction on possibly existing relations among the entities or discovery and restoration of missing content information are required to be achieved through prediction.

Meanwhile, the deep learning has various complex models at present, characteristics can be well described and fitted in a data set for distribution, but the deep learning is easy to overfit in the face of few sample data sets, and although overfit problems can be avoided by adopting modes such as regularization and data enhancement, the strategies do not provide richer information quantity for the model learning process. However, people can be found not to be restricted by small samples by reflecting the development process of human intelligence and cognition. For example, when a human knows to recognize images of a cat or a dog, the human can quickly extract effective features in the images by only providing a small number of pictures about a certain animal, so that the human has the capability of recognizing the pictures. Similarly, deep learning should accumulate experience from the learning process of tasks with similarity, and take a reverse turn for new similar tasks, so that leann to leann is learning, and the learning method has higher-order generalization capability.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a relation prediction method for improved meta-learning based on a convolutional neural network, which comprises the following steps: acquiring data to be detected in real time, and converting the data to be detected into ternary group data; inputting the converted triple data into a trained improved meta-learning relationship prediction model to obtain a prediction result of the data to be tested;

the process of training the improved meta-learning relationship prediction model comprises the following steps:

s1: acquiring original data, converting the original data into triple data, and initializing all the triple data to obtain vector representation of a triple;

s2: dividing the converted triple data to obtain a training set and a test set;

s3: selecting triples in a training set to establish a task area, and defining a support set and a query set according to the task area;

s4: extracting specific relation elements in a support set by adopting a relation element learner;

s5: processing the specific relation element by adopting an embedded learner to obtain a gradient element, and updating the relation element according to the gradient element;

s6: transferring the updated relation elements to a query set, and calculating corresponding score functions; calculating a loss function value of the model according to the corresponding score function;

s7: adjusting the parameters of the model according to the loss function of the model, and finishing the training of the model when the loss function value of the model is minimum;

s8: taking the complete triples in the test set as a support set, and taking the triples to be predicted in the test set as a query set; performing neighbor entity fusion processing on the head entity and the tail entity in the query set to obtain a head entity fusing neighbor node information and a tail entity fusing neighbor node information;

s9: and calculating the score of the query set triple to be predicted according to the head-tail entity pair fused with the neighbor node information, and taking the triple with the highest score as the complementary triple to be predicted.

Preferably, the process of establishing the task area includes: will train set D_TrainThe triples with the same relation are added into the same set, and the set is defined as the task T corresponding to the relation_r(ii) a Randomly extracting a task T from the task area T_rSelecting N_STaking a triple sample as a support set S of the task_rThe rest of N_QOne sample as the query set Q of the task_r。

Further, the selected one is used as the support set S_rSample ratio query set Q of_rThe number of samples in (1) is small; namely: n is a radical of_S<N_Q。

Preferably, the process of extracting a specific relationship element in the support set by using the relationship element learner includes: embedding and representing a head entity and a tail entity in a support set to obtain a relationship element; inputting the obtained relation elements into an L-layer neural network to extract specific relation information corresponding to each entity; and summing the information of each relationship element, and calculating the average relationship information of each relationship element according to the summed result, wherein the average relationship information is a specific relationship element.

Further, the obtained specific relationship element is:

x^l＝σ(W^lx^l-1+b^l)

preferably, the processing of the specific relationship element by the embedded learner includes: calculating the true values of the triples in the support set and the query set through entity embedding and the relationship elements; calculating a loss function and a gradient element based on the embedded learner according to the true values of the triples; and updating the relation element according to the calculated gradient element.

Preferably, the formula for calculating the score function and the loss function is:

preferably, the process of obtaining the header entity of the fused neighbor node information and the header entity of the fused neighbor node information: performing neighbor sampling on each head entity and tail entity in the query set to obtain neighbor nodes, and constructing neighbor matrixes according to the neighbor nodes; inputting the neighbor matrix into the embedding layer to obtain a word vector matrix; extracting the characteristics of a word vector matrix by adopting a convolutional neural network; performing pooling treatment on the obtained characteristics to obtain characteristic values; cascading all the pooled feature values to obtain a final feature vector; classifying the final feature vectors to obtain a head entity fusing neighbor node information; the process of obtaining the tail entity fusing the neighbor node information is the same as that of the head entity.

Further, the number of neighbor nodes collected by each node is 20.

The method solves the problem that a deep learning model which needs a large amount of data support cannot be used due to less samples, and simultaneously uses the convolutional neural network to further obtain the characteristics of the entity through the neighbors of the entity, thereby improving the calculation efficiency.

Drawings

FIG. 1 is a flow chart of a prediction method for improved meta-learning based on convolutional neural network in text domain according to the present invention;

FIG. 2 is a process of the present invention for processing data based on the text domain convolutional neural network.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

A relationship prediction method based on improved meta learning of convolutional neural network, as shown in fig. 1, the method comprising: acquiring data to be detected in real time, and converting the data to be detected into ternary group data; and inputting the converted triple data into a trained improved meta-learning relationship prediction model to obtain a prediction result of the data to be detected.

s2: dividing the converted triple data to obtain a training set and a test set;

The process of establishing the task area comprises the following steps: will train set D_TrainThe triples with the same relation are added into the same set, and the set is defined as the task T corresponding to the relation_r(ii) a In the task areaRandomly extracting a task T from T_rSelecting N_STaking a triple sample as a support set S of the task_rThe rest of N_QOne sample as the query set Q of the task_r(ii) a Selected as support set S_rSample ratio query set Q of_rThe number of samples in (1) is small; namely: n is a radical of_S<N_Q。

The process of extracting the specific relationship element in the support set by adopting the relationship element learner comprises the following steps: embedding and representing a head entity and a tail entity in a support set to obtain a relationship element; inputting the obtained relation elements into an L-layer neural network to extract specific relation information corresponding to each entity; and summing the information of each relationship element, and calculating the average relationship information of each relationship element according to the summed result, wherein the average relationship information is a specific relationship element.

The process of processing the specific relationship element by adopting the embedded learner comprises the following steps: calculating the true values of the triples in the support set and the query set through entity embedding and the relationship elements; calculating a loss function and a gradient element based on the embedded learner according to the true values of the triples; and updating the relation element according to the calculated gradient element.

As shown in fig. 2, the process of obtaining the header entity of the merged neighbor node information and the header entity of the merged neighbor node information: performing neighbor sampling on each head entity and tail entity in the query set to obtain neighbor nodes, and constructing neighbor matrixes according to the neighbor nodes; inputting the neighbor matrix into the embedding layer to obtain a word vector matrix; extracting the characteristics of a word vector matrix by adopting a convolutional neural network; performing pooling treatment on the obtained characteristics to obtain characteristic values; cascading all the pooled feature values to obtain a final feature vector; classifying the final feature vectors to obtain a head entity fusing neighbor node information; the process of obtaining the tail entity fusing the neighbor node information is the same as that of the head entity.

Preferably, the number of neighbor nodes collected by each node is 20.

A specific implementation mode of a relation prediction method based on improved meta learning of a convolutional neural network comprises the following processes:

step 1: extracting entity relation triples in a relevant range, and defining the entity relation triples as a training set D of a task to be predicted_TrainAnd each training set has a separate support set and query set.

Step 2: defining the triples of samples to be predicted as a test set D of prediction tasks_Test. Where the test set and training set are very similar but are a single set.

And step 3: to D_TrainAnd D_TestAll the triples in the triples are initialized to obtain a triplet (h, r, t) represented by a vector, wherein h is a head entity vector, r is a relation vector, t is a tail entity vector, and (h, r, t) belongs to G, G is a triplet sample set, h, t belongs to E, and E is an entity set.

The data is expressed in a triple form and comprises a head entity, a tail entity and a relation between the head entity and the tail entity; and initializing the triple data to obtain the vector form representation of the triple data. Vector representation of triple data can be more easily identified and operated on.

And 4, step 4: training set D_TrainThe triples with the same relation are added into the same set and defined as the task T corresponding to the relation_r。

And 5: randomly extracting a task T from the task area T_rTaking out N_STaking a triple sample as a support set S of the task_rThe rest of N_QOne sample as the query set Q of the task_rIn which N is_S<N_Q；

Step 6: to get from the supporting set S_rA relation element learner which is used for extracting the relation elements and defines a neural network which can be regarded as simple is used for the support set S_rThe matching of the head entity to the relationship element in (1), which inputs the support set S_rHead and tail entity pairs. Firstly, extracting entity pair specific relation elements through a full-connection neural network of an L layer:

x^l＝σ(W^lx^l-1+b^l)

wherein h is_iRepresenting a d-dimensional vector, t_iRepresenting a d-dimensional vector, L is the number of layers of the neural network.

And 7: weighting the relation elements of K, wherein the weighted formula is as follows:

wherein K represents the number of the relation elements,

represents T_rThe relationship element of (1).

And 8: the definition embedding learner is used for obtaining gradient elements to rapidly update the relation elements. At task T_rIn (1), calculating a support set S according to an entity score calculation formula_rScore of each entity pair in:

calculating a loss score from the scores of each entity pair:

wherein the content of the first and second substances,

representing a support set S_rIth entity pair score, h_iRepresenting the ith head entity vector, t_iRepresenting the ith tail entity vector,

represents T_rA relation element of (1), T_rIndicating a random extraction of a task, L (S), from the task area T_r) Represents the loss score, S, of each entity pair_rRepresenting a support set and gamma representing a boundary hyperparameter, i.e. a tolerable error boundary. When the loss function value is smaller, the representation effect is better, and the representation effect represents the true value of the triple which can be coded by the current model.

And step 9: the gradient parameter shows how the parameter is updated, so that the relation element obtained based on the loss function formula is used as the gradient element, the partial derivative of the loss function is solved, and the relation element is obtained according to the relation element formula after the gradient is reduced:

wherein the content of the first and second substances,

represents T_rThe gradient elements of (a) are selected,

denotes the gradient, L (S)_r) Represents the loss function, S_rA support set is represented.

And carrying out quick iteration on the relation element, wherein the iterative formula is as follows:

wherein, beta represents the number of iteration steps,

represents T_rThe gradient elements of (a) are selected,

represents T_rThe relationship element of (1). Repeating the steps 6-9, and continuously updating R' until all the support sets S are traversed_r。

Step 10: transferring the obtained updated relation elements to a query set Qr, and calculating corresponding score function values:

calculating a loss function value according to the score value to update the relation element:

step 11: selecting the test set D in step 2_TestAnd taking the complete triple in the test set as a support set, selecting the relationship element R finally obtained from the step 5 to the step 10, and initializing the relationship in the triple according to the selected relationship element to obtain the initialized triple vector representation.

Step 12: test set D_TestAnd taking the triple to be predicted as an inquiry set, sampling the neighbor of each head entity or tail entity, and constructing a neighbor matrix. To efficiently construct the neighbor matrix such that each node samples approximately 20 neighbor nodes, the neighbor matrix is constructed from the neighbor nodes. If the number of the node neighbors is less than 20, a sampling method with a return is adopted until 20 vertexes are sampled; if the number of node neighbors is more than 20, then non-return sampling is adopted.

Step 13: the Embedding layer is input by a neighbor matrix, and because of less samples, word vectors are pre-trained (pre-train) by using external linguistic data and then input to the Embedding layer, and the Embedding layer is initialized by using the pre-trained word vector matrix.

Step 14: after the embedding layer, a convolutional neural network is used to extract features, as shown in fig. 2. Since the relevance of the neighboring neighbor vectors in the neighbor matrix w is relatively high, a one-dimensional convolution is adopted. Wherein, the width of the convolution kernel is the dimension d of the neighbor vector, and the height H is a hyper-parameter, then the formula for performing convolution operation is:

o_i＝w·A[i:i+H-1],i＝1,2,.......,T-H+1

wherein o is_iRepresenting the convolution kernel characteristics, w representing the neighbor matrix, i representing the number of rows of the matrix, H representing the height of the matrix, and T representing the number of sampled neighbor nodes.

Preferably, T is the number of sampled neighbors 20. And (3) superposing the bias b, and activating by using an activation function f to obtain the required characteristics, wherein the formula of activating by using the activation function f is as follows:

c_i＝f(o_i+b)

then for a convolution kernel, the characteristic c epsilon R can be obtained^S-H+1Total T-H +1 features. More convolution kernels with different heights can be used, and richer feature expressions can be obtained.

Step 15: the resulting features (feature maps) for different sizes of convolution kernel are different, so a pooling function of 1-max power is used for each feature map, so that the resulting feature for each convolution kernel is a value. And after the characteristic values of all the convolution kernels are passed, cascading the characteristic values to obtain a final characteristic vector.

Step 16: inputting the final feature vector into a fullcontect layer for classification, simultaneously using drop out to prevent overfitting, obtaining head entity or tail entity embedding of fused neighbor node information, and then obtaining a new test set D'_Test. As shown in the diagram x, the neighbor node information is embedded into a head entity or a tail entity of the merged neighbor node information through a convolutional neural network (TextCNN).

And step 17: test set D'_TestAnd (4) taking the triple to be predicted as an inquiry set, calculating the score of the triple of the inquiry set according to a score calculation formula by using the vector of the triple obtained in the step (11), and taking the triple with the highest score as a complementary triple to be predicted.

The above-mentioned embodiments, which further illustrate the objects, technical solutions and advantages of the present invention, should be understood that the above-mentioned embodiments are only preferred embodiments of the present invention, and should not be construed as limiting the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A relation prediction method of improved meta-learning based on a convolutional neural network is characterized by comprising the following steps: acquiring data to be detected in real time, and converting the data to be detected into ternary group data; inputting the converted triple data into a trained improved meta-learning relationship prediction model to obtain a prediction result of the data to be tested;

s2: dividing the converted triple data to obtain a training set and a test set;

s6: transferring the updated relation elements to a query set, and calculating corresponding score functions; calculating a loss function of the model according to the corresponding score function;

2. The method of claim 1, wherein the process of establishing the task area comprises: will train set D_TrainThe triples with the same relation are added into the same set, and the set is defined as the task T corresponding to the relation_r(ii) a Randomly extracting a task T from the task area T_rSelecting N_STaking a triple sample as a support set S of the task_rThe rest of N_QOne sample as the query set Q of the task_r。

3. The method of claim 2, wherein the selected support set S is a convolutional neural network-based relationship prediction method for improved meta-learning_rSample ratio query set Q of_rThe number of samples in (1) is small; namely: n is a radical of_S<N_Q。

4. The method of claim 1, wherein the extracting of the specific relationship element in the support set by the relationship element learner comprises: embedding and representing a head entity and a tail entity in a support set to obtain a relationship element; inputting the obtained relation elements into an L-layer neural network to extract specific relation information corresponding to each entity; and summing the information of each relationship element, and calculating the average relationship information of each relationship element according to the summed result, wherein the average relationship information is a specific relationship element.

5. The method of claim 4, wherein the obtained specific relation elements are:

x^l＝σ(W^lx^l-1+b^l)

wherein x is⁰Represents the initial relationship element, h_iRepresents the ith head entity vector and the ith head entity vector,

denotes performing an XOR operation, t_iDenotes the ith tail entity vector, x^lRepresents a specific relationship element of the l-th layer neural network, l represents the l-th layer neural network, sigma (.) represents the LeakyRelu activation function, W^lRepresenting the l-th layer weight of the neural network, b^lRepresents the layer i bias term of the neural network,

represents the ith entity to the relationship element, W^LRepresents the Lth layer weight of the neural network, L represents the total number of layers of the neural network, x^L-1A relationship element representing the L-1 th layer, b^LRepresents the L-th layer bias term of the neural network.

6. The method of claim 1, wherein the processing of the specific relation element by the embedded learner comprises: calculating the true values of the triples in the support set and the query set through entity embedding and the relationship elements; calculating a loss function and a gradient element based on the embedded learner according to the true values of the triples; and updating the relation element according to the calculated gradient element.

7. The method of claim 1, wherein the formula for calculating the score function and the loss function is as follows:

wherein the content of the first and second substances,

represents T_rA relation element of (1), T_rIndicating a random extraction of a task, L (S), from the task area T_r) Represents the loss score, S, of each entity pair_rRepresenting a support set and gamma representing a boundary hyperparameter, i.e. a tolerable error boundary.

8. The convolutional neural network-based relationship prediction method for improved meta-learning, as claimed in claim 1, wherein the process of obtaining the header entity of the fused neighbor node information and the header entity of the fused neighbor node information: performing neighbor sampling on each head entity and tail entity in the query set to obtain neighbor nodes, and constructing neighbor matrixes according to the neighbor nodes; inputting the neighbor matrix into the embedding layer to obtain a word vector matrix; extracting the characteristics of a word vector matrix by adopting a convolutional neural network; performing pooling treatment on the obtained characteristics to obtain characteristic values; cascading all the pooled feature values to obtain a final feature vector; classifying the final feature vectors to obtain a head entity fusing neighbor node information; the process of obtaining the tail entity fusing the neighbor node information is the same as that of the head entity.

9. The method of claim 8, wherein the number of neighbor nodes collected by each node is 20.