CN113111134A

CN113111134A - Self-coding and attention mechanism-based heterogeneous graph node feature embedding method

Info

Publication number: CN113111134A
Application number: CN202110428607.XA
Authority: CN
Inventors: 舒明雷; 王沐晨; 李钊; 高天雷
Original assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-07-13

Abstract

A heterogeneous graph node feature embedding method based on self-coding and attention mechanism is established on a heterogeneous graph constructed by a master node, a slave node and a corresponding relation. The master node is associated with the master node, and the master node comprises slave nodes. And aggregating the characteristics of the slave nodes through the inclusion relationship and coding the characteristics to the master node. And then, fusing neighbor characteristics around the main node through the incidence relation to obtain the main node embedding expression. And finally, judging whether the master node is embedded and represented by the slave node before coding or not by calculating the similarity between the slave node before coding and the slave node after decoding. Compared with other methods, the method has the advantages that node embedding is realized in the abnormal composition graph, firstly, from the angle of graph data structure, the processing range is wider, and the same composition graph is converted into the abnormal composition graph; secondly, through the operations of aggregation-fusion-reverse aggregation, the final result is more representable by transmitting the characteristic information of different types of nodes.

Description

Self-coding and attention mechanism-based heterogeneous graph node feature embedding method

Technical Field

The invention relates to the technical field of heterogeneous graph node embedding, in particular to a heterogeneous graph node feature embedding method based on self-coding and attention mechanism.

Background

The graph shows the most direct data representation among entities in real application. But the scale is large, the dimension is high, the application is difficult, and the development of the graph embedding technology and the graph neural network solves the problem. While the related art map embedding technique is sufficient to compress high weft data to low weft, it still faces the following problems:

1) mostly, the embedding representation is carried out on all nodes in the complete graph or the graph, and the embedding representation cannot be carried out on partial nodes in the graph

2) Most of the methods are embedded and expressed for the same composition, and the expression methods related to different compositions are relatively few.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a heterogeneous graph node feature embedding method based on self-coding and attention mechanism, which has wide processing range and more representable final result.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a heterogeneous graph node feature embedding method based on self-coding and attention mechanism comprises the following steps:

a) constructing an abnormal composition picture, wherein the abnormal composition picture is composed of a plurality of main nodes and a plurality of slave nodes, the main nodes contain one type, the slave nodes contain a plurality of sub-types, the main nodes are mutually associated to form an association relationship, and the main nodes and the slave nodes are associated to form an inclusion relationship;

b) aggregating slave node characteristics contained in the master node;

c) traversing the relation contained in the current master node, and taking the node characteristics of all slave nodes of the contained relation formed by one master node;

d) repeating the step b) and the step c) until the relations included by all the main nodes in the abnormal graph are traversed, and storing the original characteristic of the main nodes as feat_RmainUpdating the characteristics of all the main nodes to obtain a primary representation of the main nodes;

e) by means of the incidence relation among the main nodes, the attention neural network GAT is used as a fusion device, an SELU activation function is adopted, and the attention mechanism is combined to fuse the characteristic information of the main node and the first-order neighbor nodes of the main node, so that the characteristics of all the main nodes are updated, and main node embedding representation is obtained;

f) reversely aggregating the main nodes to obtain the characteristics of the slave nodes;

g) calculating the similarity between all the slave node characteristics in the step f) and all the slave node characteristics in the step b);

h) repeating the steps b) to g) for N times or more, and taking the embedded representation in the step e) as the final embedded representation of the master node in the abnormal picture.

Further, in step b), the formula is used

Calculating to obtain the corresponding characteristic of the e-th inclusion relation of the ith host node

Wherein SELU (-) is the SELU activation function, GCN (-) is the GCN graph convolution process,

as the jth slave node feature, N (main)_i) The number of slave nodes corresponds to the ith master node.

Further, in step c), the formula is used

Calculating the node characteristics of all the slave nodes of the inclusion relationship formed by the ith master node

In the formula

And for the number of edge types contained between all slave nodes of the containing relationship formed by the ith master node and the ith master node, mean (-) is calculated by taking the average value, and concat (-) is splicing operation.

Further, all the main nodes eat through assignment operation in the step d)_mainUpdating to obtain a preliminary representation.

Further, in the step e), all the main nodes are feat through assignment operation_mainAnd updating to obtain the embedded representation of the main node.

Further, step f) is performed by the formula

Calculating the corresponding characteristics of the ith slave node after obtaining the reverse set

The node characteristics of all slave nodes constituting the containing relation for the j-th master node,

the original characteristics of the j master nodes.

Further, in step g), the formula Loss ═ SmoothL1 (feat) is used_Nsub,feat_sub) Calculating all slave node characteristics feat after obtaining reverse set_NsubAnd all slave node primitive features feat_subThe similarity difference Loss, SmoothL1(.) of (A) is a SmoothL1 Loss function.

Preferably, N is 100.

The invention has the beneficial effects that: the method is established on an abnormal graph constructed by the master node, the slave node and the corresponding relation. The master node is associated with the master node, and the master node comprises slave nodes. And aggregating the characteristics of the slave nodes through the inclusion relationship and coding the characteristics to the master node. And then, fusing neighbor characteristics around the main node through the incidence relation to obtain the main node embedding expression. And finally, judging whether the master node is embedded and represented by the slave node before coding or not by calculating the similarity between the slave node before coding and the slave node after decoding. Compared with other methods, the method has the advantages that node embedding is realized in the abnormal composition graph, firstly, from the angle of graph data structure, the processing range is wider, and the same composition graph is converted into the abnormal composition graph; secondly, through the operations of aggregation-fusion-reverse aggregation, the final result is more representable by transmitting the characteristic information of different types of nodes.

Drawings

FIG. 1 is a diagram illustrating the relationship between master and slave nodes of a heterogeneous graph

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

The invention will be further explained with reference to fig. 1 and 2.

a) as shown in fig. 1, an abnormal graph is constructed, the abnormal graph is composed of a plurality of main nodes and a plurality of slave nodes, the main nodes are of one type, the slave nodes contain a plurality of sub-types, and the abnormal graph is a simple graph without isolated points. The main nodes are associated with each other to form an association relation, and the main nodes and the slave nodes are associated with each other to form an inclusion relation. That is, the master node includes slave nodes, and the slave nodes have no correlation with each other.

b) The master node of the aggregation contains slave node characteristics.

c) And traversing the relation contained by the current master node, and taking the node characteristics of all slave nodes of the contained relation formed by one master node.

d) Repeating the step b) and the step c) until the relations included by all the main nodes in the abnormal graph are traversed, and storing the original characteristic of the main nodes as feat_RmainAnd updating all the main node characteristics to obtain a primary representation of the main node characteristics.

e) By means of the incidence relation among the main nodes, the attention neural network GAT is used as a fusion device, an SELU activation function is adopted, and the attention mechanism is combined to fuse the characteristic information of the main node and the first-order neighbor nodes of the main node, so that the characteristics of all the main nodes are updated, and the embedded expression of the main nodes is obtained.

f) And reversely aggregating the master node to obtain the slave node characteristics.

g) Calculating the similarity of all the slave node characteristics in the step f) and all the slave node characteristics in the step b).

The method is established on an abnormal graph constructed by the master node, the slave node and the corresponding relation. The master node is associated with the master node, and the master node comprises slave nodes. And aggregating the characteristics of the slave nodes through the inclusion relationship and coding the characteristics to the master node. And then, fusing neighbor characteristics around the main node through the incidence relation to obtain the main node embedding expression. And finally, judging whether the master node is embedded and represented by the slave node before coding or not by calculating the similarity between the slave node before coding and the slave node after decoding. Compared with other methods, the method has the advantages that node embedding is realized in the abnormal composition graph, firstly, from the angle of graph data structure, the processing range is wider, and the same composition graph is converted into the abnormal composition graph; secondly, through the operations of aggregation-fusion-reverse aggregation, the final result is more representable by transmitting the characteristic information of different types of nodes.

Example 1:

in step b) by the formula

Example 2:

in step c) by the formula

In the formula

Example 3:

in the step d), all the main nodes are feat through assignment operation_mainUpdating to obtain a preliminary representation.

Example 4:

in the step e), all the main nodes are feat through assignment operation_mainAnd updating to obtain the embedded representation of the main node.

Example 5:

in step f) by the formula

the original characteristics of the j master nodes.

Example 6:

step g) by the formula Loss SmoothL1 (feat)_Nsub,feat_sub) Calculating all slave node characteristics feat after obtaining reverse set_NsubAnd all slave node primitive features feat_subThe similarity difference Loss, SmoothL1(.) of (A) is a SmoothL1 Loss function.

Example 7:

and N takes the value of 100.

Claims

1. A heterogeneous graph node feature embedding method based on self-coding and attention mechanism is characterized by comprising the following steps:

b) aggregating slave node characteristics contained in the master node;

2. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 1, wherein: in step b) by the formula

3. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 2, wherein: in step c) by the formula

In the formula

4. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 3, wherein: in the step d), all the main nodes are feat through assignment operation_mainUpdating to obtain a preliminary representation.

5. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 3, wherein: in the step e), all the main nodes are feat through assignment operation_mainAnd updating to obtain the embedded representation of the main node.

6. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 3, wherein: in step f) by the formula

Is calculated toFeature corresponding to ith slave node after reverse set

the original characteristics of the j master nodes.

7. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 6, wherein: step g) by the formula Loss SmoothL1 (feat)_Nsub,feat_sub) Calculating all slave node characteristics feat after obtaining reverse set_NsubAnd all slave node primitive features feat_subThe similarity difference Loss, SmoothL1(.) of (A) is a SmoothL1 Loss function.

8. The self-coding and attention mechanism-based heterogeneous graph node feature embedding method according to claim 1, wherein: and N takes the value of 100.