CN112989842A - Construction method of universal embedded framework of multi-semantic heterogeneous graph - Google Patents

Construction method of universal embedded framework of multi-semantic heterogeneous graph Download PDF

Info

Publication number
CN112989842A
CN112989842A CN202110215070.9A CN202110215070A CN112989842A CN 112989842 A CN112989842 A CN 112989842A CN 202110215070 A CN202110215070 A CN 202110215070A CN 112989842 A CN112989842 A CN 112989842A
Authority
CN
China
Prior art keywords
semantic
node
attention
hnse
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110215070.9A
Other languages
Chinese (zh)
Inventor
王瑞锦
张志扬
张凤荔
周世杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110215070.9A priority Critical patent/CN112989842A/en
Publication of CN112989842A publication Critical patent/CN112989842A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for constructing a universal embedded frame of a multi-semantic heterogeneous graph, which comprises the following steps: step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures; step 2: constructing an HNSE model based on alpha-expansion, wherein the HNSE model comprises an alpha-expansion neighborhood exploration layer, a multi-semantic learning layer and a node classification layer, and learning nodes are embedded in a low-dimensional mode while heterogeneous information and semantic information of the nodes are kept; and step 3: the method comprises the following steps of realizing a multi-layer HNSE model in a residual error mode, and connecting a full-connection output layer behind the multi-layer HNSE model; and 4, step 4: three extended strategies for HNSE were constructed. The invention embeds each vertex of the multi-semantic heterogeneous graph by aggregating adjacent/meta-path neighbor nodes of different types, designs a node aggregation sampling strategy combining the meta-path neighbor and the direct neighbor for HNSE, so as to guide a multi-head attention mechanism in the HNSE, and improves the capture of multi-semantic information of the node by utilizing the meta-path.

Description

Construction method of universal embedded framework of multi-semantic heterogeneous graph
Technical Field
The invention relates to the field of graph neural networks, in particular to a method for constructing a universal embedded framework of a multi-semantic heterogeneous graph.
Background
The graph embedding work realizes the applications of node classification, link prediction and the like on a topological graph by extracting the depth characteristic representation of nodes in the graph, along with the multimode of various network structures, the latest graph embedding method gradually abandons the modeling method of a homogeneous information network, focuses on modeling the interconnected graph data into a heterogeneous information network formed by different types of nodes and edges, and utilizes comprehensive structural information and rich semantic information in the network to discover more accurate knowledge. Compared with a homogeneous network, the heterogeneous network has the advantages that the multi-type objects and the relations coexist, and the heterogeneous network contains rich structure and semantic information, so that a novel accurate and interpretable way is provided for discovering the hidden mode. For example, the heterogeneous network of the recommendation system no longer has only two objects, namely, users and commodities, but includes more comprehensive contents such as shops and brands, and the relationship no longer has only purchases, but includes more elaborate interaction such as collection and favorites. Based on the information, semantic mining methods such as meta-paths, meta-maps, attribute heterogeneous networks and the like are utilized to generate more detailed knowledge discovery, such as improving interpretability and accuracy of a recommendation system.
Heterogeneous graphs contain more than two types of nodes or edges. Due to the particularity of heterogeneous networks, the early expression learning methods of many homogeneous networks cannot be directly applied to the heterogeneous networks, and two main challenges exist:
(1) heterogeneity of nodes and edges. Different types of nodes and edges represent different semantics, and therefore representation learning of heterogeneous networks requires mapping different types of objects into different spaces. In addition, how to save heterogeneous neighbors of each node and how to handle heterogeneous node sequences is also a matter of considerable interest.
(2) The problem of multi-semantic description brought by rich information in heterogeneous networks. The heterogeneous network describes the semantics of the nodes from multiple dimensions, and how to effectively extract and utilize multi-dimensional information and abstract the multi-dimensional information into semantic information endowed to the nodes so as to obtain comprehensive node representation is also a great challenge.
The Multi-Semantic Heterogeneous graph (Multi-Semantic Heterogeneous graph) is more complex than the common Heterogeneous graph, and the Multi-Semantic Heterogeneous graph contains multiple Semantic information in each node on the basis of multiple edge attributes/node attributes, so that it is highly possible that different link information involved in a node represents multiple semantics contained in the node, but this consideration has two disadvantages: first, this idea weakens one of the most fundamental elements of graph embedding: we work on a complex graph, not on a set of links. Secondly, even if the model for network embedding based on link paths achieves excellent effect on a certain graph structure, it is difficult to ensure that the model will work well on another graph structure, and worse, if facing a graph structure in which it is difficult to establish an effective link, such a method based on specific links cannot achieve good effect no matter how the parameters are adjusted.
In recent years, deep neural networks have enjoyed great success in the fields of computer vision and natural language processing. Some work also started trying to model different types of data in homogeneous/heterogeneous networks using depth models. Compared with a shallow model, the depth model can better capture the nonlinear relation, so that the complex semantic information contained in the nodes is extracted.
The graph convolution network propagates the structural information of the graph layer by executing neighborhood convolution operation, and the graph embedding method is free from the trouble of link guidance learning; but all focus on improving the underlying neural network model of the graph, such as introducing self-encoders, or improving the sampling work of the nodes, which have made some progress, but at the same time indicate a point: the meta-path is an element which is difficult to choose in the study of the heterogeneous graph representation, generally speaking, the meta-path splits the original structure of the graph, but simplifies the description of multiple semantic information to a great extent, which leads to that the current multi-semantic heterogeneous network embedding model cannot take the graph structure capture and the multiple semantic capture into account.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method for constructing a universal embedded framework of a multi-semantic heterogeneous map.
The purpose of the invention is realized by the following technical scheme:
the construction method of the universal embedded framework of the multi-semantic-heterogeneous graph comprises the following steps:
step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures and realize the capture of specific semantic neighbors;
step 2: constructing an HNSE model based on alpha-expansion, wherein the HNSE model comprises an alpha-expansion neighborhood exploration layer, a multi-semantic learning layer and a node classification layer, and learning nodes are embedded in a low-dimensional mode while heterogeneous information and semantic information of the nodes are kept;
and step 3: the method comprises the following steps of realizing a multi-layer HNSE model in a residual error mode, and connecting a full-connection output layer behind the multi-layer HNSE model;
and 4, step 4: three extension strategies of HNSE are constructed, including a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, so as to meet the requirements of heterogeneous networks of different types on an embedded framework.
Further, the neighborhood exploration strategy α -exploration comprises the steps of:
step 101: defining a parameter alpha for guiding the exploration direction;
step 102: for node V in a given hetero-Ag and meta path phiiEach direct neighbor V ofjIf the probability of alpha is not operated, jumping to the next direct neighbor node; biased walk sampling with 1-alpha probability and using the meta-path neighbors obtained from the walk sampling
Figure BDA0002952896620000021
Replacing the original direct neighbor Vj
Step 103: adding the direct neighbors into the neighbor set according to the probability of alpha; the meta-path neighbors are added to the set of neighbors with a probability of 1-alpha.
Further, the multi-semantic learning layer comprises specific semantic learning and multi-semantic merging; the specific semantic learning and the multiple semantics are respectively applied to different positions of a frame; for a certain special semantic meaning of a node, utilizing alpha-exploration to explore the neighborhood of the node, and using an attention mechanism to aggregate the obtained neighbor information; on the overall framework view, a multi-head mechanism is used to combine the different semantics of the nodes.
Further, the specific semantics learning takes the meta path as a guide to learn the specific semantics of the node;
applying a layer of linear mapping to each node and a neighbor node set obtained based on the meta-path of the node, wherein the neighbor node set is a neighbor node set of the node and the neighbor node set is different from the node type of the node, so that the nodes of different types are mapped to a uniform feature space;
then, each neighbor and V are calculatediThe high-dimensional node features are mapped into original attention scores by the attention coefficient;
finally, the attention coefficients are weighted and aggregated.
Further, the multiple semantic merging specifically includes: after the characteristic representation of the nodes under the specific semantics is obtained, combining a plurality of semantic information by using a multi-head mechanism so as to complete the combination of the multi-semantic characteristic representation of the nodes; each of the multiple head mechanisms is assigned a different semantic learning task.
Further, the number of the attention heads is equal to the number of the meta-paths, and if the multi-head attention mechanism is executed on the final layer of the network, semantic information on each attention head is aggregated in an averaging manner.
Further, the step 3 further comprises model training; the model training specifically comprises: after the final embedding of the nodes is obtained, the nodes are applied to different downstream tasks, and different loss functions are designed; for the semi-supervised node classification task, finally embedding the node classification task into a full connection layer of a softmax function so as to carry out node classification labels; cross-entropy loss is minimized under the direction of the marker data.
Further, the shared attention mechanism is specifically as follows: a shared attention weight is applied for neighboring nodes of the same type.
Further, the multi-semantic self-attention layer calculates the importance difference of specific semantics represented by different attention heads by improving the aggregation operation of multi-head attention so as to know the importance of each semantic in the task.
The invention has the beneficial effects that: embedding each vertex of the multi-semantic-heterogeneous graph by aggregating adjacent/meta-path neighbor nodes of different types, and designing a node aggregation sampling strategy combining the meta-path neighbor and the direct neighbor for HNSE to guide a multi-head attention mechanism in the HNSE; meanwhile, 3 variants are provided for HNSE, namely a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, and comprehensive experiments of HNSE on three popular data sets show that the node classification precision of the method provided by the scheme on a multi-semantic heterogeneous graph is comprehensively superior to that of the latest method, and the capture of multi-semantic information of the nodes is improved by using meta-paths.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is an architectural diagram of the HNSE of the present invention.
FIG. 3 is a graph comparing the classification performance of HNSE on various data sets.
Fig. 4 is a graph of the impact of the number of attention heads in M _ HSNE on its Mic F1 on the IMDB and Aminer data sets.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
In this embodiment, as shown in fig. 1, the method for constructing a universal embedded framework of a multi-semantic-heterogeneous graph includes the following steps:
step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures and realize the capture of specific semantic neighbors;
step 2: constructing an HNSE model based on alpha-expansion, wherein the HNSE model comprises an alpha-expansion neighborhood exploration layer, a multi-semantic learning layer and a node classification layer, and learning nodes are embedded in a low-dimensional mode while heterogeneous information and semantic information of the nodes are kept;
and step 3: the method comprises the following steps of realizing a multi-layer HNSE model in a residual error mode, and connecting a full-connection output layer behind the multi-layer HNSE model;
and 4, step 4: three extension strategies of HNSE are constructed, including a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, so as to meet the requirements of heterogeneous networks of different types on an embedded framework.
1. Frame structure
The scheme firstly defines the basic concept and problem definition of the multi-meaning heterogeneous map, and then divides the HNSE framework into 2 parts: and finally, the framework can be spliced with any downstream task learning framework.
1.1 problem definition:
heterogeneous graph: the Ag is composed of a set of vertices V and edges epsilon, and the heterogeneous graph contains a set of node types A and a set of edge types R, and satisfies | A | + | R | > 2. Each V ∈ V belongs to a node type in a, with a mapping function Φ: v → A; each e ∈ epsilon belongs to an edge type in R, and a mapping function is used
Figure BDA0002952896620000041
And (4) showing.
1.2 neighborhood exploration strategy
1.2.1 neighbor definition of nodes
If the neighborhood aggregation sampling of a node is analogized to the random walk sampling, 2 neighbor exploration strategies on the heterogeneous graph can be defined: neighborhood exploration based on meta-path: the meta path is a basic structure in the heterogeneous graph network, and emphasizes the association between the meta path and semantic information; one meta-path usually corresponds to one semantic, while a heterogeneous graph may contain multiple meta-paths, we denote a meta-path by Φ. For a certain node and a meta-path phi on the heterogeneous graph, a group of neighbors determined by the meta-path phi is called meta-path neighbors of the node relative to the meta-path phi. Given a sectionA point i and a meta-path phi in an abnormal graph, and a neighbor of the point i based on the meta-path
Figure BDA0002952896620000042
Defined as a set of nodes connected to node i by meta-path Φ, the meta-path based neighbor of a node comprises the node itself. The meta-path neighbors have the characteristics that the meta-path neighbors can usually explore a larger range on a heterogeneous graph and can better judge the isomorphism between nodes; secondly, the meta path carries out intuitive division on semantic information, so that the meta path has an important indication function when learning a multi-semantic complex heterogeneous network. But the disadvantage is the splitting of the heterogeneous graph network structure, resulting in the weakening of the structural equivalence between nodes.
Direct neighborhood exploration: the direct neighborhood Ni is a neighborhood of node i in the graph, and in order to make no intersection between the direct neighbor and the meta-path neighbor, unlike the previous setup, the node itself is not added to its direct neighbor set. The direct neighborhood exploration strategy pays special attention to local information near nodes on a heterogeneous graph, based on graph convolution or graph attention network, the neighborhood exploration strategy can completely and effectively represent and learn a network structure, but is limited to local neighborhoods of the nodes, if a larger area is sought to be explored, the number of layers of the network needs to be deepened or single-layer parameters need to be increased (the order of aggregation neighbors is increased), the complexity of the model is greatly increased, and meanwhile, the problems of excessive smoothness and the like are brought.
The direct neighbors better reserve the graph structure information around the nodes, and the meta-path neighbors more clearly distinguish the semantic information carried by the nodes. We can consider that the neighborhood exploration based on meta-path is more biased towards DFS (depth first exploration), focusing on the macroscopic attempt of the node; while direct neighbor-based neighborhood exploration favors BFS (breadth-first exploration), focusing on the microscopic view of the node. Based on the method, the two neighborhood exploration strategies are smoothly spliced to adapt to different heterogeneous networks. Whether to explore the heterogeneous graph in a manner biased towards DFS or BFS is decided by a parameter α.
1.2.2. alpha. -exhibition strategy
Given a hetero-Ag ═ (V, ε) and primitive path
Figure BDA0002952896620000051
For node viEach direct neighbor v ofjThe probability of alpha is used as the probability, no operation is carried out, and the next direct neighbor node is jumped to; biased walk sampling with probability 1- α as follows:
Figure BDA0002952896620000052
wherein
Figure BDA0002952896620000053
Is shown as
Figure BDA0002952896620000054
The node type of (1) is direct neighbor of t +1, and the random walk based on the meta-path is a slave node
Figure BDA0002952896620000055
Starting, but with the guide element path at node viAs start nodes, not vj. The meta-path is here set to a symmetric format to better capture the isomorphism between nodes:
Figure BDA0002952896620000056
after the biased walking sampling is completed, we replace the direct neighbors with meta-path neighbors obtained by the walking sampling:
Figure BDA0002952896620000057
wherein
Figure BDA0002952896620000058
And is
Figure BDA0002952896620000059
And vjConnected by meta-path phi. Finally, we add the direct neighbors to the set of neighbors with a probability of α
Figure BDA0002952896620000061
With a probability of 1-alpha, we add meta-path neighbors to
Figure BDA0002952896620000062
The algorithm is as follows:
inputting: ag, cell path phi, node vi
And (3) outputting: neighbor set of nodes
Figure BDA00029528966200000612
Figure BDA0002952896620000063
Exploring node A, for example, using alpha-exploration1The neighbor node of (1) guides the meta-path phi as an author->Paper->Author APA, then its direct neighbor node is Ni={P1,P2,A2Is the element path neighbor
Figure BDA0002952896620000064
For P1Directly adding it with a probability of alpha
Figure BDA0002952896620000065
Otherwise according to meta path' A1->P1->A3", will A3Adding into
Figure BDA0002952896620000066
For P2Directly adding it with a probability of alpha
Figure BDA0002952896620000067
Otherwise according to meta path' A1->P2->A2", will A2Adding into
Figure BDA0002952896620000068
For A2Due to "A1->A2->Any node "cannot constitute a path of the director, so A is directly connected2Adding into
Figure BDA0002952896620000069
Then the last one
Figure BDA00029528966200000610
May be: { P1,P2,A2},{A3,P2,A2},{P1,A2},{A3,A2}。
It can be clearly seen that the neighborhood exploration strategy follows a pattern gradual from DFS to BFS as a changes. Thus, the variation in the alpha value profoundly affects the prediction performance of the downstream embedded framework if the same alpha is used on different datasets. This difference will be seen on the different data sets. Therefore, for some data sets with reasonable and complete meta-path design schemes, α should be set to a smaller value, biasing the neighborhood exploration strategy towards DFS; for some datasets with fewer or incomplete meta-paths, α should be set to a larger value, biasing the neighborhood exploration strategy towards BFS.
1.3HNSE
As shown in FIG. 2, the HNSE model architecture diagram is preferred in the scheme, and the model aims to keep the heterogeneous information and semantic information of the nodes and learn the low-dimensional embedded representation of the nodes
Figure BDA00029528966200000611
Wherein n isΦ(i)Is the embedding space dimension for node type Φ (i).
In order to learn multi-semantic information of nodes, the scheme provides 2 methods, one method is to separate and learn the semantic information by using a meta path, and the other method is to combine different semantic information by using a multi-head mechanism; and applying the two methods respectively at different positions of the model: for a certain special semantic meaning of a node, the neighborhood of the node is explored by utilizing alpha-exploration, and obtained neighbor information is aggregated by utilizing an attention mechanism so as to help a model to more flexibly and effectively model the neighbor information of the node; on the whole model view, a multi-head mechanism is used to combine different semantics of the nodes.
And (3) specific semantic learning: in the meta path phikAs a guide to learn node viFor node v in order to preserve the heterogeneous information of the nodeiAnd it is based on phikDerived set of neighbor nodes
Figure BDA0002952896620000071
For each node type different from the node type
Figure BDA0002952896620000072
Applying a layer of linear mapping
Figure BDA0002952896620000073
To map different types of nodes into a unified feature space:
Figure BDA0002952896620000074
in particular, we use for the node itself and for the neighbor nodes of the same type as the node
Figure BDA0002952896620000075
As a mapping matrix, for the nodes themselves, this is equivalent to adding self-loops to the nodes. Then, the attention coefficient between each neighbor and each neighbor is calculated one by one:
Figure BDA0002952896620000076
wherein [ | · | ]]The concatenation of the vectors is represented and,
Figure BDA0002952896620000077
the method is a trainable parameter in a single-layer feedforward neural network and aims to map high-dimensional node features into original attention scores; then, the attention coefficients are weighted and aggregated:
Figure BDA0002952896620000078
multiple semantic merging: after the characteristic representation of the nodes under the specific semantics is obtained, combining a plurality of semantic information by using a multi-head mechanism so as to complete the combination of the multi-semantic characteristic representation of the nodes; it should be noted that the scheme allocates different semantic learning tasks to each attention head, so that the number of attention heads in the model is equal to the type of meta-path, which enables the model to learn the feature difference between different semantics on the basis of self-stabilization by using a multi-head mechanism. Specifically, the feature representations on each of the heads of attention are aggregated using stitching:
Figure BDA0002952896620000079
in particular, if a multi-head attention mechanism is performed on the final (prediction) layer of the network, the semantic information on each attention head is gathered in an averaging manner:
Figure BDA00029528966200000710
to this end, we have obtained viThe multi-semantic feature representation of the first-order neighborhood aggregation, because the direct neighborhood-based method needs a deeper model to explore a wider neighborhood; in most cases, a plurality of layers of specific semantic learning layers are used for capturing higher-order neighborhood information, and finally, a multi-layer HNSE model is realized in the form of residual errors, and a node v on the (m +1) layeriThe multi-semantic feature embedding can be expressed as
Figure BDA0002952896620000081
The multi-layer HNSE model is followed by a fully connected output layer (usually softmax or logical signature for classification problems). Overall, the model resembles a multi-head attention network.
1.4 model training
Obtaining a final embedded representation of the node
Figure BDA0002952896620000082
It can then be applied to different downstream tasks and different penalty functions designed. For the semi-supervised node classification task, the scheme adopts the following steps
Figure BDA0002952896620000083
The full connectivity layer of the softmax function is launched for node classification tagging. Cross-entropy loss is minimized under the direction of the marker data:
Figure BDA0002952896620000084
wherein, VlIs a set of nodes containing tags, yiAnd
Figure BDA0002952896620000085
are respectively a node viTrue class labels and predicted class labels.
The algorithm is as follows:
an INPUT: heterogeneous graph G, node characteristics hiThe attention head number K, the meta-path set Φ ═ Φ0,Φ1,Φ2,Φ3,…ΦK}
OUTPUT: node multi-semantic embedding
Figure BDA0002952896620000086
Attention alpha under specific semantics, semantic attention beta
For attention head 0to K do:
Figure BDA0002952896620000087
Figure BDA0002952896620000091
2. Extension policy
2.1 shared attention mechanism
A method for greatly reducing model parameters is to apply shared attention weight to neighbor nodes of the same type, and the scheme applies a node viWith its neighbours
Figure BDA0002952896620000092
The similarity measure of (d) is rewritten as:
Figure BDA0002952896620000093
wherein r represents a node viAnd
Figure BDA0002952896620000094
type of link between, for direct neighbors
Figure BDA0002952896620000095
If both neighbors are of type r with node viAre connected, they share the same attention vector
Figure BDA0002952896620000096
For meta path neighbors
Figure BDA0002952896620000097
Separate from direct neighbours
Figure BDA0002952896620000098
All meta-path neighbors share this attention vector. The shared attention mechanism can effectively reduce the modelsParameters and computation workload, but they apply the same attention weight by default when computing node neighbors linked by edges of the same type, so this approach works poorly on multi-semantic heterogeneous maps where there are few meta-paths and most of the semantic information needs to be learned through direct neighborhood.
2.2 Multi-headed semantic attention divergence regularization
Disagregation on Inputs: the input divergence regularization term aims to achieve divergence between the heads of attention by differentiating the input vectors of each head of attention. In the scheme, the divergence on the attention input is realized by calculating the cosine similarity between the characteristic vectors of the input nodes on different attention heads and maximizing the average cosine distance between all paired attention heads, and the applied regular terms are as follows:
Figure BDA0002952896620000099
wherein the content of the first and second substances,
Figure BDA00029528966200000910
and (4) representing the vector representation (the node type is n) of the node i after the feature space conversion, wherein k is the number of attention heads.
Disagregation on Outputs: output divergence similar to Disagrement on Inputs, we do this by maximizing z on each pair of attention headskThe average cosine distance between them to realize the function of semantic divergence:
Figure BDA00029528966200000911
2.3 Multi-semantic self-attention tier
This section considers the difference in importance between semantics, which is particularly important on some heterogeneous graph structures, and since each attention head represents a specific semantic, the K specific semantics of the existing output are embedded
Figure BDA0002952896620000101
The scheme improves the aggregation operation of multi-head attention by adding a self-attention layer to calculate the importance difference of specific semantics represented by different attention heads and know the importance of each semantic in a task. In this scheme, attention weights between semantics are calculated by the following formula:
Figure BDA0002952896620000102
where v is a trainable semantic attention vector, W is a parameter matrix, b is an offset, and the attention scores embedded for nodes containing a particular semantic are averaged prior to normalization. Finally, the K semantic embeddings are subjected to attention weighted summation:
Figure BDA0002952896620000103
3. experiment of
The experiment is used for embedding the multi-semantic heterogeneous graph into the effectiveness and efficiency of a framework for representing learning.
3.1 data set
IMDB: entities in the data set include actors, movies, and directors, wherein the types of links in which movies participate include movie-actor and movie-director (i.e., the types of direct neighbors of the movie type node include actor and director), and the movies are classified into three categories according to type labels: action, comedy or drama. The features of a movie are defined as its set of keywords. Finally, the section sets two meta-path schemes of MAM and MDM to execute the film classification task.
DBLP: constructing a subset of a DBLP dataset, the entities in the dataset including papers, authors, meetings, terms, wherein each author in turn belongs to one of four research areas: database, data mining, machine learning and information retrieval, the link type of the author is only author-paper, the papers are published by 20 different meetings, each author carries out label marking according to the research field of the paper, and the characteristics of the author are defined as the keyword set of the paper published by the author. Here, APA, APCPA and APTPA are set as meta-path schemes to perform the author classification task.
AMIner: we build a subset of the AMiner dataset, with entities in the dataset including papers, authors, and link types in which authors participate including author-author, author-paper. Similar to DBLP, each paper in AMiner features keyword bags, and both the paper and author are labeled as four research areas: database, data mining, natural language processing and computer vision, where APA and APCPA are taken as meta-path schemes to perform author classification tasks.
3.2 comparative experiment
Comparisons will be made here with some of the latest baseline methods on the above three datasets, including (heterogeneous) network embedding methods and graph neural network based methods, to verify the validity of the proposed HNSE framework.
metapath2vec designs meta-paths to guide random walk in an anomaly graph and then follows a jump syntax model to learn potential spatial representations of vertices.
The GCN is a semi-supervised graph convolution network designed for homogeneity graphs. We convert the isomorphic graph embedding methods (GCN and GAT) to a model for heterogeneous graph embedding learning by ignoring the type of nodes and links, with the GCN level set to 3.
GAT introduces an attention strategy to the GNN framework to enrich the feature representation of nodes by aggregating the features of direct neighbors. We set the number of GAT layers to 3 and the number of attention network nodes per layer to 8 x 8 in the comparative experiment.
HAN introduces two levels of hierarchical attention to GNNs, where the node level attention captures the relationship between adjacent nodes generated by one meta-path scheme, while the semantic level attention aggregation multi-path scheme sets the number of attention network nodes to 8 x 8 for each node in the graph.
HetSANN learns the feature representation of heterogeneous nodes by applying a graph neural network to focus on aggregating multi-relationship information of the projected neighborhood. We used a 3-layer attention mechanism in the comparative experiments, each consisting of 8 attention heads.
In addition, we also tested three extended versions of HNSE, which have verified the validity of the embedded learning methods contained therein, respectively.
Satt-HNSE applies shared attention weights to neighbor nodes of the same type on the basis of HNSE.
DI-HNSE performs divergence regularization of attention head inputs on the basis of HNSE.
DO-HNSE performs divergence regularization of the attention head output on the basis of HNSE.
M-HNSE changes the multi-semantic learning layer of HNSE into a self-attention layer so as to learn the importance difference of specific semantics represented by different attention heads.
3.3 Superparameter settings
For the proposed HNSE framework, parameters are initialized randomly and an Adam optimization model is used, the learning rate is set to be 0.005, the regularization parameter is set to be 0.001, the initial node embedding size is set to be 64, the number of hidden units of a specific semantic attention layer is set to be 8, and in addition, if the verification loss is not reduced for 100 continuous periods, the training is stopped.
Note that head K: we set to 8 uniformly, but it is composed differently on different datasets, for example, the AMiner dataset has only one meta-path (i.e. only one semantic information based on meta-path), so only a minimum of one attention head is needed to learn all meta-path semantics, in this case, this attention head is repeated 8 times here; for a DBLP dataset, however, it has three meta-paths, and therefore a minimum of three attention heads are required, where these three attention heads are repeated 3 times each, leaving the last attention head away so that the sum of the attention heads is the same.
Tuning parameters in the α -optimization strategy: to verify the flexibility of HNSE, i.e., the ability to adjust across different datasets, the experiment assigns different conditioning parameters, α, to different datasetsIMDBIs 0.5, alphaAMinerIs 0.75, alphaDBLPIs 0.25.
3.4 evaluation index
Evaluation indexes are as follows: here, 3 data sets are each expressed as 0.8: 0.1: the scale of 0.1 was randomly divided into a training set, a validation set, and a test set. Then, the best combination was selected in the validation set of each comparative model and then evaluated in the test set by Micro F1 and Macro F1. For each model, the average performance of 10 replicates will be reported, and the comparative experiment is shown in table 1.
TABLE 1 comparison of the Performance of each model
Figure BDA0002952896620000121
3.5 model analysis
The HNSE and its variant models achieved the best results in all classification tasks, which also verified the usability and flexibility of the models across different datasets. In the film classification task of the IMDB, the M-HNSE is respectively improved by 4.6% and 3.7% in Mic F1 and Mac F1 compared with the optimal baseline algorithm-HetSANN, and in the author classification task of the AMIner, the M-HNSE is respectively higher than the optimal baseline method-GAT by 4.0% and 5.9%, because the GAT for performing first-order neighborhood aggregation is similar to the HNSE when alpha is close to 1, the difference with the M-HNSE is only a layer of self-attention mechanism (applied to a plurality of attention heads), the learnable importance difference between semantics is verified, and the learning can be performed through the self-attention mechanism among multi-head attention; in the author classification task of DBLP, DI-HNSE and DO-HNSE achieve the purpose of learning specific semantics through the characteristic learning process of differentiating different meta-paths based on using a more perfect meta-path design scheme and a divergent multi-attention mechanism, wherein DO-HNSE achieves the best effect and is respectively improved by 3.6% and 3% on Mic F1 and Mac F1 compared with an optimal baseline model HAN; on the task, the M-HNSE as the most complex HNSE variant does not obtain the optimal precision, because in the DBLP data set, a plurality of semantics contained in the neighborhood of a node are clearly divided by a meta-path, and on the basis of defining the attention head as the meta-path, the effect of learning the difference between the semantics by applying a layer of attention mechanism is almost equal to the effect of applying the divergence regularization to the attention head, but the model complexity is increased.
While the classification performance of Satt-HNSE was less than that of the underlying HNSE model on all three datasets to the extent that it reduced the complexity of the model (Satt-HNSE model decreased 37% per iteration time when compared to HNSE on the IMDB dataset); therefore, it can be considered that on some complex heterogeneous graphs with extremely large number and types of nodes, which causes difficulty in attention weight training, the attention weight sharing is still needed to reduce the training consumption of the model in some scenarios.
3.6 analysis of parameters
The working principle and the variation range of the HNSE in the model are known by studying the parameters of some basic HNSE.
Tuning parameters in the α -optimization strategy: according to the scheme, values of alpha are changed on three data sets by fixing other parameters, and meanwhile, the classification performance of HNSE on each data set is reported. As shown in fig. 3, it can be seen that as α increases, F1 Score on the DBLP dataset first reaches a peak, then IMDB, and finally Aminer, which conforms to the sampling principle of α -expansion strategy, that is, the value of α depends on whether meta-path construction of the dataset is complete or whether multi-semantic information represented on different meta-paths is rich enough, for example, when a large α is used in the Aminer dataset, HNSE can only obtain semantic information of "same co-authored paper" from the meta-path of APA, but ignores most of semantic information in direct neighborhood, resulting in poor prediction accuracy; in the scheme, GAT and HAN are selected as comparison in the graph, because under two extreme alpha values, the model structure of HNSE is very close to the two methods, which is the reason that the two ends of the precision curve in the graph are close to the precision of GAT and HAN to some extent.
Note that the number of heads K: generally, note that the more heads, the better the performance of the model. However, since the multi-head attention mechanism in HNSE is more closely related to the training set data (meta-path number), the influence of the attention head number K in M _ HNSE on the Mic F1 of the IMDB and Aminer data sets is tested, as shown in fig. 4, on the Aminer data set, the multi-head attention mechanism is more responsible for the task of improving the stability of the model, and the improvement effect on the precision of the model is very small, but on the IMDB data set, as the exponential increase of K, the Mic F1 Score of the model grows more obviously, because compared with Aminer, the IMDB has more complicated meta-path types, so that the model depends more on multiple attention heads to perform different semantic learning tasks, namely task division with meta-paths, respectively, but when K grows to a certain extent, the model still shows an overfitting trend. When the number of the meta-paths implied in the data set is 2 to 4, the K is suggested to be 8, the precision and the complexity of the model can be considered, and certainly, when a heterogeneous graph containing more complex semantics is faced, the value of the K may need to be increased properly.
The method embeds each vertex of the multi-semantic heterogeneous graph by aggregating adjacent/meta-path neighbor nodes of different types, and designs a node aggregation sampling strategy combining the meta-path neighbor and the direct neighbor for HNSE to guide a multi-head attention mechanism in the HNSE; meanwhile, 3 variants are provided for HNSE, namely a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, and comprehensive experiments of HNSE on three popular data sets show that the node classification precision of the method provided by the scheme on a multi-semantic heterogeneous graph is comprehensively superior to that of the latest method.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (9)

1. The construction method of the universal embedded frame of the multi-semantic-heterogeneous graph is characterized by comprising the following steps of:
step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures;
step 2: constructing an HNSE model based on alpha-expansion, wherein the HNSE model comprises an alpha-expansion neighborhood exploration layer, a multi-semantic learning layer and a node classification layer, and learning nodes are embedded in a low-dimensional mode while heterogeneous information and semantic information of the nodes are kept;
and step 3: the method comprises the following steps of realizing a multi-layer HNSE model in a residual error mode, and connecting a full-connection output layer behind the multi-layer HNSE model;
and 4, step 4: three extension strategies for constructing HNSE include a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer.
2. The method for building a generic embedding framework for multi-semantic heterogeneous maps according to claim 1, characterized in that said neighborhood exploration strategy α -exploration comprises the following steps:
step 101: defining a parameter alpha for guiding the exploration direction;
step 102: for node V in a given hetero-Ag and meta path phiiEach direct neighbor V ofjIf the probability of alpha is not operated, jumping to the next direct neighbor node; biased walk sampling with 1-alpha probability and using the meta-path neighbors obtained from the walk sampling
Figure FDA0002952896610000011
Replacing the original direct neighbor Vj
Step 103: adding the direct neighbors into the neighbor set according to the probability of alpha; the meta-path neighbors are added to the set of neighbors with a probability of 1-alpha.
3. The method of claim 1, wherein the multi-semantic learning layer comprises specific semantic learning and multi-semantic merging; the specific semantic learning and the multiple semantics are respectively applied to different positions of a frame; for a certain special semantic meaning of a node, utilizing alpha-exploration to explore the neighborhood of the node, and using an attention mechanism to aggregate the obtained neighbor information; on the overall framework view, a multi-head mechanism is used to combine the different semantics of the nodes.
4. The method of claim 3, wherein the specific semantic learning takes meta path as guidance to learn specific semantics of nodes;
applying a layer of linear mapping to each node and a neighbor node set obtained based on the meta-path of the node, wherein the neighbor node set is a neighbor node set of the node and the neighbor node set is different from the node type of the node, so that the nodes of different types are mapped to a uniform feature space;
then, each neighbor and V are calculatediThe high-dimensional node features are mapped into original attention scores by the attention coefficient;
finally, the attention coefficients are weighted and aggregated.
5. The method of claim 3, wherein the multiple semantic merging is specifically: after the characteristic representation of the nodes under the specific semantics is obtained, combining a plurality of semantic information by using a multi-head mechanism so as to complete the combination of the multi-semantic characteristic representation of the nodes; each of the multiple head mechanisms is assigned a different semantic learning task.
6. The method of claim 5, wherein the number of the attention heads is equal to the number of meta-paths, and if the multi-head attention mechanism is performed at the final layer of the network, the semantic information on each attention head is aggregated by averaging.
7. The method of claim 1, wherein the step 3 further comprises model training; the model training specifically comprises: after the final embedding of the nodes is obtained, the nodes are applied to different downstream tasks, and different loss functions are designed; for the semi-supervised node classification task, finally embedding the node classification task into a full connection layer of a softmax function so as to carry out node classification labels; cross-entropy loss is minimized under the direction of the marker data.
8. The method of claim 1, wherein the shared attention mechanism is specifically: a shared attention weight is applied for neighboring nodes of the same type.
9. The method of claim 1, wherein the multi-semantic-self-attention layer calculates the difference in importance of the specific semantics represented by different heads of attention by improving the aggregation operation of multi-head attention to understand the importance of each semantic in the task.
CN202110215070.9A 2021-02-25 2021-02-25 Construction method of universal embedded framework of multi-semantic heterogeneous graph Pending CN112989842A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110215070.9A CN112989842A (en) 2021-02-25 2021-02-25 Construction method of universal embedded framework of multi-semantic heterogeneous graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110215070.9A CN112989842A (en) 2021-02-25 2021-02-25 Construction method of universal embedded framework of multi-semantic heterogeneous graph

Publications (1)

Publication Number Publication Date
CN112989842A true CN112989842A (en) 2021-06-18

Family

ID=76350945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110215070.9A Pending CN112989842A (en) 2021-02-25 2021-02-25 Construction method of universal embedded framework of multi-semantic heterogeneous graph

Country Status (1)

Country Link
CN (1) CN112989842A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254803A (en) * 2021-06-24 2021-08-13 暨南大学 Social recommendation method based on multi-feature heterogeneous graph neural network
CN113298319A (en) * 2021-06-22 2021-08-24 东北大学秦皇岛分校 Traffic speed prediction method based on skip map attention gating cycle network
CN113869461A (en) * 2021-07-21 2021-12-31 中国人民解放军国防科技大学 Author migration and classification method for scientific cooperation heterogeneous network
CN113869992A (en) * 2021-12-03 2021-12-31 平安科技(深圳)有限公司 Artificial intelligence based product recommendation method and device, electronic equipment and medium
CN114168804A (en) * 2021-12-17 2022-03-11 中国科学院自动化研究所 Similar information retrieval method and system based on heterogeneous subgraph neural network
CN114547408A (en) * 2022-01-18 2022-05-27 北京工业大学 Similar student searching method based on fine-grained student space-time behavior heterogeneous network representation
CN114756713A (en) * 2022-03-17 2022-07-15 哈尔滨工业大学(威海) Graph representation learning method based on multi-source interaction fusion
CN114936907A (en) * 2022-06-15 2022-08-23 山东大学 Commodity recommendation method and system based on node type interaction
CN117390521A (en) * 2023-12-11 2024-01-12 福建理工大学 Social heterogeneous graph node classification method integrating deep semantic graph convolution

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298319B (en) * 2021-06-22 2022-03-08 东北大学秦皇岛分校 Traffic speed prediction method based on skip map attention gating cycle network
CN113298319A (en) * 2021-06-22 2021-08-24 东北大学秦皇岛分校 Traffic speed prediction method based on skip map attention gating cycle network
CN113254803A (en) * 2021-06-24 2021-08-13 暨南大学 Social recommendation method based on multi-feature heterogeneous graph neural network
CN113869461B (en) * 2021-07-21 2024-03-12 中国人民解放军国防科技大学 Author migration classification method for scientific cooperation heterogeneous network
CN113869461A (en) * 2021-07-21 2021-12-31 中国人民解放军国防科技大学 Author migration and classification method for scientific cooperation heterogeneous network
CN113869992A (en) * 2021-12-03 2021-12-31 平安科技(深圳)有限公司 Artificial intelligence based product recommendation method and device, electronic equipment and medium
CN113869992B (en) * 2021-12-03 2022-03-18 平安科技(深圳)有限公司 Artificial intelligence based product recommendation method and device, electronic equipment and medium
CN114168804A (en) * 2021-12-17 2022-03-11 中国科学院自动化研究所 Similar information retrieval method and system based on heterogeneous subgraph neural network
CN114168804B (en) * 2021-12-17 2022-06-10 中国科学院自动化研究所 Similar information retrieval method and system based on heterogeneous subgraph neural network
CN114547408A (en) * 2022-01-18 2022-05-27 北京工业大学 Similar student searching method based on fine-grained student space-time behavior heterogeneous network representation
CN114547408B (en) * 2022-01-18 2024-04-02 北京工业大学 Similar student searching method based on fine-grained student space-time behavior heterogeneous network characterization
CN114756713A (en) * 2022-03-17 2022-07-15 哈尔滨工业大学(威海) Graph representation learning method based on multi-source interaction fusion
CN114936907A (en) * 2022-06-15 2022-08-23 山东大学 Commodity recommendation method and system based on node type interaction
CN114936907B (en) * 2022-06-15 2024-04-30 山东大学 Commodity recommendation method and system based on node type interaction
CN117390521A (en) * 2023-12-11 2024-01-12 福建理工大学 Social heterogeneous graph node classification method integrating deep semantic graph convolution
CN117390521B (en) * 2023-12-11 2024-03-19 福建理工大学 Social heterogeneous graph node classification method integrating deep semantic graph convolution

Similar Documents

Publication Publication Date Title
CN112989842A (en) Construction method of universal embedded framework of multi-semantic heterogeneous graph
Sun et al. Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems
Ghosh et al. The journey of graph kernels through two decades
CN113918832B (en) Graph convolution collaborative filtering recommendation system based on social relationship
CN113918833B (en) Product recommendation method realized through graph convolution collaborative filtering of social network relationship
Shi et al. Rhine: relation structure-aware heterogeneous information network embedding
CN113918834B (en) Graph convolution collaborative filtering recommendation method fusing social relations
Xiao et al. Link prediction based on feature representation and fusion
Kalintha et al. Kernelized evolutionary distance metric learning for semi-supervised clustering
CN112990431A (en) Neighborhood exploration method based on heterogeneous graph neural network
CN114461907A (en) Knowledge graph-based multi-element environment perception recommendation method and system
Liu et al. Effective model integration algorithm for improving link and sign prediction in complex networks
Chen et al. Heterogeneous graph convolutional network with local influence
Shi et al. Heterogeneous Graph Representation Learning and Applications
Gong et al. Exploring temporal information for dynamic network embedding
Li et al. Adaptive subgraph neural network with reinforced critical structure mining
Luo et al. Predicting protein-protein interactions using sequence and network information via variational graph autoencoder
Fu et al. Robust representation learning for heterogeneous attributed networks
Li et al. Self-supervised nodes-hyperedges embedding for heterogeneous information network learning
Chen et al. Gaussian mixture embedding of multiple node roles in networks
CN116257696A (en) Service recommendation method and system based on cross-modal knowledge graph comparison learning
Wang et al. Enabling inductive knowledge graph completion via structure-aware attention network
Ma et al. Self-supervised learning for heterogeneous graph via structure information based on metapath
Liu Large-scale machine learning for classification and search
Liu et al. An improved two-stage label propagation algorithm based on LeaderRank

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination