CN112989842A

CN112989842A - Construction method of universal embedded framework of multi-semantic heterogeneous graph

Info

Publication number: CN112989842A
Application number: CN202110215070.9A
Authority: CN
Inventors: 王瑞锦; 张志扬; 张凤荔; 周世杰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-18

Abstract

The invention discloses a method for constructing a universal embedded frame of a multi-semantic heterogeneous graph, which comprises the following steps: step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures; step 2: constructing an HNSE model based on alpha-expansion, wherein the HNSE model comprises an alpha-expansion neighborhood exploration layer, a multi-semantic learning layer and a node classification layer, and learning nodes are embedded in a low-dimensional mode while heterogeneous information and semantic information of the nodes are kept; and step 3: the method comprises the following steps of realizing a multi-layer HNSE model in a residual error mode, and connecting a full-connection output layer behind the multi-layer HNSE model; and 4, step 4: three extended strategies for HNSE were constructed. The invention embeds each vertex of the multi-semantic heterogeneous graph by aggregating adjacent/meta-path neighbor nodes of different types, designs a node aggregation sampling strategy combining the meta-path neighbor and the direct neighbor for HNSE, so as to guide a multi-head attention mechanism in the HNSE, and improves the capture of multi-semantic information of the node by utilizing the meta-path.

Description

Construction method of universal embedded framework of multi-semantic heterogeneous graph

Technical Field

The invention relates to the field of graph neural networks, in particular to a method for constructing a universal embedded framework of a multi-semantic heterogeneous graph.

Background

The graph embedding work realizes the applications of node classification, link prediction and the like on a topological graph by extracting the depth characteristic representation of nodes in the graph, along with the multimode of various network structures, the latest graph embedding method gradually abandons the modeling method of a homogeneous information network, focuses on modeling the interconnected graph data into a heterogeneous information network formed by different types of nodes and edges, and utilizes comprehensive structural information and rich semantic information in the network to discover more accurate knowledge. Compared with a homogeneous network, the heterogeneous network has the advantages that the multi-type objects and the relations coexist, and the heterogeneous network contains rich structure and semantic information, so that a novel accurate and interpretable way is provided for discovering the hidden mode. For example, the heterogeneous network of the recommendation system no longer has only two objects, namely, users and commodities, but includes more comprehensive contents such as shops and brands, and the relationship no longer has only purchases, but includes more elaborate interaction such as collection and favorites. Based on the information, semantic mining methods such as meta-paths, meta-maps, attribute heterogeneous networks and the like are utilized to generate more detailed knowledge discovery, such as improving interpretability and accuracy of a recommendation system.

Heterogeneous graphs contain more than two types of nodes or edges. Due to the particularity of heterogeneous networks, the early expression learning methods of many homogeneous networks cannot be directly applied to the heterogeneous networks, and two main challenges exist:

(1) heterogeneity of nodes and edges. Different types of nodes and edges represent different semantics, and therefore representation learning of heterogeneous networks requires mapping different types of objects into different spaces. In addition, how to save heterogeneous neighbors of each node and how to handle heterogeneous node sequences is also a matter of considerable interest.

(2) The problem of multi-semantic description brought by rich information in heterogeneous networks. The heterogeneous network describes the semantics of the nodes from multiple dimensions, and how to effectively extract and utilize multi-dimensional information and abstract the multi-dimensional information into semantic information endowed to the nodes so as to obtain comprehensive node representation is also a great challenge.

The Multi-Semantic Heterogeneous graph (Multi-Semantic Heterogeneous graph) is more complex than the common Heterogeneous graph, and the Multi-Semantic Heterogeneous graph contains multiple Semantic information in each node on the basis of multiple edge attributes/node attributes, so that it is highly possible that different link information involved in a node represents multiple semantics contained in the node, but this consideration has two disadvantages: first, this idea weakens one of the most fundamental elements of graph embedding: we work on a complex graph, not on a set of links. Secondly, even if the model for network embedding based on link paths achieves excellent effect on a certain graph structure, it is difficult to ensure that the model will work well on another graph structure, and worse, if facing a graph structure in which it is difficult to establish an effective link, such a method based on specific links cannot achieve good effect no matter how the parameters are adjusted.

In recent years, deep neural networks have enjoyed great success in the fields of computer vision and natural language processing. Some work also started trying to model different types of data in homogeneous/heterogeneous networks using depth models. Compared with a shallow model, the depth model can better capture the nonlinear relation, so that the complex semantic information contained in the nodes is extracted.

The graph convolution network propagates the structural information of the graph layer by executing neighborhood convolution operation, and the graph embedding method is free from the trouble of link guidance learning; but all focus on improving the underlying neural network model of the graph, such as introducing self-encoders, or improving the sampling work of the nodes, which have made some progress, but at the same time indicate a point: the meta-path is an element which is difficult to choose in the study of the heterogeneous graph representation, generally speaking, the meta-path splits the original structure of the graph, but simplifies the description of multiple semantic information to a great extent, which leads to that the current multi-semantic heterogeneous network embedding model cannot take the graph structure capture and the multiple semantic capture into account.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a method for constructing a universal embedded framework of a multi-semantic heterogeneous map.

The purpose of the invention is realized by the following technical scheme:

the construction method of the universal embedded framework of the multi-semantic-heterogeneous graph comprises the following steps:

step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures and realize the capture of specific semantic neighbors;

step 2: constructing an HNSE model based on alpha-expansion, wherein the HNSE model comprises an alpha-expansion neighborhood exploration layer, a multi-semantic learning layer and a node classification layer, and learning nodes are embedded in a low-dimensional mode while heterogeneous information and semantic information of the nodes are kept;

and step 3: the method comprises the following steps of realizing a multi-layer HNSE model in a residual error mode, and connecting a full-connection output layer behind the multi-layer HNSE model;

and 4, step 4: three extension strategies of HNSE are constructed, including a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, so as to meet the requirements of heterogeneous networks of different types on an embedded framework.

Further, the neighborhood exploration strategy α -exploration comprises the steps of:

step 101: defining a parameter alpha for guiding the exploration direction;

step 102: for node V in a given hetero-Ag and meta path phiⁱEach direct neighbor V of^jIf the probability of alpha is not operated, jumping to the next direct neighbor node; biased walk sampling with 1-alpha probability and using the meta-path neighbors obtained from the walk sampling

Replacing the original direct neighbor V^j；

Step 103: adding the direct neighbors into the neighbor set according to the probability of alpha; the meta-path neighbors are added to the set of neighbors with a probability of 1-alpha.

Further, the multi-semantic learning layer comprises specific semantic learning and multi-semantic merging; the specific semantic learning and the multiple semantics are respectively applied to different positions of a frame; for a certain special semantic meaning of a node, utilizing alpha-exploration to explore the neighborhood of the node, and using an attention mechanism to aggregate the obtained neighbor information; on the overall framework view, a multi-head mechanism is used to combine the different semantics of the nodes.

Further, the specific semantics learning takes the meta path as a guide to learn the specific semantics of the node;

applying a layer of linear mapping to each node and a neighbor node set obtained based on the meta-path of the node, wherein the neighbor node set is a neighbor node set of the node and the neighbor node set is different from the node type of the node, so that the nodes of different types are mapped to a uniform feature space;

then, each neighbor and V are calculatedⁱThe high-dimensional node features are mapped into original attention scores by the attention coefficient;

finally, the attention coefficients are weighted and aggregated.

Further, the multiple semantic merging specifically includes: after the characteristic representation of the nodes under the specific semantics is obtained, combining a plurality of semantic information by using a multi-head mechanism so as to complete the combination of the multi-semantic characteristic representation of the nodes; each of the multiple head mechanisms is assigned a different semantic learning task.

Further, the number of the attention heads is equal to the number of the meta-paths, and if the multi-head attention mechanism is executed on the final layer of the network, semantic information on each attention head is aggregated in an averaging manner.

Further, the step 3 further comprises model training; the model training specifically comprises: after the final embedding of the nodes is obtained, the nodes are applied to different downstream tasks, and different loss functions are designed; for the semi-supervised node classification task, finally embedding the node classification task into a full connection layer of a softmax function so as to carry out node classification labels; cross-entropy loss is minimized under the direction of the marker data.

Further, the shared attention mechanism is specifically as follows: a shared attention weight is applied for neighboring nodes of the same type.

Further, the multi-semantic self-attention layer calculates the importance difference of specific semantics represented by different attention heads by improving the aggregation operation of multi-head attention so as to know the importance of each semantic in the task.

The invention has the beneficial effects that: embedding each vertex of the multi-semantic-heterogeneous graph by aggregating adjacent/meta-path neighbor nodes of different types, and designing a node aggregation sampling strategy combining the meta-path neighbor and the direct neighbor for HNSE to guide a multi-head attention mechanism in the HNSE; meanwhile, 3 variants are provided for HNSE, namely a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, and comprehensive experiments of HNSE on three popular data sets show that the node classification precision of the method provided by the scheme on a multi-semantic heterogeneous graph is comprehensively superior to that of the latest method, and the capture of multi-semantic information of the nodes is improved by using meta-paths.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is an architectural diagram of the HNSE of the present invention.

FIG. 3 is a graph comparing the classification performance of HNSE on various data sets.

Fig. 4 is a graph of the impact of the number of attention heads in M _ HSNE on its Mic F1 on the IMDB and Aminer data sets.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.

In this embodiment, as shown in fig. 1, the method for constructing a universal embedded framework of a multi-semantic-heterogeneous graph includes the following steps:

1. Frame structure

The scheme firstly defines the basic concept and problem definition of the multi-meaning heterogeneous map, and then divides the HNSE framework into 2 parts: and finally, the framework can be spliced with any downstream task learning framework.

1.1 problem definition:

heterogeneous graph: the Ag is composed of a set of vertices V and edges epsilon, and the heterogeneous graph contains a set of node types A and a set of edge types R, and satisfies | A | + | R | > 2. Each V ∈ V belongs to a node type in a, with a mapping function Φ: v → A; each e ∈ epsilon belongs to an edge type in R, and a mapping function is used

And (4) showing.

1.2 neighborhood exploration strategy

1.2.1 neighbor definition of nodes

If the neighborhood aggregation sampling of a node is analogized to the random walk sampling, 2 neighbor exploration strategies on the heterogeneous graph can be defined: neighborhood exploration based on meta-path: the meta path is a basic structure in the heterogeneous graph network, and emphasizes the association between the meta path and semantic information; one meta-path usually corresponds to one semantic, while a heterogeneous graph may contain multiple meta-paths, we denote a meta-path by Φ. For a certain node and a meta-path phi on the heterogeneous graph, a group of neighbors determined by the meta-path phi is called meta-path neighbors of the node relative to the meta-path phi. Given a sectionA point i and a meta-path phi in an abnormal graph, and a neighbor of the point i based on the meta-path

Defined as a set of nodes connected to node i by meta-path Φ, the meta-path based neighbor of a node comprises the node itself. The meta-path neighbors have the characteristics that the meta-path neighbors can usually explore a larger range on a heterogeneous graph and can better judge the isomorphism between nodes; secondly, the meta path carries out intuitive division on semantic information, so that the meta path has an important indication function when learning a multi-semantic complex heterogeneous network. But the disadvantage is the splitting of the heterogeneous graph network structure, resulting in the weakening of the structural equivalence between nodes.

Direct neighborhood exploration: the direct neighborhood Ni is a neighborhood of node i in the graph, and in order to make no intersection between the direct neighbor and the meta-path neighbor, unlike the previous setup, the node itself is not added to its direct neighbor set. The direct neighborhood exploration strategy pays special attention to local information near nodes on a heterogeneous graph, based on graph convolution or graph attention network, the neighborhood exploration strategy can completely and effectively represent and learn a network structure, but is limited to local neighborhoods of the nodes, if a larger area is sought to be explored, the number of layers of the network needs to be deepened or single-layer parameters need to be increased (the order of aggregation neighbors is increased), the complexity of the model is greatly increased, and meanwhile, the problems of excessive smoothness and the like are brought.

The direct neighbors better reserve the graph structure information around the nodes, and the meta-path neighbors more clearly distinguish the semantic information carried by the nodes. We can consider that the neighborhood exploration based on meta-path is more biased towards DFS (depth first exploration), focusing on the macroscopic attempt of the node; while direct neighbor-based neighborhood exploration favors BFS (breadth-first exploration), focusing on the microscopic view of the node. Based on the method, the two neighborhood exploration strategies are smoothly spliced to adapt to different heterogeneous networks. Whether to explore the heterogeneous graph in a manner biased towards DFS or BFS is decided by a parameter α.

1.2.2. alpha. -exhibition strategy

Given a hetero-Ag ═ (V, ε) and primitive path

For node vⁱEach direct neighbor v of^jThe probability of alpha is used as the probability, no operation is carried out, and the next direct neighbor node is jumped to; biased walk sampling with probability 1- α as follows:

wherein

Is shown as

The node type of (1) is direct neighbor of t +1, and the random walk based on the meta-path is a slave node

Starting, but with the guide element path at node vⁱAs start nodes, not v^j. The meta-path is here set to a symmetric format to better capture the isomorphism between nodes:

after the biased walking sampling is completed, we replace the direct neighbors with meta-path neighbors obtained by the walking sampling:

wherein

And is

And v^jConnected by meta-path phi. Finally, we add the direct neighbors to the set of neighbors with a probability of α

With a probability of 1-alpha, we add meta-path neighbors to

The algorithm is as follows:

inputting: ag, cell path phi, node vⁱ

And (3) outputting: neighbor set of nodes

Exploring node A, for example, using alpha-exploration₁The neighbor node of (1) guides the meta-path phi as an author->Paper->Author APA, then its direct neighbor node is N_i＝{P₁,P₂,A₂Is the element path neighbor

For P₁Directly adding it with a probability of alpha

Otherwise according to meta path' A₁->P₁->A₃", will A₃Adding into

For P₂Directly adding it with a probability of alpha

Otherwise according to meta path' A₁->P₂->A₂", will A₂Adding into

For A₂Due to "A₁->A₂->Any node "cannot constitute a path of the director, so A is directly connected₂Adding into

Then the last one

May be: { P₁，P₂，A₂}，{A₃，P₂，A₂}，{P₁，A₂}，{A₃，A₂}。

It can be clearly seen that the neighborhood exploration strategy follows a pattern gradual from DFS to BFS as a changes. Thus, the variation in the alpha value profoundly affects the prediction performance of the downstream embedded framework if the same alpha is used on different datasets. This difference will be seen on the different data sets. Therefore, for some data sets with reasonable and complete meta-path design schemes, α should be set to a smaller value, biasing the neighborhood exploration strategy towards DFS; for some datasets with fewer or incomplete meta-paths, α should be set to a larger value, biasing the neighborhood exploration strategy towards BFS.

1.3HNSE

As shown in FIG. 2, the HNSE model architecture diagram is preferred in the scheme, and the model aims to keep the heterogeneous information and semantic information of the nodes and learn the low-dimensional embedded representation of the nodes

Wherein n is_Φ(i)Is the embedding space dimension for node type Φ (i).

In order to learn multi-semantic information of nodes, the scheme provides 2 methods, one method is to separate and learn the semantic information by using a meta path, and the other method is to combine different semantic information by using a multi-head mechanism; and applying the two methods respectively at different positions of the model: for a certain special semantic meaning of a node, the neighborhood of the node is explored by utilizing alpha-exploration, and obtained neighbor information is aggregated by utilizing an attention mechanism so as to help a model to more flexibly and effectively model the neighbor information of the node; on the whole model view, a multi-head mechanism is used to combine different semantics of the nodes.

And (3) specific semantic learning: in the meta path phi_kAs a guide to learn node vⁱFor node v in order to preserve the heterogeneous information of the nodeⁱAnd it is based on phi_kDerived set of neighbor nodes

For each node type different from the node type

Applying a layer of linear mapping

To map different types of nodes into a unified feature space:

in particular, we use for the node itself and for the neighbor nodes of the same type as the node

As a mapping matrix, for the nodes themselves, this is equivalent to adding self-loops to the nodes. Then, the attention coefficient between each neighbor and each neighbor is calculated one by one:

wherein [ | · | ]]The concatenation of the vectors is represented and,

the method is a trainable parameter in a single-layer feedforward neural network and aims to map high-dimensional node features into original attention scores; then, the attention coefficients are weighted and aggregated:

multiple semantic merging: after the characteristic representation of the nodes under the specific semantics is obtained, combining a plurality of semantic information by using a multi-head mechanism so as to complete the combination of the multi-semantic characteristic representation of the nodes; it should be noted that the scheme allocates different semantic learning tasks to each attention head, so that the number of attention heads in the model is equal to the type of meta-path, which enables the model to learn the feature difference between different semantics on the basis of self-stabilization by using a multi-head mechanism. Specifically, the feature representations on each of the heads of attention are aggregated using stitching:

in particular, if a multi-head attention mechanism is performed on the final (prediction) layer of the network, the semantic information on each attention head is gathered in an averaging manner:

to this end, we have obtained vⁱThe multi-semantic feature representation of the first-order neighborhood aggregation, because the direct neighborhood-based method needs a deeper model to explore a wider neighborhood; in most cases, a plurality of layers of specific semantic learning layers are used for capturing higher-order neighborhood information, and finally, a multi-layer HNSE model is realized in the form of residual errors, and a node v on the (m +1) layerⁱThe multi-semantic feature embedding can be expressed as

The multi-layer HNSE model is followed by a fully connected output layer (usually softmax or logical signature for classification problems). Overall, the model resembles a multi-head attention network.

1.4 model training

Obtaining a final embedded representation of the node

It can then be applied to different downstream tasks and different penalty functions designed. For the semi-supervised node classification task, the scheme adopts the following steps

The full connectivity layer of the softmax function is launched for node classification tagging. Cross-entropy loss is minimized under the direction of the marker data:

wherein, V_lIs a set of nodes containing tags, y_iAnd

are respectively a node vⁱTrue class labels and predicted class labels.

The algorithm is as follows:

an INPUT: heterogeneous graph G, node characteristics h_iThe attention head number K, the meta-path set Φ ═ Φ₀，Φ₁，Φ₂，Φ₃，…Φ_K}

OUTPUT: node multi-semantic embedding

Attention alpha under specific semantics, semantic attention beta

For attention head 0to K do:

2. Extension policy

2.1 shared attention mechanism

A method for greatly reducing model parameters is to apply shared attention weight to neighbor nodes of the same type, and the scheme applies a node vⁱWith its neighbours

The similarity measure of (d) is rewritten as:

wherein r represents a node vⁱAnd

type of link between, for direct neighbors

If both neighbors are of type r with node vⁱAre connected, they share the same attention vector

For meta path neighbors

Separate from direct neighbours

All meta-path neighbors share this attention vector. The shared attention mechanism can effectively reduce the modelsParameters and computation workload, but they apply the same attention weight by default when computing node neighbors linked by edges of the same type, so this approach works poorly on multi-semantic heterogeneous maps where there are few meta-paths and most of the semantic information needs to be learned through direct neighborhood.

2.2 Multi-headed semantic attention divergence regularization

Disagregation on Inputs: the input divergence regularization term aims to achieve divergence between the heads of attention by differentiating the input vectors of each head of attention. In the scheme, the divergence on the attention input is realized by calculating the cosine similarity between the characteristic vectors of the input nodes on different attention heads and maximizing the average cosine distance between all paired attention heads, and the applied regular terms are as follows:

wherein the content of the first and second substances,

and (4) representing the vector representation (the node type is n) of the node i after the feature space conversion, wherein k is the number of attention heads.

Disagregation on Outputs: output divergence similar to Disagrement on Inputs, we do this by maximizing z on each pair of attention heads^kThe average cosine distance between them to realize the function of semantic divergence:

2.3 Multi-semantic self-attention tier

This section considers the difference in importance between semantics, which is particularly important on some heterogeneous graph structures, and since each attention head represents a specific semantic, the K specific semantics of the existing output are embedded

The scheme improves the aggregation operation of multi-head attention by adding a self-attention layer to calculate the importance difference of specific semantics represented by different attention heads and know the importance of each semantic in a task. In this scheme, attention weights between semantics are calculated by the following formula:

where v is a trainable semantic attention vector, W is a parameter matrix, b is an offset, and the attention scores embedded for nodes containing a particular semantic are averaged prior to normalization. Finally, the K semantic embeddings are subjected to attention weighted summation:

3. experiment of

The experiment is used for embedding the multi-semantic heterogeneous graph into the effectiveness and efficiency of a framework for representing learning.

3.1 data set

IMDB: entities in the data set include actors, movies, and directors, wherein the types of links in which movies participate include movie-actor and movie-director (i.e., the types of direct neighbors of the movie type node include actor and director), and the movies are classified into three categories according to type labels: action, comedy or drama. The features of a movie are defined as its set of keywords. Finally, the section sets two meta-path schemes of MAM and MDM to execute the film classification task.

DBLP: constructing a subset of a DBLP dataset, the entities in the dataset including papers, authors, meetings, terms, wherein each author in turn belongs to one of four research areas: database, data mining, machine learning and information retrieval, the link type of the author is only author-paper, the papers are published by 20 different meetings, each author carries out label marking according to the research field of the paper, and the characteristics of the author are defined as the keyword set of the paper published by the author. Here, APA, APCPA and APTPA are set as meta-path schemes to perform the author classification task.

AMIner: we build a subset of the AMiner dataset, with entities in the dataset including papers, authors, and link types in which authors participate including author-author, author-paper. Similar to DBLP, each paper in AMiner features keyword bags, and both the paper and author are labeled as four research areas: database, data mining, natural language processing and computer vision, where APA and APCPA are taken as meta-path schemes to perform author classification tasks.

3.2 comparative experiment

Comparisons will be made here with some of the latest baseline methods on the above three datasets, including (heterogeneous) network embedding methods and graph neural network based methods, to verify the validity of the proposed HNSE framework.

metapath2vec designs meta-paths to guide random walk in an anomaly graph and then follows a jump syntax model to learn potential spatial representations of vertices.

The GCN is a semi-supervised graph convolution network designed for homogeneity graphs. We convert the isomorphic graph embedding methods (GCN and GAT) to a model for heterogeneous graph embedding learning by ignoring the type of nodes and links, with the GCN level set to 3.

GAT introduces an attention strategy to the GNN framework to enrich the feature representation of nodes by aggregating the features of direct neighbors. We set the number of GAT layers to 3 and the number of attention network nodes per layer to 8 x 8 in the comparative experiment.

HAN introduces two levels of hierarchical attention to GNNs, where the node level attention captures the relationship between adjacent nodes generated by one meta-path scheme, while the semantic level attention aggregation multi-path scheme sets the number of attention network nodes to 8 x 8 for each node in the graph.

HetSANN learns the feature representation of heterogeneous nodes by applying a graph neural network to focus on aggregating multi-relationship information of the projected neighborhood. We used a 3-layer attention mechanism in the comparative experiments, each consisting of 8 attention heads.

In addition, we also tested three extended versions of HNSE, which have verified the validity of the embedded learning methods contained therein, respectively.

Satt-HNSE applies shared attention weights to neighbor nodes of the same type on the basis of HNSE.

DI-HNSE performs divergence regularization of attention head inputs on the basis of HNSE.

DO-HNSE performs divergence regularization of the attention head output on the basis of HNSE.

M-HNSE changes the multi-semantic learning layer of HNSE into a self-attention layer so as to learn the importance difference of specific semantics represented by different attention heads.

3.3 Superparameter settings

For the proposed HNSE framework, parameters are initialized randomly and an Adam optimization model is used, the learning rate is set to be 0.005, the regularization parameter is set to be 0.001, the initial node embedding size is set to be 64, the number of hidden units of a specific semantic attention layer is set to be 8, and in addition, if the verification loss is not reduced for 100 continuous periods, the training is stopped.

Note that head K: we set to 8 uniformly, but it is composed differently on different datasets, for example, the AMiner dataset has only one meta-path (i.e. only one semantic information based on meta-path), so only a minimum of one attention head is needed to learn all meta-path semantics, in this case, this attention head is repeated 8 times here; for a DBLP dataset, however, it has three meta-paths, and therefore a minimum of three attention heads are required, where these three attention heads are repeated 3 times each, leaving the last attention head away so that the sum of the attention heads is the same.

Tuning parameters in the α -optimization strategy: to verify the flexibility of HNSE, i.e., the ability to adjust across different datasets, the experiment assigns different conditioning parameters, α, to different datasets_IMDBIs 0.5, alpha_AMinerIs 0.75, alpha_DBLPIs 0.25.

3.4 evaluation index

Evaluation indexes are as follows: here, 3 data sets are each expressed as 0.8: 0.1: the scale of 0.1 was randomly divided into a training set, a validation set, and a test set. Then, the best combination was selected in the validation set of each comparative model and then evaluated in the test set by Micro F1 and Macro F1. For each model, the average performance of 10 replicates will be reported, and the comparative experiment is shown in table 1.

TABLE 1 comparison of the Performance of each model

3.5 model analysis

The HNSE and its variant models achieved the best results in all classification tasks, which also verified the usability and flexibility of the models across different datasets. In the film classification task of the IMDB, the M-HNSE is respectively improved by 4.6% and 3.7% in Mic F1 and Mac F1 compared with the optimal baseline algorithm-HetSANN, and in the author classification task of the AMIner, the M-HNSE is respectively higher than the optimal baseline method-GAT by 4.0% and 5.9%, because the GAT for performing first-order neighborhood aggregation is similar to the HNSE when alpha is close to 1, the difference with the M-HNSE is only a layer of self-attention mechanism (applied to a plurality of attention heads), the learnable importance difference between semantics is verified, and the learning can be performed through the self-attention mechanism among multi-head attention; in the author classification task of DBLP, DI-HNSE and DO-HNSE achieve the purpose of learning specific semantics through the characteristic learning process of differentiating different meta-paths based on using a more perfect meta-path design scheme and a divergent multi-attention mechanism, wherein DO-HNSE achieves the best effect and is respectively improved by 3.6% and 3% on Mic F1 and Mac F1 compared with an optimal baseline model HAN; on the task, the M-HNSE as the most complex HNSE variant does not obtain the optimal precision, because in the DBLP data set, a plurality of semantics contained in the neighborhood of a node are clearly divided by a meta-path, and on the basis of defining the attention head as the meta-path, the effect of learning the difference between the semantics by applying a layer of attention mechanism is almost equal to the effect of applying the divergence regularization to the attention head, but the model complexity is increased.

While the classification performance of Satt-HNSE was less than that of the underlying HNSE model on all three datasets to the extent that it reduced the complexity of the model (Satt-HNSE model decreased 37% per iteration time when compared to HNSE on the IMDB dataset); therefore, it can be considered that on some complex heterogeneous graphs with extremely large number and types of nodes, which causes difficulty in attention weight training, the attention weight sharing is still needed to reduce the training consumption of the model in some scenarios.

3.6 analysis of parameters

The working principle and the variation range of the HNSE in the model are known by studying the parameters of some basic HNSE.

Tuning parameters in the α -optimization strategy: according to the scheme, values of alpha are changed on three data sets by fixing other parameters, and meanwhile, the classification performance of HNSE on each data set is reported. As shown in fig. 3, it can be seen that as α increases, F1 Score on the DBLP dataset first reaches a peak, then IMDB, and finally Aminer, which conforms to the sampling principle of α -expansion strategy, that is, the value of α depends on whether meta-path construction of the dataset is complete or whether multi-semantic information represented on different meta-paths is rich enough, for example, when a large α is used in the Aminer dataset, HNSE can only obtain semantic information of "same co-authored paper" from the meta-path of APA, but ignores most of semantic information in direct neighborhood, resulting in poor prediction accuracy; in the scheme, GAT and HAN are selected as comparison in the graph, because under two extreme alpha values, the model structure of HNSE is very close to the two methods, which is the reason that the two ends of the precision curve in the graph are close to the precision of GAT and HAN to some extent.

Note that the number of heads K: generally, note that the more heads, the better the performance of the model. However, since the multi-head attention mechanism in HNSE is more closely related to the training set data (meta-path number), the influence of the attention head number K in M _ HNSE on the Mic F1 of the IMDB and Aminer data sets is tested, as shown in fig. 4, on the Aminer data set, the multi-head attention mechanism is more responsible for the task of improving the stability of the model, and the improvement effect on the precision of the model is very small, but on the IMDB data set, as the exponential increase of K, the Mic F1 Score of the model grows more obviously, because compared with Aminer, the IMDB has more complicated meta-path types, so that the model depends more on multiple attention heads to perform different semantic learning tasks, namely task division with meta-paths, respectively, but when K grows to a certain extent, the model still shows an overfitting trend. When the number of the meta-paths implied in the data set is 2 to 4, the K is suggested to be 8, the precision and the complexity of the model can be considered, and certainly, when a heterogeneous graph containing more complex semantics is faced, the value of the K may need to be increased properly.

The method embeds each vertex of the multi-semantic heterogeneous graph by aggregating adjacent/meta-path neighbor nodes of different types, and designs a node aggregation sampling strategy combining the meta-path neighbor and the direct neighbor for HNSE to guide a multi-head attention mechanism in the HNSE; meanwhile, 3 variants are provided for HNSE, namely a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer, and comprehensive experiments of HNSE on three popular data sets show that the node classification precision of the method provided by the scheme on a multi-semantic heterogeneous graph is comprehensively superior to that of the latest method.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The construction method of the universal embedded frame of the multi-semantic-heterogeneous graph is characterized by comprising the following steps of:

step 1: constructing a neighborhood exploration strategy alpha-exploration, smoothly splicing two exploration strategies of DFS and BFS to adapt to different heterogeneous network structures;

and 4, step 4: three extension strategies for constructing HNSE include a shared attention mechanism, multi-head semantic attention divergence regularization and a multi-semantic self-attention layer.

2. The method for building a generic embedding framework for multi-semantic heterogeneous maps according to claim 1, characterized in that said neighborhood exploration strategy α -exploration comprises the following steps:

step 101: defining a parameter alpha for guiding the exploration direction;

Replacing the original direct neighbor V^j；

3. The method of claim 1, wherein the multi-semantic learning layer comprises specific semantic learning and multi-semantic merging; the specific semantic learning and the multiple semantics are respectively applied to different positions of a frame; for a certain special semantic meaning of a node, utilizing alpha-exploration to explore the neighborhood of the node, and using an attention mechanism to aggregate the obtained neighbor information; on the overall framework view, a multi-head mechanism is used to combine the different semantics of the nodes.

4. The method of claim 3, wherein the specific semantic learning takes meta path as guidance to learn specific semantics of nodes;

finally, the attention coefficients are weighted and aggregated.

5. The method of claim 3, wherein the multiple semantic merging is specifically: after the characteristic representation of the nodes under the specific semantics is obtained, combining a plurality of semantic information by using a multi-head mechanism so as to complete the combination of the multi-semantic characteristic representation of the nodes; each of the multiple head mechanisms is assigned a different semantic learning task.

6. The method of claim 5, wherein the number of the attention heads is equal to the number of meta-paths, and if the multi-head attention mechanism is performed at the final layer of the network, the semantic information on each attention head is aggregated by averaging.

7. The method of claim 1, wherein the step 3 further comprises model training; the model training specifically comprises: after the final embedding of the nodes is obtained, the nodes are applied to different downstream tasks, and different loss functions are designed; for the semi-supervised node classification task, finally embedding the node classification task into a full connection layer of a softmax function so as to carry out node classification labels; cross-entropy loss is minimized under the direction of the marker data.

8. The method of claim 1, wherein the shared attention mechanism is specifically: a shared attention weight is applied for neighboring nodes of the same type.

9. The method of claim 1, wherein the multi-semantic-self-attention layer calculates the difference in importance of the specific semantics represented by different heads of attention by improving the aggregation operation of multi-head attention to understand the importance of each semantic in the task.