CN112148931B

CN112148931B - Meta path learning method for high-order abnormal picture classification

Info

Publication number: CN112148931B
Application number: CN202011045034.4A
Authority: CN
Inventors: 杨亮; 栗位勋; 王悦雪; 张亚娟; 顾军华
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2022-11-04
Anticipated expiration: 2040-09-29
Also published as: CN112148931A

Abstract

The invention discloses a meta-path learning method for high-order heteromorphic image classification, which comprises the following steps: step 1, constructing a plurality of element paths under a multi-channel mode to obtain an adjacent matrix A of an L-th layer ^(L) (ii) a Step 2, completing the information of the element path to obtain an adjacent matrix A ^(L2) (ii) a Step 3, connecting the adjacent matrix A ^(L2) Each channel of the three-dimensional path model is respectively subjected to GCN clustering operation to obtain an optimization target Z, so that the learning of the whole element path is completed. The method can learn the potential meta-path information, namely, the original meta-path data which is relatively scattered and short is learned by the model to obtain the meta-path with rich information and longer path; after the intermediate adjacency matrix is obtained, classification operation is performed on each piece of the intermediate adjacency matrix, so that the classification result is more accurate.

Description

Meta path learning method for high-order abnormal picture classification

Technical Field

The invention belongs to the technical field of abnormal composition graph classification, and particularly relates to a meta-path learning method for high-order abnormal composition graph classification.

Background

With the advent and application of neural networks, which has successfully pushed in-depth research into pattern recognition and data mining, machine learning tasks that previously relied primarily on manual feature extraction, such as object detection, machine translation, and speech recognition, are now being revolutionized by various end-to-end deep learning paradigms such as Convolutional Neural Networks (CNNs), long Short Term Memory (LSTM), and automatic encoders. Convolutional Neural Networks (CNN) have met with great success in the field of image processing, but implement regular tabular data that is not applicable to processing non-specification data, i.e., graphical networks, such as social networks, reference networks, recommendation systems, etc.

Graph convolutional neural networks (GNNs) have been widely used in the learning of graphical representations and achieve superior performance in tasks such as node classification and link prediction. However, most GNNs are based on the assumption of fixed and homogeneous composition (homogenous), that is, the node type and the edge type in the Graph are the same, and when the GNNs are learned and represented on an uncertain Graph or a Heterogeneous Graph (Heterogeneous Graph) containing a plurality of edge types and node types, the GNNs are assumed to have reduced accuracy.

Graph Transformer Networks (GTNs) are capable of transforming heterogeneous graphs into multiple new graphs defined by meta-paths of arbitrary edge types and arbitrary lengths, while learning node representations by convolving the learned meta-path graphs. When constructing a GT layer, a GTN model disclosed in Graph Transformer Network (https:// axiv. Org/pdf/1911.06455. Pdf) directly selects an intermediate adjacency matrix from a new adjacency matrix set, and multiplies the intermediate adjacency matrix with an adjacency matrix of a previous layer to obtain a new adjacency matrix set, so that available information contained in the new adjacency matrix set is not abundant, available information contained in learned meta-paths is not complete, and classification accuracy is low.

Disclosure of Invention

In view of the defects in the prior art, the technical problem to be solved by the present invention is to provide a meta-path learning method for high-order abnormal figure classification.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a meta-path learning method for high-order heteromorphic image classification is characterized by comprising the following steps:

step 1, constructing a plurality of element paths in a multi-channel mode to obtain an adjacent matrix A of an L-th layer ^(L) ；

Set of candidate adjacency matrices

Two intermediate adjacency matrixes Q are obtained through selection operation ₀ And Q ₁ Intermediate adjacent matrix Q ₀ And Q ₁ After multiplication, normalization processing is carried out to obtain an adjacency matrix A of a first layer ⁽¹⁾ (ii) a The first layer of the adjacency matrix A ⁽¹⁾ And candidate adjacency matrix set

Obtaining a candidate adjacency matrix set by phase splicing

Completing the stacking of the first GT layer;

set candidate adjacency matrix

Obtaining an intermediate adjacency matrix Q through a selection operation ₂ Intermediate adjacency matrix Q ₂ Adjacent to the first layer ⁽¹⁾ Obtaining an adjacent matrix A of a second layer through normalization processing after multiplication ⁽²⁾ (ii) a Adjoining matrix A of the second layer ⁽²⁾ And candidate adjacency matrix set

Obtaining a candidate adjacency matrix set by phase splicing

Completing the stacking of the second GT layer;

repeating the steps to stack L layers GT altogether to obtain L-th adjacent matrix A ^(L) ；

Step 2, completing the information of the meta path;

the adjacency matrix A of the L-th layer is expressed by formula (6) ^(L) Adjacent matrix A with L-1 layer ^(L-1) Adding them to obtain adjacency matrix A ^(L1) Then, the adjacency matrix A is formed according to the formula (7) ^(L1) And the adjacent matrix A ^(L1) Splicing to obtain an adjacent matrix A ^(L2) Thus, the information completion is carried out on the meta path;

A ^(L1) ＝A ^(L) +A ^(L-1) (6)

A ^(L2) ＝A ^(L1) ||A ^(L1) (7)

step 3, connecting the adjacent matrix A ^(L2) Each channel of the global channel is subjected to GCN clustering operation to obtain an optimization target Z, so that the learning of the whole element path is completed.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides for the first layer's adjacency matrix A when stacking the first GT layer ⁽¹⁾ And candidate adjacency matrix set

Phase splicing to obtain new candidate adjacency matrix set

Obtaining a higher order candidate adjacency matrix set

Adding candidate adjacency matrix set

I.e. to the candidate adjacency matrix set

The information of the adjacent matrixes is supplemented, so that the available information contained in the candidate adjacent matrix set is more, and the available information selected by the intermediate adjacent matrix is more; each GT layer is subjected to the operation, so that the information amount learned by the meta path is more and more complete, and the classification accuracy is improved.

2. Adjacent matrix A of L-th layer ^(L) Adjacent matrix A with L-1 layer ^(L-1) Adding them to obtain adjacency matrix A ^(L1) Then connect the adjacent matrix A ^(L1) And the adjacent matrix A ^(L1) Splicing to obtain an adjacent matrix A ^(L2) Proceeding with the element pathAnd completing the row information.

3. The method can learn the potential meta-path information, namely, the original meta-path data which is relatively scattered and short is learned by the model to obtain the meta-path with rich information and longer path; after the intermediate adjacency matrix is obtained, classification operation is performed on each piece of the intermediate adjacency matrix, so that the classification result is more accurate.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings, and is not intended to limit the scope of the present invention.

The invention provides a meta-path learning method (a method for short, see fig. 1) for high-order heteromorphic image classification, which comprises the following steps:

Set of candidate adjacency matrices

Two intermediate adjacency matrixes Q are obtained through selection operation ₀ And Q ₁ Intermediate adjacent matrix Q ₀ And Q ₁ Obtaining the adjacency matrix A of the first layer through normalization processing after multiplication ⁽¹⁾ (ii) a The first layer of the adjacency matrix A ⁽¹⁾ And candidate adjacency matrix set

Obtaining a candidate adjacency matrix set by phase splicing

Completing the stacking of the first GT layer;

set candidate adjacency matrix

Obtaining an intermediate adjacency matrix Q through a selection operation ₂ Intermediate adjacency matrix Q ₂ Adjacent to the first layer of matrix A ⁽¹⁾ Obtaining an adjacent matrix A of a second layer through normalization processing after multiplication ⁽²⁾ (ii) a Adjoining matrix A of the second layer ⁽²⁾ And candidate adjacency matrix set

Obtaining a candidate adjacency matrix set by phase splicing

Completing the stacking of the second GT layer;

repeating the steps to stack L layers GT layer together to obtain the L-th adjacent matrix A ^(L) ；

Step 2, completing the information of the meta path;

the adjacency matrix A of the L-th layer is expressed by formula (6) ^(L) Adjacent matrix A with L-1 layer ^(L-1) Adding them to obtain adjacency matrix A ^(L1) And then the adjacency matrix A obtained by the formula (6) is processed according to the formula (7) ^(L1) Splicing with itself to obtain an adjacent matrix A ^(L2) Thus, the information completion is carried out on the meta path;

A ^(L1) ＝A ^(L) +A ^(L-1) (6)

A ^(L2) ＝A ^(L1) ||A ^(L1) (7)

Example 1

The embodiment provides a meta path learning method for high-order abnormal composition classification, which comprises the following steps:

step 1, constructing a plurality of element paths under a multi-channel mode to obtain an adjacency matrix A ^(L) ；

It should be noted that: in this embodiment, the anomaly graph G = (V, E), where V represents a node set and E represents an edge set;

and

respectively representing a type set of a node and a type set of an edge; heterogeneous graph satisfies

And

node type mapping function

Type mapping function of edge

For any node v in the node set _r E.g. V all have a node type, i.e.

There is an edge type for any edge E E, i.e.

Inputting a plurality of different composition graphs in a model, wherein each different composition graph is an adjacency matrix, namely the model input is a candidate adjacency matrix set formed by stacking a plurality of adjacency matrices

A _k ∈R ^N×N Representing an adjacency matrix, k representing the kth type edge; n represents the number of nodes, N = | V |; r represents a matrix space;

indicating the number of types of edges.

Stacking the first GT layer:

set of candidate adjacency matrices

Two intermediate adjacency matrixes Q are obtained through the selection operation of the formula (1) ₀ And Q ₁ At this time W _φ Respectively take W _φ0 And W _φ1 ，

Get

Wherein the content of the first and second substances,

Q ₀ 、Q ₁ ∈R ^N×N×C k represents the number of types of edges, N represents the number of nodes, R represents a matrix space, and C represents the number of channels of a convolution kernel in selection operation and also represents the number of element paths; wherein W _φ0 、W _φ1 Is a multi-channel convolution kernel capable of being assembled from candidate adjacency matrices

Selecting a plurality of element paths;

equation (1) represents the selection operation of the intermediate adjacency matrix Q, i.e., the weight matrix W _φ After passing through the softmax function, the adjacent matrix is aggregated

Performing 1 × 1 convolution to obtain;

in the formula (1), phi represents a convolution kernel label, and F represents an equation symbol; w _φ ∈R ^m×m Indicating that a fixed convolution kernel weight matrix is given in the 1 × 1 convolution, and m indicates the dimension of the convolution kernel;

the intermediate adjacency matrix Q is composed of a plurality of slices and can be regarded as being in a candidate adjacency matrix set

An attention mechanism is applied, different attention coefficients are distributed to each piece to serve as the weight of each piece, and a middle adjacent matrix Q is obtained through weighted summation of a formula (2);

in the formula (2), the reaction mixture is,

indicates the r-th type edge t _r Weight at layer l; a. The ^(l) A adjacency matrix representing the l-th layer;

then the intermediate adjacency matrix Q ₀ And Q ₁ After multiplication, normalization processing is carried out to obtain an adjacency matrix A of a first layer as formula (3) ⁽¹⁾ (ii) a The normalization processing is to prevent the situation of gradient explosion or gradient disappearance;

A ⁽¹⁾ ＝D ^-1 Q ₀ Q ₁ (3)

in the formula (3), D represents Q ₀ And Q ₁ A multiplied degree matrix;

the first layer of the adjacency matrix A ⁽¹⁾ And candidate adjacency matrix set

Splicing according to a formula (4) to obtain a candidate adjacency matrix set

Taking 1 at this time, and completing the stacking of the first GT layer;

in the formula (4), | | represents the splicing operation.

Stacking the second GT layers:

set of candidate adjacency matrices

Obtaining an intermediate adjacency matrix Q through the selection operation of the formula (1) ₂ At this time W _φ Get W _φ2 ，

Get the

Intermediate adjacency matrix Q ₂ Adjacent to the first layer of matrix A ⁽¹⁾ After multiplication, the adjacency matrix A of the second layer is obtained by normalization processing as formula (5) ⁽²⁾ When l is 2;

A ^(l) ＝(D ^(l-1) ) ^-1 A ^(l-1) Q _l (5)

in formula (5), D ^(l-1) Is represented by A ^(l-1) And Q _l A multiplied degree matrix; q _l Representing a set of candidate adjacency matrices

The selected intermediate adjacency matrix;

adjoining matrix A of the second layer ⁽²⁾ And candidate adjacency matrix set

Splicing according to a formula (4) to obtain a candidate adjacency matrix set

Taking 2 for this time l, the stacking of the second GT layer is completed.

Repeating the above GT layer stacking operation until the L-layer GT is stacked, and obtaining the adjacent matrix A of the L-th layer ^(L) (ii) a Adjacency matrix A of L-th layer ^(L) I.e. the adjacency matrix of the entire meta-path, A ^(L) For a high-order adjacency matrix, L also represents the total length of the element path;

the operation of step 1 can be understood as: assume p is a meta path of the heteromorphic graph G:

v denotes a node, t _r Represents the r-th type edge; the meta-path p can be represented by an adjacency matrix, for example, the meta-path Author-Paper-Conference (APC) can be represented by

A _AP A _pc Generated adjacency matrix A _APC The finally obtained adjacency matrix of the meta-path is obtained by multiplying a plurality of adjacency matrices, that is, after the processing of step 1, the adjacency matrix A ^(L) Can be regarded as the multiplication of adjacent matrixes of each layer, A ^(L) ＝A ^(L-1) …A ⁽²⁾ A ⁽¹⁾ ；

Step 2, completing the information of the meta path;

the adjacency matrix A of the L-th layer is expressed by formula (6) ^(L) Adjacent matrix A with L-1 layer ^(L-1) Adding them to obtain adjacency matrix A ^(L1) And then the adjacency matrix A is formed according to the formula (7) ^(L1) And the adjacent matrix A ^(L1) Splicing to obtain an adjacent matrix A ^(L2) ∈R ^N ^×N(C+C) ；

A ^(L1) ＝A ^(L) +A ^(L-1) (6)

A ^(L2) ＝A ^(L1) ||A ^(L1) (7)

After step 2, the adjacency matrix A obtained in step 1 ^(L) Obtaining the adjacency matrix A after the operation of step 2 ^(L2) Therefore, the path is more complete, the expression capacity is better, and the classification effect is better;

step 3, connecting the adjacent matrix A ^(L2) Each channel of the GCN is respectively subjected to GCN clustering operation, vector representations of a plurality of nodes are spliced together according to a formula (8) to obtain a convolution result Z of the GCN, and the Z is used as an optimization target to finish the learning of the whole meta-path;

in formula (8)，W _z ∈R ^d×d ，

Satisfies the formula (9), and represents the adjacency matrix of the ith channel of the l-th layer

And the identity matrix I is obtained by adding, and the information of the identity matrix I is added, so that the identity matrix I is also used as a propagator of the information, and the purpose of the information is to comprehensively improve the accuracy of classification:

in the formula (8), the reaction mixture is,

is that

Degree matrix of (W) _z ∈R ^d×d Weight matrix representing GCN, X ∈ R ^N×d Is the input feature matrix; σ represents an activation function; i represents an identity matrix; d represents a dimension.

In order to verify the effect of the method, the convolution result Z obtained by the method is used for a supervised node classification task, and a heterogeneous graph data set with various types of nodes and edges is used for verification; the embodiment uses two citation network data sets DBLP and ACM and a movie data set IMDB, and divides the data sets into a test set, a training set and a verification set;

the parameters for each dataset are shown in Table 1, and the DBLP dataset contains three types of nodes: paper (P), author (a), conference (C), four types of edges: PA, AP, PC, CP, with the authors' research field as a label; the ACM dataset contains three types of nodes: paper (P), author (a), subject (S), four types of edges: PA, AP, PS, SP, with paper categories as labels; each node in the DBLP data set and the ACM data set is represented in a bag of words of keywords; the IMDB dataset contains three types of nodes: a movie (M), actors (a) and director (D), the labels being genres of the movie, each node being represented in the form of a bag of episodes; the data in the table represents the number of each item;

table 1 parameters of the data set

And (3) implementation of a node classification task: the GTN model in the prior literature (https:// arxiv.org/pdf/1911.06455. Pdf) and the meta-path constructed in the application are subjected to an experiment under the same conditions;

experimental parameters: a total of five GT layers are constructed, and two element paths are learned; inputting the convolution result Z into two dense layers, and then passing through a softmax function and a loss function, wherein the loss function adopts a cross entropy function; the embedding dimension d is set to 64, i.e. the weight value W of formula (9) _z ∈R ^d×d D =64 for fair comparison; the number of training layers is 4, an Adam optimizer is used, the learning rate is set to be 0.005, the number of channels is 2, and the training is stopped under the condition that the accuracy rate is not changed; the labels of the three data sets are respectively scaled to 1%, 2%, 3%, 4% and 5% of the original labels for comparison experiments, and the node classification accuracy as shown in table 2 is obtained.

TABLE 2 results of the experiment

Dataset	GTN(％)	This application (%)
			ACM1％	67.77	80.38
ACM2％	78.73	83.7
			ACM3％	80.35	85.94
ACM4％	80.72	84.18
			ACM5％	84.61	85.87
IMDB1％	23.21	25.7
			IMDB2％	28.57	36.01
IMDB3％	35.43	39.46
			IMDB4％	35.11	41.24
IMDB5％	35.02	45.61
			DBLP1％	74.85	85.79
DBLP2％	65.91	90.94
			DBLP3％	77.48	90.17
DBLP4％	77.67	91.8
			DBLP5％	82.17	92.2

As can be seen from table 2, under the condition of the same number of tags, the accuracy of node classification by using the meta-path constructed by the present application is higher than that of the GTN model of the existing literature, so that the meta-path constructed by the present application has a better classification effect; according to the method, when the GT layers are stacked, the adjacent matrix of each layer is spliced with the candidate adjacent matrix set obtained by stacking the previous GT layer to obtain a new candidate adjacent matrix set with more paths and more complete information, and then the middle adjacent matrix is selected from the new candidate adjacent matrix set to stack the next GT layer, so that the continuously iteration enables the finally learned meta-paths to be more complete, and the classification accuracy is improved.

In this embodiment, a Movie recommendation system is taken as an example, and is a typical heterogeneous graphics network, and four edge types, namely, K =4, are shared, namely, MD (motion-Director), DM (Director-motion), MA (motion-indicator), and AM (indicator-motion); the node types are M (Movie) and A (Ac)the method comprises three node types of tor and D (Director), wherein each node type comprises a plurality of nodes, namely each M comprises a plurality of movie names; input feature matrix X of equation (8) ^N×d Representing movie types such as comedy, action, etc.;

inputting 4 heterogeneous graphs in the model, namely inputting a candidate adjacency matrix set formed by stacking 4 adjacency matrices in the model

A _k ∈R ^N×N Representing an adjacency matrix;

after the step 1-3, a plurality of meta-paths are learned, such as the meta-paths AMD, MDM, MAM and the like, and the two movies of the same director are learned to know what type of movie the director is good at, and the two movies of the same actor are learned to know what type of the two movies belong to, namely, the movies are classified in detail according to the type of the movie the director is good at or the movies the actors participate in, and the movies are recommended systematically; for example, if a person likes to watch a movie from a director, the model recommends to the person movies that the person did not watch and that belong to the director; for example meta-path AM (actors, movie), if one person likes the movie S of a certain actor, and another person also likes the actor but does not see the movie S, the system recommends the movie S to the person according to the same preferences of the viewer. The invention may also be used in social networks, reference networks, etc.

The invention is applicable to the prior art where nothing is said.

Claims

1. A meta-path learning method for high-order abnormal picture classification is characterized in that the method is used for a film recommendation system, and has four edge types of film-director, director-film, film-actor and actor-film, wherein the three node types of film, actor and director are included; the method comprises the following steps:

Suppose an anomaly pattern G = (V, E)) V denotes a node set composed of the movie, the actors, and the director, and E denotes an edge set composed of link relationships among the movie, the actors, and the director;

and

And

node type mapping function f _v ：

Type mapping function f of edge _e ：

For any node v in the node set _r E.g. V all have a node type, i.e.

There is an edge type for any edge E E, i.e.

Inputting a plurality of heterogeneous graphs formed by adjacent relations among movies, directors and actors in a model, wherein each heterogeneous graph is an adjacent matrix, namely the model input is a candidate adjacent matrix set formed by stacking a plurality of adjacent matrices

representing the number of types of edges;

set of candidate adjacency matrices

Obtaining a candidate adjacency matrix set by phase splicing

Completing the stacking of the first GT layer;

set candidate adjacency matrix

Obtaining an intermediate adjacency matrix Q through a selection operation ₂ Intermediate adjacency matrix Q ₂ Adjacent to the first layer ⁽¹⁾ Obtaining an adjacent matrix A of a second layer through normalization processing after multiplication ⁽²⁾ (ii) a A contiguous matrix A of a second layer ⁽²⁾ And candidate adjacency matrix set

Obtaining a candidate adjacency matrix set by phase splicing

Completing the stacking of the second GT layer;

repeating the steps to stack L layers GT layer to obtain L-th adjacent matrix A ^(L) ；

Step 2, completing the information of the meta path;

A ^(L1) ＝A ^(L) +A ^(L-1) (6)

A ^(L2) ＝A ^(L1) ||A ^(L1) (7)

step 3, connecting the adjacent matrix A ^(L2) Each channel of the GCN is respectively subjected to GCN clustering operation to obtain a convolution result Z of the GCN, and the convolution result Z is taken as an optimization target so as to complete the learning of the whole element path; using the convolution result Z for a supervised node classification task; the expression of the convolution result Z is:

in the formula (8), X ∈ R ^N×d Is an input feature matrix representing the movie type;

is that

Degree matrix of (W) _z ∈R ^d×d A weight matrix representing the GCN; sigma represents an activation function, i represents an ith channel, l represents an ith layer, C represents the number of channels, and d represents a dimension;

the model learns a plurality of meta-paths through steps 1-3, and the model recommends movies to the audience according to the movie type with the director's expertise or movies starring the actors.