CN113205175A - Multi-layer attribute network representation learning method based on mutual information maximization - Google Patents

Multi-layer attribute network representation learning method based on mutual information maximization Download PDF

Info

Publication number
CN113205175A
CN113205175A CN202110398736.9A CN202110398736A CN113205175A CN 113205175 A CN113205175 A CN 113205175A CN 202110398736 A CN202110398736 A CN 202110398736A CN 113205175 A CN113205175 A CN 113205175A
Authority
CN
China
Prior art keywords
network
multilayer
attribute
layer
attribute network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110398736.9A
Other languages
Chinese (zh)
Inventor
江昊
王强
聂琦
羿舒文
彭姿文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110398736.9A priority Critical patent/CN113205175A/en
Publication of CN113205175A publication Critical patent/CN113205175A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the technical field of deep network characterization learning, and particularly relates to a multi-layer attribute network characterization learning method based on mutual information maximization. Inputting a multilayer attribute network and a target representation space dimension; constructing a multilayer attribute network representation learning model and a loss function of the model by combining attribute dimensions, network layer numbers, total network nodes and target representation space dimensions in a multilayer attribute network and utilizing a mutual information maximization principle; and training a multilayer attribute network characterization learning model by combining the loss function of the model, and outputting a multilayer attribute network node characterization matrix. The invention utilizes the mutual information maximization principle to expand the prior single-layer attribute network representation learning method into the multilayer attribute network, obtains the node vector representation of the multilayer attribute network in the low-dimensional target space, and the relationship between the node vectors can keep the structural proximity and the attribute similarity between the nodes in the multilayer attribute network, thereby being beneficial to the analysis task of the multilayer attribute network.

Description

Multi-layer attribute network representation learning method based on mutual information maximization
Technical Field
The invention belongs to the field of deep learning and network characterization learning, and particularly relates to a multi-layer attribute network characterization learning method based on mutual information maximization.
Background
The network has strong representation capability, and can represent entities and the relations among the entities in various fields such as molecular networks, protein interaction networks, recommendation systems, social networks, citation networks and the like. In recent years, effective network analysis techniques have provided methods for mining potential information of data, thereby leading to wide applications such as community detection, link prediction, node classification, and the like. However, many network analysis methods require high time and space complexity when dealing with large-scale networks. Furthermore, many machine learning algorithms attempt to take network structured data as input, which can be typically represented as vectors. Traditional methods use statistical parameters, kernel functions or manually designed features to describe structural information, but these designs are expensive and cannot be adjusted during algorithm learning. Network characterization learning is one of the most effective ways to solve such problems. The method converts high-dimensional sparse network information into low-dimensional dense real-valued vectors, and a machine learning algorithm can efficiently utilize the vectors, so that a plurality of downstream tasks are benefited.
Currently, there are many methods for single-layer network characterization learning, but a real-world network usually includes a plurality of relationships, each relationship can independently form a layer of network, which is called a multi-layer network. Compared with a single-layer network, the entities of the multi-layer network have more complex relationships and richer semantics. Assuming that each layer of network has an independent semantic space, we can embed the multiple layers of network layer by layer and then obtain the embedded result of each layer of concatenation, but this way can only store the information of each layer independently. This is feasible for multi-layer networks with large variations between each layer. However, the existing single-layer network characterization learning method has the following problems when applied to multi-layer network characterization: (1) many multilayer networks are not simply formed by a random combination of multiple single layer networks, and their inter-layer networks exhibit significant correlation. The method of fusion embedding through simple operations such as splicing and weighted summation causes the connection among layers to be lost and ignores the correlation among the layers. (2) Each individual layer is typically sparse and biased, which results in the possibility of strong bias for independently embedded splices. (3) The multi-layer network is composed of networks with different relations, and different layer networks generated by the different relations have potential consistency and complementarity. Simple operations such as stitching and weighted summation are difficult to exploit with potential consistency and complementarity.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-layer attribute network representation learning method based on mutual information maximization, and aims to solve the problem that the existing single-layer attribute network representation learning method is difficult to be directly applied to multi-layer attribute network representation.
The technical scheme of the invention provides a multilayer attribute network representation learning method based on mutual information maximization, which is characterized by comprising the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;
step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;
preferably, each layer in the multi-layer attribute network in the step 1 is a undirected network;
the multi-layer attribute network in step 1 is defined as:
G={G(1),G(2),…,G(r),X}={V,E,X},
wherein, the number of network layers in the step 1 is r, r is a positive integer and r>1,G(l)Representing the connection of each layer of the l-th network and the multi-layer attribute networkThe edge corresponds to an actual semantic relationship, r relationships are in total, the edge corresponds to an r-layer network, and l belongs to [1, r ]];
In step 1, the total number of the network nodes is N, N is a positive integer, and V ═ V1,v2,…,vNV is the set of nodes, viRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2),…,E(r)Is the set of network edges, E(l)A connection edge set of the layer I network;
the attribute dimension in the step 1 is f, wherein f is a positive integer,
Figure BDA0003015076170000021
is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi
The target characterization space dimension in the step 1 is d, d is less than N, and N is the total number of the multilayer attribute network nodes;
preferably, the attribute dimension of the multilayer attribute network in step 2 is f, the number of network layers of the multilayer attribute network in step 2 is r, the total number of network nodes of the multilayer attribute network in step 2 is N, and the target characterization space dimension of the multilayer attribute network in step 2 is d;
the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely
Figure BDA0003015076170000031
Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,
Figure BDA0003015076170000032
is the joint distribution of the variable X, Y,
Figure BDA0003015076170000033
is the product of the edge distribution of the variable X, Y, TωIs a deep neural network based discriminator parameterized by ω; expected value in the formula
Figure BDA0003015076170000034
Can be obtained by sampling
Figure BDA0003015076170000035
And
Figure BDA0003015076170000036
is estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approaches the mutual information of the random variables X and Y with high precision;
the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:
l-th network G for multi-layer attribute network(l),l∈[1,r]The single-layer attribute network representation learning model is glWhose trainable set of model parameters is
Figure BDA00030150761700000315
The hyper-parameter set to be adjusted is
Figure BDA00030150761700000316
Model glHas the following functions: y isl=gl(G(l)X), where X is a node attribute matrix of a multi-layer attribute network, output
Figure BDA0003015076170000037
A node characterization matrix of the l-th network of the multilayer attribute network, d is a target characterization space dimension, N is the total number of the nodes of the multilayer attribute network, and a model glHas a loss function of
Figure BDA0003015076170000038
The single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;
defining a multi-layer attribute network node characterization matrix as a trainable parameter matrix
Figure BDA0003015076170000039
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of the multilayer attribute network nodes, and d is a target representation space dimension;
step 2.1, for l ∈ [1, r ∈ >]Is inputted into G(l)X to the layer attribute network characterization learning model glObtaining an output Yl
Step 2.2, use the line shuffling function
Figure BDA00030150761700000317
Random scrambling YlA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network
Figure BDA00030150761700000310
Step 2.3, for l ∈ [1, r)]Characterizing the nodes of the l-th network of the multilayer attribute network into a matrix YlMultilayer attribute network node characterization matrix Z input discriminator
Figure BDA00030150761700000311
Deriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Figure BDA00030150761700000312
Characterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrix
Figure BDA00030150761700000313
Multilayer attribute network node characterization matrix Z input discriminator
Figure BDA00030150761700000314
Deriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Figure BDA0003015076170000041
Wherein i ∈ [1, N ∈ ]]N is the total number of nodes in the multilayer attribute network, and a discriminator
Figure BDA0003015076170000042
A bilinear function may be employed, of the form:
Figure BDA0003015076170000043
Figure BDA0003015076170000044
sigma is a sigmoid non-linear function,
Figure BDA0003015076170000045
the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is a target characterization space dimension of a multilayer attribute network;
the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:
Figure BDA0003015076170000046
Figure BDA0003015076170000047
wherein r is the number of network layers of the multilayer attribute network,
Figure BDA0003015076170000048
is a model glLoss function of ωlIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient,
Figure BDA00030150761700000415
Figure BDA00030150761700000416
is a model parameter regularization term, N is the total number of network nodes of the multi-layer attribute network,
Figure BDA0003015076170000049
and
Figure BDA00030150761700000410
learning model discriminators for multi-layer attribute network characterization
Figure BDA00030150761700000411
An output of (d);
preferably, the node characterization matrix of the multi-layer attribute network described in step 3 is a trainable parameter matrix
Figure BDA00030150761700000412
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model glThe hyper-parameter set to be adjusted is
Figure BDA00030150761700000413
Hyper-parameter omega for controlling importance of mutual information of different layerslThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the training method can adopt a gradient descent method to minimize multilayer attribute network characterizationA loss function of the learning model, wherein,
Figure BDA00030150761700000414
for the model trainable parameters, < i > e [ < 1 >, r >];
And 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
The method uses the mutual information to fuse the single-layer attribute network representation matrix, so that the fused multilayer attribute network representation matrix uses lower dimensionality to contain as much information as possible related to input and focuses on frequent modes in each layer of input. Under the re-parameterization of variables, the mutual information is invariant. This property may reduce some of the unnecessary noise introduced during the training process. In addition, the method can extend the existing single-layer attribute network characterization learning method to multi-layer attribute network characterization learning.
Drawings
To more clearly illustrate embodiments or prior art solutions of the present invention, embodiments or prior art are briefly described below, and the accompanying drawings in the following description are some embodiments of the present invention:
FIG. 1: the invention discloses a multi-layer attribute network representation learning method based on mutual information maximization.
FIG. 2: the embodiment of the invention provides a flow chart of a multi-layer attribute network characterization learning model based on mutual information maximization.
Detailed Description
The method is mainly based on the deep learning technology to maximize the mutual information of the multilayer attribute network representation matrix and each layer of attribute network representation matrix so as to realize multilayer attribute network representation learning.
The method provided by the invention can realize the process by using a computer software technology. In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention.
A first embodiment of the present invention is a multi-layer attribute network characterization learning method based on mutual information maximization, which is characterized by comprising the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
each layer in the multilayer attribute network in the step 1 is a undirected network;
the multi-layer attribute network in step 1 is defined as:
G={G(1),G(2),…,G(r),X}={V,E,X},
wherein, the number of network layers in step 1 is r ═ 2, r is a positive integer and r is>1,G(l)Expressing the l-th network, the connection edge of each layer of the multilayer attribute network corresponds to an actual semantic relationship, r relationships are shared, the connection edge corresponds to the r-th network, and l belongs to [1, r ∈];
In step 1, the total number of the network nodes is N ═ 1000, N is a positive integer, and V ═ V1,v2,…,vNV is the set of nodes, viRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2),…,E(r)Is the set of network edges, E(l)A connection edge set of the layer I network;
the attribute dimension in step 1 is 200, f is a positive integer,
Figure BDA0003015076170000061
is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi
The target characterization space dimension in the step 1 is d-32, d & lt N, and N is the total number of the multilayer attribute network nodes;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;
in step 2, the attribute dimension of the multilayer attribute network is f-200, the number of network layers of the multilayer attribute network in step 2 is r-2, the total number of network nodes of the multilayer attribute network in step 2 is N-1000, and the target characterization space dimension of the multilayer attribute network in step 2 is d-32;
the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely
Figure BDA0003015076170000062
Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,
Figure BDA0003015076170000063
is the joint distribution of the variable X, Y,
Figure BDA0003015076170000064
is the product of the edge distribution of the variable X, Y, TωIs a deep neural network based discriminator parameterized by ω; expected value in the formula
Figure BDA0003015076170000065
Can be obtained by sampling
Figure BDA0003015076170000066
And
Figure BDA0003015076170000067
is estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approximates each other of the random variables X and Y with high accuracyInformation;
the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:
l-th network G for multi-layer attribute network(l),l∈[1,r]And r is 2, and the adopted single-layer attribute network characterization learning model is glWhose trainable set of model parameters is
Figure BDA0003015076170000071
The hyper-parameter set to be adjusted is
Figure BDA0003015076170000072
Model glHas the following functions: y isl=gl(G(l)X), where X is a node attribute matrix of a multi-layer attribute network, output
Figure BDA0003015076170000073
A node characterization matrix of the first layer network of the multilayer attribute network is represented, d is 32 and is the target characterization space dimension of the multilayer attribute network, N is 1000 and is the total number of the nodes of the multilayer attribute network, and a model glHas a loss function of
Figure BDA0003015076170000074
The single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;
defining a multi-layer attribute network node characterization matrix as a trainable parameter matrix
Figure BDA0003015076170000075
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;
step 2.1, for l ∈ [1, r ∈ >]R 2 is the network layer number of the multilayer attribute network, and G is input(l)X to the layer attribute network characterization learning model glObtaining an output Yl
Step 2.2Using line shuffling functions
Figure BDA0003015076170000076
Random scrambling YlA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network
Figure BDA0003015076170000077
Step 2.3, for l ∈ [1, r)]And r is 2, the network layer number of the multilayer attribute network, and a node characterization matrix Y of the l-th layer network of the multilayer attribute networklMultilayer attribute network node characterization matrix Z input discriminator
Figure BDA0003015076170000078
Deriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Figure BDA0003015076170000079
Characterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrix
Figure BDA00030150761700000710
Multilayer attribute network node characterization matrix Z input discriminator
Figure BDA00030150761700000711
Deriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Figure BDA00030150761700000712
Wherein i ∈ [1, N ∈ ]]N1000 is the total number of nodes in the multilayer attribute network, and the discriminator
Figure BDA00030150761700000713
A bilinear function may be employed, of the form:
Figure BDA00030150761700000714
Figure BDA00030150761700000715
sigma is a sigmoid non-linear function,
Figure BDA00030150761700000716
the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is 32 and is a target characterization space dimension of a multi-layer attribute network;
the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:
Figure BDA0003015076170000081
Figure BDA0003015076170000082
wherein r is 2, the network layer number of the multilayer attribute network,
Figure BDA0003015076170000083
is a model glLoss function of ωlIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient,
Figure BDA0003015076170000084
Figure BDA0003015076170000085
for the model parameter regularization term, N1000 is the total number of network nodes of the multi-layer attribute network,
Figure BDA0003015076170000086
and
Figure BDA0003015076170000087
learning model discriminators for multi-layer attribute network characterization
Figure BDA0003015076170000088
An output of (d);
step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;
the node characterization matrix of the multilayer attribute network in the step 3 is a trainable parameter matrix
Figure BDA0003015076170000089
Figure BDA00030150761700000810
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is 1000, and d is 32, which is the target characterization space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model glThe hyper-parameter set to be adjusted is
Figure BDA00030150761700000811
Hyper-parameter omega for controlling importance of mutual information of different layerslThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the training method can adopt a gradient descent method to minimize a loss function of a multilayer attribute network characterization learning model, wherein,
Figure BDA00030150761700000812
for the model trainable parameters, < i > e [ < 1 >, r >]And r is 2, the network layer number of the multilayer attribute network;
the learning rate of the multilayer attribute network characterization learning model can be as follows:
lr∈[0.0001,0.0005,0.001,0.005],
the regularization term coefficients may be used as follows:
λ∈[0.00001,0.0001,0.001,0.01],
the hyper-parameter value for controlling the importance of the mutual information of different layers can adopt the following steps:
ω1∈[0.6,0.8,2.0,3.0],
ω2∈[0.6,0.8,2.0,3.0],
and 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
The second embodiment of the invention is described by combining a single-layer attribute network characterization learning method Deep Graph InfoMax (DGI), and comprises the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
in specific implementation, each layer in the multi-layer attribute network in the step 1 is a undirected network;
the multilayer attribute network in step 1 is:
G={G(1),G(2),X}={V,E,X},
wherein, the number of network layers in step 1 is r ═ 2, r is a positive integer and r is>1,G(1)Indicating layer 1 network, G(2)Indicating layer 2 network;
in step 1, the total number of the network nodes is N ═ 1000, N is a positive integer, and V ═ V1,v2,…,vNV is the set of nodes, viRepresents the ith node in the two-layer attribute network G, i belongs to [1, N ]](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2)Is the set of network edges, E(1)Set of contiguous edges for layer 1 networks, E(2)The connection edge set of the layer 2 network;
the attribute dimension in step 1 is 200, f is a positive integer,
Figure BDA0003015076170000091
is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi
The target characterization space dimension in the step 1 is d-32, d & lt N, and N is the total number of the multilayer attribute network nodes;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the model by utilizing a mutual information maximization principle by combining attribute dimensions, network layer numbers, network node total numbers and target representation space dimensions in a multilayer attribute network;
in step 2, the attribute dimension of the multilayer attribute network is f-200, the number of network layers of the multilayer attribute network in step 2 is r-2, the total number of network nodes of the multilayer attribute network in step 2 is N-1000, and the target characterization space dimension of the multilayer attribute network in step 2 is d-32;
FIG. 2 shows a multi-layer attribute network characterization learning model process in conjunction with the single-layer attribute network characterization learning model Deep Graph InfoMax (DGI),
step 201, inputting a multilayer attribute network and a target characterization space dimension, wherein the multilayer attribute network is G ═ G(1),G(2)X, the target characterization space dimension of the multilayer attribute network is d ═ 32;
step 202, randomly perturbing the node attribute matrix by a row shuffling function to generate multi-layer attribute network negative samples, i.e.
Figure BDA0003015076170000101
The row shuffling function only changes the node attribute matrix, where,
Figure BDA0003015076170000102
for line shuffling function, A ═ A(1),A(2)Is the set of adjacency matrices of different layers, A(1)Is a layer 1 network G(1)Of a neighboring matrix of(2)Is a layer 2 network G(2)Of the adjacent matrix of (a) and (b),
Figure BDA0003015076170000103
the method is a node attribute matrix of a multilayer attribute network, wherein N is 1000 which is the total number of network nodes of the multilayer attribute network, f is 200 which is the attribute dimension of the multilayer attribute network, and an adjacency matrix A(1),A(2)The specific form is as follows:
Figure BDA0003015076170000104
Figure BDA0003015076170000105
wherein the content of the first and second substances,
Figure BDA0003015076170000106
indicating the ith node and the jth node in the l-th layer network of the multilayer attribute network to be connected with the edge weight,
Figure BDA0003015076170000107
indicating that no continuous edge exists between the ith node and the jth node in the l layer network of the multilayer attribute network, and l belongs to [1, r ]]And r is 2, the network layer number of the multilayer attribute network;
step 203, encoding the original attribute network and the negative sample attribute network of each layer into a target characterization space through an encoder, and acquiring an original node local characterization vector and a negative sample node local characterization vector of each layer network;
the original attribute network refers to { A(1)X and { A }(2)X, said negative sample attribute network refers to
Figure BDA0003015076170000108
And
Figure BDA0003015076170000109
the encoder adopts a graph convolution neural network GCN, and then a node local representation matrixComprises the following steps:
Figure BDA0003015076170000111
Figure BDA0003015076170000112
Figure BDA0003015076170000113
Figure BDA0003015076170000114
Y(l)a matrix is locally characterized for the l-th layer of original nodes,
Figure BDA0003015076170000115
locally characterizing the vector for the original node of the ith node of the l-th layer,
Figure BDA0003015076170000116
for the l-th layer negative sample node local characterization matrix,
Figure BDA0003015076170000117
local characterization vectors for the negative sample nodes of the ith node of the l-th layer,
Figure BDA0003015076170000118
is a contiguous matrix inserted into the l-th layer of the self-loop, the importance of the gamma control node itself, generally in [1,5 ]]Internal integer value, INIs an identity matrix of N multiplied by N,
Figure BDA0003015076170000119
is a corresponding degree matrix, which is an N × N diagonal matrix of the form:
Figure BDA00030150761700001110
Figure BDA00030150761700001111
n1000 is the total number of network nodes in the multi-layer attribute network, W(l)Is the weight parameter matrix learnable by the l-th layer, sigma is the ReLU nonlinear activation function, and l is the [1, r ]]And r is 2, the network layer number of the multilayer attribute network;
step 204, inputting the local characterization vector of the original node of each layer network into a Readout function to obtain a global vector of each layer network characterization;
the Readout function adopts an average pooling function, and the concrete form is as follows:
Figure BDA00030150761700001112
wherein the content of the first and second substances,
Figure BDA00030150761700001113
local characterization vector, s, for original node of ith node of l layer(l)For a global vector of the l-th network characterization, sigma represents sigmoid nonlinear function, and l belongs to [1, r ∈]R is 2, and N is 1000, which is the total number of network nodes of the multilayer attribute network;
step 205, inputting the local characterization vector of the original node of each layer and the global vector of the network characterization into a discriminator for layer characterization, and obtaining output;
the discriminator of the layer characterization is realized by a bilinear function, and the form of the discriminator is as follows:
Figure BDA0003015076170000121
wherein the content of the first and second substances,
Figure BDA0003015076170000122
local characterization vector, s, for original node of ith node of l layer(l)For global vectors, characterized by the tier i network, σ is a sigmoid nonlinear function,
Figure BDA0003015076170000123
is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminator
Figure BDA0003015076170000124
An output of (d);
step 206, inputting the local characterization vector of the negative sample node of each layer and the global vector of the network characterization into a discriminator for layer characterization, and obtaining output;
the discriminator of the layer characterization is shared with the discriminator of the layer characterization of step 205, and is of the form:
Figure BDA0003015076170000125
wherein the content of the first and second substances,
Figure BDA0003015076170000126
local characterization vector of negative sample node for ith node of l layer(l)For global vectors, characterized by the tier i network, σ is a sigmoid nonlinear function,
Figure BDA0003015076170000127
is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminator
Figure BDA00030150761700001212
An output of (d);
step 207, inputting the original node local characterization vector and the multilayer attribute network node characterization vector of each layer into a discriminator for multilayer attribute network characterization to obtain output;
the discriminator of the multilayer attribute network representation is realized by a bilinear function, and the form of the discriminator is as follows:
Figure BDA0003015076170000128
wherein the content of the first and second substances,
Figure BDA0003015076170000129
local characterization vectors for the original nodes of the ith node of the l-th layer, ZiCharacterizing vectors for the multi-layer attribute network nodes of the ith node, wherein sigma is a sigmoid nonlinear function,
Figure BDA00030150761700001210
is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminator
Figure BDA00030150761700001211
An output of (d);
step 208, inputting the local characterization vector of the negative sample node and the multilayer attribute network node characterization vector of each layer into a discriminator for multilayer attribute network characterization, and obtaining output;
the identifier of the multi-layer attribute network representation is shared with the identifier in step 207, and the form of the identifier is as follows:
Figure BDA0003015076170000131
wherein the content of the first and second substances,
Figure BDA0003015076170000132
local characterization vector, Z, for the negative sample node of the ith node of the l-th layeriCharacterizing vectors for the multi-layer attribute network nodes of the ith node, wherein sigma is a sigmoid nonlinear function,
Figure BDA0003015076170000133
is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminator
Figure BDA00030150761700001312
An output of (d);
the loss function described in step 2 is of the form:
Figure BDA0003015076170000134
Figure BDA0003015076170000135
Figure BDA0003015076170000136
wherein, ω islFor the hyper-parameters controlling the importance of the mutual information of the different layers, λ is the regularization term coefficient, θattr={W(1),W(2),M1,M2Z is a model parameter regularization term, N1000 is the total number of network nodes of the multilayer attribute network,
Figure BDA0003015076170000137
the term is the output obtained in step 205,
Figure BDA0003015076170000138
the term is the output obtained at step 206,
Figure BDA0003015076170000139
the term is the output obtained in step 207 and,
Figure BDA00030150761700001310
the term is the output obtained in step 208, where l ∈ [1, r ∈ [ ]]And r is 2, the network layer number of the multilayer attribute network;
step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;
the multi-layer attribute network node characterization matrix in the step 3 is a trainable parameter matrix
Figure BDA00030150761700001311
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is 1000, and d is 32, which is the target characterization space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method can adopt grid search to carry out multi-layer attribute network representation learning model hyper-parameter search, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the hyper-parameter comprises: { lr, λ, γ, ω12L r is a model learning rate, lambda is a loss function regularization term coefficient, gamma is a parameter of the importance of the encoder GCN control node, and omega1、ω2A hyper-parameter for controlling the importance of the mutual information of the 1 st and 2 nd layer networks;
the learning rate of the multilayer attribute network characterization learning model can be as follows:
lr∈[0.0001,0.0005,0.001,0.005],
the regularization term coefficients may be used as follows:
λ∈[0.00001,0.0001,0.001,0.01],
the encoder GCN may adopt the following parameters for controlling the importance of the node itself:
γ∈[1.0,2.0,3.0,4.0,5.0],
the hyper-parameter value for controlling the importance of the mutual information of different layers can adopt the following steps:
ω1∈[0.6,0.8,2.0,3.0],
ω2∈[0.6,0.8,2.0,3.0],
the training method can adopt a gradient descent method to minimize a loss function of a multi-layer attribute network characterization learning model, wherein the loss function is W(1),W(2),M1,M2Z is a trainable set of model parameters;
and 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
The method provided by the invention has the following advantages or beneficial technical effects:
the invention provides a multi-layer attribute network representation learning method based on mutual information maximization. The method fuses single-layer attribute network node representation matrixes by using a mutual information maximization principle, so that the fused multilayer attribute network node representation matrixes can express as much information as possible in a lower-dimensional space and focus on frequent modes in each layer of attribute network. Under the re-parameterization of variables, the mutual information is invariant. Using this property may reduce some of the unnecessary noise introduced during the training process. In addition, the method can extend the existing single-layer attribute network representation learning method to a multi-layer attribute network representation learning method.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (4)

1. A multi-layer attribute network representation learning method based on mutual information maximization is characterized by comprising the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;
and 3, randomly initializing a node characterization matrix of the multilayer attribute network, training the multilayer attribute network characterization learning model by combining a loss function of the multilayer attribute network characterization learning model, and outputting the optimized node characterization matrix of the multilayer attribute network.
2. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:
each layer in the multilayer attribute network in the step 1 is a undirected network;
the multi-layer attribute network in step 1 is defined as:
G={G(1),G(2),...,G(r),X}={V,E,X},
wherein, the number of network layers in the step 1 is r, r is a positive integer and r is more than 1, G(l)Expressing the l-th network, the connection edge of each layer of the multilayer attribute network corresponds to an actual semantic relationship, r relationships are shared, the connection edge corresponds to the r-th network, and l belongs to [1, r ∈];
In step 1, the total number of the network nodes is N, N is a positive integer, and V ═ V1,v2,...,vNV is the set of nodes, viRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2),...,E(r)Is the set of network edges, E(l)A connection edge set of the layer I network;
the attribute dimension in the step 1 is f, wherein f is a positive integer,
Figure FDA0003015076160000011
is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi
The target characterization space dimension in the step 1 is d, d is less than N, and N is the total number of the multilayer attribute network nodes.
3. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:
the attribute dimension of the multilayer attribute network in the step 2 is f, the number of network layers of the multilayer attribute network in the step 2 is r, the total number of network nodes of the multilayer attribute network in the step 2 is N, and the target characterization space dimension of the multilayer attribute network in the step 2 is d;
the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely
Figure FDA0003015076160000021
Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,
Figure FDA0003015076160000022
is the joint distribution of the variable X, Y,
Figure FDA0003015076160000023
is the product of the edge distribution of the variable X, Y, TωIs a deep neural network based discriminator parameterized by ω; expected value in the formula
Figure FDA0003015076160000024
Can be obtained by sampling
Figure FDA0003015076160000025
And
Figure FDA0003015076160000026
is estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approaches the mutual information of the random variables X and Y with high precision;
the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:
l-th network G for multi-layer attribute network(l),l∈[1,r]The single-layer attribute network representation learning model is glWhose trainable set of model parameters is
Figure FDA0003015076160000027
The hyper-parameter set to be adjusted is
Figure FDA0003015076160000028
Model glHas the following functions: y isl=gl(G(l)X), where X is a node attribute matrix of a multi-layer attribute network, output
Figure FDA0003015076160000029
A node characterization matrix of the l-th network of the multilayer attribute network, d is a target characterization space dimension, N is the total number of the nodes of the multilayer attribute network, and a model glHas a loss function of
Figure FDA00030150761600000210
The single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;
defining a multi-layer attribute network node characterization matrix as a trainable parameter matrix
Figure FDA00030150761600000211
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of the multilayer attribute network nodes, and d is a target representation space dimension;
step 2.1, for l ∈ [1, r ∈ >]Is inputted into G(l)X to the layer attribute network characterization learning model glObtaining an output Yl
Step 2.2, use the line shuffling function
Figure FDA00030150761600000212
Random scrambling YlA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network
Figure FDA00030150761600000213
Step 2.3, for l ∈ [1, r)]Characterizing the nodes of the l-th network of the multilayer attribute network into a matrix YlMultilayer attribute network node characterization matrix Z input discriminator
Figure FDA00030150761600000316
Deriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Figure FDA0003015076160000031
Characterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrix
Figure FDA0003015076160000032
Multilayer attribute network node characterization matrix Z input discriminator
Figure FDA0003015076160000033
Deriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Figure FDA0003015076160000034
Wherein i ∈ [1, N ∈ ]]N is the total number of nodes in the multilayer attribute network, and a discriminator
Figure FDA0003015076160000035
A bilinear function may be employed, of the form:
Figure FDA0003015076160000036
Figure FDA0003015076160000037
sigma is a sigmoid non-linear function,
Figure FDA0003015076160000038
the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is a target characterization space dimension of a multilayer attribute network;
the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:
Figure FDA0003015076160000039
Figure FDA00030150761600000310
wherein r is the number of network layers of the multilayer attribute network,
Figure FDA00030150761600000311
is a model glLoss function of ωlIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient,
Figure FDA00030150761600000317
is a model parameter regularization term, N is the total number of network nodes of the multi-layer attribute network,
Figure FDA00030150761600000312
and
Figure FDA00030150761600000313
learning model discriminators for multi-layer attribute network characterization
Figure FDA00030150761600000314
To output of (c).
4. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:
the node characterization matrix of the multilayer attribute network in the step 3 is a trainable parameter matrix
Figure FDA00030150761600000315
The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model glThe hyper-parameter set to be adjusted is
Figure FDA0003015076160000041
Hyper-parameter omega for controlling importance of mutual information of different layerslThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the above-mentionedThe training method of (1) can adopt a gradient descent method to minimize a loss function of a multi-layer attribute network characterization learning model, wherein,
Figure FDA0003015076160000042
for the model trainable parameters, < i > e [ < 1 >, r >];
And 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
CN202110398736.9A 2021-04-12 2021-04-12 Multi-layer attribute network representation learning method based on mutual information maximization Pending CN113205175A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110398736.9A CN113205175A (en) 2021-04-12 2021-04-12 Multi-layer attribute network representation learning method based on mutual information maximization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110398736.9A CN113205175A (en) 2021-04-12 2021-04-12 Multi-layer attribute network representation learning method based on mutual information maximization

Publications (1)

Publication Number Publication Date
CN113205175A true CN113205175A (en) 2021-08-03

Family

ID=77026776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110398736.9A Pending CN113205175A (en) 2021-04-12 2021-04-12 Multi-layer attribute network representation learning method based on mutual information maximization

Country Status (1)

Country Link
CN (1) CN113205175A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304367A (en) * 2023-02-24 2023-06-23 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622307A (en) * 2017-09-11 2018-01-23 浙江工业大学 A kind of Undirected networks based on deep learning connect side right weight Forecasting Methodology
CN109101629A (en) * 2018-08-14 2018-12-28 合肥工业大学 A kind of network representation method based on depth network structure and nodal community
CN109376857A (en) * 2018-09-03 2019-02-22 上海交通大学 A kind of multi-modal depth internet startup disk method of fusion structure and attribute information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622307A (en) * 2017-09-11 2018-01-23 浙江工业大学 A kind of Undirected networks based on deep learning connect side right weight Forecasting Methodology
CN109101629A (en) * 2018-08-14 2018-12-28 合肥工业大学 A kind of network representation method based on depth network structure and nodal community
CN109376857A (en) * 2018-09-03 2019-02-22 上海交通大学 A kind of multi-modal depth internet startup disk method of fusion structure and attribute information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116304367A (en) * 2023-02-24 2023-06-23 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training
CN116304367B (en) * 2023-02-24 2023-12-01 河北师范大学 Algorithm and device for obtaining communities based on graph self-encoder self-supervision training

Similar Documents

Publication Publication Date Title
Gao et al. Deep transfer learning for image‐based structural damage recognition
Li et al. LGM-Net: Learning to generate matching networks for few-shot learning
Zou et al. Deep learning based feature selection for remote sensing scene classification
Kadam et al. Efficient approach towards detection and identification of copy move and image splicing forgeries using mask R-CNN with MobileNet V1
Rohekar et al. Constructing deep neural networks by Bayesian network structure learning
Chen et al. Automated design of neural network architectures with reinforcement learning for detection of global manipulations
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
KR20190098801A (en) Classificating method for image of trademark using machine learning
CN111461175A (en) Label recommendation model construction method and device of self-attention and cooperative attention mechanism
CN114611617A (en) Depth field self-adaptive image classification method based on prototype network
Choudhary et al. Inference-aware convolutional neural network pruning
Kumar et al. Pair wise training for stacked convolutional autoencoders using small scale images
Nalini et al. Comparative analysis of deep network models through transfer learning
Tan et al. Performance comparison of three types of autoencoder neural networks
CN112948581B (en) Patent automatic classification method and device, electronic equipment and storage medium
CN113205175A (en) Multi-layer attribute network representation learning method based on mutual information maximization
He et al. Classification of metro facilities with deep neural networks
CN112905906A (en) Recommendation method and system fusing local collaboration and feature intersection
Djibrine et al. Transfer Learning for Animal Species Identification from CCTV Image: Case Study Zakouma National Park
CN114265954B (en) Graph representation learning method based on position and structure information
Lee et al. Ensemble of binary tree structured deep convolutional network for image classification
Fisher et al. Tentnet: Deep learning tent detection algorithm using a synthetic training approach
CN112015854B (en) Heterogeneous data attribute association method based on self-organizing mapping neural network
Chu et al. Mixed-precision quantized neural network with progressively decreasing bitwidth for image classification and object detection
Rodriguez-Coayahuitl et al. Convolutional genetic programming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210803