CN113205175A - Multi-layer attribute network representation learning method based on mutual information maximization - Google Patents
Multi-layer attribute network representation learning method based on mutual information maximization Download PDFInfo
- Publication number
- CN113205175A CN113205175A CN202110398736.9A CN202110398736A CN113205175A CN 113205175 A CN113205175 A CN 113205175A CN 202110398736 A CN202110398736 A CN 202110398736A CN 113205175 A CN113205175 A CN 113205175A
- Authority
- CN
- China
- Prior art keywords
- network
- multilayer
- attribute
- layer
- attribute network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention belongs to the technical field of deep network characterization learning, and particularly relates to a multi-layer attribute network characterization learning method based on mutual information maximization. Inputting a multilayer attribute network and a target representation space dimension; constructing a multilayer attribute network representation learning model and a loss function of the model by combining attribute dimensions, network layer numbers, total network nodes and target representation space dimensions in a multilayer attribute network and utilizing a mutual information maximization principle; and training a multilayer attribute network characterization learning model by combining the loss function of the model, and outputting a multilayer attribute network node characterization matrix. The invention utilizes the mutual information maximization principle to expand the prior single-layer attribute network representation learning method into the multilayer attribute network, obtains the node vector representation of the multilayer attribute network in the low-dimensional target space, and the relationship between the node vectors can keep the structural proximity and the attribute similarity between the nodes in the multilayer attribute network, thereby being beneficial to the analysis task of the multilayer attribute network.
Description
Technical Field
The invention belongs to the field of deep learning and network characterization learning, and particularly relates to a multi-layer attribute network characterization learning method based on mutual information maximization.
Background
The network has strong representation capability, and can represent entities and the relations among the entities in various fields such as molecular networks, protein interaction networks, recommendation systems, social networks, citation networks and the like. In recent years, effective network analysis techniques have provided methods for mining potential information of data, thereby leading to wide applications such as community detection, link prediction, node classification, and the like. However, many network analysis methods require high time and space complexity when dealing with large-scale networks. Furthermore, many machine learning algorithms attempt to take network structured data as input, which can be typically represented as vectors. Traditional methods use statistical parameters, kernel functions or manually designed features to describe structural information, but these designs are expensive and cannot be adjusted during algorithm learning. Network characterization learning is one of the most effective ways to solve such problems. The method converts high-dimensional sparse network information into low-dimensional dense real-valued vectors, and a machine learning algorithm can efficiently utilize the vectors, so that a plurality of downstream tasks are benefited.
Currently, there are many methods for single-layer network characterization learning, but a real-world network usually includes a plurality of relationships, each relationship can independently form a layer of network, which is called a multi-layer network. Compared with a single-layer network, the entities of the multi-layer network have more complex relationships and richer semantics. Assuming that each layer of network has an independent semantic space, we can embed the multiple layers of network layer by layer and then obtain the embedded result of each layer of concatenation, but this way can only store the information of each layer independently. This is feasible for multi-layer networks with large variations between each layer. However, the existing single-layer network characterization learning method has the following problems when applied to multi-layer network characterization: (1) many multilayer networks are not simply formed by a random combination of multiple single layer networks, and their inter-layer networks exhibit significant correlation. The method of fusion embedding through simple operations such as splicing and weighted summation causes the connection among layers to be lost and ignores the correlation among the layers. (2) Each individual layer is typically sparse and biased, which results in the possibility of strong bias for independently embedded splices. (3) The multi-layer network is composed of networks with different relations, and different layer networks generated by the different relations have potential consistency and complementarity. Simple operations such as stitching and weighted summation are difficult to exploit with potential consistency and complementarity.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-layer attribute network representation learning method based on mutual information maximization, and aims to solve the problem that the existing single-layer attribute network representation learning method is difficult to be directly applied to multi-layer attribute network representation.
The technical scheme of the invention provides a multilayer attribute network representation learning method based on mutual information maximization, which is characterized by comprising the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;
step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;
preferably, each layer in the multi-layer attribute network in the step 1 is a undirected network;
the multi-layer attribute network in step 1 is defined as:
G={G(1),G(2),…,G(r),X}={V,E,X},
wherein, the number of network layers in the step 1 is r, r is a positive integer and r>1,G(l)Representing the connection of each layer of the l-th network and the multi-layer attribute networkThe edge corresponds to an actual semantic relationship, r relationships are in total, the edge corresponds to an r-layer network, and l belongs to [1, r ]];
In step 1, the total number of the network nodes is N, N is a positive integer, and V ═ V1,v2,…,vNV is the set of nodes, viRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2),…,E(r)Is the set of network edges, E(l)A connection edge set of the layer I network;
the attribute dimension in the step 1 is f, wherein f is a positive integer,is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi;
The target characterization space dimension in the step 1 is d, d is less than N, and N is the total number of the multilayer attribute network nodes;
preferably, the attribute dimension of the multilayer attribute network in step 2 is f, the number of network layers of the multilayer attribute network in step 2 is r, the total number of network nodes of the multilayer attribute network in step 2 is N, and the target characterization space dimension of the multilayer attribute network in step 2 is d;
the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely
Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,is the joint distribution of the variable X, Y,is the product of the edge distribution of the variable X, Y, TωIs a deep neural network based discriminator parameterized by ω; expected value in the formulaCan be obtained by samplingAndis estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approaches the mutual information of the random variables X and Y with high precision;
the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:
l-th network G for multi-layer attribute network(l),l∈[1,r]The single-layer attribute network representation learning model is glWhose trainable set of model parameters isThe hyper-parameter set to be adjusted isModel glHas the following functions: y isl=gl(G(l)X), where X is a node attribute matrix of a multi-layer attribute network, outputA node characterization matrix of the l-th network of the multilayer attribute network, d is a target characterization space dimension, N is the total number of the nodes of the multilayer attribute network, and a model glHas a loss function ofThe single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;
defining a multi-layer attribute network node characterization matrix as a trainable parameter matrixThe characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of the multilayer attribute network nodes, and d is a target representation space dimension;
step 2.1, for l ∈ [1, r ∈ >]Is inputted into G(l)X to the layer attribute network characterization learning model glObtaining an output Yl;
Step 2.2, use the line shuffling functionRandom scrambling YlA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network
Step 2.3, for l ∈ [1, r)]Characterizing the nodes of the l-th network of the multilayer attribute network into a matrix YlMultilayer attribute network node characterization matrix Z input discriminatorDeriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute networkCharacterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrixMultilayer attribute network node characterization matrix Z input discriminatorDeriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Wherein i ∈ [1, N ∈ ]]N is the total number of nodes in the multilayer attribute network, and a discriminatorA bilinear function may be employed, of the form:
sigma is a sigmoid non-linear function,the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is a target characterization space dimension of a multilayer attribute network;
the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:
wherein r is the number of network layers of the multilayer attribute network,is a model glLoss function of ωlIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient, is a model parameter regularization term, N is the total number of network nodes of the multi-layer attribute network,andlearning model discriminators for multi-layer attribute network characterizationAn output of (d);
preferably, the node characterization matrix of the multi-layer attribute network described in step 3 is a trainable parameter matrixThe characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model glThe hyper-parameter set to be adjusted isHyper-parameter omega for controlling importance of mutual information of different layerslThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the training method can adopt a gradient descent method to minimize multilayer attribute network characterizationA loss function of the learning model, wherein,for the model trainable parameters, < i > e [ < 1 >, r >];
And 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
The method uses the mutual information to fuse the single-layer attribute network representation matrix, so that the fused multilayer attribute network representation matrix uses lower dimensionality to contain as much information as possible related to input and focuses on frequent modes in each layer of input. Under the re-parameterization of variables, the mutual information is invariant. This property may reduce some of the unnecessary noise introduced during the training process. In addition, the method can extend the existing single-layer attribute network characterization learning method to multi-layer attribute network characterization learning.
Drawings
To more clearly illustrate embodiments or prior art solutions of the present invention, embodiments or prior art are briefly described below, and the accompanying drawings in the following description are some embodiments of the present invention:
FIG. 1: the invention discloses a multi-layer attribute network representation learning method based on mutual information maximization.
FIG. 2: the embodiment of the invention provides a flow chart of a multi-layer attribute network characterization learning model based on mutual information maximization.
Detailed Description
The method is mainly based on the deep learning technology to maximize the mutual information of the multilayer attribute network representation matrix and each layer of attribute network representation matrix so as to realize multilayer attribute network representation learning.
The method provided by the invention can realize the process by using a computer software technology. In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention.
A first embodiment of the present invention is a multi-layer attribute network characterization learning method based on mutual information maximization, which is characterized by comprising the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
each layer in the multilayer attribute network in the step 1 is a undirected network;
the multi-layer attribute network in step 1 is defined as:
G={G(1),G(2),…,G(r),X}={V,E,X},
wherein, the number of network layers in step 1 is r ═ 2, r is a positive integer and r is>1,G(l)Expressing the l-th network, the connection edge of each layer of the multilayer attribute network corresponds to an actual semantic relationship, r relationships are shared, the connection edge corresponds to the r-th network, and l belongs to [1, r ∈];
In step 1, the total number of the network nodes is N ═ 1000, N is a positive integer, and V ═ V1,v2,…,vNV is the set of nodes, viRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2),…,E(r)Is the set of network edges, E(l)A connection edge set of the layer I network;
the attribute dimension in step 1 is 200, f is a positive integer,is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi;
The target characterization space dimension in the step 1 is d-32, d & lt N, and N is the total number of the multilayer attribute network nodes;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;
in step 2, the attribute dimension of the multilayer attribute network is f-200, the number of network layers of the multilayer attribute network in step 2 is r-2, the total number of network nodes of the multilayer attribute network in step 2 is N-1000, and the target characterization space dimension of the multilayer attribute network in step 2 is d-32;
the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely
Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,is the joint distribution of the variable X, Y,is the product of the edge distribution of the variable X, Y, TωIs a deep neural network based discriminator parameterized by ω; expected value in the formulaCan be obtained by samplingAndis estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approximates each other of the random variables X and Y with high accuracyInformation;
the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:
l-th network G for multi-layer attribute network(l),l∈[1,r]And r is 2, and the adopted single-layer attribute network characterization learning model is glWhose trainable set of model parameters isThe hyper-parameter set to be adjusted isModel glHas the following functions: y isl=gl(G(l)X), where X is a node attribute matrix of a multi-layer attribute network, outputA node characterization matrix of the first layer network of the multilayer attribute network is represented, d is 32 and is the target characterization space dimension of the multilayer attribute network, N is 1000 and is the total number of the nodes of the multilayer attribute network, and a model glHas a loss function ofThe single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;
defining a multi-layer attribute network node characterization matrix as a trainable parameter matrixThe characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;
step 2.1, for l ∈ [1, r ∈ >]R 2 is the network layer number of the multilayer attribute network, and G is input(l)X to the layer attribute network characterization learning model glObtaining an output Yl;
Step 2.2Using line shuffling functionsRandom scrambling YlA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network
Step 2.3, for l ∈ [1, r)]And r is 2, the network layer number of the multilayer attribute network, and a node characterization matrix Y of the l-th layer network of the multilayer attribute networklMultilayer attribute network node characterization matrix Z input discriminatorDeriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute networkCharacterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrixMultilayer attribute network node characterization matrix Z input discriminatorDeriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Wherein i ∈ [1, N ∈ ]]N1000 is the total number of nodes in the multilayer attribute network, and the discriminatorA bilinear function may be employed, of the form:
sigma is a sigmoid non-linear function,the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is 32 and is a target characterization space dimension of a multi-layer attribute network;
the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:
wherein r is 2, the network layer number of the multilayer attribute network,is a model glLoss function of ωlIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient, for the model parameter regularization term, N1000 is the total number of network nodes of the multi-layer attribute network,andlearning model discriminators for multi-layer attribute network characterizationAn output of (d);
step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;
the node characterization matrix of the multilayer attribute network in the step 3 is a trainable parameter matrix The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is 1000, and d is 32, which is the target characterization space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model glThe hyper-parameter set to be adjusted isHyper-parameter omega for controlling importance of mutual information of different layerslThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the training method can adopt a gradient descent method to minimize a loss function of a multilayer attribute network characterization learning model, wherein,for the model trainable parameters, < i > e [ < 1 >, r >]And r is 2, the network layer number of the multilayer attribute network;
the learning rate of the multilayer attribute network characterization learning model can be as follows:
lr∈[0.0001,0.0005,0.001,0.005],
the regularization term coefficients may be used as follows:
λ∈[0.00001,0.0001,0.001,0.01],
the hyper-parameter value for controlling the importance of the mutual information of different layers can adopt the following steps:
ω1∈[0.6,0.8,2.0,3.0],
ω2∈[0.6,0.8,2.0,3.0],
and 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
The second embodiment of the invention is described by combining a single-layer attribute network characterization learning method Deep Graph InfoMax (DGI), and comprises the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
in specific implementation, each layer in the multi-layer attribute network in the step 1 is a undirected network;
the multilayer attribute network in step 1 is:
G={G(1),G(2),X}={V,E,X},
wherein, the number of network layers in step 1 is r ═ 2, r is a positive integer and r is>1,G(1)Indicating layer 1 network, G(2)Indicating layer 2 network;
in step 1, the total number of the network nodes is N ═ 1000, N is a positive integer, and V ═ V1,v2,…,vNV is the set of nodes, viRepresents the ith node in the two-layer attribute network G, i belongs to [1, N ]](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2)Is the set of network edges, E(1)Set of contiguous edges for layer 1 networks, E(2)The connection edge set of the layer 2 network;
the attribute dimension in step 1 is 200, f is a positive integer,is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi;
The target characterization space dimension in the step 1 is d-32, d & lt N, and N is the total number of the multilayer attribute network nodes;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the model by utilizing a mutual information maximization principle by combining attribute dimensions, network layer numbers, network node total numbers and target representation space dimensions in a multilayer attribute network;
in step 2, the attribute dimension of the multilayer attribute network is f-200, the number of network layers of the multilayer attribute network in step 2 is r-2, the total number of network nodes of the multilayer attribute network in step 2 is N-1000, and the target characterization space dimension of the multilayer attribute network in step 2 is d-32;
FIG. 2 shows a multi-layer attribute network characterization learning model process in conjunction with the single-layer attribute network characterization learning model Deep Graph InfoMax (DGI),
wherein the content of the first and second substances,indicating the ith node and the jth node in the l-th layer network of the multilayer attribute network to be connected with the edge weight,indicating that no continuous edge exists between the ith node and the jth node in the l layer network of the multilayer attribute network, and l belongs to [1, r ]]And r is 2, the network layer number of the multilayer attribute network;
the original attribute network refers to { A(1)X and { A }(2)X, said negative sample attribute network refers toAndthe encoder adopts a graph convolution neural network GCN, and then a node local representation matrixComprises the following steps:
Y(l)a matrix is locally characterized for the l-th layer of original nodes,locally characterizing the vector for the original node of the ith node of the l-th layer,for the l-th layer negative sample node local characterization matrix,local characterization vectors for the negative sample nodes of the ith node of the l-th layer,is a contiguous matrix inserted into the l-th layer of the self-loop, the importance of the gamma control node itself, generally in [1,5 ]]Internal integer value, INIs an identity matrix of N multiplied by N,is a corresponding degree matrix, which is an N × N diagonal matrix of the form:
n1000 is the total number of network nodes in the multi-layer attribute network, W(l)Is the weight parameter matrix learnable by the l-th layer, sigma is the ReLU nonlinear activation function, and l is the [1, r ]]And r is 2, the network layer number of the multilayer attribute network;
the Readout function adopts an average pooling function, and the concrete form is as follows:
wherein the content of the first and second substances,local characterization vector, s, for original node of ith node of l layer(l)For a global vector of the l-th network characterization, sigma represents sigmoid nonlinear function, and l belongs to [1, r ∈]R is 2, and N is 1000, which is the total number of network nodes of the multilayer attribute network;
the discriminator of the layer characterization is realized by a bilinear function, and the form of the discriminator is as follows:
wherein the content of the first and second substances,local characterization vector, s, for original node of ith node of l layer(l)For global vectors, characterized by the tier i network, σ is a sigmoid nonlinear function,is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminatorAn output of (d);
the discriminator of the layer characterization is shared with the discriminator of the layer characterization of step 205, and is of the form:
wherein the content of the first and second substances,local characterization vector of negative sample node for ith node of l layer(l)For global vectors, characterized by the tier i network, σ is a sigmoid nonlinear function,is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminatorAn output of (d);
the discriminator of the multilayer attribute network representation is realized by a bilinear function, and the form of the discriminator is as follows:
wherein the content of the first and second substances,local characterization vectors for the original nodes of the ith node of the l-th layer, ZiCharacterizing vectors for the multi-layer attribute network nodes of the ith node, wherein sigma is a sigmoid nonlinear function,is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminatorAn output of (d);
the identifier of the multi-layer attribute network representation is shared with the identifier in step 207, and the form of the identifier is as follows:
wherein the content of the first and second substances,local characterization vector, Z, for the negative sample node of the ith node of the l-th layeriCharacterizing vectors for the multi-layer attribute network nodes of the ith node, wherein sigma is a sigmoid nonlinear function,is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminatorAn output of (d);
the loss function described in step 2 is of the form:
wherein, ω islFor the hyper-parameters controlling the importance of the mutual information of the different layers, λ is the regularization term coefficient, θattr={W(1),W(2),M1,M2Z is a model parameter regularization term, N1000 is the total number of network nodes of the multilayer attribute network,the term is the output obtained in step 205,the term is the output obtained at step 206,the term is the output obtained in step 207 and,the term is the output obtained in step 208, where l ∈ [1, r ∈ [ ]]And r is 2, the network layer number of the multilayer attribute network;
step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;
the multi-layer attribute network node characterization matrix in the step 3 is a trainable parameter matrixThe characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is 1000, and d is 32, which is the target characterization space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method can adopt grid search to carry out multi-layer attribute network representation learning model hyper-parameter search, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the hyper-parameter comprises: { lr, λ, γ, ω1,ω2L r is a model learning rate, lambda is a loss function regularization term coefficient, gamma is a parameter of the importance of the encoder GCN control node, and omega1、ω2A hyper-parameter for controlling the importance of the mutual information of the 1 st and 2 nd layer networks;
the learning rate of the multilayer attribute network characterization learning model can be as follows:
lr∈[0.0001,0.0005,0.001,0.005],
the regularization term coefficients may be used as follows:
λ∈[0.00001,0.0001,0.001,0.01],
the encoder GCN may adopt the following parameters for controlling the importance of the node itself:
γ∈[1.0,2.0,3.0,4.0,5.0],
the hyper-parameter value for controlling the importance of the mutual information of different layers can adopt the following steps:
ω1∈[0.6,0.8,2.0,3.0],
ω2∈[0.6,0.8,2.0,3.0],
the training method can adopt a gradient descent method to minimize a loss function of a multi-layer attribute network characterization learning model, wherein the loss function is W(1),W(2),M1,M2Z is a trainable set of model parameters;
and 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
The method provided by the invention has the following advantages or beneficial technical effects:
the invention provides a multi-layer attribute network representation learning method based on mutual information maximization. The method fuses single-layer attribute network node representation matrixes by using a mutual information maximization principle, so that the fused multilayer attribute network node representation matrixes can express as much information as possible in a lower-dimensional space and focus on frequent modes in each layer of attribute network. Under the re-parameterization of variables, the mutual information is invariant. Using this property may reduce some of the unnecessary noise introduced during the training process. In addition, the method can extend the existing single-layer attribute network representation learning method to a multi-layer attribute network representation learning method.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.
Claims (4)
1. A multi-layer attribute network representation learning method based on mutual information maximization is characterized by comprising the following steps:
step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;
step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;
and 3, randomly initializing a node characterization matrix of the multilayer attribute network, training the multilayer attribute network characterization learning model by combining a loss function of the multilayer attribute network characterization learning model, and outputting the optimized node characterization matrix of the multilayer attribute network.
2. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:
each layer in the multilayer attribute network in the step 1 is a undirected network;
the multi-layer attribute network in step 1 is defined as:
G={G(1),G(2),...,G(r),X}={V,E,X},
wherein, the number of network layers in the step 1 is r, r is a positive integer and r is more than 1, G(l)Expressing the l-th network, the connection edge of each layer of the multilayer attribute network corresponds to an actual semantic relationship, r relationships are shared, the connection edge corresponds to the r-th network, and l belongs to [1, r ∈];
In step 1, the total number of the network nodes is N, N is a positive integer, and V ═ V1,v2,...,vNV is the set of nodes, viRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)(1),E(2),...,E(r)Is the set of network edges, E(l)A connection edge set of the layer I network;
the attribute dimension in the step 1 is f, wherein f is a positive integer,is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrixi;
The target characterization space dimension in the step 1 is d, d is less than N, and N is the total number of the multilayer attribute network nodes.
3. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:
the attribute dimension of the multilayer attribute network in the step 2 is f, the number of network layers of the multilayer attribute network in the step 2 is r, the total number of network nodes of the multilayer attribute network in the step 2 is N, and the target characterization space dimension of the multilayer attribute network in the step 2 is d;
the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely
Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,is the joint distribution of the variable X, Y,is the product of the edge distribution of the variable X, Y, TωIs a deep neural network based discriminator parameterized by ω; expected value in the formulaCan be obtained by samplingAndis estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approaches the mutual information of the random variables X and Y with high precision;
the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:
l-th network G for multi-layer attribute network(l),l∈[1,r]The single-layer attribute network representation learning model is glWhose trainable set of model parameters isThe hyper-parameter set to be adjusted isModel glHas the following functions: y isl=gl(G(l)X), where X is a node attribute matrix of a multi-layer attribute network, outputA node characterization matrix of the l-th network of the multilayer attribute network, d is a target characterization space dimension, N is the total number of the nodes of the multilayer attribute network, and a model glHas a loss function ofThe single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;
defining a multi-layer attribute network node characterization matrix as a trainable parameter matrixThe characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of the multilayer attribute network nodes, and d is a target representation space dimension;
step 2.1, for l ∈ [1, r ∈ >]Is inputted into G(l)X to the layer attribute network characterization learning model glObtaining an output Yl;
Step 2.2, use the line shuffling functionRandom scrambling YlA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network
Step 2.3, for l ∈ [1, r)]Characterizing the nodes of the l-th network of the multilayer attribute network into a matrix YlMultilayer attribute network node characterization matrix Z input discriminatorDeriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute networkCharacterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrixMultilayer attribute network node characterization matrix Z input discriminatorDeriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network
Wherein i ∈ [1, N ∈ ]]N is the total number of nodes in the multilayer attribute network, and a discriminatorA bilinear function may be employed, of the form:
sigma is a sigmoid non-linear function,the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is a target characterization space dimension of a multilayer attribute network;
the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:
wherein r is the number of network layers of the multilayer attribute network,is a model glLoss function of ωlIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient,is a model parameter regularization term, N is the total number of network nodes of the multi-layer attribute network,andlearning model discriminators for multi-layer attribute network characterizationTo output of (c).
4. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:
the node characterization matrix of the multilayer attribute network in the step 3 is a trainable parameter matrixThe characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrixiN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;
in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model glThe hyper-parameter set to be adjusted isHyper-parameter omega for controlling importance of mutual information of different layerslThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the above-mentionedThe training method of (1) can adopt a gradient descent method to minimize a loss function of a multi-layer attribute network characterization learning model, wherein,for the model trainable parameters, < i > e [ < 1 >, r >];
And 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110398736.9A CN113205175A (en) | 2021-04-12 | 2021-04-12 | Multi-layer attribute network representation learning method based on mutual information maximization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110398736.9A CN113205175A (en) | 2021-04-12 | 2021-04-12 | Multi-layer attribute network representation learning method based on mutual information maximization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113205175A true CN113205175A (en) | 2021-08-03 |
Family
ID=77026776
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110398736.9A Pending CN113205175A (en) | 2021-04-12 | 2021-04-12 | Multi-layer attribute network representation learning method based on mutual information maximization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113205175A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116304367A (en) * | 2023-02-24 | 2023-06-23 | 河北师范大学 | Algorithm and device for obtaining communities based on graph self-encoder self-supervision training |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107622307A (en) * | 2017-09-11 | 2018-01-23 | 浙江工业大学 | A kind of Undirected networks based on deep learning connect side right weight Forecasting Methodology |
CN109101629A (en) * | 2018-08-14 | 2018-12-28 | 合肥工业大学 | A kind of network representation method based on depth network structure and nodal community |
CN109376857A (en) * | 2018-09-03 | 2019-02-22 | 上海交通大学 | A kind of multi-modal depth internet startup disk method of fusion structure and attribute information |
-
2021
- 2021-04-12 CN CN202110398736.9A patent/CN113205175A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107622307A (en) * | 2017-09-11 | 2018-01-23 | 浙江工业大学 | A kind of Undirected networks based on deep learning connect side right weight Forecasting Methodology |
CN109101629A (en) * | 2018-08-14 | 2018-12-28 | 合肥工业大学 | A kind of network representation method based on depth network structure and nodal community |
CN109376857A (en) * | 2018-09-03 | 2019-02-22 | 上海交通大学 | A kind of multi-modal depth internet startup disk method of fusion structure and attribute information |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116304367A (en) * | 2023-02-24 | 2023-06-23 | 河北师范大学 | Algorithm and device for obtaining communities based on graph self-encoder self-supervision training |
CN116304367B (en) * | 2023-02-24 | 2023-12-01 | 河北师范大学 | Algorithm and device for obtaining communities based on graph self-encoder self-supervision training |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gao et al. | Deep transfer learning for image‐based structural damage recognition | |
Li et al. | LGM-Net: Learning to generate matching networks for few-shot learning | |
Zou et al. | Deep learning based feature selection for remote sensing scene classification | |
Kadam et al. | Efficient approach towards detection and identification of copy move and image splicing forgeries using mask R-CNN with MobileNet V1 | |
Rohekar et al. | Constructing deep neural networks by Bayesian network structure learning | |
Chen et al. | Automated design of neural network architectures with reinforcement learning for detection of global manipulations | |
CN113159067A (en) | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation | |
KR20190098801A (en) | Classificating method for image of trademark using machine learning | |
CN111461175A (en) | Label recommendation model construction method and device of self-attention and cooperative attention mechanism | |
CN114611617A (en) | Depth field self-adaptive image classification method based on prototype network | |
Choudhary et al. | Inference-aware convolutional neural network pruning | |
Kumar et al. | Pair wise training for stacked convolutional autoencoders using small scale images | |
Nalini et al. | Comparative analysis of deep network models through transfer learning | |
Tan et al. | Performance comparison of three types of autoencoder neural networks | |
CN112948581B (en) | Patent automatic classification method and device, electronic equipment and storage medium | |
CN113205175A (en) | Multi-layer attribute network representation learning method based on mutual information maximization | |
He et al. | Classification of metro facilities with deep neural networks | |
CN112905906A (en) | Recommendation method and system fusing local collaboration and feature intersection | |
Djibrine et al. | Transfer Learning for Animal Species Identification from CCTV Image: Case Study Zakouma National Park | |
CN114265954B (en) | Graph representation learning method based on position and structure information | |
Lee et al. | Ensemble of binary tree structured deep convolutional network for image classification | |
Fisher et al. | Tentnet: Deep learning tent detection algorithm using a synthetic training approach | |
CN112015854B (en) | Heterogeneous data attribute association method based on self-organizing mapping neural network | |
Chu et al. | Mixed-precision quantized neural network with progressively decreasing bitwidth for image classification and object detection | |
Rodriguez-Coayahuitl et al. | Convolutional genetic programming |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210803 |