CN113205175A

CN113205175A - Multi-layer attribute network representation learning method based on mutual information maximization

Info

Publication number: CN113205175A
Application number: CN202110398736.9A
Authority: CN
Inventors: 江昊; 王强; 聂琦; 羿舒文; 彭姿文
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2021-08-03

Abstract

The invention belongs to the technical field of deep network characterization learning, and particularly relates to a multi-layer attribute network characterization learning method based on mutual information maximization. Inputting a multilayer attribute network and a target representation space dimension; constructing a multilayer attribute network representation learning model and a loss function of the model by combining attribute dimensions, network layer numbers, total network nodes and target representation space dimensions in a multilayer attribute network and utilizing a mutual information maximization principle; and training a multilayer attribute network characterization learning model by combining the loss function of the model, and outputting a multilayer attribute network node characterization matrix. The invention utilizes the mutual information maximization principle to expand the prior single-layer attribute network representation learning method into the multilayer attribute network, obtains the node vector representation of the multilayer attribute network in the low-dimensional target space, and the relationship between the node vectors can keep the structural proximity and the attribute similarity between the nodes in the multilayer attribute network, thereby being beneficial to the analysis task of the multilayer attribute network.

Description

Multi-layer attribute network representation learning method based on mutual information maximization

Technical Field

The invention belongs to the field of deep learning and network characterization learning, and particularly relates to a multi-layer attribute network characterization learning method based on mutual information maximization.

Background

The network has strong representation capability, and can represent entities and the relations among the entities in various fields such as molecular networks, protein interaction networks, recommendation systems, social networks, citation networks and the like. In recent years, effective network analysis techniques have provided methods for mining potential information of data, thereby leading to wide applications such as community detection, link prediction, node classification, and the like. However, many network analysis methods require high time and space complexity when dealing with large-scale networks. Furthermore, many machine learning algorithms attempt to take network structured data as input, which can be typically represented as vectors. Traditional methods use statistical parameters, kernel functions or manually designed features to describe structural information, but these designs are expensive and cannot be adjusted during algorithm learning. Network characterization learning is one of the most effective ways to solve such problems. The method converts high-dimensional sparse network information into low-dimensional dense real-valued vectors, and a machine learning algorithm can efficiently utilize the vectors, so that a plurality of downstream tasks are benefited.

Currently, there are many methods for single-layer network characterization learning, but a real-world network usually includes a plurality of relationships, each relationship can independently form a layer of network, which is called a multi-layer network. Compared with a single-layer network, the entities of the multi-layer network have more complex relationships and richer semantics. Assuming that each layer of network has an independent semantic space, we can embed the multiple layers of network layer by layer and then obtain the embedded result of each layer of concatenation, but this way can only store the information of each layer independently. This is feasible for multi-layer networks with large variations between each layer. However, the existing single-layer network characterization learning method has the following problems when applied to multi-layer network characterization: (1) many multilayer networks are not simply formed by a random combination of multiple single layer networks, and their inter-layer networks exhibit significant correlation. The method of fusion embedding through simple operations such as splicing and weighted summation causes the connection among layers to be lost and ignores the correlation among the layers. (2) Each individual layer is typically sparse and biased, which results in the possibility of strong bias for independently embedded splices. (3) The multi-layer network is composed of networks with different relations, and different layer networks generated by the different relations have potential consistency and complementarity. Simple operations such as stitching and weighted summation are difficult to exploit with potential consistency and complementarity.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a multi-layer attribute network representation learning method based on mutual information maximization, and aims to solve the problem that the existing single-layer attribute network representation learning method is difficult to be directly applied to multi-layer attribute network representation.

The technical scheme of the invention provides a multilayer attribute network representation learning method based on mutual information maximization, which is characterized by comprising the following steps:

step 1, constructing a multilayer attribute network by combining attribute dimensions, network layer number and network node total number, and introducing target representation space dimensions;

step 2, constructing a multilayer attribute network representation learning model and a loss function of the multilayer attribute network representation learning model by utilizing a mutual information maximization principle by combining the attribute dimension, the network layer number, the total number of network nodes and the target representation space dimension of the multilayer attribute network;

step 3, randomly initializing a node representation matrix of the multilayer attribute network, training the multilayer attribute network representation learning model by combining a loss function of the multilayer attribute network representation learning model, and outputting the optimized node representation matrix of the multilayer attribute network;

preferably, each layer in the multi-layer attribute network in the step 1 is a undirected network;

the multi-layer attribute network in step 1 is defined as:

G＝{G⁽¹⁾，G⁽²⁾，…，G^(r)，X}＝{V，E，X}，

wherein, the number of network layers in the step 1 is r, r is a positive integer and r>1，G^(l)Representing the connection of each layer of the l-th network and the multi-layer attribute networkThe edge corresponds to an actual semantic relationship, r relationships are in total, the edge corresponds to an r-layer network, and l belongs to [1, r ]]；

In step 1, the total number of the network nodes is N, N is a positive integer, and V ═ V₁，v₂，…，v_NV is the set of nodes, v_iRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)⁽¹⁾，E⁽²⁾，…，E^(r)Is the set of network edges, E^(l)A connection edge set of the layer I network;

the attribute dimension in the step 1 is f, wherein f is a positive integer,

is a node attribute matrix of a multilayer attribute network, wherein the attribute of the ith node in the multilayer attribute network corresponds to the f-dimensional vector X of the ith row of the matrix_i；

The target characterization space dimension in the step 1 is d, d is less than N, and N is the total number of the multilayer attribute network nodes;

preferably, the attribute dimension of the multilayer attribute network in step 2 is f, the number of network layers of the multilayer attribute network in step 2 is r, the total number of network nodes of the multilayer attribute network in step 2 is N, and the target characterization space dimension of the multilayer attribute network in step 2 is d;

the mutual information maximization method in the step 2 is realized by maximizing a lower bound of the mutual information, namely

Where X, Y denotes two variables, MI (X; Y) denotes mutual information of the variables, is a complex measure of the non-linear correlation between the two variables,

is the joint distribution of the variable X, Y,

is the product of the edge distribution of the variable X, Y, T_ωIs a deep neural network based discriminator parameterized by ω; expected value in the formula

Can be obtained by sampling

And

is estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approaches the mutual information of the random variables X and Y with high precision;

the multilayer attribute network characterization learning model in the step 2 is specifically constructed by the following steps:

l-th network G for multi-layer attribute network^(l)，l∈[1,r]The single-layer attribute network representation learning model is g_lWhose trainable set of model parameters is

The hyper-parameter set to be adjusted is

Model g_lHas the following functions: y is_l＝g_l(G^(l)X), where X is a node attribute matrix of a multi-layer attribute network, output

A node characterization matrix of the l-th network of the multilayer attribute network, d is a target characterization space dimension, N is the total number of the nodes of the multilayer attribute network, and a model g_lHas a loss function of

The single-layer attribute network representation learning model adopted by each layer has the same form, but trainable model parameters are independent;

defining a multi-layer attribute network node characterization matrix as a trainable parameter matrix

The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrix_iN is the total number of the multilayer attribute network nodes, and d is a target representation space dimension;

step 2.1, for l ∈ [1, r ∈ >]Is inputted into G^(l)X to the layer attribute network characterization learning model g_lObtaining an output Y_l；

Step 2.2, use the line shuffling function

Random scrambling Y_lA row sequence for obtaining a node characterization negative sample matrix of the l-th network of the multilayer attribute network

Step 2.3, for l ∈ [1, r)]Characterizing the nodes of the l-th network of the multilayer attribute network into a matrix Y_lMultilayer attribute network node characterization matrix Z input discriminator

Deriving a positive sample output of a discriminator with respect to an ith node of a multi-layer attribute network

Characterizing nodes of a layer I network of a multi-layer attribute network into a negative sample matrix

Multilayer attribute network node characterization matrix Z input discriminator

Deriving a negative sample output of a discriminator with respect to an ith node of a multi-layer attribute network

Wherein i ∈ [1, N ∈ ]]N is the total number of nodes in the multilayer attribute network, and a discriminator

A bilinear function may be employed, of the form:

sigma is a sigmoid non-linear function,

the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is a target characterization space dimension of a multilayer attribute network;

the loss function of the multilayer attribute network characterization learning model in the step 2 adopts positive and negative example binary cross entropy loss, and the form is as follows:

wherein r is the number of network layers of the multilayer attribute network,

is a model g_lLoss function of ω_lIn order to control the superparameter of the importance of the mutual information of different layers, lambda is the regularization term coefficient,

is a model parameter regularization term, N is the total number of network nodes of the multi-layer attribute network,

and

learning model discriminators for multi-layer attribute network characterization

An output of (d);

preferably, the node characterization matrix of the multi-layer attribute network described in step 3 is a trainable parameter matrix

The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrix_iN is the total number of nodes of the multilayer attribute network, and d is the target representation space dimension of the multilayer attribute network;

in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method adopts grid search to carry out multi-parameter search on the multi-layer attribute network representation learning model, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the super-parameters comprise: model g_lThe hyper-parameter set to be adjusted is

Hyper-parameter omega for controlling importance of mutual information of different layers_lThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the training method can adopt a gradient descent method to minimize multilayer attribute network characterizationA loss function of the learning model, wherein,

for the model trainable parameters, < i > e [ < 1 >, r >]；

And 3, the optimized node representation matrix of the multilayer attribute network is a parameter matrix Z after training and adjustment of a multilayer attribute network representation learning model.

The method uses the mutual information to fuse the single-layer attribute network representation matrix, so that the fused multilayer attribute network representation matrix uses lower dimensionality to contain as much information as possible related to input and focuses on frequent modes in each layer of input. Under the re-parameterization of variables, the mutual information is invariant. This property may reduce some of the unnecessary noise introduced during the training process. In addition, the method can extend the existing single-layer attribute network characterization learning method to multi-layer attribute network characterization learning.

Drawings

To more clearly illustrate embodiments or prior art solutions of the present invention, embodiments or prior art are briefly described below, and the accompanying drawings in the following description are some embodiments of the present invention:

FIG. 1: the invention discloses a multi-layer attribute network representation learning method based on mutual information maximization.

FIG. 2: the embodiment of the invention provides a flow chart of a multi-layer attribute network characterization learning model based on mutual information maximization.

Detailed Description

The method is mainly based on the deep learning technology to maximize the mutual information of the multilayer attribute network representation matrix and each layer of attribute network representation matrix so as to realize multilayer attribute network representation learning.

The method provided by the invention can realize the process by using a computer software technology. In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention.

A first embodiment of the present invention is a multi-layer attribute network characterization learning method based on mutual information maximization, which is characterized by comprising the following steps:

each layer in the multilayer attribute network in the step 1 is a undirected network;

the multi-layer attribute network in step 1 is defined as:

G＝{G⁽¹⁾，G⁽²⁾，…，G^(r)，X}＝{V，E，X}，

wherein, the number of network layers in step 1 is r ═ 2, r is a positive integer and r is>1，G^(l)Expressing the l-th network, the connection edge of each layer of the multilayer attribute network corresponds to an actual semantic relationship, r relationships are shared, the connection edge corresponds to the r-th network, and l belongs to [1, r ∈]；

In step 1, the total number of the network nodes is N ═ 1000, N is a positive integer, and V ═ V₁，v₂，…，v_NV is the set of nodes, v_iRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)⁽¹⁾，E⁽²⁾，…，E^(r)Is the set of network edges, E^(l)A connection edge set of the layer I network;

the attribute dimension in step 1 is 200, f is a positive integer,

The target characterization space dimension in the step 1 is d-32, d & lt N, and N is the total number of the multilayer attribute network nodes;

in step 2, the attribute dimension of the multilayer attribute network is f-200, the number of network layers of the multilayer attribute network in step 2 is r-2, the total number of network nodes of the multilayer attribute network in step 2 is N-1000, and the target characterization space dimension of the multilayer attribute network in step 2 is d-32;

is the joint distribution of the variable X, Y,

Can be obtained by sampling

And

is estimated; the discriminator can accurately distinguish the sample of the product of X and Y joint distribution and edge distribution, and the X and the Y are considered to have higher mutual information; the expressive power of the discriminator ensures that the lower bound approximates each other of the random variables X and Y with high accuracyInformation;

l-th network G for multi-layer attribute network^(l)，l∈[1,r]And r is 2, and the adopted single-layer attribute network characterization learning model is g_lWhose trainable set of model parameters is

The hyper-parameter set to be adjusted is

A node characterization matrix of the first layer network of the multilayer attribute network is represented, d is 32 and is the target characterization space dimension of the multilayer attribute network, N is 1000 and is the total number of the nodes of the multilayer attribute network, and a model g_lHas a loss function of

step 2.1, for l ∈ [1, r ∈ >]R 2 is the network layer number of the multilayer attribute network, and G is input^(l)X to the layer attribute network characterization learning model g_lObtaining an output Y_l；

Step 2.2Using line shuffling functions

Step 2.3, for l ∈ [1, r)]And r is 2, the network layer number of the multilayer attribute network, and a node characterization matrix Y of the l-th layer network of the multilayer attribute network_lMultilayer attribute network node characterization matrix Z input discriminator

Multilayer attribute network node characterization matrix Z input discriminator

Wherein i ∈ [1, N ∈ ]]N1000 is the total number of nodes in the multilayer attribute network, and the discriminator

A bilinear function may be employed, of the form:

sigma is a sigmoid non-linear function,

the method comprises the following steps of (1) obtaining a trainable shared scoring matrix, wherein d is 32 and is a target characterization space dimension of a multi-layer attribute network;

wherein r is 2, the network layer number of the multilayer attribute network,

for the model parameter regularization term, N1000 is the total number of network nodes of the multi-layer attribute network,

and

An output of (d);

the node characterization matrix of the multilayer attribute network in the step 3 is a trainable parameter matrix

The characterization vector of the ith node in the multilayer attribute network corresponds to the d-dimensional vector Z of the ith row of the matrix_iN is 1000, and d is 32, which is the target characterization space dimension of the multilayer attribute network;

Hyper-parameter omega for controlling importance of mutual information of different layers_lThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the training method can adopt a gradient descent method to minimize a loss function of a multilayer attribute network characterization learning model, wherein,

for the model trainable parameters, < i > e [ < 1 >, r >]And r is 2, the network layer number of the multilayer attribute network;

the learning rate of the multilayer attribute network characterization learning model can be as follows:

lr∈[0.0001,0.0005,0.001,0.005],

the regularization term coefficients may be used as follows:

λ∈[0.00001,0.0001,0.001,0.01],

the hyper-parameter value for controlling the importance of the mutual information of different layers can adopt the following steps:

ω₁∈[0.6,0.8,2.0,3.0],

ω₂∈[0.6,0.8,2.0,3.0],

The second embodiment of the invention is described by combining a single-layer attribute network characterization learning method Deep Graph InfoMax (DGI), and comprises the following steps:

in specific implementation, each layer in the multi-layer attribute network in the step 1 is a undirected network;

the multilayer attribute network in step 1 is:

G＝{G⁽¹⁾，G⁽²⁾，X}＝{V，E，X}，

wherein, the number of network layers in step 1 is r ═ 2, r is a positive integer and r is>1，G⁽¹⁾Indicating layer 1 network, G⁽²⁾Indicating layer 2 network;

in step 1, the total number of the network nodes is N ═ 1000, N is a positive integer, and V ═ V₁，v₂，…，v_NV is the set of nodes, v_iRepresents the ith node in the two-layer attribute network G, i belongs to [1, N ]](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)⁽¹⁾，E⁽²⁾Is the set of network edges, E⁽¹⁾Set of contiguous edges for layer 1 networks, E⁽²⁾The connection edge set of the layer 2 network;

the attribute dimension in step 1 is 200, f is a positive integer,

step 2, constructing a multilayer attribute network representation learning model and a loss function of the model by utilizing a mutual information maximization principle by combining attribute dimensions, network layer numbers, network node total numbers and target representation space dimensions in a multilayer attribute network;

FIG. 2 shows a multi-layer attribute network characterization learning model process in conjunction with the single-layer attribute network characterization learning model Deep Graph InfoMax (DGI),

step 201, inputting a multilayer attribute network and a target characterization space dimension, wherein the multilayer attribute network is G ═ G⁽¹⁾，G⁽²⁾X, the target characterization space dimension of the multilayer attribute network is d ═ 32;

step 202, randomly perturbing the node attribute matrix by a row shuffling function to generate multi-layer attribute network negative samples, i.e.

The row shuffling function only changes the node attribute matrix, where,

for line shuffling function, A ═ A⁽¹⁾，A⁽²⁾Is the set of adjacency matrices of different layers, A⁽¹⁾Is a layer 1 network G⁽¹⁾Of a neighboring matrix of⁽²⁾Is a layer 2 network G⁽²⁾Of the adjacent matrix of (a) and (b),

the method is a node attribute matrix of a multilayer attribute network, wherein N is 1000 which is the total number of network nodes of the multilayer attribute network, f is 200 which is the attribute dimension of the multilayer attribute network, and an adjacency matrix A⁽¹⁾，A⁽²⁾The specific form is as follows:

wherein the content of the first and second substances,

indicating the ith node and the jth node in the l-th layer network of the multilayer attribute network to be connected with the edge weight,

indicating that no continuous edge exists between the ith node and the jth node in the l layer network of the multilayer attribute network, and l belongs to [1, r ]]And r is 2, the network layer number of the multilayer attribute network;

step 203, encoding the original attribute network and the negative sample attribute network of each layer into a target characterization space through an encoder, and acquiring an original node local characterization vector and a negative sample node local characterization vector of each layer network;

the original attribute network refers to { A⁽¹⁾X and { A }⁽²⁾X, said negative sample attribute network refers to

And

the encoder adopts a graph convolution neural network GCN, and then a node local representation matrixComprises the following steps:

Y^(l)a matrix is locally characterized for the l-th layer of original nodes,

locally characterizing the vector for the original node of the ith node of the l-th layer,

for the l-th layer negative sample node local characterization matrix,

local characterization vectors for the negative sample nodes of the ith node of the l-th layer,

is a contiguous matrix inserted into the l-th layer of the self-loop, the importance of the gamma control node itself, generally in [1,5 ]]Internal integer value, I_NIs an identity matrix of N multiplied by N,

is a corresponding degree matrix, which is an N × N diagonal matrix of the form:

n1000 is the total number of network nodes in the multi-layer attribute network, W^(l)Is the weight parameter matrix learnable by the l-th layer, sigma is the ReLU nonlinear activation function, and l is the [1, r ]]And r is 2, the network layer number of the multilayer attribute network;

step 204, inputting the local characterization vector of the original node of each layer network into a Readout function to obtain a global vector of each layer network characterization;

the Readout function adopts an average pooling function, and the concrete form is as follows:

wherein the content of the first and second substances,

local characterization vector, s, for original node of ith node of l layer^(l)For a global vector of the l-th network characterization, sigma represents sigmoid nonlinear function, and l belongs to [1, r ∈]R is 2, and N is 1000, which is the total number of network nodes of the multilayer attribute network;

step 205, inputting the local characterization vector of the original node of each layer and the global vector of the network characterization into a discriminator for layer characterization, and obtaining output;

the discriminator of the layer characterization is realized by a bilinear function, and the form of the discriminator is as follows:

wherein the content of the first and second substances,

local characterization vector, s, for original node of ith node of l layer^(l)For global vectors, characterized by the tier i network, σ is a sigmoid nonlinear function,

is a trainable shared scoring matrix, d is 32 representing the target characteristic space dimension of the multi-layer attribute network, and is in the range of [1, r ]]And r 2 is the network layer number of the multilayer attribute network, and the output is the discriminator

An output of (d);

step 206, inputting the local characterization vector of the negative sample node of each layer and the global vector of the network characterization into a discriminator for layer characterization, and obtaining output;

the discriminator of the layer characterization is shared with the discriminator of the layer characterization of step 205, and is of the form:

wherein the content of the first and second substances,

local characterization vector of negative sample node for ith node of l layer^(l)For global vectors, characterized by the tier i network, σ is a sigmoid nonlinear function,

An output of (d);

step 207, inputting the original node local characterization vector and the multilayer attribute network node characterization vector of each layer into a discriminator for multilayer attribute network characterization to obtain output;

the discriminator of the multilayer attribute network representation is realized by a bilinear function, and the form of the discriminator is as follows:

wherein the content of the first and second substances,

local characterization vectors for the original nodes of the ith node of the l-th layer, Z_iCharacterizing vectors for the multi-layer attribute network nodes of the ith node, wherein sigma is a sigmoid nonlinear function,

An output of (d);

step 208, inputting the local characterization vector of the negative sample node and the multilayer attribute network node characterization vector of each layer into a discriminator for multilayer attribute network characterization, and obtaining output;

the identifier of the multi-layer attribute network representation is shared with the identifier in step 207, and the form of the identifier is as follows:

wherein the content of the first and second substances,

local characterization vector, Z, for the negative sample node of the ith node of the l-th layer_iCharacterizing vectors for the multi-layer attribute network nodes of the ith node, wherein sigma is a sigmoid nonlinear function,

An output of (d);

the loss function described in step 2 is of the form:

wherein, ω is_lFor the hyper-parameters controlling the importance of the mutual information of the different layers, λ is the regularization term coefficient, θ_attr＝{W⁽¹⁾,W⁽²⁾,M₁,M₂Z is a model parameter regularization term, N1000 is the total number of network nodes of the multilayer attribute network,

the term is the output obtained in step 205,

the term is the output obtained at step 206,

the term is the output obtained in step 207 and,

the term is the output obtained in step 208, where l ∈ [1, r ∈ [ ]]And r is 2, the network layer number of the multilayer attribute network;

the multi-layer attribute network node characterization matrix in the step 3 is a trainable parameter matrix

in step 3, a multi-layer attribute network representation learning model is trained by combining a loss function of the model, the training method can adopt grid search to carry out multi-layer attribute network representation learning model hyper-parameter search, namely, in all candidate parameters, the best-performing parameter is the final result by circularly traversing the possibility of trying each parameter combination, and the hyper-parameter comprises: { lr, λ, γ, ω₁,ω₂L r is a model learning rate, lambda is a loss function regularization term coefficient, gamma is a parameter of the importance of the encoder GCN control node, and omega₁、ω₂A hyper-parameter for controlling the importance of the mutual information of the 1 st and 2 nd layer networks;

lr∈[0.0001,0.0005,0.001,0.005],

the regularization term coefficients may be used as follows:

λ∈[0.00001,0.0001,0.001,0.01],

the encoder GCN may adopt the following parameters for controlling the importance of the node itself:

γ∈[1.0,2.0,3.0,4.0,5.0],

ω₁∈[0.6,0.8,2.0,3.0],

ω₂∈[0.6,0.8,2.0,3.0],

the training method can adopt a gradient descent method to minimize a loss function of a multi-layer attribute network characterization learning model, wherein the loss function is W⁽¹⁾,W⁽²⁾,M₁,M₂Z is a trainable set of model parameters;

The method provided by the invention has the following advantages or beneficial technical effects:

the invention provides a multi-layer attribute network representation learning method based on mutual information maximization. The method fuses single-layer attribute network node representation matrixes by using a mutual information maximization principle, so that the fused multilayer attribute network node representation matrixes can express as much information as possible in a lower-dimensional space and focus on frequent modes in each layer of attribute network. Under the re-parameterization of variables, the mutual information is invariant. Using this property may reduce some of the unnecessary noise introduced during the training process. In addition, the method can extend the existing single-layer attribute network representation learning method to a multi-layer attribute network representation learning method.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A multi-layer attribute network representation learning method based on mutual information maximization is characterized by comprising the following steps:

and 3, randomly initializing a node characterization matrix of the multilayer attribute network, training the multilayer attribute network characterization learning model by combining a loss function of the multilayer attribute network characterization learning model, and outputting the optimized node characterization matrix of the multilayer attribute network.

2. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:

the multi-layer attribute network in step 1 is defined as:

G＝{G⁽¹⁾，G⁽²⁾，...，G^(r)，X}＝{V，E，X}，

wherein, the number of network layers in the step 1 is r, r is a positive integer and r is more than 1, G^(l)Expressing the l-th network, the connection edge of each layer of the multilayer attribute network corresponds to an actual semantic relationship, r relationships are shared, the connection edge corresponds to the r-th network, and l belongs to [1, r ∈]；

In step 1, the total number of the network nodes is N, N is a positive integer, and V ═ V₁，v₂，...，v_NV is the set of nodes, v_iRepresenting the ith node in a multi-layer attribute network G, i ∈ [1, N](ii) a The set of nodes in each layer network is the same, but the set of contiguous edges is different, E ═ E { (E)⁽¹⁾，E⁽²⁾，...，E^(r)Is the set of network edges, E^(l)A connection edge set of the layer I network;

the attribute dimension in the step 1 is f, wherein f is a positive integer,

The target characterization space dimension in the step 1 is d, d is less than N, and N is the total number of the multilayer attribute network nodes.

3. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:

the attribute dimension of the multilayer attribute network in the step 2 is f, the number of network layers of the multilayer attribute network in the step 2 is r, the total number of network nodes of the multilayer attribute network in the step 2 is N, and the target characterization space dimension of the multilayer attribute network in the step 2 is d;

is the joint distribution of the variable X, Y,

Can be obtained by sampling

And

l-th network G for multi-layer attribute network^(l)，l∈[1，r]The single-layer attribute network representation learning model is g_lWhose trainable set of model parameters is

The hyper-parameter set to be adjusted is

Step 2.2, use the line shuffling function

Multilayer attribute network node characterization matrix Z input discriminator

A bilinear function may be employed, of the form:

sigma is a sigmoid non-linear function,

wherein r is the number of network layers of the multilayer attribute network,

and

To output of (c).

4. The mutual information maximization-based multi-layer attribute network characterization learning method according to claim 1, characterized in that:

Hyper-parameter omega for controlling importance of mutual information of different layers_lThe regularization term coefficient lambda and the learning rate lr of the multilayer attribute network representation learning model; the above-mentionedThe training method of (1) can adopt a gradient descent method to minimize a loss function of a multi-layer attribute network characterization learning model, wherein,

for the model trainable parameters, < i > e [ < 1 >, r >]；