CN114372505A - Unsupervised network alignment method and system - Google Patents

Unsupervised network alignment method and system Download PDF

Info

Publication number
CN114372505A
CN114372505A CN202111500307.4A CN202111500307A CN114372505A CN 114372505 A CN114372505 A CN 114372505A CN 202111500307 A CN202111500307 A CN 202111500307A CN 114372505 A CN114372505 A CN 114372505A
Authority
CN
China
Prior art keywords
network
matrix
unsupervised
node
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111500307.4A
Other languages
Chinese (zh)
Inventor
侯家琛
战德成
杨林瑶
徐延才
李小双
王晓
王飞跃
张俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Academy Of Intelligent Industries
State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Qingdao Academy Of Intelligent Industries
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Academy Of Intelligent Industries filed Critical Qingdao Academy Of Intelligent Industries
Priority to CN202111500307.4A priority Critical patent/CN114372505A/en
Publication of CN114372505A publication Critical patent/CN114372505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides an unsupervised network alignment method and a system, wherein the technical scheme of the method comprises an embedded structure calculation step, an adjacent matrix of a source network and a target network is obtained, and the embedded structures of the source network and the target network are learned through two graph neural networks respectively; calculating a projection matrix, namely calculating the projection matrix of the source network and the target network according to the adjacent matrix, the embedded structure and the hyper-parameter of the source network and the target network; and a node alignment calculation step, namely calculating the similarity of the projection matrix to obtain a similarity matrix, and performing node matching of the source network and the target network according to the similarity matrix. The invention solves the problems of high cost, long time consumption and poor effect of the existing network alignment method.

Description

Unsupervised network alignment method and system
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to an unsupervised network alignment method and an unsupervised network alignment system.
Background
Identifying Network Alignments (NA) of cross-network node correspondences is critical to integrating knowledge from multiple networks while providing a comprehensive view of network analysis. The most advanced NA approach is based on labeled cross-network anchor linking, modeling the similarity of nodes to low-dimensional embedding. However, the labeling of anchor links is highly labor-dependent, costly, time-consuming, and difficult to implement due to privacy and security issues. While some unsupervised network alignment methods attempt to align the underlying space of different networks based on an antagonistic mapping model by learning a mapping function, they rely on highly discriminating attributes to learn embedding, which is difficult to converge to an optimal solution.
Disclosure of Invention
The embodiment of the application provides an unsupervised network alignment method and system, and aims to at least solve the problems of high cost, long time consumption and poor effect of the conventional network alignment method.
In a first aspect, an embodiment of the present application provides an unsupervised network alignment method, including: an embedded structure calculation step, namely acquiring an adjacent matrix of a source network and a target network, and respectively learning the embedded structures of the source network and the target network through two graph neural networks; calculating a projection matrix, namely calculating the projection matrix of the source network and the target network according to the adjacent matrix, the embedded structure and the hyper-parameter of the source network and the target network; and a node alignment calculation step, namely calculating the similarity of the projection matrix to obtain a similarity matrix, and performing node matching of the source network and the target network according to the similarity matrix.
In some of these embodiments, the projection matrix calculating step further comprises: and when the projection matrix is calculated, carrying out forward projection and backward projection restoration in sequence.
In some of these embodiments, the projection matrix calculating step further comprises: the loss function is calculated by the sinkhorn distance.
In some of these embodiments, the projection matrix calculating step further comprises: the projection matrix is optimized by adopting a cycle consistent countermeasure model based on a CycleGAN framework.
In some of these embodiments, the embedded structure calculating step further comprises: and calculating the embedded structure by adopting a DGI machine learning model.
In some of these embodiments, the embedded structure calculating step further comprises: and respectively adding and averaging the embedded structure values of all the nodes in the source network and the target network to generate the embedded structure of the whole graph of the source network and the target network.
In some of these embodiments, the embedded structure calculating step further comprises: and training the DGI machine learning model by using binary cross entropy loss.
In some of these embodiments, the node alignment calculating step further comprises: and the node matching adopts collective corresponding distribution based on a greedy strategy.
In some of these embodiments, the node alignment calculating step further comprises: and calculating the Euclidean distance of the projection matrix of the source network and the target network to obtain a distance matrix, and calculating according to the distance matrix to obtain the node distribution dictionary.
In a second aspect, an embodiment of the present application provides an unsupervised network alignment system, which is applicable to the above unsupervised network alignment method, and includes: the embedded structure calculation module is used for acquiring an adjacent matrix of the source network and the target network and respectively learning the embedded structures of the source network and the target network through two graph neural networks; the projection matrix calculation module is used for calculating to obtain a projection matrix of the source network and the target network according to the adjacent matrix, the embedded structure and the hyper-parameter of the source network and the target network; and the node alignment calculation module is used for calculating the similarity of the projection matrix to obtain a similarity matrix and carrying out node matching of the source network and the target network according to the similarity matrix.
Compared with the related art, the unsupervised network alignment method provided by the embodiment of the application has the following advantages:
1. the invention provides an unsupervised method which effectively utilizes structural information and a countermeasure model with periodic consistency to learn node representation and optimize a mapping function of a network alignment problem.
2. The present invention initializes the mapping function of the countermeasure model by iteratively solving the warerstein-Prokrustes problem to facilitate the learning of the countermeasure model.
3. The present invention optimizes the confrontation mapping model by minimizing the Sinhonn distance between the translation source and target embeddings, which helps to alleviate the pattern collapse problem.
4. The invention is based on a greedy strategy and collectively and efficiently distributes node correspondences with one-to-one constraints.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of an unsupervised network alignment method of the present invention;
FIG. 2 is a block diagram of an unsupervised network alignment system of the present invention;
FIG. 3 is a block diagram of an electronic device according to an embodiment of the present invention;
FIG. 4 is a general diagram of the unsupervised network alignment method of the present invention;
FIG. 5 is a diagram illustrating the experimental effect of the unsupervised network alignment method of the present invention;
in the above figures:
1. an embedded structure calculation module; 2. a projection matrix calculation module; 3. a node alignment calculation module; 60. A bus; 61. a processor; 62. a memory; 63. a communication interface.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The method includes the steps that firstly, a network is embedded based on an unsupervised depth map information (DGI) model, global and local structural features of nodes are captured, a Wasserstein Procrustes problem is solved through iteration based on an embedded matrix and an adjacent matrix, a linear mapping function is initialized, and embedding is converted into a common vector space. A pair of countermeasure networks is then applied to improve the linear mapping function with the cyclic consistency constraint. And finally, calculating the distance between the source embedding and the target embedding after conversion, and carrying out one-to-one alignment based on a collective correspondence allocation algorithm.
The nodes are expressed into the public vector space through cross-network mapping of the embedding space and based on a confrontation model learning mapping function with a fine initialization period consistent. According to the structure information of the network, the network is embedded, and a mapping function between different embedding spaces is learned through iterative calculation of an equivalent Wasserstein-Prokrusters problem, so that countertraining is better initialized. And then, the mapping function is refined through a cyclic consistent countermeasure model, the embedding distribution difference based on cyclic countermeasure learning is further reduced, and the Sinhoen (Sinkhorn) distance is added into the loss function to relieve the mode collapse problem. And performing one-to-one constraint efficient distribution on the accurate and robust corresponding nodes by utilizing the distance between the mapping embedded points based on a set corresponding distribution algorithm.
Embodiments of the invention are described in detail below with reference to the accompanying drawings:
referring to fig. 1 and 4, the unsupervised network alignment method of the present invention includes the following steps:
s1: and acquiring an adjacency matrix of the source network and the target network, and learning the embedded structures of the source network and the target network respectively through two graph neural networks.
Optionally, the embedded structure is calculated by using a DGI machine learning model.
Optionally, the embedded structure values of all nodes in the source network and the target network are respectively added and averaged to generate the embedded structure of the whole graph of the source network and the target network.
Optionally, the DGI machine learning model is trained using binary cross entropy loss.
In a specific implementation, a DGI machine learning model is adopted in the step, the model is an unsupervised learning model, and the embedded structure can be learned without label attributes.
In a multi-network system, networks are paired one-to-one, denoted as G1 ═ V1, E1 and G2 ═ V2, E2, where V1, V2 are node sets and E1, E2 are corresponding links between nodes. Given a node in G1, the task of Unsupervised Network Alignment (UNA) is to find its corresponding node in G2, without observing any node correspondence.
In general, if both networks have n nodes, each network Gi learns a d-dimensional embedding matrix
Figure RE-GDA0003519421730000051
Due to different embedding spaces of different networks and heterogeneous characteristics thereof, a linear transformation is learned to project one embedding space to another, and when the node correspondence is known, an orthogonal Procrustes (Procrustes) method is applied. When the transformation matrix is known, the mapping matrix is found by minimizing the squared Waterstein (Wasserstein) distance. When any node correspondence cannot be known in advance and the linear transformation matrix cannot be known at the same time. Then, a Wasserstein-Procrustes problem analysis method is applied, that is, the learning-optimized node correspondence and the corresponding linear transformation.
The goal of the DGI model is to maximize the local mutual information, i.e., maximize the joint probability of pairs of sets of blocks (patch). Thus, the model is trained with a binary cross entropy loss:
Figure RE-GDA0003519421730000052
where D is the discriminator and D (-) is the positive probability score for a given set of blocks (patch-sum) pair.
Figure RE-GDA0003519421730000053
Is a negative sample obtained by preserving the original adjacency matrix while generating the impairment characteristics by a line-by-line transformation of X. m is the number of negative samples. The model maximizes the objective by updating the parameters of the block encoder and discriminator through gradient descent training. A linear transformation (LA) of the adjacency matrix is employed based on security and privacy concerns. Based on the DGI model, both global and local structural features of the nodes remain in their embedding, which helps to effectively capture the structural similarity of the nodes.
S2: and calculating to obtain a projection matrix of the source network and the target network according to the adjacent matrix, the embedded structure and the hyper-parameter of the source network and the target network.
Optionally, when the projection matrix is calculated, forward projection and backward projection restoration are performed in sequence.
Alternatively, the loss function is calculated by the sinkhorn distance.
Optionally, a cyclic gan framework-based period consistent confrontation model is used to optimize the projection matrix.
In the implementation, due to the source network G1And a target network G2Their nodes may be represented in different embedding spaces. The application aligns their embedding spaces to compute the similarity between different network nodes, i.e. learns a linear mapping function to project one embedding space to another.
By learning an initial linear transformation, it is based on iterative calculations between the Wasserstein Procrustes (Wasserstein Procrustes) problem. First, Wasser initializes Wasser-Prokruslite by solving a classical orthogonal regularized quadratic graph matching problemstein Procrustes) problem. The first component of the goal is to learn a permutation matrix to maximize G1And G2And the second component is to ensure G1Each node in (2) is assigned to G only2One node in (b).
In a specific implementation, the initial projection matrix is calculated from an orthogonal Procruster (Procrustes) analysis, using the following algorithm:
inputting: node embedding Z1、Z2Adjacent matrix A1、A2Of a hyperparameter λ0、μ、η
And (3) outputting: q
Initializing a permutation matrix:
Figure RE-GDA0003519421730000061
initialization of a transformation matrix: q ═ UVT
Figure RE-GDA0003519421730000062
When T is 1 → T, run
Figure RE-GDA0003519421730000071
Figure RE-GDA0003519421730000072
U∑VT=SVD(Q-ηGt);Q=UVT
Thereafter, in each iteration t, P is first calculated by solving the walerstein (Wasserstein) problem by using the simhonen (Sinkhorn) algorithm and regularization based on the current transformation matrix Qt. Then, from Wasserstein-Proklusters (Wasserstein products)
Figure RE-GDA0003519421730000073
Calculating a gradient G for QtAnd uses it to update the transformation matrix, thereby solving the probukast (Procustes) problem. The mapping initialization procedure based on the Wasserstein-Procrustes problem solution is shown in the above algorithm. Based on the algorithm, Z is1Is transformed into Z2The mapping function of the embedding space of (2) is initialized to the transformation matrix. Similarly, slave Z may also be initialized2To Z1The mapping function of (2).
The method adopts a cyclic GAN framework-based period consistency countermeasure model to optimize a projection matrix so as to promote a mapping function to generate the invariant embedding of the distributed network. The Sinhoen (Sinkhorn) distance between the map structure embeddings is used to guide the training of the period-consistent confrontation model. Given G1And G2The model learns two generators: f12:Z1→Z2And F21:Z2→Z1The aim is to project one vector space into another in the distribution layer. For example, if G is tried1Conversion to G2Then by the generator F12Is responsible for the space Z1Is converted to Z2The aim being to convert Z1And Z2Is the smallest.
In a specific implementation, the projection matrix and the result are adjusted by a recurrent neural network. And learning two mapping matrixes, and restoring by one forward projection and one backward projection, so that the difference loss of the two projections is reduced as much as possible, and the points of the two networks are aligned one by one. And calculates the loss function by the sinkhorn distance.
The Sinhoen (Sinkhorn) distance is a useful distance measure for probability distributions. Z 'is measured by using the Sinkhorn distance'1And Z2And Z'2And Z1Divergence between them.
Formally, the Sinkhorn distance between matrices X and Y is defined as:
Figure RE-GDA0003519421730000074
wherein<·,·>Representing a Frobenius dot product, M being a matrix of distances between X and Y, Ua(r, c) is a transmission polyhedron with entropy constraints defined as:
Figure RE-GDA0003519421730000081
where h (-) is the entropy function, r and c are the sample weights in the source and target domains, and α is the hyper-parameter. Various distance measurements may be used to calculate the distance matrix M, using the formula:
Figure RE-GDA0003519421730000082
wherein xiAnd yjAre the ith and jth vectors of X and Y. Obviously, the entries of the distance matrix are limited to [0, 2 ]]Within the range. Then, the Sinkhorn distance is calculated according to the following algorithm, wherein lambda1Is an entropy constrained lagrange multiplier.
Figure RE-GDA0003519421730000083
Representing the inner product calculation. r and c are set to be proportional to the degree of the node.
Calculation of Sinkhorn distance
Inputting: m, r, c, lambda1,T
And (3) outputting: dsh(X,Y)
Figure RE-GDA0003519421730000084
Figure RE-GDA0003519421730000085
When T is 1 → T, run
μ=r./Kv
v=c./KTμ;
Figure RE-GDA0003519421730000086
After the Sinkhorn distance is defined, the generator F12The challenge loss function of (a) is defined as:
Figure RE-GDA0003519421730000087
generator F21For coupling Z2Conversion to Z2The antagonism loss function of (a) is calculated as follows:
Figure RE-GDA0003519421730000088
with these competing losses, the generator will attempt to convert the embedding of the network to be similar to the embedding of another network. To further reduce the possible mapping space and guide the model to learn one-to-one mappings, the method jointly trains two generators by adding a cycle consistency constraint and a reconstruction penalty. G1Is then embedded by F21Transformation generating reconstruction node embedding Z1Similarly, G2Reconstructed node embedding slave F12To obtain the compound. The reconstruction loss function is then calculated as:
Figure RE-GDA0003519421730000091
wherein | · | purple1Represents L1Distance.
Finally, the integrity loss function of the generator is as follows:
Figure RE-GDA0003519421730000092
wherein λ2Is the relative weight of the reconstruction penalty.
The linear generator is jointly optimized with an Adam optimizer by minimizing the above-mentioned loss function. Using the initialized mapping function, the training of the cyclic consistent countermeasure model is faster and does not require attribute embedding. Under the continuous supervision of the Sinkhorn distribution distance and the cycle consistency constraint, the model is converged to an optimal solution, and a one-to-one mapping function is learned.
S3: and calculating the similarity of the projection matrix to obtain a similarity matrix, and performing node matching of the source network and the target network according to the similarity matrix.
Optionally, the node matching adopts collective corresponding allocation based on a greedy policy.
Optionally, the euclidean distance between the projection matrices of the source network and the target network is calculated to obtain a distance matrix, and the node distribution dictionary is calculated according to the distance matrix.
In specific implementation, similarity is calculated for the projected embedded structure to obtain a similarity matrix, and then node matching is performed on the similarity matrix. The node alignment is calculated using two network similarities.
In the specific implementation, the following collective correspondence allocation algorithm based on the greedy strategy is adopted:
inputting: maximum value of network distance S
And (3) outputting: node distribution dictionary A
When t is 1 → n, run
Finding the minimum term of S and recording the horizontal and vertical indexes i and j of the minimum term;
distribution G1Ith node of (2) and G2The jth node of (a) is added to a in pairs;
changing the ith row and jth column element of S to infinity;
in an implementation, Z is calculated1And Z2The distance matrix S is obtained.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment of the application provides an unsupervised network alignment system, which is suitable for the unsupervised network alignment method. As used below, the terms "unit," "module," and the like may implement a combination of software and/or hardware of predetermined functions. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
Fig. 2 is a block diagram of an unsupervised network alignment system according to the present invention, please refer to fig. 2, which includes:
embedded structure calculation module 1: and acquiring an adjacency matrix of the source network and the target network, and learning the embedded structures of the source network and the target network respectively through two graph neural networks.
Optionally, the embedded structure is calculated by using a DGI machine learning model.
Optionally, the embedded structure values of all nodes in the source network and the target network are respectively added and averaged to generate the embedded structure of the whole graph of the source network and the target network.
Optionally, the DGI machine learning model is trained using binary cross entropy loss.
In a specific implementation, the DGI machine learning model is adopted by the module, is an unsupervised learning model, and can learn the embedded structure without label attributes.
In a multi-network system, networks are paired one-to-one, denoted as G1 ═ V1, E1 and G2 ═ V2, E2, where V1, V2 are node sets and E1, E2 are corresponding links between nodes. Given a node in G1, the task of Unsupervised Network Alignment (UNA) is to find its corresponding node in G2, without observing any node correspondence.
In general, if both networks have n nodes, each network Gi learns a d-dimensional embedding matrix
Figure RE-GDA0003519421730000101
Due to different embedding spaces of different networks and heterogeneous characteristics thereof, a linear transformation is learned to project one embedding space to another, and when the node correspondence is known, an orthogonal Procrustes (Procrustes) method is applied. When the transformation matrix is known, the mapping matrix is found by minimizing the squared Waterstein (Wasserstein) distance. When any node correspondence cannot be known in advance and the linear transformation matrix cannot be known at the same time. Then, a Wasserstein-Procrustes problem analysis method is applied, that is, the learning-optimized node correspondence and the corresponding linear transformation.
The goal of the DGI model is to maximize the local mutual information, i.e., maximize the joint probability of pairs of sets of blocks (patch). Thus, the model is trained with a binary cross entropy loss:
Figure RE-GDA0003519421730000111
where D is the discriminator and D (-) is the positive probability score for a given set of blocks (patch-sum) pair.
Figure RE-GDA0003519421730000112
Is a negative sample obtained by preserving the original adjacency matrix while generating the impairment characteristics by a line-by-line transformation of X. m is the number of negative samples. The model maximizes the objective by updating the parameters of the block encoder and discriminator through gradient descent training. A linear transformation (LA) of the adjacency matrix is employed based on security and privacy concerns. Based on the DGI model, both global and local structural features of the nodes remain in their embedding, which helps to effectively capture the structural similarity of the nodes.
Projection matrix calculation module 2: and calculating to obtain a projection matrix of the source network and the target network according to the adjacent matrix, the embedded structure and the hyper-parameter of the source network and the target network.
Optionally, when the projection matrix is calculated, forward projection and backward projection restoration are performed in sequence.
Alternatively, the loss function is calculated by the sinkhorn distance.
Optionally, a cyclic gan framework-based period consistent confrontation model is used to optimize the projection matrix.
In the implementation, due to the source network G1And a target network G2Their nodes may be represented in different embedding spaces. The application aligns their embedding spaces to compute the similarity between different network nodes, i.e. learns a linear mapping function to project one embedding space to another.
By learning an initial linear transformation, it is based on iterative calculations between the Wasserstein Procrustes (Wasserstein Procrustes) problem. The permutation matrix in the walserstein-procklusters (Wasserstein Procrustes) problem is first initialized by solving the classical orthogonal regularized quadratic graph matching problem. The first component of the goal is to learn a permutation matrix to maximize G1And G2And the second component is to ensure G1Each node in (2) is assigned to G only2One node in (b).
In a specific implementation, the initial projection matrix is calculated from an orthogonal Procruster (Procrustes) analysis, using the following algorithm:
inputting: node embedding Z1、Z2Adjacent matrix A1、A2Of a hyperparameter λ0、μ、η
And (3) outputting: q
Initializing a permutation matrix:
Figure RE-GDA0003519421730000121
initialization of a transformation matrix: q ═ UVT
Figure RE-GDA0003519421730000122
When T is 1 → T, run
Figure RE-GDA0003519421730000123
Figure RE-GDA0003519421730000124
U∑VT=SVD(Q-ηGt);Q=UVT
Thereafter, in each iteration t, P is first calculated by solving the walerstein (Wasserstein) problem by using the simhonen (Sinkhorn) algorithm and regularization based on the current transformation matrix Qt. Then, from Wasserstein-Proklusters (Wasserstein products)
Figure RE-GDA0003519421730000125
Calculating a gradient G for QtAnd uses it to update the transformation matrix, thereby solving the probukast (Procustes) problem. The mapping initialization procedure based on the Wasserstein-Procrustes problem solution is shown in the above algorithm. Based on the algorithm, Z is1Is transformed into Z2The mapping function of the embedding space of (2) is initialized to the transformation matrix. Similarly, slave Z may also be initialized2To Z1The mapping function of (2).
The system adopts a cyclic GAN framework-based period consistent countermeasure model to optimize a projection matrix so as to promote a mapping function to generate the distributed network invariant embedding. The Sinhoen (Sinkhorn) distance between the map structure embeddings is used to guide the training of the period-consistent confrontation model. Given G1And G2The model learns two generators: f12:Z1→Z2And F21:Z2→Z1The aim is to project one vector space into another in the distribution layer. For example, if G is tried1Conversion to G2Then by the generator F12Is responsible for the space Z1Is converted to Z2The aim being to convert Z1And Z2Is the smallest.
In a specific implementation, the projection matrix and the result are adjusted by a recurrent neural network. And learning two mapping matrixes, and restoring by one forward projection and one backward projection, so that the difference loss of the two projections is reduced as much as possible, and the points of the two networks are aligned one by one. And calculates the loss function by the sinkhorn distance.
The Sinhoen (Sinkhorn) distance is a useful distance measure for probability distributions. Z 'is measured by using the Sinkhorn distance'1And Z2And Z'2And Z1Divergence between them.
Formally, the Sinkhorn distance between matrices X and Y is defined as:
Figure RE-GDA0003519421730000131
wherein<·,·>Representing a Frobenius dot product, M being a matrix of distances between X and Y, Ua(r, c) is a transmission polyhedron with entropy constraints defined as:
Figure RE-GDA0003519421730000132
where h (-) is the entropy function, r and c are the sample weights in the source and target domains, and α is the hyper-parameter. Various distance measurements may be used to calculate the distance matrix M, using the formula:
Figure RE-GDA0003519421730000133
wherein xiAnd yjAre the ith and jth vectors of X and Y. Obviously, the entries of the distance matrix are limited to [0, 2 ]]Within the range. Then, S is calculated according to the following algorithminkhorn distance, where λ1Is an entropy constrained lagrange multiplier.
Figure RE-GDA0003519421730000134
Representing the inner product calculation. r and c are set to be proportional to the degree of the node.
Calculation of Sinkhorn distance
Inputting: m, r, c, lambda1,T
And (3) outputting: dsh(X,Y)
Figure RE-GDA0003519421730000135
Figure RE-GDA0003519421730000136
When T is 1 → T, run
μ=r./Kv
v=c./KTμ;
Figure RE-GDA0003519421730000141
After the Sinkhorn distance is defined, the generator F12The challenge loss function of (a) is defined as:
Figure RE-GDA0003519421730000142
generator F21For coupling Z2Conversion to Z2The antagonism loss function of (a) is calculated as follows:
Figure RE-GDA0003519421730000143
with these competing losses, the generator will attempt to convert the embedding of the network to be similar to the embedding of another network. To further reduceThe system jointly trains two generators by adding cycle consistency constraints and reconstruction losses. G1Is then embedded by F21Transformation generates reconstruction node embedding Z ″)1Similarly, G2Reconstructed node embedding slave F12To obtain the compound. The reconstruction loss function is then calculated as:
Figure RE-GDA0003519421730000144
wherein | · | purple1Represents L1Distance.
Finally, the integrity loss function of the generator is as follows:
Figure RE-GDA0003519421730000145
wherein λ2Is the relative weight of the reconstruction penalty.
The linear generator is jointly optimized with an Adam optimizer by minimizing the above-mentioned loss function. Using the initialized mapping function, the training of the cyclic consistent countermeasure model is faster and does not require attribute embedding. Under the continuous supervision of the Sinkhorn distribution distance and the cycle consistency constraint, the model is converged to an optimal solution, and a one-to-one mapping function is learned.
The node alignment calculation module 3: and calculating the similarity of the projection matrix to obtain a similarity matrix, and performing node matching of the source network and the target network according to the similarity matrix.
Optionally, the node matching adopts collective corresponding allocation based on a greedy policy.
Optionally, the euclidean distance between the projection matrices of the source network and the target network is calculated to obtain a distance matrix, and the node distribution dictionary is calculated according to the distance matrix.
In specific implementation, similarity is calculated for the projected embedded structure to obtain a similarity matrix, and then node matching is performed on the similarity matrix. The node alignment is calculated using two network similarities.
In the specific implementation, the following collective correspondence allocation algorithm based on the greedy strategy is adopted:
inputting: maximum value of network distance S
And (3) outputting: node distribution dictionary A
When t is 1 → n, run
Finding the minimum term of S and recording the horizontal and vertical indexes i and j of the minimum term;
distribution G1Ith node of (2) and G2The jth node of (a) is added to a in pairs;
changing the ith row and jth column element of S to infinity;
in an implementation, Z is calculated1And Z2The distance matrix S is obtained.
In summary, an unsupervised network alignment method and an unsupervised network alignment system of the present application first learns different network embeddings based on an unsupervised depth map information (DGI) model that preserves useful global and local structural proximity in the embeddings. Then, the mapping function is initialized based on iterative optimization of the Wasserstein Procrustes problem to facilitate learning of the countermeasure model. Then, using a round robin consistent counterlearning model, the goal is to minimize the Sinkhorn distance between the mapped embedding spaces. Through the above steps, the mapping function between different embedding spaces is learned. Finally, in order to effectively learn the one-to-one correspondence of a large-scale network, a collective correspondence allocation method is used.
The present application helps identify more accurate corresponding nodes by preventing many-to-one alignment caused by a central problem that disrupts alignment accuracy. As shown in FIG. 5, experiments at different noise levels prove that the method has superiority in the unsupervised network supervision problem. Based on the knowledge fusion of the alignment network, accurate prediction and prescription of the networks can be realized through a parallel learning method, so that a network system is more intelligent and controllable. The performance of the present application based on an embedded baseline is better than that based on optimization because the embedding effectively captures the proximity between the structure nodes. The present application is highly robust to noise. The performance degradation over time is less for accuracy than other similar methods. The application can be extended to many types of networks and different types of information (e.g., relationship information) can be fused into the network embedding to improve it. Furthermore, combining unsupervised domain switching with downstream network migration learning tasks would be another direction of the present application.
Additionally, an unsupervised network alignment method described in connection with fig. 1 may be implemented by an electronic device. Fig. 3 is a block diagram of an electronic device according to an embodiment of the invention.
The electronic device may comprise a processor 61 and a memory 62 in which computer program instructions are stored.
Specifically, the processor 61 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 62 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 62 may include a Hard Disk Drive (Hard Disk Drive, abbreviated HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 62 may include removable or non-removable (or fixed) media, where appropriate. The memory 62 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 62 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 62 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 62 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 61.
The processor 61 implements any of the above embodiments of the unsupervised network alignment method by reading and executing computer program instructions stored in the memory 62.
In some of these embodiments, the electronic device may also include a communication interface 63 and a bus 60. As shown in fig. 3, the processor 61, the memory 62, and the communication interface 63 are connected via a bus 60 to complete communication therebetween.
The communication port 63 may be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 60 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 60 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 60 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 60 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may perform an unsupervised network alignment method in the embodiments of the present application.
In addition, in combination with the unsupervised network alignment method in the foregoing embodiments, the embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the above-described embodiments of an unsupervised network alignment method.
And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. An unsupervised network alignment method, comprising:
an embedded structure calculation step, namely acquiring an adjacent matrix of a source network and a target network, and respectively learning the embedded structures of the source network and the target network through two graph neural networks;
calculating a projection matrix, namely calculating the projection matrix of the source network and the target network according to the adjacency matrix, the embedded structure and the hyper-parameter of the source network and the target network;
and a node alignment calculation step, namely calculating the similarity of the projection matrix to obtain a similarity matrix, and performing node matching of the source network and the target network according to the similarity matrix.
2. The unsupervised network alignment method of claim 1, wherein the projection matrix calculating step further comprises:
and when the projection matrix is calculated, carrying out forward projection and backward projection reduction in sequence.
3. The unsupervised network alignment method of claim 2, wherein the projection matrix calculating step further comprises:
the loss function is calculated by the sinkhorn distance.
4. The unsupervised network alignment method of claim 3, wherein the projection matrix calculating step further comprises:
and optimizing the projection matrix by adopting a cycle consistent countermeasure model based on a cycleGAN framework.
5. The unsupervised network alignment method of claim 1, wherein the embedding structure calculating step further comprises:
computing the embedded structure using a DGI machine learning model.
6. The unsupervised network alignment method of claim 5, wherein the embedding structure calculating step further comprises:
and respectively adding and averaging the embedded structure values of all nodes in the source network and the target network to generate the embedded structure of the whole graph of the source network and the target network.
7. The unsupervised network alignment method of claim 5, wherein the embedding structure calculating step further comprises:
training the DGI machine learning model using binary cross entropy loss.
8. The unsupervised network alignment method of claim 1, wherein the node alignment calculation step further comprises:
and the node matching adopts collective corresponding distribution based on a greedy strategy.
9. The unsupervised network alignment method of claim 8, wherein the node alignment calculation step further comprises:
and calculating the Euclidean distance between the projection matrixes of the source network and the target network to obtain a distance matrix, and calculating according to the distance matrix to obtain a node distribution dictionary.
10. An unsupervised network alignment system, comprising:
the embedded structure calculation module is used for acquiring an adjacent matrix of a source network and a target network and learning the embedded structures of the source network and the target network through two graph neural networks respectively;
the projection matrix calculation module is used for calculating to obtain a projection matrix of the source network and the target network according to the adjacency matrix, the embedded structure and the hyper-parameter of the source network and the target network;
and the node alignment calculation module is used for calculating the similarity of the projection matrix to obtain a similarity matrix and performing node matching of the source network and the target network according to the similarity matrix.
CN202111500307.4A 2021-12-09 2021-12-09 Unsupervised network alignment method and system Pending CN114372505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111500307.4A CN114372505A (en) 2021-12-09 2021-12-09 Unsupervised network alignment method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111500307.4A CN114372505A (en) 2021-12-09 2021-12-09 Unsupervised network alignment method and system

Publications (1)

Publication Number Publication Date
CN114372505A true CN114372505A (en) 2022-04-19

Family

ID=81139903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111500307.4A Pending CN114372505A (en) 2021-12-09 2021-12-09 Unsupervised network alignment method and system

Country Status (1)

Country Link
CN (1) CN114372505A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861822A (en) * 2023-02-07 2023-03-28 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device
CN116738201A (en) * 2023-02-17 2023-09-12 云南大学 Illegal account identification method based on graph comparison learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861822A (en) * 2023-02-07 2023-03-28 海豚乐智科技(成都)有限责任公司 Target local point and global structured matching method and device
CN116738201A (en) * 2023-02-17 2023-09-12 云南大学 Illegal account identification method based on graph comparison learning
CN116738201B (en) * 2023-02-17 2024-01-16 云南大学 Illegal account identification method based on graph comparison learning

Similar Documents

Publication Publication Date Title
CN108171320B (en) Image domain conversion network and conversion method based on generative countermeasure network
US10275473B2 (en) Method for learning cross-domain relations based on generative adversarial networks
Zhang et al. Neural collaborative subspace clustering
CN110163258B (en) Zero sample learning method and system based on semantic attribute attention redistribution mechanism
Xia et al. Supervised hashing for image retrieval via image representation learning
Lin et al. Supervised online hashing via hadamard codebook learning
CN107273927B (en) Unsupervised field adaptive classification method based on inter-class matching
CN114372505A (en) Unsupervised network alignment method and system
Jiang et al. When to learn what: Deep cognitive subspace clustering
CN110363068B (en) High-resolution pedestrian image generation method based on multiscale circulation generation type countermeasure network
US20220253639A1 (en) Complementary learning for multi-modal saliency detection
Amiri et al. Efficient multi-modal fusion on supergraph for scalable image annotation
CN112734049A (en) Multi-initial-value meta-learning framework and method based on domain self-adaptation
Wang et al. Generative partial multi-view clustering
Wei et al. Center-aligned domain adaptation network for image classification
Pradhyumna A survey of modern deep learning based generative adversarial networks (gans)
CN114926742A (en) Loop detection and optimization method based on second-order attention mechanism
CN115880556B (en) Multi-mode data fusion processing method, device, equipment and storage medium
Wu et al. Multi-instance learning from positive and unlabeled bags
KR20210064817A (en) Method for Transfer Learning between Different Deep Learning Models
Shi et al. Relative Entropic Optimal Transport: a (Prior-aware) Matching Perspective to (Unbalanced) Classification
CN114417251A (en) Retrieval method, device, equipment and storage medium based on hash code
Shapira et al. Cross‐Collection Map Inference by Intrinsic Alignment of Shape Spaces
CN114332469A (en) Model training method, device, equipment and storage medium
Javaheripi et al. Swann: Small-world architecture for fast convergence of neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230223

Address after: 266000 26F, block B, Chuangye building, high tech Zone, Qingdao, Shandong Province

Applicant after: QINGDAO ACADEMY OF INTELLIGENT INDUSTRIES

Applicant after: STATE GRID ZHEJIANG ELECTRIC POWER Co.,Ltd.

Address before: 266000 26F, block B, Chuangye building, high tech Zone, Qingdao, Shandong Province

Applicant before: QINGDAO ACADEMY OF INTELLIGENT INDUSTRIES

TA01 Transfer of patent application right