CN111931903B

CN111931903B - Network alignment method based on double-layer graph attention neural network

Info

Publication number: CN111931903B
Application number: CN202010654776.0A
Authority: CN
Inventors: 卢美莲; 戴银龙
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2023-07-07
Anticipated expiration: 2040-07-09
Also published as: CN111931903A

Abstract

The invention provides a network alignment method based on a double-layer graph attention neural network, which comprises two stages of network embedded representation and embedded vector space alignment. In the network embedding representation stage, a double-layer graph attention neural network is provided for network representation learning so as to extract the embedding vector of a user in a social network; in the embedded vector space alignment stage, a classification model is constructed by utilizing the obtained embedded vector of the social network user node and part of known anchor link sets to predict anchor links between different social networks, and a bidirectional embedded vector space alignment strategy is provided to meet one-to-one matching constraint of user entities between different social networks. Through the arrangement, the method can effectively capture different influence weights among the user, neighbor users and various features in the social network, so that accurate representation of the user in the social network is learned, and accuracy of anchor link prediction among different social networks is improved.

Description

Network alignment method based on double-layer graph attention neural network

Technical Field

The invention relates to the technical field of data mining and machine learning, in particular to a network alignment method based on a double-layer graph attention neural network.

Background

With the rapid development of the internet and mobile devices, online social networks have become an indispensable popular platform for people to share and exchange information. Because of the different services provided by different social platforms, a person will typically register accounts with multiple social network platforms at the same time to meet their different needs. These users, who are shared by different social networking platforms, naturally form anchor links that connect different social networks, facilitating information interaction between the different social networks. Mining information interactions across multiple social domains may be effectively applied to a variety of downstream social network applications such as cross-domain link prediction, cross-domain recommendation, and cross-domain information dissemination. However, in today's society, these social networking platforms are often maintained individually by different companies and have some information isolation from each other. Therefore, aligning accounts belonging to the same user in different social platforms has become an urgent research topic to be solved. Current research on network alignment methods can be largely divided into two categories: an unsupervised-based network alignment method and a supervised-based network alignment method. Wherein:

(1) An unsupervised network alignment method: an unsupervised network alignment model based attempts to align user accounts between different social networks without known anchor links. In this type of approach, researchers typically measure user similarity between different social networks based on the rarity of the user's username in the social network and the consistency of the neighborhood structure, and then predict anchor links using greedy methods or methods that minimize the structural consistency of the two social networks.

(2) Based on a supervised network alignment method: the general idea based on the supervised network alignment model is to convert the alignment problem between different social networks into a classification problem about anchor links, i.e. to determine whether any two users between different social networks have an anchor link relationship. Early studies built classification models by manually extracting certain features of users in social networks, which, although solving to some extent the problem of alignment of some users in some social network scenarios, still had significant limitations. Firstly, the method for manually extracting the user features is very tedious, and the user cannot directly judge which features are effective, and the effective features of different social network scenes are possibly different; secondly, the social network platform is used for protecting the privacy of the user, part of real information of the user is often hidden, so that part of information is often lost when the user features are manually extracted, and the accuracy of the anchor link prediction task is affected.

In recent years, with the incentive that network representation learning is widely and successfully applied in a single social network analysis task, some researchers begin to apply network representation learning to network alignment tasks among multiple social networks, and this type of approach attempts to learn a common embedded vector space for users in different social networks without manually extracting the effective features of the users in the social networks. These approaches, while attempting to model the user's behavior in the social network from aspects of the user's social structure and profile information, ignore the different impact weights of their different neighboring user nodes when capturing the user node representation, and the different impact weights of different attribute information on the user information interactions.

In view of the above, the present invention aims to propose a network alignment method based on a dual-layer graph attention neural network, which combines a user-level attention mechanism and a feature-level attention mechanism to enable a model to learn an accurate representation of a user in a social network and improve the prediction accuracy of an anchor link, in view of the importance of network alignment research on multi-social network analysis tasks and some limitations of the existing research.

Disclosure of Invention

In view of the above, the invention aims to provide a network alignment method based on a double-layer graph attention neural network, which can effectively capture different influence weights among users, neighbor users and among features in a social network while modeling social behaviors of the users by using information such as attributes, local social structures, global social structures and the like of the users in the social network, thereby learning accurate representation of the users in the social network and improving accuracy of anchor link prediction among different social networks.

Based on the above object, the present invention provides a network alignment method based on a dual-layer graph attention neural network, which is characterized by comprising:

basic definition: social network abstraction is a directed graph g= (V, E, X), where v= { V _i I=1, …, N } represents a set of user nodes in the social network, N being the number of user nodes in the social network; e= { E _i,j ＝(v _i ,v _j )|v _i ∈V,v _j E V represents a set of relationships between users in a social network, e _i,j ＝(v _i ,v _j ) Representing user v _i And user v _j The association relation exists between the two; x= { X _i I=1, …, N } represents the features of all usersVector set, for each user node v _i All have a node feature vector x _i Correspondingly, the feature vector can be extracted from the personal data, the behavior and the network social structure information of the user node, without losing generality, the two networks to be aligned are named as a source social network and a target social network, and G is used for each network ^s And G ^t A representation;

for any two users from different social networks

And->

We use

Representing an anchor linkage relationship between a source social network and a target social network, wherein +.>

And->

Is that the same user is respectively in different social networks G ^s And G ^t An account in (a); the anchor links are one-to-one link relations between two users in different social networks, and the situation that the two anchor links share the same user account of the same social network does not exist;

two different social networks G ^s And G ^t All the anchor links between are defined as the set of anchor links, and

representation of->

Representing a user account in a source social network, +.>

Representing a user account in a target social network; for two different social networks G ^s ＝(V ^s ,E ^s ,X ^s ) And G ^t ＝(V ^t ,E ^t ,X ^t ) Network alignment aims at finding a set of anchor link sets T between two social networks, where any element e 'in set T' _ij E T represents two user accounts +. >

And->

An anchor link between the two;

s1, a network preprocessing module: preprocessing a social network according to the input network type and the contained user attribute information, and constructing an initialized user node feature vector matrix;

s2, a network embedded representation module: taking the initialized user node feature vector matrix obtained by the network preprocessing module and the adjacent matrix of the social network as inputs, capturing a complex information interaction relation of a user in the social network through a double-layer-diagram attention neural network so as to learn potential information of the user node in the social network and obtain an accurate user node embedded vector;

s3, embedding a vector space alignment module: constructing a classification model according to the user node embedded vectors of the source social network and the target social network learned in the S2 to predict anchor links, and adopting a bidirectional embedded vector space alignment strategy to meet the constraint of one-to-one matching of user accounts among different social networks;

s4, solving an intersection, and completing network alignment.

Preferably, the step S2 includes the following: user v _i Is expressed as x _i Based on the network type, various feature vectors of the user are extracted and stacked laterally to generate an initialized feature vector representation of the user

Where d represents the dimension of the user-initialized feature vector, and d ', d ", d'" appearing hereinafter represent different dimensions, respectively, building the initialized feature vectors of all users in the social network into a state matrix X, where each row is a feature vector of a particular user node: x= (X ₁ ,x ₂ ,…x _N ) ^T 。

Preferably, the network type is a topological network, the feature vector of the user is randomly initialized in a random matrix mode, and the weight parameters of the random matrix are learned through the training stage of the double-layer diagram attention neural network model.

Preferably, the network type is an attribute network, and the user attribute is vectorized in the following manner: randomly initializing user information such as user names by adopting a word embedding mode to obtain user name feature vectors; mining the language style of the user from the long text information of the user by adopting a Doc2Vec model, and learning the text feature vector of the user; carrying out vector initialization on the user track information through spatial clustering to obtain a spatial feature vector of a user; and directly taking the user scoring and the number of sign-in times as a characteristic dimension of the user, and carrying out vector initialization.

Preferably, the step S2 includes the following:

S2.1, embedding a user layer node into a representation sub-module: capturing different influence weights among users to carry out weighted aggregation on local neighborhood information of the users in the social network, so that node embedding vectors of user levels are learned;

s2.2, embedding a characteristic layer node into a representation sub-module: the method is responsible for learning influence weights among different features of the user so as to capture interaction relations among the features with finer granularity, and therefore node embedding vectors of the user at the feature level are learned;

s2.3, embedding a vector fusion sub-module: and the method is responsible for reserving and resetting the user embedded vectors from different layers of the user level and the feature level so as to fuse the node embedded vectors of multiple views and improve the accuracy of network alignment tasks.

Preferably, S2.1 includes the following: using a leachable transformation matrix

Converting the input vector into a high-dimensional vector, namely:

for any two user nodes v, according to the knowledge about the graph attention neural network _i And v _j First, the relation strength e between the two user nodes is calculated _ij ：

Wherein the method comprises the steps of

And->

Representing user node v _i And v _j Embedding vector at layer I, +.>

The weight parameter representing the first layer, "|" is the series operator representing the transverse concatenation of two vectors, leakyReLU (·) is the activation function of neurons, for the calculation of user node v _i When the neighborhood information of (a) is aggregated, the information contribution ratio from different neighbors is normalized by adopting a softmax (·) function to normalize the user node and all neighbor user nodes v thereof _k ∈N(v _i ) The relation strength between the two is calculated as follows:

a _ij is called user node v _i And v _j Attention coefficient between a _ij The larger the value of (a) is, the closer the relationship between the two users is expressed, and the node v of the user is obtained according to calculation _i Attention coefficients with all its neighbor node users (including themselves), each user node v _i The new embedded vector of (a) may be defined as follows:

wherein delta (·) is the activation function of the neuron, and the node embedding vector h of each user at the user level in the social network can be obtained by performing linear aggregation of different influence weights on the neighborhood information of the user _i Form a user level vector matrix m= (h ₁ ,h ₂ ,…,h _N ) ^T 。

Preferably, S2.2 comprises the following: user level vector matrix m= (h) obtained with S2.1 ₁ ,h ₂ ,…,h _N ) ^T As input, and taking into account the multidimensional attention between the features of any two user nodes in the social network, i.e. calculating an attention coefficient for each corresponding dimension of the two user node vectors,

let h _i And h _j Distribution represents two user nodes v in a social network _i And v _j The relationship between two user node embedded vectors can be defined as:

f(h _i ,h _j )＝W ₅ ·tanh(W ₄ ·h _i +W ₃ ·h _j +b ₂ )+b ₁

wherein W is ₃ ，W ₄ And

is a parameter matrix, b ₁ ，/>

Is a bias termTan h (·) is the activation function of the neuron,

use of a feed forward neural network to determine the frequency of the feedback signal based on f (h _i ,h _j ) Calculating the dependency relationship of any two user nodes on the characteristic level to enable beta to be calculated _ij Representing user node v _i And v _j Attention coefficient vector, [ beta ], of the feature layer of (c) _ij ] _k Similarly, to facilitate comparing the attention coefficients of the features of the corresponding dimensions between different attention coefficient vectors, if the attention coefficient vectors of all neighbors of the user are normalized by the corresponding feature dimensions using a softmax (·) function, then there are:

between computing attention coefficients [ beta ] between each dimension of features of any two users _ij ] _k The attention coefficients can then be combined into an attention coefficient vector beta between the two users according to the corresponding feature dimensions _ij ＝([β _ij ] ₁ ,[β _ij ] ₂ ,…,[β _ij ] _d″ ) The dimensions of the attention vector and the user node vector are the same, each dimension [ beta ] _ij ] _k Impact weight corresponding to each dimension of the user node vector, [ beta ] _ij ] _k The larger, two user nodes v are represented _i And v _j The stronger the degree of association of the features of the kth dimension, and finally, for any user node v in the social network _i The method comprises the steps that embedded vectors of neighbor users are weighted and linearly aggregated according to learned attention coefficient vectors among different user nodes, and the aggregation function of a feature layer attention mechanism is used for aggregating neighborhood information in an element-by-element multiplication mode, different from an aggregation mode of a user layer attention mechanism:

wherein the method comprises the steps of

Representing the element-by-element product between two vectors with the same shape, obtaining a vector with the same shape, wherein delta (·) is an activation function of a neuron, and node embedded representation of each user at the feature level in the social network can be obtained by linearly aggregating the neighborhood information of the user according to the influence weights of different features>

Component feature level vector matrix->

Preferably, S2.3 comprises the following: the embedded vector fusion sub-module uses the user-level vector matrix m= (h) ₁ ,h ₂ ,…,h _N ) ^T And feature level vector matrix

As input, the gating mechanism is utilized to automatically learn the weight parameters of node embedded vectors at different levels of the same user, so as to effectively reserve and reset the information representations from different levels,

For any user node v in a social network _i The module first calculates a user level node embedding vector h _i And feature level node embedding vector

The weight relation vector is calculated as follows:

wherein the method comprises the steps of

And->

Is a parameter matrix of a gated neural network, +.>

Is a bias term, sigmoid (·) is an activation function of neurons, and according to the learned weight relation vector F, user node embedded vectors from different layers can be selectively retained and reset, and the final user node embedded vector is represented as follows:

wherein the method comprises the steps of

Representing selective retention of node embedded vectors at user level,/->

Represents a selective reset of the node embedded vector for the feature level, where 1-F is a vector operation, representing subtracting 1 from each dimension of the vector F,

according to the fusion of the user level node embedded representation and the feature level node embedded representation of the user in the social network by using a gating mechanism, the final node embedded representation z of each user in the social network can be obtained _i Component node embedding vector matrix z= (Z) ₁ ,z ₂ ,…,z _N ) ^T ；

For any given pair of user nodes v in a social network _i And v _j The node embedded vectors are z respectively _i And z _j The probability of an edge between the two nodes can be expressed as:

wherein the method comprises the steps of

Is a sigmoid function;

to optimize model parameters of a two-layer graph attention neural network, we need to define an objective function of the model whose objective is to maximize the probability of observable edge occurrences in the social network, namely:

to avoid trivial solutions, for each edge e that is observable _i,j ＝(v _i ,v _j ) We employ a negative sampling technique to maximize the objective function, namely:

wherein the first term is modeling positive examples in the social network, and the second term is modeling negative examples by randomly generating edges associated with nodes by a negative sampling technique, and the probability of each node being sampled satisfies

K represents the number of edges of the negative example of the sample, d _v A degree representing the user node v; according to the objective function, we can learn the parameters of the double-layer graph attention neural network model by adopting a back propagation optimization algorithm, so as to obtain the node vector matrixes of the source social network and the target social network respectively +.>

And->

Wherein |V ^s I represents the number of user nodes in the source social network, |V ^t The number of user nodes in the target social network; />

Representing user +.>

Node embedded vector, " >

Representing user +.>

Is embedded with a vector; z is Z ^s And Z ^t Also referred to as the embedded vector space corresponding to the source social network and the target social network.

Preferably, the step S3 includes the following: based on the step S2, a node embedded vector matrix of the source social network is obtained

And node embedded vector matrix of target social network +.>

Each row of the node vector matrix represents a node embedded vector corresponding to one user in the social network, and the whole node vector matrix is also called an embedded vector space corresponding to the social network;

defining a mapping function M that can map a user node vector from one embedded vector space to another, assuming we now project the source social network to the target social network to find a target node that matches the source social network, the target function can be defined as:

wherein M is _s→t (. Cndot.) represents a mapping function from a source social network to a target social network, wherein a multi-layer perceptron is adopted to construct the mapping function, and θ is a weight parameter of the multi-layer perceptron; the objective function aims at minimizing the distance between the source user node in the user pair with the anchor link relation and the target user node after mapping the source user node to the target social network, so as to construct a classification model to predict whether any two users between different social networks have the anchor link or not, and selecting the target user node nearest to the source user node after projection to construct candidate anchor links.

From the above, it can be seen that the invention uses the graph neural network and the information of the attribute, the local social structure, the global social structure and the like of the user in the social network to model the complex interaction behavior of the user in the social network, and can obtain more accurate embedded vectors of the user nodes;

the invention provides a double-layer graph annotation intention neural network to learn the attention coefficients among users with user levels and feature levels respectively, and captures the difference of influence weights among different users from multiple perspectives, so that the learned user node embedding representation is more in line with the actual situation of the users in the social network;

the invention provides a bidirectional embedded vector space alignment strategy to predict anchor links among different social networks, so that users among different social networks are aligned and one-to-one matching constraint relation is satisfied. Meanwhile, the accuracy of the anchor link prediction is improved by further confirming the bidirectional embedded vector space alignment strategy.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a method framework of the present invention;

FIG. 2 is a fused schematic diagram of a user node embedded representation of the present invention from different perspectives;

FIG. 3 is a schematic diagram of the bi-directional embedded vector space alignment strategy of the present invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present invention should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

As shown in fig. 1 to 3, the embodiment:

first, the present invention describes the social network alignment problem as follows:

social network abstraction is a directed graph g= (V, E, X), where v= { V _i I=1, …, N } represents a set of user nodes in a social network, N being socialThe number of user nodes in the traffic network; e= { E _i,j ＝(v _i ,v _j )|v _i ∈V,v _j E V represents a set of relationships between users in a social network, e _i,j ＝(v _i ,v _j ) Representing user v _i And user v _j The association relation exists between the two; x= { X _i I=1, …, N } represents the feature vector set of all users, for each user node v _i All have a node feature vector x _i In response, feature vectors may be extracted from the user node's personal data, behavioral and structural attributes, and the like. For each edge e in the set of relationships _i,j Let w _i,j Representing the weights of the edges. Wherein if two users have a link relationship in the social network, w _i,j =1, otherwise, w _i,j =0. Matrix array

Called the adjacency matrix of graph G. Without loss of generality, we will call the two networks to be aligned a source social network and a target social network, and use G separately ^s And G ^t The representation is performed.

For any two users from different social networks

And->

We use

Representing the association relationship between the two user nodes, w' _i,j Representing the relationship weights. If->

And->

Is two users in the source social network and the target social network respectivelyLet w 'in principle for a different account' _i,j =1 means that there is an anchor link relationship between the two users, otherwise, w' _i,j =0. The goal of network alignment is to find an anchor link set between different social networks +.>

Wherein->

Representing user node +.>

From the source social network, +.>

Representing user node +.>

From the target social network, w' _i,j =1 means two user nodes +.>

And->

Belonging to the same user entity in the real world.

Next, referring to fig. 1, the present invention provides a network alignment method framework, which mainly comprises a network preprocessing module, a network embedded representation module and an embedded vector space alignment module. Wherein the method comprises the steps of

The network preprocessing module: for the social network g= (V, E), we first perform preprocessing work on the network according to the type of the input network and the included information, so as to construct an initialized user vector matrix. Common network types can be divided into topological networks and attribute networks, for which the invention adoptsThe embedding layer randomly initializes the node embedded vector x of the user _i The weight parameters of the embedding layer are learned through the training stage of the attention neural network model of the double-layer graph; for attribute networks, user attributes are often vectorized in different ways depending on the information contained. For short text attributes like a user name, an emmbedding layer may be employed for random initialization; for long text attributes like user comments, methods such as topic models are generally adopted to learn topic preferences of users; regarding the sign-in information of the user, the user is considered to have similar access preference to merchants in the same area, and the sign-in information of the user is initialized by adopting a spatial clustering method; for numerical attributes such as user scoring, number of check-ins and the like, the numerical attributes can be directly used as one dimension of the user attributes. After extracting the attribute vectors of the user's various attributes, we stack these attribute vectors laterally to generate the user's initialization vector in the attribute network. Thus, for each user in the social network, we ultimately generate an initialized representation of the user

Where d represents the dimension of the user initialization vector. Constructing a state matrix X from initialization vectors of all users in the social network, wherein each row is a feature vector of a specific user node: x= (X ₁ ,x ₂ ,…x _N ) ^T 。

Network embedded representation module: the module initializes the feature vector matrix X= (X) of the user node obtained by the network preprocessing module ₁ ,x ₂ ,…x _N ) ^T Adjacency matrix with social network

As input, capturing complex information interaction relation of a user in a social network through a double-layer graph attention mechanism neural network so as to learn potential information representation z of a user node in the social network _i . The module can be further subdivided into three sub-modules: user layer node embedded representation subThe device comprises a module, a feature layer node embedded representation sub-module and an embedded vector fusion sub-module.

1. User layer node embedded representation sub-module

The user layer node embedded representation sub-module is responsible for capturing different influence weights among users to carry out weighted aggregation on local neighborhood information of the users in the social network, so that node embedded representations at the user level are learned. In order to ensure that the node vector has sufficient information representation capability, the invention firstly utilizes a leachable transformation matrix

Converting the input vector into a high-dimensional vector, namely:

for any two user nodes v _i And v _j First, the relation strength e between the two users is calculated _ij ：

Wherein the method comprises the steps of

And->

Representing user node v _i And v _j Vector representation at layer l, +.>

The weight parameter of the first layer is represented, the "|" is a series operator, and represents that two vectors are transversely spliced, and the LeakyReLU (·) is an activation function of a neuron. For calculating the node v for the user _i Information contribution ratio from different neighbors when aggregation is carried out on neighborhood information, and softmax (·) function is adopted for usersNode and all neighbor user nodes v _k ∈N(v _i ) The relation strength of (2) is normalized, and the calculation mode is as follows:

a _ij is referred to as user node v _i And v _j Attention coefficient between a _ij The larger the value of (c) the closer the relationship between the two users. According to the calculated user node v _i Attention coefficients with all its neighbor node users (including themselves), each user node v _i The new potential information representation of (c) may be defined as follows:

where δ (·) is the activation function of the neuron. According to the linear aggregation of different influence weights on the neighborhood information of the users, the node embedded representation h of each user at the user level in the social network can be obtained _i Form a user level vector matrix m= (h ₁ ,h ₂ ,…,h _N ) ^T 。

2. Feature layer node embedded representation submodule

The feature layer node embedded representation sub-module is responsible for learning the influence weights among different features of the user to capture the interaction relationship among the features with finer granularity, so that the node embedded representation of the user at the feature level is learned. The feature layer node is embedded into a user level vector matrix M= (h) obtained in the previous stage of the representation submodule ₁ ,h ₂ ,…,h _N ) ^T As input, and taking into account the multidimensional attention of the features between any two nodes in the social network, an attention coefficient is calculated for each corresponding dimension of the two user node vectors.

f(h _i ,h _j )＝W ₅ ·tanh(W ₄ ·h _i +W ₃ ·h _j +b ₂ )+b ₁

wherein W is ₃ ，W ₄ And

is a parameter matrix, b ₁ ，/>

Is the bias term, and tanh (·) is the activation function of the neuron. Use of a feed forward neural network to determine the frequency of the feedback signal based on f (h _i ,h _j ) And calculating the dependency relationship of any two user nodes on the feature level. Let beta _ij Representing user node v _i And v _j Attention coefficient vector, [ beta ], of the feature layer of (c) _ij ] _k Representing the kth dimension of the attention vector. Also, to facilitate comparing the attention coefficients of the features of the corresponding dimensions between the different attention coefficient vectors, the attention coefficient vectors of all neighbors of the user are normalized according to the corresponding feature dimensions using a softmax (·) function, then there are:

between computing attention coefficients [ beta ] between each dimension of features of any two users _ij ] _k The attention coefficients can then be combined into an attention coefficient vector beta between the two users according to the corresponding feature dimensions _ij ＝([β _ij ] ₁ ,[β _ij ] ₂ ,…,[β _ij ] _d ). The dimensions of the attention vector and the user node vector are the same, each dimension [ beta ] _ij ] _k Impact weight corresponding to each dimension of the user node vector, [ beta ] _ij ] _k The larger, two user nodes v are represented _i And v _j The stronger the association of features in the k-th dimension.

Finally, for any user node v in the social network _i We weight a linear aggregation of potential information representations of their neighbor users according to learned attention coefficient vectors between different user nodes. Unlike the aggregation mode of the user layer attention mechanism, the aggregation function of the feature layer attention mechanism is to aggregate the neighborhood information in an element-by-element multiplication mode:

wherein the method comprises the steps of

Representing the element-wise product between two vectors of the same shape, the resulting vector is also one of the same shape, δ (·) being the activation function of the neuron. According to the influence weights of different characteristics on the neighborhood information of the users, the node embedded representation of the characteristic level of each user in the social network can be obtained>

Component feature level vector matrix->

3. Embedded vector fusion submodule

The embedded vector fusion sub-module is responsible for reserving and resetting user potential information from different layers of user levels and feature levels so as to fuse node embedded representations of multiple views and improve the accuracy of subsequent network alignment tasks. The embedded vector fusion sub-module uses the user-level vector matrix m= (h) ₁ ,h ₂ ,…,h _N ) ^T And feature level vector matrix

As input, the gating mechanism is utilized to automatically learn the weight parameters of the node embedded representations at different levels of the same user to effectively preserve and reset the information representations from the different levels.

As shown in fig. 2, for any one user node v in the social network _i The module first calculates a user level node embedded representation h _i And feature level node embedded representation

The weight relation vector is calculated as follows:

wherein the method comprises the steps of

And->

Is a parameter matrix of the gated neural network, < +.>

Is a bias term, sigmoid (·) is the activation function of the neuron. The user potential information representations from different levels can be selectively retained and reset according to the learned weight relation vector F, and the final user node embedding representation is as follows:

wherein the method comprises the steps of

Selective reservation of node embedded representation of representation to user level,/->

Represents a selective reset of the node embedded representation of the feature level, where 1-F is a vector operation, representing subtracting 1 from each dimension of the vector F.

According to the fusion of the user level node embedded representation and the feature level node embedded representation of the user in the social network by using a gating mechanism, the final node embedded representation z of each user in the social network can be obtained _i Component node embedding vector matrix z= (Z) ₁ ,z ₂ ,…,z _n ) ^T 。

wherein the method comprises the steps of

Is a sigmoid function;

And->

Representing user +.>

Node embedded vector, ">

Representing user +.>

Is embedded with a vector; z is Z ^s And Z ^t Also known as source social networks and objectivesAnd marking an embedded vector space corresponding to the social network.

An embedded vector space alignment module: based on the modules, the node embedded vector matrix of the source social network can be obtained respectively

And node embedded vector matrix of target social network +.>

Each row of the node vector matrix represents node embedded representation corresponding to one user in the social network, and the whole node vector matrix is called an embedded vector space corresponding to the social network. In order to enable efficient alignment of two embedded vector spaces, we need to project the embedded vector space of the source social network and the embedded vector space of the target social network into a common vector space.

First we define a mapping function M that maps user node vectors from one embedded vector space to another. Suppose we now project a source social network to a target social network to find a target node that matches the source social network. Given a partially known anchor linkage set T as supervision information, the objective function may be defined as:

Wherein M is _s→t (. Cndot.) represents the mapping function from the source social network to the target social network, the invention uses the multi-layer perceptron to construct the mapping function, θ is the weight parameter of the multi-layer perceptron. The objective function aims to minimize the distance from the target user node after mapping the source user node in the user pair with the anchor link relationship to the target social network, so as to construct a classification model to predict any two users between different social networksWhether there is an anchor link. Since the user typically has only one active account in a different social network platform, the target user node closest to the source user node after the projection is selected herein to build the candidate anchor link.

Since the user alignment problem between different social networks typically satisfies a one-to-one matching constraint, i.e., the same user entity, there is at most only one active account in different social network platforms. As shown in fig. 3 (a), the unidirectional embedded vector space mapping may generate a one-to-many matching relationship between social networks, which violates the actual network scenario. Therefore, the invention provides a bidirectional embedded vector space alignment strategy to ensure that network alignment tasks between two social networks meet one-to-one matching constraint relation. Referring to fig. 3, the present invention will be described with reference to specific examples, in which the steps are as follows:

Step 1: constructing a multi-layer perceptron model projected from a source social network to a target social network according to a known anchor linkage set T

Learning the weight parameter θ by minimizing the distance of a source user node in an anchor link from a corresponding target user node after projection onto a target social network ₁ 。

Step 2: based on learned model of multi-layer perceptron

For each user node in the source social network +.>

As shown in fig. 3 (a), it is first projected into the target embedded vector space, then the target user node closest to the node it is projected on is found in the target social network to form an anchor link with the source user node, and it is added to the candidate anchor link set ∈>

Step 3: constructing a multi-layer perceptron model projected from a target social network to a source social network from a known anchor linkage set T

Learning the weight parameter θ by minimizing the distance of a target user node in an anchor link from a corresponding source user node after projection onto a source social network ₂ 。

Step 4: based on learned model of multi-layer perceptron

For each user node in the target social network +.>

As shown in fig. 3 (b), it is first projected into the source embedded vector space, then the source user node closest to the node it is projected on is found in the source social network to form an anchor link with the target user node, and it is added to the candidate anchor link set- >

Step 5: taking candidate anchor linkage set A ₁ And A ₂ As the final predicted anchor linkage set a=a ₁ ∩A ₂ 。

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the invention. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.

While the invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. A network alignment method based on a dual-layer graph attention neural network, comprising:

basic definition: social network abstraction is a directed graph g= (V, E, X), where v= { V _i I=1,..n } represents a set of user nodes in the social network, N being the number of user nodes in the social network;E＝{e _i，j ＝(v _i ，v _j )|v _i ∈V，v _j e V represents a set of relationships between users in a social network, e _i，j ＝(v _i ，v _j ) Representing user v _i And user v _j The association relation exists between the two; x= { X _i I=1,..n } represents the feature vector set of all users, for each user node v _i All have a node feature vector x _i Correspondingly, the feature vector can be extracted from the personal data, the behavior and the network social structure information of the user node, without losing generality, the two networks to be aligned are named as a source social network and a target social network, and G is used for each network ^s And G ^t A representation;

for any two users from different social networks

And->

We use->

And->

two different social networks G ^s And G ^t All sets of anchor linkage relationships between them, we call the anchor linkage setCombining and using

Representation of->

Representing a user account in a source social network, +.>

Representing a user account in a target social network; for two different social networks G ^s ＝(V ^s ，E ^s ，X ^s ) And G ^t ＝(V ^t ，E ^t ，X ^t ) Network alignment aims at finding a set of anchor link sets T between two social networks, where any element e 'in set T' _ij E T represents two user accounts +.>

And->

An anchor link between the two;

s2, a network embedded representation module: the method comprises the steps of taking an initialized user node feature vector matrix obtained by a network preprocessing module and an adjacent matrix of a social network as inputs, capturing a complex information interaction relation of a user in the social network through a double-layer graph attention neural network, learning potential information of the user node in the social network, and obtaining an accurate user node embedded vector, wherein the method comprises the following steps of:

s2.1, embedding a user layer node into a representation sub-module: capturing different influence weights among users to carry out weighted aggregation on local neighborhood information of the users in the social network, so that node embedding vectors of user levels are learned; comprising the following steps:

using a leachable transformation matrix

Converting the input vector into a high-dimensional vector, namely:

Wherein the method comprises the steps of

And->

Representing user node v _i And v _j Embedding vector at layer I, +.>

wherein delta (delta) is an activation function of a neuron, and node embedding vectors h of each user at the user level in the social network can be obtained by carrying out linear aggregation of different influence weights on the neighborhood information of the user _i Form a user level vector matrix m= (h ₁ ，h ₂ ，...，h _N ) ^T ；

S2.2, embedding a characteristic layer node into a representation sub-module: the method is responsible for learning influence weights among different features of the user so as to capture interaction relations among the features with finer granularity, and therefore node embedding vectors of the user at the feature level are learned; comprising the following steps:

User level vector matrix m= (h) obtained with S2.1 ₁ ，h ₂ ，...，h _N ) ^T As input, and taking into account the multidimensional attention between the features of any two user nodes in the social network, i.e. calculating an attention coefficient for each corresponding dimension of the two user node vectors,

f(h _i ，h _j )＝W ₅ ·tanh(W ₄ ·h _i +W ₃ ·h _j +b ₂ )+b ₁

wherein W is ₃ ，W ₄ And

is a parameter matrix,/->

Is the bias term, tanh (·) is the activation function of the neuron,

use of a feed forward neural network to determine the frequency of the feedback signal based on f (h _i ，h _j ) Calculating the dependency relationship of any two user nodes on the characteristic level to enable beta to be calculated _ij Representing user node v _i And v _j Attention coefficient vector, [ beta ], of the feature layer of (c) _ij ] _k Similarly, to facilitate comparing the attention coefficients of the features of the corresponding dimensions between different attention coefficient vectors, if the attention coefficient vectors of all neighbors of the user are normalized by the corresponding feature dimensions using a softmax (·) function, then there are:

between computing attention coefficients [ beta ] between each dimension of features of any two users _ij ] _k The attention coefficients can then be combined into an attention coefficient vector beta between the two users according to the corresponding feature dimensions _ij ＝([β _ij ] ₁ ，[β _ij ] ₂ ，...，[β _ij ] _d″ ) The dimensions of the attention vector and the user node vector are the same, each dimension [ beta ] _ij ] _k Impact weight corresponding to each dimension of the user node vector, [ beta ] _ij ] _k The larger, two user nodes v are represented _i And v _j The stronger the association of features in the k-th dimension,

finally, for any user node v in the social network _i We apply the learning of the attention coefficient vector between different user nodes based on itThe embedded vectors of the neighbor users are subjected to weighted linear aggregation, and the aggregation function of the feature layer attention mechanism is used for aggregating the neighborhood information in an element-by-element multiplication mode, which is different from the aggregation mode of the user layer attention mechanism:

wherein the method comprises the steps of

Component feature level vector matrix->

S2.3, embedding a vector fusion sub-module: the method is responsible for reserving and resetting user embedded vectors from different layers of user levels and feature levels so as to fuse node embedded vectors of multiple views and improve accuracy of network alignment tasks; comprising the following steps:

The embedded vector fusion sub-module uses the user-level vector matrix m= (h) ₁ ,h ₂ ,…,h _N ) ^T And feature level vector matrix

The weight relation vector is calculated as follows:

wherein the method comprises the steps of

And->

Is a parameter matrix of a gated neural network, +.>

wherein the method comprises the steps of

Representing selective retention of node embedded vectors at user level,/->

Representing selective resetting of node embedded vectors for feature levels, where 1-is a vector operation representing the sum of 1 and directionEach dimension of the quantity F is subtracted,

according to the fusion of the user level node embedded representation and the feature level node embedded representation of the user in the social network by using a gating mechanism, the final node embedded representation z of each user in the social network can be obtained _i Component node embedding vector matrix z= (Z) ₁ ，z ₂ ，...，z _N ) ^T ；

wherein the method comprises the steps of

Is a sigmoid function;

to avoid trivial solutions, for each edge e that is observable _i，j ＝(v _i ，v _j ) We employ a negative sampling technique to maximize the objective function, namely:

wherein the first term is modeling positive examples in the social network, and the second term is modeling negative examples by randomly generating edges associated with nodes through a negative sampling techniqueLine modeling, the probability that each node is sampled satisfies

And->

Representing user +.>

Node embedded vector, ">

Representing user +.>

Is embedded with a vector; z is Z ^s And Z ^t Also referred to as embedded vector spaces corresponding to the source social network and the target social network;

s4, solving an intersection, and completing network alignment.

2. The network alignment method based on the dual-layer graph attention neural network according to claim 1, wherein the step S2 comprises the following steps: user v _i Is expressed as x _i Based on the network type, various feature vectors of the user are extracted and stacked laterally to generate an initialized feature vector representation of the user

3. The network alignment method based on the dual-layer graph attention neural network according to claim 2, wherein the network type is a topology network, the feature vector of the user is randomly initialized by adopting a random matrix mode, and the weight parameters of the random matrix are learned through the training phase of the dual-layer graph attention neural network model.

4. The network alignment method based on the dual-layer graph attention neural network according to claim 2, wherein if the network type is an attribute network, vectorizing the user attribute is performed by adopting the following manner: randomly initializing user information such as user names by adopting a word embedding mode to obtain user name feature vectors; mining the language style of the user from the long text information of the user by adopting a Doc2Vec model, and learning the text feature vector of the user; carrying out vector initialization on the user track information through spatial clustering to obtain a spatial feature vector of a user; and directly taking the user scoring and the number of sign-in times as a characteristic dimension of the user, and carrying out vector initialization.

5. The network alignment method based on the dual-layer graph attention neural network according to claim 1, wherein the step S3 comprises the following steps: based on the step S2, a node embedded vector matrix of the source social network is obtained

And node embedded vector matrix of target social network +.>