CN113095948A - Multi-source heterogeneous network user alignment method based on graph neural network - Google Patents

Multi-source heterogeneous network user alignment method based on graph neural network Download PDF

Info

Publication number
CN113095948A
CN113095948A CN202110315041.XA CN202110315041A CN113095948A CN 113095948 A CN113095948 A CN 113095948A CN 202110315041 A CN202110315041 A CN 202110315041A CN 113095948 A CN113095948 A CN 113095948A
Authority
CN
China
Prior art keywords
user
node
neural network
network
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110315041.XA
Other languages
Chinese (zh)
Other versions
CN113095948B (en
Inventor
刘晓明
尹伟
沈超
周亚东
管晓宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110315041.XA priority Critical patent/CN113095948B/en
Publication of CN113095948A publication Critical patent/CN113095948A/en
Application granted granted Critical
Publication of CN113095948B publication Critical patent/CN113095948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-source heterogeneous network user alignment method based on a graph neural network, which comprises the steps of extracting attribute features of users through a text processing algorithm, and reducing the dimension of user attribute feature vectors by using a dimension reduction algorithm; obtaining a network topological structure by using a random walk algorithm, taking the attribute characteristics after dimensionality reduction and the network structure as the input of a graph neural network, and learning to obtain identity characteristics containing user attribute and structure information; calculating the similarity of the user names and the social roles among the cross-networks, and finding out candidate user pairs; and calculating and aligning the identities of the users of the multi-source heterogeneous network by using a neural network model. The method can be used for aligning the user identity of the social network, has important application in the fields of social network analysis, human and object image completion and the like, has low calculation complexity of the algorithm and high expandability, can align the user identity in the complex network, and has strong applicability to real data.

Description

Multi-source heterogeneous network user alignment method based on graph neural network
Technical Field
The invention belongs to the field of social media data mining, and particularly relates to a multi-source heterogeneous network user alignment method based on a graph neural network.
Background
In recent years, users have registered to use many online social networks, such as Twitter, Instagram, linkedln, etc., and due to the diversity of functions, different online social network platforms attract users for various purposes, such as information lookup/sharing and maintaining social relationships. For example, a user may use Twitter to post opinions about political events while sharing their leisure activities using Instagram. To better utilize the services provided by each social network, users tend to join multiple online social networks. It has become increasingly popular for users to have accounts (also referred to as user identities) on multiple social networks. According to a report of a social media study, 93% of Instagram users are participating in Facebook at the same time, while 53% of Twitter users also use Instagram.
The method has important significance in aligning the user identities of the multi-source heterogeneous network. As the act of users registering accounts on multiple social media sites has become increasingly popular, new opportunities and challenges have been created for various data mining and learning tasks. First, a user who has accounts on multiple social media websites has the potential to fully understand the interests of the user and provide better suggestions or services. Since users use different online social networks for different purposes, analyzing user identities on a single social media may not be fully aware of their personalities and interests. However, if we can link a person's user identity to multiple data sets, collecting and analyzing their data on these social media websites together, we may have a more comprehensive understanding of the user and provide better services. Second, users with accounts on multiple social media websites can enable us to integrate patterns between online social websites and solve some problems that cannot be solved by data from only one website. Data sparsity issues, for example, in many predictive tasks; for example, a newly established social media service may not have enough historical data to recommend to the user. If we can identify these users on other sophisticated social media websites, knowledge can be transferred from mature social media to new social media, mitigating issues of data sparsity, etc. Finally, users who have accounts on multiple social media websites may also help analyze user migration patterns and guide Web development. Users migrating from one social network to another typically reflect the user experience of Web development. User identity alignment across different social media sites provides a great opportunity to study usage migration behavior.
Although some platforms provide a function of unifying user IDs, many people do not fill in correct information for security. How to align the personal identity of a user in various social networks is a research focus.
The graph neural network is a method for processing graph domain information based on deep learning, and can capture relevant information such as the relationship between vertexes in a graph, a network topological structure and the like. Because of their better performance and interpretability, graphical neural networks have recently become a widely used method of graph analysis.
Disclosure of Invention
Aiming at solving the defects in the prior art, aiming at the problems of user anonymity, content heterogeneity, network diversity and the like in a social network, the invention aims to provide a multisource heterogeneous network user alignment method based on a graph neural network.
The invention is realized by the following technical scheme.
A multi-source heterogeneous network user alignment method based on a graph neural network comprises the following steps:
step 1, preprocessing an online social network, defining friends or concern relations between users of the social network, and constructing an undirected and unauthorized social network topological graph;
step 2, extracting attribute characteristics of the user through a text processing algorithm, and reducing dimensions through a principal component analysis method;
step 3, carrying out random walk sampling on all nodes of the social network topological graph to obtain the correlation of the nodes among the social networks;
step 4, taking the node attribute features after dimensionality reduction as input, training a graph neural network according to social network structure information obtained by random walk, and extracting identity features containing user attributes and structure information for each node in the topological graph through the graph neural network;
step 5, calculating the similarity of the user name and the social role among the cross-social networks, and finding out a candidate user pair according to an empirical value;
and 6, splicing the obtained identity characteristics of the user pairs crossing the social networks to serve as input of a neural network, training a multilayer neural network model according to the users with known aligned identities, and aligning the identities of the users in the candidate user pairs.
With respect to the above technical solutions, the present invention has a further preferable solution:
preferably, representing users of the social network as nodes, representing friends or concern relations among the users as edges, and constructing an undirected and unauthorized topological graph; the undirected topology graph is represented as G ═ V, E, where V is the set of nodes in the social network and E is the set of edges in the topology graph G.
The specific method of the step 2 comprises the following steps:
step 2.1, preprocessing the user attribute information, removing fixed labels or punctuation marks, and separating each word through a blank space; removing stop words in the stop phrases provided by the NLTK to finally obtain a word bag document;
step 2.2, calculating to obtain an attribute feature vector of each user by using a TF-IDF algorithm;
and 2.3, reducing the dimension of the attribute characteristics of the user by using a principal component analysis method.
The specific method of the step 3 comprises the following steps:
step 3.1, performing multiple random walks with fixed length on each node in the social network topological graph G, and recording a traversal node sequence;
and 3.2, sampling the first-order neighbors and the second-order neighbors of each node in the social network topological graph G in a fixed quantity, if the number of the actual neighbors of a certain node is less than the number of required samples, adopting a sampling method with a return, and otherwise, adopting a sampling mode without the return.
In the step 3.1, a random walk algorithm is executed on each node in the social network topological graph G, one node is selected as a next hop from neighbors of a current node at a medium probability each time, the step length is t, the random walk times are γ, and finally the mutual relation between the nodes is obtained. The preset length t of the random walk step is 5, and the preset length γ of the random walk times for each node is 50.
In the step 3.2, according to the feature of aggregating neighbors in two hops, the sampling number of the first-order neighbors is k1, and the sampling number of the second-order neighbors is k 2. The preset length k1 of the first order neighbor sample number is 25, and the preset length k2 of the second order neighbor sample number is 15.
The specific method of the step 4 comprises the following steps:
step 4.1, inputting the attribute characteristics after dimension reduction into a two-layer graph neural network graph, carrying out k-th aggregation on the target node by using an average aggregation mode, averaging each dimension of the sampling neighbor feature vector of the target node, splicing the obtained result with the k-1 layer vector of the target vertex, and carrying out nonlinear transformation, wherein the activation function is RELU;
step 4.2, constructing a loss function by using an unsupervised training mode;
and 4.3, after training is finished, obtaining the user identity characteristic of each node, and for the newly added node in the social network, using the attribute information and the neighbor information of the node as input, and obtaining the identity characteristic through fast calculation of a graph neural network.
The specific method of the step 5 comprises the following steps:
step 5.1, calculating the similarity of the user names by using the editing distance, namely the minimum editing times for changing one character string into another character string;
step 5.2, calculating the score of each node in the network by using a PageRank algorithm, sorting according to the score, and expressing the social status of each user by percentage;
and 5.3, selecting the nodes with the user name editing distance within 5 and the social ranking within 25% as candidate user pairs.
The specific method of the step 6 comprises the following steps:
step 6.1, constructing a multilayer neural network model, splicing the identity characteristics of user pairs across the social network, using a two-dimensional vector as input of the neural network model, using [1,0] to represent that two users across the social network are the same person, and using softmax as an activation function to construct a loss function L, wherein [0,1] represents that the two users are not the same person;
step 6.2, taking the user pairs with the aligned identities as positive samples with the aligned identities, and constructing negative samples with the unaligned identities by a random extraction method, wherein the proportion of the positive samples to the negative samples is 1: 1, fitting a model after multiple rounds of training by using a random gradient descent method;
and 6.3, inputting the identity characteristics of the candidate user pairs into the trained neural network, judging whether the candidate user pairs are the same user, and aligning the two social networks.
Due to the adoption of the technical scheme, the invention has the following beneficial effects:
(1) according to the method and the device, the user identity characteristics are extracted through the user name, the attribute information and the social network structure information of the user, so that the user identity between the cross-networks is aligned, the required data are all public data and are easy to obtain, and the method and the device are suitable for various online social networks.
(2) The invention acquires the structural information of the network by random walk, trains the neural network of the graph by an unsupervised method, simultaneously learns the node attribute information and the structural information end to end, acquires the identity characteristics of the user by aggregating the identity characteristics of the multi-layer friends of the user, and fully considers the actual characteristics of the social network.
(3) The graph neural network used by the invention is a direct push type learning mode, a new user is added in the social network, and the graph neural network can be directly obtained by the identity feature aggregation of friends, so that the calculation cost is saved, and the network does not need to be trained repeatedly.
(4) The alignment model is a multilayer neural network, can learn to obtain a special relationship between two networks, and is suitable for any social network.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention:
FIG. 1 is a schematic flow chart of a multi-source heterogeneous network user alignment method based on a graph neural network according to the present invention;
FIG. 2 is a schematic diagram of a node neighbor sampling manner in the neural network;
fig. 3 is an aggregation manner of nodes in the neural network of the figure.
Detailed Description
The present invention will now be described in detail with reference to the drawings and specific embodiments, wherein the exemplary embodiments and descriptions of the present invention are provided to explain the present invention without limiting the invention thereto.
As shown in fig. 1, the multi-source heterogeneous network user alignment method based on the graph neural network provided by the embodiment of the present invention includes the following steps:
step S1: multi-source heterogeneous social network data pre-processing
The online social network data set in the embodiment is derived from https:// www.aminer.cn/cosnet and comprises two parts, namely a social network data set and an academic data set, wherein user alignment data in the social network is collected by Perito et al through a Google resume service system. Selecting two social networks of Twitter and MySpace in a social network data set for experiment, wherein the social network Twitter data set comprises 40,171,624 users and 1,468,365,182 edges and contains 28,199 parts of detailed user data; the social network MySpace dataset includes 854,498 users and 6,489,736 edges, with 9,993 shares of the detailed user profile.
Preprocessing an online social network, defining friends or concern relations between users of the social network, and constructing an undirected and unauthorized social network topological graph. Representing users of the social network as nodes, representing friends or attention relations among the users as edges, and constructing a topological graph G of the undirected weightless graph as (V, E), wherein V is a set of nodes in the social network, and E is a set of edges in the topological graph G.
Step S2: and extracting attribute features of the user through a text processing algorithm, and reducing the dimension through a principal component analysis method for subsequent input. The method specifically comprises the following steps:
step S201: the user attribute information is preprocessed, fixed labels or punctuation marks are removed, each word is separated by a space, for example, the user name of the user is Black-Hawk, the processing result is 1.Black 2.Hawk, the address filled by the user is "Georgia Tech, Atlanta, Georgia", and the processing result is 1.Georgia 2.Tech 3.Atlanta 4. Georgia. In addition, the stop words in the stop phrases provided by the NLTK are removed, and finally the bag-of-words document is obtained.
Step S202: and calculating to obtain the attribute feature vector of each user by using a TF-IDF algorithm.
Step S203: and reducing the dimension of the attribute features of the users by using a principal component analysis method, wherein the attribute features of the users occupy 64-dimensional features.
Step S3: all nodes of the social network topological graph G are subjected to random walk to obtain the relevant relations of the nodes among the social networks, and the relevant neighbors of each node are sampled, and the method specifically comprises the following steps:
step S301: and executing a random walk algorithm on each node in the social network topological graph G, selecting one node from the neighbors of the current node at a medium probability each time as a next hop, recording a traversal node sequence, and finally obtaining the mutual relation between the nodes, wherein the step length is 5, and the random walk times are 50.
Step S302: the used graph neural network mainly considers the characteristic of aggregating neighbors in two hops, a fixed number of sampling is carried out on a first-order neighbor (a directly connected node) and a second-order neighbor (a neighbor) of each node in a social network topological graph G, the sampling number of the first-order neighbor is 25, the sampling number of the second-order neighbor is 15, if the actual neighbor number of a certain node is less than the required sample number, a sampling method with a release is adopted, otherwise, a sampling mode without the release is adopted, for example, as shown in FIG. 2, the sampling number of the first-order neighbor is 3, the sampling number of the second-order neighbor is 4, the sampling nodes of the first-order neighbor of the node 1 are 2,3 and 4, and the sampling nodes of the second-order neighbor are.
Step S4: taking the node attribute features after dimensionality reduction as input, training a graph neural network according to social network structure information obtained by random walk, and extracting identity features containing user attributes and structure information for each node in a topological graph through the graph neural network, wherein the method specifically comprises the following steps:
step S401: inputting the attribute features after dimensionality reduction into a two-layer graph neural network GraphSAGE, and using an average aggregation mode to perform k-th aggregation on a target node, wherein the calculation step of performing k-th aggregation on the target node comprises the steps of averaging each dimensionality of a sampling neighbor feature vector of the target node, splicing the obtained result with the k-1-th layer vector of a target vertex, performing nonlinear transformation, obtaining an activation function RELU, and performing aggregation mode of node identity features as shown in FIG. 3, wherein the features of the nodes in the graph are all k-1-th aggregation results, and the k-th aggregation step of the node 1 is to firstly average the features of neighbors 2,3 and 4 of the node 1, namely [ (2+3+1)/3, (4+ 3)/3, (2+6+3)/3]T=[2,3,3.67]TAnd then the characteristics of the node 1 are spliced with the characteristics to obtain [4,2,5,2,3,3.67 ]]TThen multiplied by the matrix W to be trainedkAnd after the function is activated, the identity characteristic of the node 1 can be obtained.
Step S402: using an unsupervised training approach, a loss function is constructed:
Figure BDA0002990838940000091
wherein Z isuAn embedded vector is generated for the node u through a model, the node v is a neighbor that the node u reaches through random walk, and ZvFor the embedded vector of node v, σ stands for sigmoid activation function, PnThe probability distribution of negative sampling is shown, Q is the number of negative samples, an Adam optimizer is used for training, and fitting is achieved through a multi-round training graph neural network.
Step S403: and after training is finished, obtaining the user identity characteristic of each node, regarding the newly added node in the social network, taking the attribute information of the node and the neighbor attribute information as input, and obtaining the identity characteristic through calculation of a graph neural network.
Step S5: calculating the similarity of user names and social roles among the cross-social networks, and finding out candidate user pairs according to experience values, wherein the method specifically comprises the following steps:
step S501: the user name similarity is calculated using the edit distance, i.e. the minimum number of edits required to change one string to another. The user name edit distance calculation process is shown in table 1.
TABLE 1
Figure BDA0002990838940000092
Figure BDA0002990838940000101
Step S502: and calculating the score of each node in the network by using a PageRank algorithm, sorting according to the scores, and expressing the social status of each user by percentage.
Step S503: and selecting the nodes with the user name editing distance within 5 and the social ranking within 25% as candidate user pairs.
Step S6: splicing the obtained identity characteristics of the user pairs crossing the social network as input of a neural network, training a multilayer neural network model according to the users with known identity alignment, and aligning the identity of the user in the candidate user pairs, specifically comprising the following steps:
step S601: constructing a multi-layer neural network model, splicing the identity characteristics of user pairs between networks to serve as the input of the neural network model, using a two-dimensional vector as the output, [1,0]]Representing two users across the network as the same person, [0,1]Then it means that the two users are not the same person, the activation function uses softmax, constructs the loss function L:
Figure BDA0002990838940000102
where y represents the true result of the input user pair,
Figure BDA0002990838940000103
representing the predicted outcome.
Step S602: the user alignment problem is a two-classification problem, and comprises two types of positive samples (identity aligned samples) and negative samples (identity unaligned samples), wherein the identity aligned user pair is used as the positive sample, the negative sample is constructed by a random extraction method, and the proportion of the positive sample to the negative sample is 1: 1, using a random gradient descent method, wherein the learning rate lambda is 0.001, and the model is fitted after multiple rounds of training.
Step S603: and inputting the identity characteristics of the candidate user pairs into the trained neural network, judging whether the candidate user pairs are the same user, and aligning the two networks.
In order to verify the effect of the user alignment method based on the graph neural network proposed in the present embodiment, the test set in the present embodiment is tested, and the accuracy and the recall rate are used as the evaluation indexes of the model.
The experimental results of this example are as follows:
the accuracy values of the test set stabilized at 0.7796 and the recall stabilized at 0.7248.
The experimental result shows that the user identity across networks can be aligned effectively and accurately by the user alignment method based on the graph neural network. The method can be used for aligning the user identities among a plurality of social networks, has important application in a plurality of fields of social network data analysis, figure portrait missing information completion, social platform commodity recommendation and the like, has low calculation complexity and high expandability of an algorithm, can align the user identities in the complex networks, and is suitable for the plurality of social networks.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (10)

1. A multi-source heterogeneous network user alignment method based on a graph neural network is characterized by comprising the following steps:
step 1, preprocessing an online social network, defining friends or concern relations between users of the social network, and constructing an undirected and unauthorized social network topological graph;
step 2, extracting attribute characteristics of the user through a text processing algorithm, and reducing dimensions through a principal component analysis method;
step 3, carrying out random walk sampling on all nodes of the social network topological graph to obtain the correlation of the nodes among the social networks;
step 4, taking the node attribute features after dimensionality reduction as input, training a graph neural network according to social network structure information obtained by random walk, and extracting identity features containing user attributes and structure information for each node in the topological graph through the graph neural network;
step 5, calculating the similarity of the user name and the social role among the cross-social networks, and finding out a candidate user pair according to an empirical value;
and 6, splicing the obtained identity characteristics of the user pairs crossing the social networks to serve as input of a neural network, training a multilayer neural network model according to the users with known aligned identities, and aligning the identities of the users in the candidate user pairs.
2. The multi-source heterogeneous network user alignment method based on the graph neural network is characterized in that the users of the social network are represented as nodes, friends or concern relations among the users are represented as edges, and a undirected and unweighted topological graph is constructed; the undirected topology graph is represented as G ═ V, E, where V is the set of nodes in the social network and E is the set of edges in the topology graph G.
3. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 2 is as follows:
step 2.1, preprocessing the user attribute information, removing fixed labels or punctuation marks, and separating each word through a blank space; removing stop words in the stop phrases provided by the NLTK to finally obtain a word bag document;
step 2.2, calculating to obtain an attribute feature vector of each user by using a TF-IDF algorithm;
and 2.3, reducing the dimension of the attribute characteristics of the user by using a principal component analysis method.
4. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 3 is as follows:
step 3.1, performing multiple random walks with fixed length on each node in the social network topological graph G, and recording a traversal node sequence;
and 3.2, sampling the first-order neighbors and the second-order neighbors of each node in the social network topological graph G in a fixed quantity, if the number of the actual neighbors of a certain node is less than the number of required samples, adopting a sampling method with a return, and otherwise, adopting a sampling mode without the return.
5. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 3, characterized in that in step 3.1, a random walk algorithm is executed on each node in the social network topological graph G, one node is selected as a next hop from neighbors of a current node at a medium probability each time, the step length is t, the random walk times are γ, and finally the mutual relationship between the nodes is obtained;
in the step 3.2, according to the feature of aggregating neighbors in two hops, the sampling number of the first-order neighbors is k1, and the sampling number of the second-order neighbors is k 2.
6. The multi-source heterogeneous network user alignment method based on the graph neural network as claimed in claim 5, wherein the preset length t of the random walk step is 5, and the preset length γ of the random walk times for each node is 50.
7. The multi-source heterogeneous network user alignment method based on the graph neural network as claimed in claim 5, wherein the preset length k 1-25 for the first-order neighbor sample number and the preset length k 2-15 for the second-order neighbor sample number.
8. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 4 is as follows:
step 4.1, inputting the attribute characteristics after dimension reduction into a two-layer graph neural network graph, carrying out k-th aggregation on the target node by using an average aggregation mode, averaging each dimension of the sampling neighbor feature vector of the target node, splicing the obtained result with the k-1 layer vector of the target vertex, and carrying out nonlinear transformation, wherein the activation function is RELU;
step 4.2, constructing a loss function by using an unsupervised training mode:
Figure FDA0002990838930000031
wherein Z isuAn embedded vector is generated for the node u through a model, the node v is a neighbor that the node u reaches through random walk, and ZvFor the embedded vector of node v, σ stands for sigmoid activation function, PnIs the probability distribution of negative samples, Q is the number of negative samplesMesh;
and 4.3, after training is finished, obtaining the user identity characteristic of each node, and for the newly added node in the social network, using the attribute information and the neighbor information of the node as input, and obtaining the identity characteristic through fast calculation of a graph neural network.
9. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 5 is as follows:
step 5.1, calculating the similarity of the user names by using the editing distance, namely the minimum editing times for changing one character string into another character string;
step 5.2, calculating the score of each node in the network by using a PageRank algorithm, sorting according to the score, and expressing the social status of each user by percentage;
and 5.3, selecting the nodes with the user name editing distance within 5 and the social ranking within 25% as candidate user pairs.
10. The multi-source heterogeneous network user alignment method based on the graph neural network according to claim 1, wherein the specific method in the step 6 is as follows:
step 6.1, constructing a multilayer neural network model, splicing the identity characteristics of user pairs across social networks, using a two-dimensional vector as input of the neural network model, using [1,0] to represent that two users across the social networks are the same person, using [0,1] to represent that the two users are not the same person, using softmax as an activation function, and constructing a loss function L:
Figure FDA0002990838930000041
where y represents the true result of the input user pair,
Figure FDA0002990838930000042
representing the predicted result;
Step 6.2, taking the user pairs with the aligned identities as positive samples with the aligned identities, and constructing negative samples with the unaligned identities by a random extraction method, wherein the proportion of the positive samples to the negative samples is 1: 1, fitting a model after multiple rounds of training by using a random gradient descent method;
and 6.3, inputting the identity characteristics of the candidate user pairs into the trained neural network, judging whether the candidate user pairs are the same user, and aligning the two social networks.
CN202110315041.XA 2021-03-24 2021-03-24 Multi-source heterogeneous network user alignment method based on graph neural network Active CN113095948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110315041.XA CN113095948B (en) 2021-03-24 2021-03-24 Multi-source heterogeneous network user alignment method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110315041.XA CN113095948B (en) 2021-03-24 2021-03-24 Multi-source heterogeneous network user alignment method based on graph neural network

Publications (2)

Publication Number Publication Date
CN113095948A true CN113095948A (en) 2021-07-09
CN113095948B CN113095948B (en) 2023-06-06

Family

ID=76669894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110315041.XA Active CN113095948B (en) 2021-03-24 2021-03-24 Multi-source heterogeneous network user alignment method based on graph neural network

Country Status (1)

Country Link
CN (1) CN113095948B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762372A (en) * 2021-08-30 2021-12-07 中国电子科技集团公司第十五研究所 Method and device for identifying organization members in instant messaging information
CN113779520A (en) * 2021-09-07 2021-12-10 中国船舶重工集团公司第七0九研究所 Cross-space target virtual identity correlation method based on multilayer attribute analysis
CN114663245A (en) * 2022-03-16 2022-06-24 南京信息工程大学 Cross-social network identity matching method
CN114817757A (en) * 2022-04-02 2022-07-29 广州大学 Cross-social network virtual identity association method based on graph convolution network
CN115269845A (en) * 2022-08-01 2022-11-01 安徽大学 Network alignment method and system based on social network user personality

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10305845B1 (en) * 2013-04-05 2019-05-28 Hrl Laboratories, Llc Accurate user alignment across online social media platforms
CN110852755A (en) * 2019-11-06 2020-02-28 支付宝(杭州)信息技术有限公司 User identity identification method and device for transaction scene
WO2020114022A1 (en) * 2018-12-04 2020-06-11 平安科技(深圳)有限公司 Knowledge base alignment method and apparatus, computer device and storage medium
CN111931903A (en) * 2020-07-09 2020-11-13 北京邮电大学 Network alignment method based on double-layer graph attention neural network
CN112084373A (en) * 2020-08-05 2020-12-15 国家计算机网络与信息安全管理中心 Multi-source heterogeneous network user alignment method based on graph embedding
CN112507247A (en) * 2020-12-15 2021-03-16 重庆邮电大学 Cross-social network user alignment method fusing user state information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10305845B1 (en) * 2013-04-05 2019-05-28 Hrl Laboratories, Llc Accurate user alignment across online social media platforms
WO2020114022A1 (en) * 2018-12-04 2020-06-11 平安科技(深圳)有限公司 Knowledge base alignment method and apparatus, computer device and storage medium
CN110852755A (en) * 2019-11-06 2020-02-28 支付宝(杭州)信息技术有限公司 User identity identification method and device for transaction scene
CN111931903A (en) * 2020-07-09 2020-11-13 北京邮电大学 Network alignment method based on double-layer graph attention neural network
CN112084373A (en) * 2020-08-05 2020-12-15 国家计算机网络与信息安全管理中心 Multi-source heterogeneous network user alignment method based on graph embedding
CN112507247A (en) * 2020-12-15 2021-03-16 重庆邮电大学 Cross-social network user alignment method fusing user state information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JING ZHANG ET AL: "MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks", 《CIKM\'18》 *
YIXIN CAO ET AL: "Multi-Channel Graph Neural Network for Entity Alignment", 《ARXIV》 *
滕磊: "基于异构网络联合嵌入的用户对齐技术研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762372A (en) * 2021-08-30 2021-12-07 中国电子科技集团公司第十五研究所 Method and device for identifying organization members in instant messaging information
CN113779520A (en) * 2021-09-07 2021-12-10 中国船舶重工集团公司第七0九研究所 Cross-space target virtual identity correlation method based on multilayer attribute analysis
CN114663245A (en) * 2022-03-16 2022-06-24 南京信息工程大学 Cross-social network identity matching method
CN114817757A (en) * 2022-04-02 2022-07-29 广州大学 Cross-social network virtual identity association method based on graph convolution network
CN114817757B (en) * 2022-04-02 2023-07-21 广州大学 Cross-social network virtual identity association method based on graph rolling network
CN115269845A (en) * 2022-08-01 2022-11-01 安徽大学 Network alignment method and system based on social network user personality
CN115269845B (en) * 2022-08-01 2023-06-23 安徽大学 Network alignment method and system based on social network user personality

Also Published As

Publication number Publication date
CN113095948B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
US11687728B2 (en) Text sentiment analysis method based on multi-level graph pooling
CN113095948B (en) Multi-source heterogeneous network user alignment method based on graph neural network
CN111061856B (en) Knowledge perception-based news recommendation method
Fire et al. Link prediction in social networks using computationally efficient topological features
CN111368074A (en) Link prediction method based on network structure and text information
CN112084373B (en) Graph embedding-based multi-source heterogeneous network user alignment method
CN112036445B (en) User identity recognition method across social networks based on neural tensor network
CN108647800B (en) Online social network user missing attribute prediction method based on node embedding
CN114265986B (en) Information pushing method and system fusing knowledge graph structure and path semantics
CN112199608A (en) Social media rumor detection method based on network information propagation graph modeling
CN111581966A (en) Context feature fusion aspect level emotion classification method and device
CN112966091A (en) Knowledge graph recommendation system fusing entity information and heat
CN104850647A (en) Microblog group discovering method and microblog group discovering device
CN111666496A (en) Group recommendation method based on comment text
Wang et al. Multi-modal knowledge graphs representation learning via multi-headed self-attention
CN115270007B (en) POI recommendation method and system based on mixed graph neural network
CN115310005A (en) Neural network recommendation method and system based on meta-path fusion and heterogeneous network
Liu et al. Identifying experts in community question answering website based on graph convolutional neural network
Ashraf et al. Author profiling on bi-lingual tweets
Saraswathi et al. Deep Learning Enabled Social Media Recommendation Based on User Comments.
CN113239159A (en) Cross-modal retrieval method of videos and texts based on relational inference network
CN115600642B (en) Stream media-oriented decentralization federation learning method based on neighbor trust aggregation
CN116304311A (en) Online social network spam comment user detection method
Qi et al. ScratchGAN: Network representation learning for scratch with preference‐based generative adversarial nets
CN115905572A (en) Social robot detection method and storage medium for twitter users

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant