CN114782209B - Social network topological graph-based associated user identity recognition method - Google Patents

Social network topological graph-based associated user identity recognition method Download PDF

Info

Publication number
CN114782209B
CN114782209B CN202210429087.9A CN202210429087A CN114782209B CN 114782209 B CN114782209 B CN 114782209B CN 202210429087 A CN202210429087 A CN 202210429087A CN 114782209 B CN114782209 B CN 114782209B
Authority
CN
China
Prior art keywords
node
network
ego
social network
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210429087.9A
Other languages
Chinese (zh)
Other versions
CN114782209A (en
Inventor
胡瑞敏
甄宇
任灵飞
吴俊杭
胡文怡
李登实
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202210429087.9A priority Critical patent/CN114782209B/en
Publication of CN114782209A publication Critical patent/CN114782209A/en
Application granted granted Critical
Publication of CN114782209B publication Critical patent/CN114782209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Most current approaches embed social networks into a low-dimensional vector space and then align users into the low-dimensional space. However, because the social network is extremely complex and bulky, it is susceptible to error propagation and noise from different neighbors during the network embedding process. Based on the method, the invention provides an associated user identity recognition method based on a social network topological graph, which comprises the steps of firstly forming a ego network of a user (namely, extracting a local network formed by a node neighbor of the user), then extracting a user node sequence by using random walk, then learning a low-dimensional vector representation of the user by using a natural language model framework, and finally mapping two social networks into the same feature space by using a training matrix for alignment. The invention can avoid the interference caused by the high-order neighbors by utilizing ego network, thus improving the node embedding result and the association accuracy.

Description

Social network topological graph-based associated user identity recognition method
Technical Field
The invention relates to the technical field of multi-exchange network data analysis and mining, in particular to an associated user identity recognition method based on a social network topological graph.
Background
The related user identity recognition aims at finding out the corresponding relation between different identities of the same user in a plurality of social network platforms, is a key technology in the fields of analysis and mining of a plurality of social network data, has wide commercial application requirements, and has important application in the aspects of network security and individual recommendation.
Most of the current methods are DeepWalk(Perozzi B.,AI-Rfou R.,Skiena S.DeepWalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining.New York:ACM Press,2014:701-710.) -based methods, which borrow Word2vec(Mikolov T.,Sutskever I.,Chen Kai,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.Red Hook:Curran Associates Inc.,2013:3111-3119.) methods. The Word2vec method is a method for obtaining Word vectors in natural language processing, and can convert sparse, high-dimensional discrete vectors into relatively dense, low-dimensional continuous vectors. While this approach is for word vectors, reconstructing word vectors around it from the center word vector, node representations can also borrow this idea. Since both nodes in the social network and words in natural language have power law distributions, the DeepWalk method is thus applied to the social network with the method in the word vector. The method combines a random walk method and a Skip-gram method, adopts a random walk mode to extract a node sequence of a node in a social network, and then utilizes the Skip-gram method to obtain an embedded vector of the node. However, this method only obtains two feature spaces, and does not unify the feature spaces.
Thereafter, fan et al proposed ACCM's method (Zhou F,Zhang K,Xie S,et al.Learning to Correlate Accounts Across Online Social Networks:An Embedding-Based Approach[J].INFORMS Journal on Computing,2020,32.), in 2020, which also extracted the node sequences using random walk, and then mapped the set of node sequences into a feature vector space by Skip-gram method. Thus, the characteristic space of each of the two social networks can be obtained. In order to unify feature spaces, they also use some known matching users as constraints to train a mapping matrix so that the feature spaces of two social networks can be projected into the same feature space. And thus, similarity measurement is carried out in the unified feature space, and then, similarity user identity association is carried out according to a similarity result. Although the method reduces errors caused by different feature vector spaces of two social networks and is difficult to better match, the method uses the whole social network when the network is embedded, so that the influence of higher-order neighbors of the nodes is overlarge, the higher-order neighbors often do not play a key role on the nodes, more noise interference can be introduced, the node embedding result is not very accurate, and more errors are introduced.
Disclosure of Invention
The invention aims to provide an associated user identity recognition method based on a social network topological graph, which is used for solving the technical problem that the recognition accuracy is low due to excessive noise introduced into a high-order neighbor (namely, other nodes which are not directly connected with the node) when the neighbor node is embedded in the conventional method.
In order to solve the technical problems, the invention provides a related user identity recognition method based on network representation, which comprises the following steps:
S1: acquiring two known social network data sets, wherein the known social network data sets comprise friend relations between users, and the two social network data sets have associated users;
S2: respectively constructing topological graphs of the social networks G 1 and G 2 according to users and friend relations in the social network data set, wherein the social network topological graph comprises nodes and connected edges, the nodes represent the users, and the connected edges represent the friend relations; forming a first-order ego network of each node according to the social networks G 1 and G 2 respectively, wherein the first-order ego network graphs of each node in the G 1 network are combined to form a ego topological graph set, and the first-order ego network graphs of each node in the G 2 network are combined to form a ego topological graph set;
S3: forming s node sequences according to a ego network of each node by using a ego topological graph set of each node in two social networks G 1 and G 2 respectively, wherein the node sequences are extracted by adopting a random walk method to form node sequence sets of the two social networks;
S4: respectively mapping the node sequence sets of the two formed social networks into two feature spaces by using a skip-gram model, and learning the low-dimensional vector representation of the nodes in the mapped feature spaces to obtain the feature vector representation of each node;
S5: training according to the associated users of the two social network data sets to obtain a target feature mapping matrix, mapping the two feature spaces into the same feature space, calculating the similarity between a new node in the social network G 1 and each node in the social network G 2, and carrying out associated user identity recognition according to the calculated similarity, wherein the new node in the social network G 1 is a node obtained by mapping the original node in the social network G 1 according to the trained target feature mapping matrix.
In one embodiment, the two social network datasets include dataset one and dataset two, step S2 comprising:
S2.1: constructing a topology map of a social network G 1 according to a dataset, wherein G 1 comprises n nodes, v 1,v2…vn respectively, starting from node v 1 in G 1, extracting the node and all first-order neighbors thereof, then supplementing the connection edges between the extracted node and the first-order neighbors and the connection edges between the first-order neighbors according to the edges in G 1, forming ego network map Gv 1,v2-vn of node v 1, repeating the process until forming ego network maps of n nodes, and finally forming a ego network set
S2.2: constructing a topological graph of a social network G 2 according to a dataset II, wherein G 2 comprises m nodes, v ' 1,v′2…v′m respectively, starting from a node v ' 1 in G 2, extracting the node and all first-order neighbors thereof, then supplementing the connection edge between the extracted node and the first-order neighbors and the connection edge between each first-order neighbor according to the edge in G 2, forming a ego network graph Gv 1′,v′2-v′m of the node v ' 1, repeating the process until forming a ego network graph of m nodes, and finally forming a ego network set
In one embodiment, step S3 includes:
S3.1: from node v 1, starting with the ego network set formed by G 1, at the corresponding ego network Extracting s node sequences by using a random walk mode, wherein the beginning of each sequence is a node v 1, the sequence length is t, and the rest nodes repeat the process, and finally, extracting s node sequences from a ego network of each node to obtain n s node sequences altogether, and combining the n node sequences into a node sequence set L 1 of G 1;
S3.2: according to the ego network set formed by G 2, starting from a node v '1, extracting s node sequences in a corresponding ego network Gv 1' by using a random walk mode, wherein each sequence starts with a node v 1, the sequence length is t, and the rest nodes repeat the process, finally, extracting s node sequences from the ego network of each node, and obtaining m x s node sequences altogether, so as to combine the node sequence set L 2 of G 2.
In one embodiment, S4 comprises:
S4.1: inputting ego network set formed by G 1 into a skip-gram model as training data, adjusting model parameters, mapping each node into a p-dimensional feature vector, and finally mapping the G 1 network into a feature space G 1={u1,u2…un, wherein each node is represented by the feature vector;
S4.2: the ego network set formed by G 2 is used as training data to be input into a skip-gram model, model parameters are adjusted, each node is mapped into a p-dimensional feature vector, the G 2 network is finally mapped into a feature space G 2={u′1,u′2…u′m, and each node is represented by the feature vector.
In one embodiment, step S5 includes:
S5.1: training the associated users of the two social data sets in the step S1 as mapping basis to obtain a target feature mapping matrix, and mapping vector spaces of the two social networks into the same feature space based on the target feature mapping matrix;
S5.2: and mapping the nodes in G 1 to G 2 according to the target feature mapping matrix, obtaining corresponding new nodes, calculating the similarity between each new node in G 1 and each node in G 2, and carrying out associated user identification according to the calculated similarity.
In one embodiment, step S5.1 comprises:
And (3) constructing a mapping matrix by adopting the two new feature spaces obtained in the step (S4), and training by using a minimized objective function W *=argmin(Y-XW)T (Y-XW) to obtain a final target mapping matrix W *=(XTY)-1(XT Y, wherein X, Y respectively represents the two new feature spaces, W is the mapping matrix, and W * is the target mapping matrix.
In one embodiment, step S5.2 comprises:
According to the target mapping matrix, mapping each node in G 1 into G 2 to obtain a corresponding new node, wherein the calculation mode is as follows:
Wherein u 1 is a node in G 1, The new node corresponding to u 1, namely the node u 1 in G 1 is mapped to the mapping node in G 2;
and calculating cosine similarity between each new node and each node in the social network G 2:
where u' i is the ith node in G 2, Representing nodes/>Similarity to u' i;
By comparison of Cosine similarity values with each node in the social network G 2 are sorted in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 with the node u 1 in the social network G 1.
The above technical solutions in the embodiments of the present application at least have one or more of the following technical effects:
according to the social network topological graph-based associated user identity recognition method, after the social network topological graph is built, in order to obtain better embedding, a ego network of a user (namely a local network formed by a node neighbor) is formed firstly, then a random walk is used for extracting a user node sequence, then a natural language model framework is used for learning a low-dimensional vector representation of the user, and finally a training matrix maps two social networks into the same feature space for alignment. The method can avoid the interference caused by the high-order neighbors by utilizing ego network, so that the node embedding result can be improved, and the association accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for identifying an associated user identity based on a network representation according to an embodiment of the present invention.
Detailed Description
User alignment across social networks refers to finding users with the same identity among multiple social networks. The method has important application in the natural science fields of link prediction, individual recommendation and the like, and has a certain research value in the data mining field. The present inventors have found through a great deal of research and practice that: most current approaches embed social networks into a low-dimensional vector space and then align users into the low-dimensional space. However, because the social network is extremely complex and bulky, it is susceptible to error propagation and noise from different neighbors during the network embedding process.
Based on this, to obtain better embedding, the method of the present invention first forms the ego network of the user (i.e., extracts the local network formed by a section of neighbors of the user), then uses random walk to extract the user node sequence, then uses the natural language model framework to learn the low-dimensional vector representation of the user, and finally trains the matrix to map the two social networks into the same feature space for alignment. The invention can avoid the interference caused by the high-order neighbors by utilizing ego network, thus improving the node embedding result and the association accuracy.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides an associated user identity recognition method based on a social network topological graph, which comprises the following steps:
S1: acquiring two known social network data sets, wherein the known social network data sets comprise friend relations between users, and the two social network data sets have associated users;
S2: respectively constructing topological graphs of the social networks G 1 and G 2 according to users and friend relations in the social network data set, wherein the social network topological graph comprises nodes and connected edges, the nodes represent the users, and the connected edges represent the friend relations; forming a first-order ego network of each node according to the social networks G 1 and G 2 respectively, wherein the first-order ego network graphs of each node in the G 1 network are combined to form a ego topological graph set, and the first-order ego network graphs of each node in the G 2 network are combined to form a ego topological graph set;
S3: forming s node sequences according to a ego network of each node by using a ego topological graph set of each node in two social networks G 1 and G 2 respectively, wherein the node sequences are extracted by adopting a random walk method to form node sequence sets of the two social networks;
S4: respectively mapping the node sequence sets of the two formed social networks into two feature spaces by using a skip-gram model, and learning the low-dimensional vector representation of the nodes in the mapped feature spaces to obtain the feature vector representation of each node;
S5: training according to the associated users of the two social network data sets to obtain a target feature mapping matrix, mapping the two feature spaces into the same feature space, calculating the similarity between a new node in the social network G 1 and each node in the social network G 2, and carrying out associated user identity recognition according to the calculated similarity, wherein the new node in the social network G 1 is a node obtained by mapping the original node in the social network G 1 according to the trained target feature mapping matrix.
Referring to fig. 1, a flowchart of a related user identity recognition method based on network representation according to an embodiment of the present invention is shown, where an SG model is a skip-gram model.
Specifically, step S1 is the acquisition of a data set, and step S2 is the formation of a social network topology according to the acquired given social network data set, and the generation of a ego network topology of the network by extracting the first-order neighbors of the node. Step S3 is to extract a sequence containing node structure information from ego networks of nodes by using a random walk method, and a corpus of node sequences is formed by a plurality of node sequences. Step S4 is to convert the node sequence in the corpus by using a continuous word bag model (skip-gram model) in natural language processing to obtain the expression vector of the node. Step S5 is to obtain a space mapping matrix through partial known association nodes, map vector spaces of two social networks into the same vector space, and obtain the similarity of the nodes by using the newly obtained expression vector in the new space.
In S2, first-order neighbors of each node in the network are extracted for two social networks G 1 and G 2 respectively to form a ego network setAnd/>
In S4, the node sequence is mapped into a vector matrix by adopting a skip-gram model in natural language processing, and the model has a self-contained function package in python, and only needs to be called and then parameters needed by people are adjusted.
In step S4, the present invention trains a mapping matrix W, constructs a mapping function W *=argmin(Y-XW)T (Y-XW) by k known matched correlation nodes x= { u 1,u2…uk } and y= { u' 1,u′2…u′k }, and finds the final target mapping matrix by minimizing the target function.
The scheme of the invention can adopt technical software to realize automatic flow operation.
In one embodiment, the two social network datasets include dataset one and dataset two, step S2 comprising:
S2.1: constructing a topology map of a social network G 1 according to a dataset, wherein G 1 comprises n nodes, v 1,v2…vn respectively, starting from node v 1 in G 1, extracting the node and all first-order neighbors thereof, then supplementing the connection edges between the extracted node and the first-order neighbors and the connection edges between the first-order neighbors according to the edges in G 1, forming ego network map Gv 1,v2-vn of node v 1, repeating the process until forming ego network maps of n nodes, and finally forming a ego network set
S2.2: constructing a topological graph of a social network G 2 according to a dataset II, wherein G 2 comprises m nodes, v ' 1,v′2…v′m respectively, starting from a node v ' 1 in G 2, extracting the node and all first-order neighbors thereof, then supplementing the connection edge between the extracted node and the first-order neighbors and the connection edge between each first-order neighbor according to the edge in G 2, forming a ego network graph Gv 1′,v′2-v′m of the node v ' 1, repeating the process until forming a ego network graph of m nodes, and finally forming a ego network set
In one embodiment, step S3 includes:
S3.1: from node v 1, starting with the ego network set formed by G 1, at the corresponding ego network Extracting s node sequences by using a random walk mode, wherein the beginning of each sequence is a node v 1, the sequence length is t, and the rest nodes repeat the process, and finally, extracting s node sequences from a ego network of each node to obtain n s node sequences altogether, and combining the n node sequences into a node sequence set L 1 of G 1;
S3.2: according to the ego network set formed by G 2, starting from a node v '1, extracting s node sequences in a corresponding ego network Gv 1' by using a random walk mode, wherein each sequence starts with a node v 1, the sequence length is t, and the rest nodes repeat the process, finally, extracting s node sequences from the ego network of each node, and obtaining m x s node sequences altogether, so as to combine the node sequence set L 2 of G 2.
In the specific implementation process, the S node sequences in the step S3.1 are respectively as follows: l 1,l2…ls.
In one embodiment, S4 comprises:
S4.1: inputting ego network set formed by G 1 into a skip-gram model as training data, adjusting model parameters, mapping each node into a p-dimensional feature vector, and finally mapping the G 1 network into a feature space G 1={u1,u2…un, wherein each node is represented by the feature vector;
S4.2: the ego network set formed by G 2 is used as training data to be input into a skip-gram model, model parameters are adjusted, each node is mapped into a p-dimensional feature vector, the G 2 network is finally mapped into a feature space G 2={u′1,u′2…u′m, and each node is represented by the feature vector.
In one embodiment, step S5 includes:
S5.1: training the associated users of the two social data sets in the step S1 as mapping basis to obtain a target feature mapping matrix, and mapping vector spaces of the two social networks into the same feature space based on the target feature mapping matrix;
S5.2: and mapping the nodes in G 1 to G 2 according to the target feature mapping matrix, obtaining corresponding new nodes, calculating the similarity between each new node in G 1 and each node in G 2, and carrying out associated user identification according to the calculated similarity.
Specifically, after the two feature spaces are formed in step S4, since the dimensions of the two feature spaces are inconsistent, a feature mapping matrix needs to be trained to map the two feature spaces into one feature space, and then the similarity of each node is calculated.
In one embodiment, step S5.1 comprises:
And (3) constructing a mapping matrix by adopting the two new feature spaces obtained in the step (S4), and training by using a minimized objective function W *=argmin(Y-XW)T (Y-XW) to obtain a final target mapping matrix W *=(XTY)-1(XT Y, wherein X, Y respectively represents the two new feature spaces, W is the mapping matrix, and W * is the target mapping matrix.
In one embodiment, step S5.2 comprises:
According to the target mapping matrix, mapping each node in G 1 into G 2 to obtain a corresponding new node, wherein the calculation mode is as follows:
Wherein u 1 is a node in G 1, The new node corresponding to u 1, namely the node u 1 in G 1 is mapped to the mapping node in G 2;
and calculating cosine similarity between each new node and each node in the social network G 2:
Where u' i is the ith node in G 2, the value of i may traverse each node in social network G 2, i.e. i= … m, Representing nodes/>Similarity to u' i;
By comparison of Cosine similarity values with each node in the social network G 2 are sorted in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 with the node u 1 in the social network G 1.
Specifically, since the two feature spaces (X, Y) of G 1 and G 2 are obtained separately and are not the same space, and the similarity cannot be calculated directly, a mapping matrix needs to be trained, specifically, the known partial association nodes in the data set in step 1 can be used to make a mapping basis, and the two feature space dimensions are matched, so that the nodes in the G 1 network can be mapped into the G 2 network through the mapping matrix W.
For example, when we get the vector spaces G 1={u1,u2…un and G 2={u′1,u′2…u′m of two social networks, where u i is the embedded vector of the node v i obtained by the skip-gram model, n is the total number of nodes in G 1, u 'i is the embedded vector of the node v' i obtained by the skip-gram model, and m is the total number of nodes in G 2. The known k matched associated nodes X= { u 1,u2…uk } and Y= { u' 1,u′2…u′k } in the data set in the step 1 are selected, wherein 1-k is the renumbering of the k nodes, and the renumbering is not consistent with the previous numbering, so that a mapping matrix W is constructed by using only two new vector spaces X and Y, and the XW is similar to the Y as much as possible. The final mapping matrix W *=(XTY)-1(XT Y is found by minimizing the objective function W *=argmin(Y-XW)T (Y-XW). Finally, each node vector in the social network G 1={u1,u2…un may be mapped into the social network G 2 by the mapping matrix W *, e.g., for node u 1, a mapped node in the social network G 2 may be obtainedThe remaining nodes are similar.
The nodes in the G 1 network are mapped into the G 2 network after being multiplied by W *, then cosine similarity is calculated between new vectors of the nodes in the G 1 network and vectors in the G 2 network respectively, the nodes in the G 2 are arranged according to the similarity, and a plurality of similar nodes in the front are selected. Such as when a mapped node for node u 1 is obtainedTongue, calculate/>, respectivelyCosine similarity/>, with each node in social network G 1 And then by comparison/>The cosine similarity values of the nodes in the social network G 2 are ranked in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 and the node u 1 in the social network G 1, and the same is similar to other nodes in the social network G 1.
Finally, comparing the result with the accurate result to obtain an identity association index value The concrete representation forms are as follows:
Wherein, The true associated user a' i, which refers to user a i, is present in the first N selected predicted users, topN, and a 1-an refers to the N user nodes in the above step, where N is the total number of nodes.
The applicant runs on an Intel (R) Core (TM) i5-9500CPU@3.00GHz 3.00GHz computer, and by using the method of the embodiment, the disclosed data set Fourdure-Twitter is compared with the document (Tan S,Guan Z,CaiD,Qin X,Bu J,Chen C(2014)Mapping users across networks by manifold alignment on hypergraph.Proc.AAAI Conf.Artificial Intelligence,Quebec City,Canada,159-165.),(Perozzi B.,AI-Rfou R.,Skiena S.DeepWalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining.New York:ACM Press,2014:701-710.),(Zhou F,Zhang K,Xie S,et al.Learning to Correlate Accounts Across Online Social Networks:An Embedding-Based Approach[J].INFORMS Journal on Computing,2020,32.), so that the identity association effect is improved, and the method can be applied to the fields of recommendation systems, network security and the like.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Various modifications may be made to the particular embodiments described, or equivalents may be substituted, by those skilled in the art without departing from the spirit of the invention or exceeding the scope of the invention as defined by the appended claims.

Claims (4)

1. The method for identifying the identity of the associated user based on the social network topological graph is characterized by comprising the following steps of:
S1: acquiring two known social network data sets, wherein the known social network data sets comprise friend relations between users, and the two social network data sets have associated users;
S2: respectively constructing topological graphs of the social networks G 1 and G 2 according to users and friend relations in the social network data set, wherein the social network topological graph comprises nodes and connected edges, the nodes represent the users, and the connected edges represent the friend relations; forming a first-order ego network of each node according to the social networks G 1 and G 2 respectively, wherein the first-order ego network graphs of each node in the G 1 network are combined to form a ego topological graph set, and the first-order ego network graphs of each node in the G 2 network are combined to form a ego topological graph set;
S3: forming s node sequences according to a ego network of each node by using a ego topological graph set of each node in two social networks G 1 and G 2 respectively, wherein the node sequences are extracted by adopting a random walk method to form node sequence sets of the two social networks;
S4: respectively mapping the node sequence sets of the two formed social networks into two feature spaces by using a skip-gram model, and learning the low-dimensional vector representation of the nodes in the mapped feature spaces to obtain the feature vector representation of each node;
S5: training according to the associated users of the two social network data sets to obtain a target feature mapping matrix, mapping the two feature spaces into the same feature space, then calculating the similarity between a new node in the social network G 1 and each node in the social network G 2, and carrying out associated user identity recognition according to the calculated similarity, wherein the new node in the social network G 1 is a node obtained by mapping the original node in the social network G 1 according to the trained target feature mapping matrix;
Wherein, step S5 includes:
S5.1: training the associated users of the two social data sets in the step S1 as mapping basis to obtain a target feature mapping matrix, and mapping vector spaces of the two social networks into the same feature space based on the target feature mapping matrix;
S5.2: according to the target feature mapping matrix, mapping the nodes in G 1 to G 2, obtaining corresponding new nodes, then calculating the similarity between each new node of G 1 and each node in G 2, and carrying out associated user identification according to the calculated similarity;
Step S5.1 includes:
Constructing a mapping matrix by adopting the two new feature spaces obtained in the step S4, and training by using a minimized objective function W *=argmin(Y-XW)T (Y-XW) to obtain a final objective mapping matrix W *=(XTY)-1(XT Y, wherein x and Y respectively represent the two new feature spaces, W is the mapping matrix, and W * is the objective mapping matrix;
step S5.2 comprises:
According to the target mapping matrix, mapping each node in G 1 into G 2 to obtain a corresponding new node, wherein the calculation mode is as follows:
Wherein u 1 is a node in G 1, The new node corresponding to u 1, namely the node u 1 in G 1 is mapped to the mapping node in G 2;
and calculating cosine similarity between each new node and each node in the social network G 2:
where u' i is the ith node in G 2, Representing nodes/>Similarity to u' i;
By comparison of Cosine similarity values with each node in the social network G 2 are sorted in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 with the node u 1 in the social network G 1.
2. The social network topology-based associated user identification method of claim 1, wherein the two social network datasets comprise dataset one and dataset two, step S2 comprising:
S2.1: constructing a topology map of a social network G 1 according to a dataset, wherein G 1 comprises n nodes, v 1,v2…vn respectively, starting from node v 1 in G 1, extracting the node and all first-order neighbors thereof, then supplementing the connection edges between the extracted node and the first-order neighbors and the connection edges between the first-order neighbors according to the edges in G 1, forming ego network map Gv 1,v2-vn of node v 1, repeating the process until forming ego network maps of n nodes, and finally forming a ego network set
S2.2: constructing a topological graph of a social network G 2 according to a dataset II, wherein G 2 comprises m nodes, v ' 1,v′2…v′m respectively, starting from a node v ' 1 in G 2, extracting the node and all first-order neighbors thereof, then supplementing the connection edge between the extracted node and the first-order neighbors and the connection edge between each first-order neighbor according to the edge in G 2, forming a ego network graph Gv 1′,v′2-v′m of the node v ' 1, repeating the process until forming a ego network graph of m nodes, and finally forming a ego network set
3. The social network topology-based associated user identification method of claim 1, wherein step S3 comprises:
S3.1: from node v 1, starting with the ego network set formed by G 1, at the corresponding ego network Extracting s node sequences by using a random walk mode, wherein the beginning of each sequence is a node v 1, the sequence length is t, and the rest nodes repeat the process, and finally, extracting s node sequences from a ego network of each node to obtain n s node sequences altogether, and combining the n node sequences into a node sequence set L 1 of G 1;
S3.2: according to the ego network set formed by G 2, starting from a node v '1, extracting s node sequences in a corresponding ego network Gv 1' by using a random walk mode, wherein each sequence starts with a node v 1, the sequence length is t, and the rest nodes repeat the process, finally, extracting s node sequences from the ego network of each node, and obtaining m x s node sequences altogether, so as to combine the node sequence set L 2 of G 2.
4. The social network topology-based associated user identification method of claim 1, wherein S4 comprises:
S4.1: inputting ego network set formed by G 1 into a skip-gram model as training data, adjusting model parameters, mapping each node into a p-dimensional feature vector, and finally mapping the G 1 network into a feature space G 1={u1,u2…un, wherein each node is represented by the feature vector;
S4.2: the ego network set formed by G 2 is used as training data to be input into a skip-gram model, model parameters are adjusted, each node is mapped into a p-dimensional feature vector, the G 2 network is finally mapped into a feature space G 2={u′1,u′2…u′m, and each node is represented by the feature vector.
CN202210429087.9A 2022-04-22 2022-04-22 Social network topological graph-based associated user identity recognition method Active CN114782209B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210429087.9A CN114782209B (en) 2022-04-22 2022-04-22 Social network topological graph-based associated user identity recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210429087.9A CN114782209B (en) 2022-04-22 2022-04-22 Social network topological graph-based associated user identity recognition method

Publications (2)

Publication Number Publication Date
CN114782209A CN114782209A (en) 2022-07-22
CN114782209B true CN114782209B (en) 2024-06-11

Family

ID=82430692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210429087.9A Active CN114782209B (en) 2022-04-22 2022-04-22 Social network topological graph-based associated user identity recognition method

Country Status (1)

Country Link
CN (1) CN114782209B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116776193A (en) * 2023-05-17 2023-09-19 广州大学 Method and device for associating virtual identities across social networks based on attention mechanism

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8306922B1 (en) * 2009-10-01 2012-11-06 Google Inc. Detecting content on a social network using links
WO2019085641A1 (en) * 2017-11-01 2019-05-09 上海掌门科技有限公司 Method and apparatus for friend recommendation
CN110532436A (en) * 2019-07-17 2019-12-03 中国人民解放军战略支援部队信息工程大学 Across social network user personal identification method based on community structure
CN111080304A (en) * 2019-12-12 2020-04-28 支付宝(杭州)信息技术有限公司 Credible relationship identification method, device and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8433670B2 (en) * 2011-03-03 2013-04-30 Xerox Corporation System and method for recommending items in multi-relational environments

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8306922B1 (en) * 2009-10-01 2012-11-06 Google Inc. Detecting content on a social network using links
WO2019085641A1 (en) * 2017-11-01 2019-05-09 上海掌门科技有限公司 Method and apparatus for friend recommendation
CN110532436A (en) * 2019-07-17 2019-12-03 中国人民解放军战略支援部队信息工程大学 Across social network user personal identification method based on community structure
CN111080304A (en) * 2019-12-12 2020-04-28 支付宝(杭州)信息技术有限公司 Credible relationship identification method, device and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘奇飞 ; 杜彦辉 ; 芦天亮 ; .基于用户关系的跨社交网络用户身份关联方法.计算机应用研究.(02),全文. *

Also Published As

Publication number Publication date
CN114782209A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN110851645B (en) Image retrieval method based on similarity maintenance under deep metric learning
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
CN110070116B (en) Segmented selection integration image classification method based on deep tree training strategy
CN108805077A (en) A kind of face identification system of the deep learning network based on triple loss function
CN114332984B (en) Training data processing method, device and storage medium
CN113298191B (en) User behavior identification method based on personalized semi-supervised online federal learning
CN113157957A (en) Attribute graph document clustering method based on graph convolution neural network
CN110598061A (en) Multi-element graph fused heterogeneous information network embedding method
Barman et al. Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems
CN110866134A (en) Image retrieval-oriented distribution consistency keeping metric learning method
CN115357728A (en) Large model knowledge graph representation method based on Transformer
CN114782209B (en) Social network topological graph-based associated user identity recognition method
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN117036760A (en) Multi-view clustering model implementation method based on graph comparison learning
CN116206327A (en) Image classification method based on online knowledge distillation
CN116977763A (en) Model training method, device, computer readable storage medium and computer equipment
Tian et al. Genetic algorithm based deep learning model selection for visual data classification
CN116862024A (en) Credible personalized federal learning method and device based on clustering and knowledge distillation
CN112541530B (en) Data preprocessing method and device for clustering model
CN115600642B (en) Stream media-oriented decentralization federation learning method based on neighbor trust aggregation
CN117010373A (en) Recommendation method for category and group to which asset management data of power equipment belong
Shi et al. EpiRep: Learning node representations through epidemic dynamics on networks
CN115910232A (en) Multi-view drug pair response prediction method, device, equipment and storage medium
CN114169007B (en) Medical privacy data identification method based on dynamic neural network
CN115661539A (en) Less-sample image identification method embedded with uncertainty information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant