CN114782209B - Social network topological graph-based associated user identity recognition method - Google Patents
Social network topological graph-based associated user identity recognition method Download PDFInfo
- Publication number
- CN114782209B CN114782209B CN202210429087.9A CN202210429087A CN114782209B CN 114782209 B CN114782209 B CN 114782209B CN 202210429087 A CN202210429087 A CN 202210429087A CN 114782209 B CN114782209 B CN 114782209B
- Authority
- CN
- China
- Prior art keywords
- node
- network
- ego
- social network
- social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013507 mapping Methods 0.000 claims abstract description 74
- 239000013598 vector Substances 0.000 claims abstract description 46
- 239000011159 matrix material Substances 0.000 claims abstract description 44
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000005295 random walk Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 7
- 230000001502 supplementing effect Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000013459 approach Methods 0.000 abstract description 5
- 238000007418 data mining Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000013604 expression vector Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Most current approaches embed social networks into a low-dimensional vector space and then align users into the low-dimensional space. However, because the social network is extremely complex and bulky, it is susceptible to error propagation and noise from different neighbors during the network embedding process. Based on the method, the invention provides an associated user identity recognition method based on a social network topological graph, which comprises the steps of firstly forming a ego network of a user (namely, extracting a local network formed by a node neighbor of the user), then extracting a user node sequence by using random walk, then learning a low-dimensional vector representation of the user by using a natural language model framework, and finally mapping two social networks into the same feature space by using a training matrix for alignment. The invention can avoid the interference caused by the high-order neighbors by utilizing ego network, thus improving the node embedding result and the association accuracy.
Description
Technical Field
The invention relates to the technical field of multi-exchange network data analysis and mining, in particular to an associated user identity recognition method based on a social network topological graph.
Background
The related user identity recognition aims at finding out the corresponding relation between different identities of the same user in a plurality of social network platforms, is a key technology in the fields of analysis and mining of a plurality of social network data, has wide commercial application requirements, and has important application in the aspects of network security and individual recommendation.
Most of the current methods are DeepWalk(Perozzi B.,AI-Rfou R.,Skiena S.DeepWalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining.New York:ACM Press,2014:701-710.) -based methods, which borrow Word2vec(Mikolov T.,Sutskever I.,Chen Kai,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.Red Hook:Curran Associates Inc.,2013:3111-3119.) methods. The Word2vec method is a method for obtaining Word vectors in natural language processing, and can convert sparse, high-dimensional discrete vectors into relatively dense, low-dimensional continuous vectors. While this approach is for word vectors, reconstructing word vectors around it from the center word vector, node representations can also borrow this idea. Since both nodes in the social network and words in natural language have power law distributions, the DeepWalk method is thus applied to the social network with the method in the word vector. The method combines a random walk method and a Skip-gram method, adopts a random walk mode to extract a node sequence of a node in a social network, and then utilizes the Skip-gram method to obtain an embedded vector of the node. However, this method only obtains two feature spaces, and does not unify the feature spaces.
Thereafter, fan et al proposed ACCM's method (Zhou F,Zhang K,Xie S,et al.Learning to Correlate Accounts Across Online Social Networks:An Embedding-Based Approach[J].INFORMS Journal on Computing,2020,32.), in 2020, which also extracted the node sequences using random walk, and then mapped the set of node sequences into a feature vector space by Skip-gram method. Thus, the characteristic space of each of the two social networks can be obtained. In order to unify feature spaces, they also use some known matching users as constraints to train a mapping matrix so that the feature spaces of two social networks can be projected into the same feature space. And thus, similarity measurement is carried out in the unified feature space, and then, similarity user identity association is carried out according to a similarity result. Although the method reduces errors caused by different feature vector spaces of two social networks and is difficult to better match, the method uses the whole social network when the network is embedded, so that the influence of higher-order neighbors of the nodes is overlarge, the higher-order neighbors often do not play a key role on the nodes, more noise interference can be introduced, the node embedding result is not very accurate, and more errors are introduced.
Disclosure of Invention
The invention aims to provide an associated user identity recognition method based on a social network topological graph, which is used for solving the technical problem that the recognition accuracy is low due to excessive noise introduced into a high-order neighbor (namely, other nodes which are not directly connected with the node) when the neighbor node is embedded in the conventional method.
In order to solve the technical problems, the invention provides a related user identity recognition method based on network representation, which comprises the following steps:
S1: acquiring two known social network data sets, wherein the known social network data sets comprise friend relations between users, and the two social network data sets have associated users;
S2: respectively constructing topological graphs of the social networks G 1 and G 2 according to users and friend relations in the social network data set, wherein the social network topological graph comprises nodes and connected edges, the nodes represent the users, and the connected edges represent the friend relations; forming a first-order ego network of each node according to the social networks G 1 and G 2 respectively, wherein the first-order ego network graphs of each node in the G 1 network are combined to form a ego topological graph set, and the first-order ego network graphs of each node in the G 2 network are combined to form a ego topological graph set;
S3: forming s node sequences according to a ego network of each node by using a ego topological graph set of each node in two social networks G 1 and G 2 respectively, wherein the node sequences are extracted by adopting a random walk method to form node sequence sets of the two social networks;
S4: respectively mapping the node sequence sets of the two formed social networks into two feature spaces by using a skip-gram model, and learning the low-dimensional vector representation of the nodes in the mapped feature spaces to obtain the feature vector representation of each node;
S5: training according to the associated users of the two social network data sets to obtain a target feature mapping matrix, mapping the two feature spaces into the same feature space, calculating the similarity between a new node in the social network G 1 and each node in the social network G 2, and carrying out associated user identity recognition according to the calculated similarity, wherein the new node in the social network G 1 is a node obtained by mapping the original node in the social network G 1 according to the trained target feature mapping matrix.
In one embodiment, the two social network datasets include dataset one and dataset two, step S2 comprising:
S2.1: constructing a topology map of a social network G 1 according to a dataset, wherein G 1 comprises n nodes, v 1,v2…vn respectively, starting from node v 1 in G 1, extracting the node and all first-order neighbors thereof, then supplementing the connection edges between the extracted node and the first-order neighbors and the connection edges between the first-order neighbors according to the edges in G 1, forming ego network map Gv 1,v2-vn of node v 1, repeating the process until forming ego network maps of n nodes, and finally forming a ego network set
S2.2: constructing a topological graph of a social network G 2 according to a dataset II, wherein G 2 comprises m nodes, v ' 1,v′2…v′m respectively, starting from a node v ' 1 in G 2, extracting the node and all first-order neighbors thereof, then supplementing the connection edge between the extracted node and the first-order neighbors and the connection edge between each first-order neighbor according to the edge in G 2, forming a ego network graph Gv 1′,v′2-v′m of the node v ' 1, repeating the process until forming a ego network graph of m nodes, and finally forming a ego network set
In one embodiment, step S3 includes:
S3.1: from node v 1, starting with the ego network set formed by G 1, at the corresponding ego network Extracting s node sequences by using a random walk mode, wherein the beginning of each sequence is a node v 1, the sequence length is t, and the rest nodes repeat the process, and finally, extracting s node sequences from a ego network of each node to obtain n s node sequences altogether, and combining the n node sequences into a node sequence set L 1 of G 1;
S3.2: according to the ego network set formed by G 2, starting from a node v '1, extracting s node sequences in a corresponding ego network Gv 1' by using a random walk mode, wherein each sequence starts with a node v 1, the sequence length is t, and the rest nodes repeat the process, finally, extracting s node sequences from the ego network of each node, and obtaining m x s node sequences altogether, so as to combine the node sequence set L 2 of G 2.
In one embodiment, S4 comprises:
S4.1: inputting ego network set formed by G 1 into a skip-gram model as training data, adjusting model parameters, mapping each node into a p-dimensional feature vector, and finally mapping the G 1 network into a feature space G 1={u1,u2…un, wherein each node is represented by the feature vector;
S4.2: the ego network set formed by G 2 is used as training data to be input into a skip-gram model, model parameters are adjusted, each node is mapped into a p-dimensional feature vector, the G 2 network is finally mapped into a feature space G 2={u′1,u′2…u′m, and each node is represented by the feature vector.
In one embodiment, step S5 includes:
S5.1: training the associated users of the two social data sets in the step S1 as mapping basis to obtain a target feature mapping matrix, and mapping vector spaces of the two social networks into the same feature space based on the target feature mapping matrix;
S5.2: and mapping the nodes in G 1 to G 2 according to the target feature mapping matrix, obtaining corresponding new nodes, calculating the similarity between each new node in G 1 and each node in G 2, and carrying out associated user identification according to the calculated similarity.
In one embodiment, step S5.1 comprises:
And (3) constructing a mapping matrix by adopting the two new feature spaces obtained in the step (S4), and training by using a minimized objective function W *=argmin(Y-XW)T (Y-XW) to obtain a final target mapping matrix W *=(XTY)-1(XT Y, wherein X, Y respectively represents the two new feature spaces, W is the mapping matrix, and W * is the target mapping matrix.
In one embodiment, step S5.2 comprises:
According to the target mapping matrix, mapping each node in G 1 into G 2 to obtain a corresponding new node, wherein the calculation mode is as follows:
Wherein u 1 is a node in G 1, The new node corresponding to u 1, namely the node u 1 in G 1 is mapped to the mapping node in G 2;
and calculating cosine similarity between each new node and each node in the social network G 2:
where u' i is the ith node in G 2, Representing nodes/>Similarity to u' i;
By comparison of Cosine similarity values with each node in the social network G 2 are sorted in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 with the node u 1 in the social network G 1.
The above technical solutions in the embodiments of the present application at least have one or more of the following technical effects:
according to the social network topological graph-based associated user identity recognition method, after the social network topological graph is built, in order to obtain better embedding, a ego network of a user (namely a local network formed by a node neighbor) is formed firstly, then a random walk is used for extracting a user node sequence, then a natural language model framework is used for learning a low-dimensional vector representation of the user, and finally a training matrix maps two social networks into the same feature space for alignment. The method can avoid the interference caused by the high-order neighbors by utilizing ego network, so that the node embedding result can be improved, and the association accuracy is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for identifying an associated user identity based on a network representation according to an embodiment of the present invention.
Detailed Description
User alignment across social networks refers to finding users with the same identity among multiple social networks. The method has important application in the natural science fields of link prediction, individual recommendation and the like, and has a certain research value in the data mining field. The present inventors have found through a great deal of research and practice that: most current approaches embed social networks into a low-dimensional vector space and then align users into the low-dimensional space. However, because the social network is extremely complex and bulky, it is susceptible to error propagation and noise from different neighbors during the network embedding process.
Based on this, to obtain better embedding, the method of the present invention first forms the ego network of the user (i.e., extracts the local network formed by a section of neighbors of the user), then uses random walk to extract the user node sequence, then uses the natural language model framework to learn the low-dimensional vector representation of the user, and finally trains the matrix to map the two social networks into the same feature space for alignment. The invention can avoid the interference caused by the high-order neighbors by utilizing ego network, thus improving the node embedding result and the association accuracy.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides an associated user identity recognition method based on a social network topological graph, which comprises the following steps:
S1: acquiring two known social network data sets, wherein the known social network data sets comprise friend relations between users, and the two social network data sets have associated users;
S2: respectively constructing topological graphs of the social networks G 1 and G 2 according to users and friend relations in the social network data set, wherein the social network topological graph comprises nodes and connected edges, the nodes represent the users, and the connected edges represent the friend relations; forming a first-order ego network of each node according to the social networks G 1 and G 2 respectively, wherein the first-order ego network graphs of each node in the G 1 network are combined to form a ego topological graph set, and the first-order ego network graphs of each node in the G 2 network are combined to form a ego topological graph set;
S3: forming s node sequences according to a ego network of each node by using a ego topological graph set of each node in two social networks G 1 and G 2 respectively, wherein the node sequences are extracted by adopting a random walk method to form node sequence sets of the two social networks;
S4: respectively mapping the node sequence sets of the two formed social networks into two feature spaces by using a skip-gram model, and learning the low-dimensional vector representation of the nodes in the mapped feature spaces to obtain the feature vector representation of each node;
S5: training according to the associated users of the two social network data sets to obtain a target feature mapping matrix, mapping the two feature spaces into the same feature space, calculating the similarity between a new node in the social network G 1 and each node in the social network G 2, and carrying out associated user identity recognition according to the calculated similarity, wherein the new node in the social network G 1 is a node obtained by mapping the original node in the social network G 1 according to the trained target feature mapping matrix.
Referring to fig. 1, a flowchart of a related user identity recognition method based on network representation according to an embodiment of the present invention is shown, where an SG model is a skip-gram model.
Specifically, step S1 is the acquisition of a data set, and step S2 is the formation of a social network topology according to the acquired given social network data set, and the generation of a ego network topology of the network by extracting the first-order neighbors of the node. Step S3 is to extract a sequence containing node structure information from ego networks of nodes by using a random walk method, and a corpus of node sequences is formed by a plurality of node sequences. Step S4 is to convert the node sequence in the corpus by using a continuous word bag model (skip-gram model) in natural language processing to obtain the expression vector of the node. Step S5 is to obtain a space mapping matrix through partial known association nodes, map vector spaces of two social networks into the same vector space, and obtain the similarity of the nodes by using the newly obtained expression vector in the new space.
In S2, first-order neighbors of each node in the network are extracted for two social networks G 1 and G 2 respectively to form a ego network setAnd/>
In S4, the node sequence is mapped into a vector matrix by adopting a skip-gram model in natural language processing, and the model has a self-contained function package in python, and only needs to be called and then parameters needed by people are adjusted.
In step S4, the present invention trains a mapping matrix W, constructs a mapping function W *=argmin(Y-XW)T (Y-XW) by k known matched correlation nodes x= { u 1,u2…uk } and y= { u' 1,u′2…u′k }, and finds the final target mapping matrix by minimizing the target function.
The scheme of the invention can adopt technical software to realize automatic flow operation.
In one embodiment, the two social network datasets include dataset one and dataset two, step S2 comprising:
S2.1: constructing a topology map of a social network G 1 according to a dataset, wherein G 1 comprises n nodes, v 1,v2…vn respectively, starting from node v 1 in G 1, extracting the node and all first-order neighbors thereof, then supplementing the connection edges between the extracted node and the first-order neighbors and the connection edges between the first-order neighbors according to the edges in G 1, forming ego network map Gv 1,v2-vn of node v 1, repeating the process until forming ego network maps of n nodes, and finally forming a ego network set
S2.2: constructing a topological graph of a social network G 2 according to a dataset II, wherein G 2 comprises m nodes, v ' 1,v′2…v′m respectively, starting from a node v ' 1 in G 2, extracting the node and all first-order neighbors thereof, then supplementing the connection edge between the extracted node and the first-order neighbors and the connection edge between each first-order neighbor according to the edge in G 2, forming a ego network graph Gv 1′,v′2-v′m of the node v ' 1, repeating the process until forming a ego network graph of m nodes, and finally forming a ego network set
In one embodiment, step S3 includes:
S3.1: from node v 1, starting with the ego network set formed by G 1, at the corresponding ego network Extracting s node sequences by using a random walk mode, wherein the beginning of each sequence is a node v 1, the sequence length is t, and the rest nodes repeat the process, and finally, extracting s node sequences from a ego network of each node to obtain n s node sequences altogether, and combining the n node sequences into a node sequence set L 1 of G 1;
S3.2: according to the ego network set formed by G 2, starting from a node v '1, extracting s node sequences in a corresponding ego network Gv 1' by using a random walk mode, wherein each sequence starts with a node v 1, the sequence length is t, and the rest nodes repeat the process, finally, extracting s node sequences from the ego network of each node, and obtaining m x s node sequences altogether, so as to combine the node sequence set L 2 of G 2.
In the specific implementation process, the S node sequences in the step S3.1 are respectively as follows: l 1,l2…ls.
In one embodiment, S4 comprises:
S4.1: inputting ego network set formed by G 1 into a skip-gram model as training data, adjusting model parameters, mapping each node into a p-dimensional feature vector, and finally mapping the G 1 network into a feature space G 1={u1,u2…un, wherein each node is represented by the feature vector;
S4.2: the ego network set formed by G 2 is used as training data to be input into a skip-gram model, model parameters are adjusted, each node is mapped into a p-dimensional feature vector, the G 2 network is finally mapped into a feature space G 2={u′1,u′2…u′m, and each node is represented by the feature vector.
In one embodiment, step S5 includes:
S5.1: training the associated users of the two social data sets in the step S1 as mapping basis to obtain a target feature mapping matrix, and mapping vector spaces of the two social networks into the same feature space based on the target feature mapping matrix;
S5.2: and mapping the nodes in G 1 to G 2 according to the target feature mapping matrix, obtaining corresponding new nodes, calculating the similarity between each new node in G 1 and each node in G 2, and carrying out associated user identification according to the calculated similarity.
Specifically, after the two feature spaces are formed in step S4, since the dimensions of the two feature spaces are inconsistent, a feature mapping matrix needs to be trained to map the two feature spaces into one feature space, and then the similarity of each node is calculated.
In one embodiment, step S5.1 comprises:
And (3) constructing a mapping matrix by adopting the two new feature spaces obtained in the step (S4), and training by using a minimized objective function W *=argmin(Y-XW)T (Y-XW) to obtain a final target mapping matrix W *=(XTY)-1(XT Y, wherein X, Y respectively represents the two new feature spaces, W is the mapping matrix, and W * is the target mapping matrix.
In one embodiment, step S5.2 comprises:
According to the target mapping matrix, mapping each node in G 1 into G 2 to obtain a corresponding new node, wherein the calculation mode is as follows:
Wherein u 1 is a node in G 1, The new node corresponding to u 1, namely the node u 1 in G 1 is mapped to the mapping node in G 2;
and calculating cosine similarity between each new node and each node in the social network G 2:
Where u' i is the ith node in G 2, the value of i may traverse each node in social network G 2, i.e. i= … m, Representing nodes/>Similarity to u' i;
By comparison of Cosine similarity values with each node in the social network G 2 are sorted in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 with the node u 1 in the social network G 1.
Specifically, since the two feature spaces (X, Y) of G 1 and G 2 are obtained separately and are not the same space, and the similarity cannot be calculated directly, a mapping matrix needs to be trained, specifically, the known partial association nodes in the data set in step 1 can be used to make a mapping basis, and the two feature space dimensions are matched, so that the nodes in the G 1 network can be mapped into the G 2 network through the mapping matrix W.
For example, when we get the vector spaces G 1={u1,u2…un and G 2={u′1,u′2…u′m of two social networks, where u i is the embedded vector of the node v i obtained by the skip-gram model, n is the total number of nodes in G 1, u 'i is the embedded vector of the node v' i obtained by the skip-gram model, and m is the total number of nodes in G 2. The known k matched associated nodes X= { u 1,u2…uk } and Y= { u' 1,u′2…u′k } in the data set in the step 1 are selected, wherein 1-k is the renumbering of the k nodes, and the renumbering is not consistent with the previous numbering, so that a mapping matrix W is constructed by using only two new vector spaces X and Y, and the XW is similar to the Y as much as possible. The final mapping matrix W *=(XTY)-1(XT Y is found by minimizing the objective function W *=argmin(Y-XW)T (Y-XW). Finally, each node vector in the social network G 1={u1,u2…un may be mapped into the social network G 2 by the mapping matrix W *, e.g., for node u 1, a mapped node in the social network G 2 may be obtainedThe remaining nodes are similar.
The nodes in the G 1 network are mapped into the G 2 network after being multiplied by W *, then cosine similarity is calculated between new vectors of the nodes in the G 1 network and vectors in the G 2 network respectively, the nodes in the G 2 are arranged according to the similarity, and a plurality of similar nodes in the front are selected. Such as when a mapped node for node u 1 is obtainedTongue, calculate/>, respectivelyCosine similarity/>, with each node in social network G 1 And then by comparison/>The cosine similarity values of the nodes in the social network G 2 are ranked in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 and the node u 1 in the social network G 1, and the same is similar to other nodes in the social network G 1.
Finally, comparing the result with the accurate result to obtain an identity association index value The concrete representation forms are as follows:
Wherein, The true associated user a' i, which refers to user a i, is present in the first N selected predicted users, topN, and a 1-an refers to the N user nodes in the above step, where N is the total number of nodes.
The applicant runs on an Intel (R) Core (TM) i5-9500CPU@3.00GHz 3.00GHz computer, and by using the method of the embodiment, the disclosed data set Fourdure-Twitter is compared with the document (Tan S,Guan Z,CaiD,Qin X,Bu J,Chen C(2014)Mapping users across networks by manifold alignment on hypergraph.Proc.AAAI Conf.Artificial Intelligence,Quebec City,Canada,159-165.),(Perozzi B.,AI-Rfou R.,Skiena S.DeepWalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge discovery and data mining.New York:ACM Press,2014:701-710.),(Zhou F,Zhang K,Xie S,et al.Learning to Correlate Accounts Across Online Social Networks:An Embedding-Based Approach[J].INFORMS Journal on Computing,2020,32.), so that the identity association effect is improved, and the method can be applied to the fields of recommendation systems, network security and the like.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Various modifications may be made to the particular embodiments described, or equivalents may be substituted, by those skilled in the art without departing from the spirit of the invention or exceeding the scope of the invention as defined by the appended claims.
Claims (4)
1. The method for identifying the identity of the associated user based on the social network topological graph is characterized by comprising the following steps of:
S1: acquiring two known social network data sets, wherein the known social network data sets comprise friend relations between users, and the two social network data sets have associated users;
S2: respectively constructing topological graphs of the social networks G 1 and G 2 according to users and friend relations in the social network data set, wherein the social network topological graph comprises nodes and connected edges, the nodes represent the users, and the connected edges represent the friend relations; forming a first-order ego network of each node according to the social networks G 1 and G 2 respectively, wherein the first-order ego network graphs of each node in the G 1 network are combined to form a ego topological graph set, and the first-order ego network graphs of each node in the G 2 network are combined to form a ego topological graph set;
S3: forming s node sequences according to a ego network of each node by using a ego topological graph set of each node in two social networks G 1 and G 2 respectively, wherein the node sequences are extracted by adopting a random walk method to form node sequence sets of the two social networks;
S4: respectively mapping the node sequence sets of the two formed social networks into two feature spaces by using a skip-gram model, and learning the low-dimensional vector representation of the nodes in the mapped feature spaces to obtain the feature vector representation of each node;
S5: training according to the associated users of the two social network data sets to obtain a target feature mapping matrix, mapping the two feature spaces into the same feature space, then calculating the similarity between a new node in the social network G 1 and each node in the social network G 2, and carrying out associated user identity recognition according to the calculated similarity, wherein the new node in the social network G 1 is a node obtained by mapping the original node in the social network G 1 according to the trained target feature mapping matrix;
Wherein, step S5 includes:
S5.1: training the associated users of the two social data sets in the step S1 as mapping basis to obtain a target feature mapping matrix, and mapping vector spaces of the two social networks into the same feature space based on the target feature mapping matrix;
S5.2: according to the target feature mapping matrix, mapping the nodes in G 1 to G 2, obtaining corresponding new nodes, then calculating the similarity between each new node of G 1 and each node in G 2, and carrying out associated user identification according to the calculated similarity;
Step S5.1 includes:
Constructing a mapping matrix by adopting the two new feature spaces obtained in the step S4, and training by using a minimized objective function W *=argmin(Y-XW)T (Y-XW) to obtain a final objective mapping matrix W *=(XTY)-1(XT Y, wherein x and Y respectively represent the two new feature spaces, W is the mapping matrix, and W * is the objective mapping matrix;
step S5.2 comprises:
According to the target mapping matrix, mapping each node in G 1 into G 2 to obtain a corresponding new node, wherein the calculation mode is as follows:
Wherein u 1 is a node in G 1, The new node corresponding to u 1, namely the node u 1 in G 1 is mapped to the mapping node in G 2;
and calculating cosine similarity between each new node and each node in the social network G 2:
where u' i is the ith node in G 2, Representing nodes/>Similarity to u' i;
By comparison of Cosine similarity values with each node in the social network G 2 are sorted in order from large to small, and the top N are sequentially taken as association matching results in the social network G 2 with the node u 1 in the social network G 1.
2. The social network topology-based associated user identification method of claim 1, wherein the two social network datasets comprise dataset one and dataset two, step S2 comprising:
S2.1: constructing a topology map of a social network G 1 according to a dataset, wherein G 1 comprises n nodes, v 1,v2…vn respectively, starting from node v 1 in G 1, extracting the node and all first-order neighbors thereof, then supplementing the connection edges between the extracted node and the first-order neighbors and the connection edges between the first-order neighbors according to the edges in G 1, forming ego network map Gv 1,v2-vn of node v 1, repeating the process until forming ego network maps of n nodes, and finally forming a ego network set
S2.2: constructing a topological graph of a social network G 2 according to a dataset II, wherein G 2 comprises m nodes, v ' 1,v′2…v′m respectively, starting from a node v ' 1 in G 2, extracting the node and all first-order neighbors thereof, then supplementing the connection edge between the extracted node and the first-order neighbors and the connection edge between each first-order neighbor according to the edge in G 2, forming a ego network graph Gv 1′,v′2-v′m of the node v ' 1, repeating the process until forming a ego network graph of m nodes, and finally forming a ego network set
3. The social network topology-based associated user identification method of claim 1, wherein step S3 comprises:
S3.1: from node v 1, starting with the ego network set formed by G 1, at the corresponding ego network Extracting s node sequences by using a random walk mode, wherein the beginning of each sequence is a node v 1, the sequence length is t, and the rest nodes repeat the process, and finally, extracting s node sequences from a ego network of each node to obtain n s node sequences altogether, and combining the n node sequences into a node sequence set L 1 of G 1;
S3.2: according to the ego network set formed by G 2, starting from a node v '1, extracting s node sequences in a corresponding ego network Gv 1' by using a random walk mode, wherein each sequence starts with a node v 1, the sequence length is t, and the rest nodes repeat the process, finally, extracting s node sequences from the ego network of each node, and obtaining m x s node sequences altogether, so as to combine the node sequence set L 2 of G 2.
4. The social network topology-based associated user identification method of claim 1, wherein S4 comprises:
S4.1: inputting ego network set formed by G 1 into a skip-gram model as training data, adjusting model parameters, mapping each node into a p-dimensional feature vector, and finally mapping the G 1 network into a feature space G 1={u1,u2…un, wherein each node is represented by the feature vector;
S4.2: the ego network set formed by G 2 is used as training data to be input into a skip-gram model, model parameters are adjusted, each node is mapped into a p-dimensional feature vector, the G 2 network is finally mapped into a feature space G 2={u′1,u′2…u′m, and each node is represented by the feature vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210429087.9A CN114782209B (en) | 2022-04-22 | 2022-04-22 | Social network topological graph-based associated user identity recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210429087.9A CN114782209B (en) | 2022-04-22 | 2022-04-22 | Social network topological graph-based associated user identity recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114782209A CN114782209A (en) | 2022-07-22 |
CN114782209B true CN114782209B (en) | 2024-06-11 |
Family
ID=82430692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210429087.9A Active CN114782209B (en) | 2022-04-22 | 2022-04-22 | Social network topological graph-based associated user identity recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114782209B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116776193A (en) * | 2023-05-17 | 2023-09-19 | 广州大学 | Method and device for associating virtual identities across social networks based on attention mechanism |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8306922B1 (en) * | 2009-10-01 | 2012-11-06 | Google Inc. | Detecting content on a social network using links |
WO2019085641A1 (en) * | 2017-11-01 | 2019-05-09 | 上海掌门科技有限公司 | Method and apparatus for friend recommendation |
CN110532436A (en) * | 2019-07-17 | 2019-12-03 | 中国人民解放军战略支援部队信息工程大学 | Across social network user personal identification method based on community structure |
CN111080304A (en) * | 2019-12-12 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Credible relationship identification method, device and equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8433670B2 (en) * | 2011-03-03 | 2013-04-30 | Xerox Corporation | System and method for recommending items in multi-relational environments |
-
2022
- 2022-04-22 CN CN202210429087.9A patent/CN114782209B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8306922B1 (en) * | 2009-10-01 | 2012-11-06 | Google Inc. | Detecting content on a social network using links |
WO2019085641A1 (en) * | 2017-11-01 | 2019-05-09 | 上海掌门科技有限公司 | Method and apparatus for friend recommendation |
CN110532436A (en) * | 2019-07-17 | 2019-12-03 | 中国人民解放军战略支援部队信息工程大学 | Across social network user personal identification method based on community structure |
CN111080304A (en) * | 2019-12-12 | 2020-04-28 | 支付宝(杭州)信息技术有限公司 | Credible relationship identification method, device and equipment |
Non-Patent Citations (1)
Title |
---|
刘奇飞 ; 杜彦辉 ; 芦天亮 ; .基于用户关系的跨社交网络用户身份关联方法.计算机应用研究.(02),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN114782209A (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110851645B (en) | Image retrieval method based on similarity maintenance under deep metric learning | |
CN111414461B (en) | Intelligent question-answering method and system fusing knowledge base and user modeling | |
CN110070116B (en) | Segmented selection integration image classification method based on deep tree training strategy | |
CN108805077A (en) | A kind of face identification system of the deep learning network based on triple loss function | |
CN114332984B (en) | Training data processing method, device and storage medium | |
CN113298191B (en) | User behavior identification method based on personalized semi-supervised online federal learning | |
CN113157957A (en) | Attribute graph document clustering method based on graph convolution neural network | |
CN110598061A (en) | Multi-element graph fused heterogeneous information network embedding method | |
Barman et al. | Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems | |
CN110866134A (en) | Image retrieval-oriented distribution consistency keeping metric learning method | |
CN115357728A (en) | Large model knowledge graph representation method based on Transformer | |
CN114782209B (en) | Social network topological graph-based associated user identity recognition method | |
CN114282059A (en) | Video retrieval method, device, equipment and storage medium | |
CN117036760A (en) | Multi-view clustering model implementation method based on graph comparison learning | |
CN116206327A (en) | Image classification method based on online knowledge distillation | |
CN116977763A (en) | Model training method, device, computer readable storage medium and computer equipment | |
Tian et al. | Genetic algorithm based deep learning model selection for visual data classification | |
CN116862024A (en) | Credible personalized federal learning method and device based on clustering and knowledge distillation | |
CN112541530B (en) | Data preprocessing method and device for clustering model | |
CN115600642B (en) | Stream media-oriented decentralization federation learning method based on neighbor trust aggregation | |
CN117010373A (en) | Recommendation method for category and group to which asset management data of power equipment belong | |
Shi et al. | EpiRep: Learning node representations through epidemic dynamics on networks | |
CN115910232A (en) | Multi-view drug pair response prediction method, device, equipment and storage medium | |
CN114169007B (en) | Medical privacy data identification method based on dynamic neural network | |
CN115661539A (en) | Less-sample image identification method embedded with uncertainty information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |