CN113434782A - Cross-social network user identity recognition method based on joint embedded learning model - Google Patents

Cross-social network user identity recognition method based on joint embedded learning model Download PDF

Info

Publication number
CN113434782A
CN113434782A CN202110718740.9A CN202110718740A CN113434782A CN 113434782 A CN113434782 A CN 113434782A CN 202110718740 A CN202110718740 A CN 202110718740A CN 113434782 A CN113434782 A CN 113434782A
Authority
CN
China
Prior art keywords
user
upg
representing
node
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110718740.9A
Other languages
Chinese (zh)
Other versions
CN113434782B (en
Inventor
王李冬
关佶红
常乐
曹世华
胡克用
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dayu Chuangfu Technology Co ltd
Original Assignee
Qianjiang College of Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianjiang College of Hangzhou Normal University filed Critical Qianjiang College of Hangzhou Normal University
Priority to CN202110718740.9A priority Critical patent/CN113434782B/en
Publication of CN113434782A publication Critical patent/CN113434782A/en
Application granted granted Critical
Publication of CN113434782B publication Critical patent/CN113434782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a cross-social-network user identity recognition method based on a joint embedded learning model. Firstly, selecting candidate paired user pairs from two social networks by utilizing the similarity of user names and network structures; then, constructing a user pair network graph by taking all candidate paired user pairs as nodes; secondly, on the basis of the constructed UPG and the labeled user pair data, a labeled matched user information label, structure information and attribute information are fused to build a combined embedded learning model, and the model is designed into a deep neural network structure with 1 input and 2 outputs; and finally, performing minimum learning on the loss function of the combined embedded model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the parameters of the model after learning is finished, and judging whether the output is the same user or not. The method and the system can effectively predict whether two users from different networks are the same user, and play a vital role in related application of cross-social networks in commerce.

Description

Cross-social network user identity recognition method based on joint embedded learning model
Technical Field
The invention relates to the field of user relationship mining for social networks. In particular to a cross-social network user identity recognition method based on a joint embedded learning model.
Background
From early email, BBS, to today's Social Media Networks (SMNs), more and more users are becoming accustomed to daily interaction and information acquisition on Social networks. People often need to register as users of a different website in order to enjoy services provided by the website. It is a common phenomenon that a common user owns virtual accounts of multiple different social networking sites. Because each social network site is independent, the data information is not shared, and a uniform identity identifier is lacked on the network to uniquely identify the netizen, a plurality of social network site accounts belonging to the same netizen are not directly related. In order to obtain a complete image (profile) of a user, data of the user on different social networks needs to be integrated, which breaks through the association of user identities across social platforms, i.e., the identification of accounts of the user on multiple social networks. In recent years, social network identification methods based on representation learning have been prevalent, and researchers have begun to identify users on multiple social networks by using algorithms based on network embedding, however, the following problems still exist in the realization of the cross-social network user identification technology based on representation learning:
1. the existing expression-based learning method belongs to a supervised learning mode and an unsupervised learning mode, wherein the former needs a large amount of Labeled data (Labeled data), the Labeled data is difficult to obtain, and a large amount of manpower is consumed; the latter does not require labeling data, but the obtained effect is often unsatisfactory.
2. The accuracy of user identity recognition can be improved by comprehensively utilizing modal data such as attribute information of a user, structural information of a network, label information of the user and the like, but how to embed the information into a uniform vector space is a difficult problem;
3. the existing user identity correlation method based on representation learning usually splits a task into two steps of embedded learning and identity recognition of nodes, so that label information of a user cannot be effectively integrated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a cross-social-network user identity association method based on a joint embedding model.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1, aiming at social network GAAnd GBThe user selects candidate paired user pairs from the two social networks by utilizing the user name similarity and the network structure;
step 2, all the candidate pairing user pairs P ═ { P ═ PiIs node, if user is piTwo users in the system are respectively neighbors of the other party, then piAnd pjAn edge exists between the two, and a user-to-network graph UPG is constructed by taking the edge as a principle;
step 3, fusing labeled paired user information labels, structure information and attribute information to build a combined embedded learning model on the basis of the constructed user pair network graph UPG and labeled user pair data, and designing the combined embedded learning model into a deep neural network structure with 1 input and 2 outputs;
and 4, performing minimum learning on the loss function of the combined embedded learning model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the model after learning is finished, and judging whether the output is the same user or not.
Further, the step 1 is specifically realized as follows:
1-1.GA=(UA,EA,XA) Representing social networks A, UASet of users representing social network A, EAUser relationship set representing social network A,XAA matrix of user attributes representing social network a,
Figure BDA0003136107610000021
representing user i in social network A; gB=(UB,EB,XB) Representing a social network B, and the rest parameters have similar meanings;
1-2, acquiring data of different social network platforms by using a crawler;
1-3. pairs are from social network G respectivelyAAnd GBTo a user
Figure BDA0003136107610000022
User name n ofkAnd njCalculating the similarity according to a formula (1), and adding a user pair with the similarity larger than 0.8 into the candidate paired user pair set P;
Figure BDA0003136107610000023
wherein, lev (n)k,nj) Represents the Levenshtein distance, l (n)k) Representing a user name nkThe character length of (d);
1-4, expanding neighbor nodes by taking each pair of users in the user pair set P as a seed user pair, selecting user pairs with r common neighbors (known pairs) from the neighbor nodes of the seed user pair, adding the user pairs into the P, and setting different r values according to different data sets.
Further, the step 2 is specifically realized as follows:
2-1.UPG=(UUPG,EUPG) Representing a user versus network diagram, UUPGRepresenting a set of nodes, EUPGRepresenting a set of relationships between nodes; pairing candidate users piNode as UPG and is recorded as u'i,u'i∈UUPG
2-2. suppose
Figure BDA0003136107610000031
And
Figure BDA0003136107610000032
two nodes in the UPG have an edge between the two nodes if the following relationship exists between the two nodes;
Figure BDA0003136107610000033
wherein,
Figure BDA0003136107610000034
representing a user
Figure BDA0003136107610000035
A set of neighboring nodes.
Further, the step 3 is specifically realized as follows:
3-1, marking accurate mapping account numbers of the users in another network by text analysis and matching technology and combining manual judgment through partial user attribute information crawled by a crawler; the marked user matching pairs are used as the monitoring information of model training;
3-2, every two users in the candidate pairing user set generated in the step 2-1 are paired
Figure BDA0003136107610000036
And
Figure BDA0003136107610000037
the attributes of (1) are subjected to feature conversion through one-hot coding and are respectively recorded as
Figure BDA0003136107610000038
And
Figure BDA0003136107610000039
the attributes comprise a user name, a gender, a graduation institution and a geographic location;
3-3, constructing a joint embedded learning model for the network aiming at the constructed user; vector the attributes of two users in a node
Figure BDA00031361076100000310
Performing a splicing operation, note
Figure BDA00031361076100000311
And d isiAs input to a joint embedded learning model; the output has a left branch and a right branch, and the left branch utilizes a multilayer perceptron model to output a node label yiThe probability values are 0 and 1, wherein 1 represents that two users in the node are the same user, and 0 represents that two users in the node are different users; outputting the predicted probability value of the Context node by the right branch by using a skipgram model;
the mth layer of the skipgram model is represented as:
Figure BDA00031361076100000312
Figure BDA00031361076100000313
Figure BDA0003136107610000041
wherein δ (·) represents a sigmoid function, WmAnd bmWeights and biases parameters for m layers; formula (4) and formula (5) represent the m +1 th layers of the left and right branches, respectively;
Figure BDA0003136107610000042
the weights parameter representing the left branch of the (m + 1) th layer,
Figure BDA0003136107610000043
the weights parameter of the right branch of the (m + 1) th layer is represented,
Figure BDA0003136107610000044
and
Figure BDA0003136107610000045
and so on;
the last layer of the left branch of the model is designed as softmax layer, and the input of the layer is:
Figure BDA0003136107610000046
the last layer of the right branch of the model is designed as a softmax layer, and the input of the layer is as follows:
Figure BDA0003136107610000047
where k represents the number of layers of the left branch implicit layer and k' represents the number of layers of the right branch implicit layer.
Further, the step 4 is specifically realized as follows:
4-1. the left branch of the joint embedding learning model is a multi-layer perceptual model, and the loss function of the branch is defined as:
Figure BDA0003136107610000048
wherein
Figure BDA0003136107610000049
Representing a tagged node in UPG, p (y)i|di) Represents given diUnder the condition of yiIs calculated as follows:
Figure BDA00031361076100000410
the right branch adopts a negative sampling mechanism to define a loss function as follows:
Figure BDA00031361076100000411
where δ (·) stands for sigmoid function, n ═ UUPGL, u 'represents node u'iThe context node of (a) is selected,
Figure BDA00031361076100000412
representing randomly selected t negative samples;
4-2, calculating parameters by adopting a mini-batch gradient descent method; setting the value of the left branch's batch b1Set to 200, the value of batch of the right branch b2Is 200; slave UUPGMiddle random sampling b1The labeled nodes, and calculate L(L)According to the gradient value of the parameter WmAnd bm
Figure BDA0003136107610000051
And
Figure BDA0003136107610000052
updating;
4-3 from UUPGMiddle random sampling b2A node and calculate
Figure BDA0003136107610000053
According to the gradient value of the parameter WmAnd bm
Figure BDA0003136107610000054
And
Figure BDA0003136107610000055
updating of (1);
4-4, returning to the step 4-2, and iterating for 100 times;
4-5 input node u 'to be predicted in UPG'jCalculating according to the step 3-2 to obtain the attribute vectors of the two users in the node, and splicing the attribute vectors to obtain a vector djInputting the data into a joint embedding learning model, and calculating to obtain a node u 'to be predicted'jThe label of (1).
The invention has the following beneficial effects:
the invention focuses on how to implement network embedding method, effectively integrates key factors of user identity identification, and realizes user identity identification on two social platforms. The cross-social platform identity association plays a crucial role in business cross-social network applications, such as user behavior analysis of multiple social networks, information service push of cross-social networks, cross-platform friend recommendation, network security governance of government offices and enterprises and the like. The method and the system can effectively predict whether two users from different networks are the same user, and play a vital role in the related application of cross-social networks in commerce.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of an example of a candidate paired user pair generation;
FIG. 3 is an exemplary diagram of a user generating a network graph;
FIG. 4 is an exemplary diagram of a joint embedding model;
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in FIG. 1, the method for identifying the user identity across the social network based on the joint embedded learning model comprises the following steps:
step 1 for social network GAAnd GBThe user selects candidate paired user pairs from the two social networks by utilizing the user name similarity and the network structure;
step 2, all candidate pairing user pairs P ═ { P ═ PiIs node, if user is piTwo users in the system are respectively neighbors of the other party, then piAnd pjAn edge exists between the two, and a User Pair network Graph (UPG) is constructed by taking the edge as a principle;
step 3, on the basis of the constructed UPG and labeled user pair data (labeled user pairs), labeled paired user information labels, structure information and attribute information are fused to build a joint embedded learning model, and the model is designed into a deep neural network structure with 1 input and 2 outputs;
and 4, learning the loss function minimization of the combined embedded model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the model after learning is finished, and judging whether the output is the same user or not.
The specific implementation process of the step 1 is as follows:
1-1.GA=(UA,EA,XA) Representing social networks A, UASet of users representing social network A, EASet of user relationships, X, representing social network AAA matrix of user attributes representing social network a,
Figure BDA0003136107610000061
representing a user in social network A; gB=(UB,EB,XB) Representing social network B, the remaining parameters are similar in meaning. The invention utilizes web crawlers to microblog from the green sea (G)A) And known as (G)B) The new wave net comprises about 1.23 x 10 user nodes5The human network contains about 1.95 x 10 user data5. The user information common to both networks includes user name, gender, college and location.
And 1-2, data of different social network platforms are obtained by using a crawler.
1-3. pairs are from social network G respectivelyAAnd GBTo a user
Figure BDA0003136107610000062
User name string nkAnd njCalculating the similarity according to the following formula, selecting the user pairs with the similarity more than 0.8 to be added into the candidate paired user pair set P,
Figure BDA0003136107610000063
Figure BDA0003136107610000064
wherein, lev (n)k,nj) Represents the Levenshtein distance, l (n)k) Representing a user name nkThe character length of (2). For example, the user name "vio" and "violet" have a similarity of 0.5.
1-3, taking each pair of users in the P as a seed user pair to expand neighbor nodes, selecting the user pairs with r common neighbors (known pairs) from the neighbor nodes of the seed user pair to be added into the P, and setting different r values according to different data sets. In this step, the present invention provides the example shown in FIG. 2. In FIG. 2, assume that
Figure BDA0003136107610000065
For user pairs with a username similarity greater than 0.8, let r be 2, according to which step it will be
Figure BDA0003136107610000071
Four user pairs are used as candidate pairing user pairs to be added into P, and finally
Figure BDA0003136107610000072
The specific implementation process of the step 2 is as follows:
2-1.UPG=(UUPG,EUPG) Representing a user versus network diagram, UUPGRepresenting a set of nodes, EUPGRepresenting a set of relationships between nodes. Pairing candidate users piNode as UPG and is recorded as u'i,u'i∈UUPG
2-2. suppose
Figure BDA0003136107610000073
And
Figure BDA0003136107610000074
for two nodes in a UPG, there is an edge between the two nodes if there is a relationship between them.
Figure BDA0003136107610000075
Wherein,
Figure BDA0003136107610000076
representing a user
Figure BDA0003136107610000077
A set of neighboring nodes.
The present invention provides step 2 with a user-to-network graph generated by the two social networks shown in FIG. 2, with the results shown in FIG. 3. According to step 2-1 and step 2-2, the generated user-to-network graph contains 6 nodes and 8 edges.
The specific implementation process of the step 3 is as follows:
and 3-1, marking the accurate mapping account of the user in another network by using partial user attribute information (such as account information of other platforms, mobile phones and mailboxes provided by the user in personal introduction) crawled by a crawler, text analysis and matching technology and manual judgment. And the marked user matching pairs are used as the monitoring information of model training.
3-2, every two users in the candidate pairing user set generated in the step 2-1 are paired
Figure BDA0003136107610000078
And
Figure BDA0003136107610000079
the attributes (user name, gender, college and geography) of (1) are subjected to feature conversion by one-hot coding and are respectively recorded as
Figure BDA00031361076100000710
And
Figure BDA00031361076100000711
specifically, aiming at the attribute of the user name, Chinese characters are unified into pinyin, capital letters are unified into lowercase letters, special characters such as underlines and the like are removed, and then a plurality of character substrings are intercepted from the user name
Figure BDA00031361076100000712
For charactersSubstrings are subjected to one-hot encoding. For example, for a user name "violet", several character substrings { "vio", "iol", "ole", "let" } with a length of 3 may be truncated. And directly implementing one-hot coding according to the classifiable attributes such as gender, geographic position, graduation colleges and the like. For example, there are only two options in gender, "male" and "female", then the "male" attribute may be encoded as {10}, the "female" attribute may be encoded as {01}, and the remaining attributes are similar.
3-3. as shown in FIG. 4, a joint embedding model is built for the built user to the network. Attribute vectors (denoted as attribute vectors) for two users in a node
Figure BDA0003136107610000081
) Performing a splicing operation, note
Figure BDA0003136107610000082
And as input to the joint embedding model; the output has a left branch and a right branch, and the left branch utilizes a multilayer perceptron model to output a node label y representing predictioniAnd the probability values are 0 and 1 (1 represents that two users in the node are the same user, and 0 represents that two users in the node are different users), and the probability value of the predicted Context node is output by the right branch by using a skipgram model. The mth layer of the model is represented as:
Figure BDA0003136107610000083
Figure BDA0003136107610000084
Figure BDA0003136107610000085
wherein δ (·) represents a sigmoid function, WmAnd bmAre the weights and biases parameters for the m layers. The latter two formulas represent the (m + 1) th layers of the left branch and the right branch respectively;
Figure BDA0003136107610000086
the weights parameter representing the left branch of the (m + 1) th layer,
Figure BDA0003136107610000087
the weights parameter of the right branch of the (m + 1) th layer is represented,
Figure BDA0003136107610000088
and
Figure BDA0003136107610000089
and so on.
The last layer of the left branch (node label prediction) of the model is designed as the softmax layer, and the inputs of the layer are:
Figure BDA00031361076100000810
the last layer of the right branch (node label prediction) of the model is designed as the softmax layer, and the inputs of the layer are:
Figure BDA00031361076100000811
where k represents the number of layers of the left branch implicit layer and k' represents the number of layers of the right branch implicit layer.
The specific implementation process of the step 4 is as follows:
4-1. the left branch of the joint embedding model is a multi-layer perceptual model, and the loss function of the branch is defined as:
Figure BDA00031361076100000812
wherein
Figure BDA00031361076100000813
Representing a tagged node in UPG, p (y)i|di) Represents given diUnder the condition of yiIs calculated as follows:
Figure BDA0003136107610000091
the right branch adopts a negative sampling mechanism to define a loss function as follows:
Figure BDA0003136107610000092
where δ (·) stands for sigmoid function, n ═ UUPGL, u 'represent all points u'iThe context node of (a) is selected,
Figure BDA0003136107610000093
representing t negative samples chosen at random. The remaining parameters are referred to in step 3-3.
4-2, calculating parameters by adopting a mini-batch gradient descent method. Setting the value of the left branch's batch b1Set to 200, the value of batch of the right branch b2Is 200, randomly sampling b1Node with label, and calculate ^ L(L)By a gradient value of the parameter WmAnd bm
Figure BDA0003136107610000094
And
Figure BDA0003136107610000095
updating;
4-3 from UUPGMiddle sampling b2A node and calculate
Figure BDA0003136107610000096
By a gradient value of the parameter WmAnd bm
Figure BDA0003136107610000097
And
Figure BDA0003136107610000098
updating of (1);
4-4 returns to step 4-2 and iterates 100 times.
4-5 input node u 'to be predicted in UPG'jCalculating according to the step 3-2 to obtain the attribute vectors of the two users in the node, and splicing the attribute vectors to obtain a vector djInputting the data into a joint embedding model, and calculating to obtain a node u 'to be predicted'jThe label of (1).
In step 4, taking the crawl of the user data of the Xinlang microblog and the known net user data as an example, 7325 user data pairs are extracted from the user data, wherein the 7325 user data pairs comprise 2213 labeled data, 30% of the labeled data are extracted to serve as model training data, and the rest are taken as test data. And aiming at the network pair, constructing a user-to-network diagram, constructing a joint embedded model according to the diagram 4, and performing parameter learning on the model. And (4) carrying out user identity correlation and calculating accuracy aiming at the test data pair, wherein the finally obtained accuracy reaches 84.7%.

Claims (5)

1. The cross-social network user identity recognition method based on the joint embedded learning model is characterized by comprising the following steps of:
step 1, aiming at social network GAAnd GBThe user selects candidate paired user pairs from the two social networks by utilizing the user name similarity and the network structure;
step 2, all the candidate pairing user pairs P ═ { P ═ PiIs node, if user is piTwo users in the system are respectively neighbors of the other party, then piAnd pjAn edge exists between the two, and a user-to-network graph UPG is constructed by taking the edge as a principle;
step 3, fusing labeled paired user information labels, structure information and attribute information to build a combined embedded learning model on the basis of the constructed user pair network graph UPG and labeled user pair data, and designing the combined embedded learning model into a deep neural network structure with 1 input and 2 outputs;
and 4, performing minimum learning on the loss function of the combined embedded learning model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the model after learning is finished, and judging whether the output is the same user or not.
2. The method for identifying the user identity across the social network based on the joint embedded learning model according to claim 1, wherein the step 1 is implemented as follows:
1-1.GA=(UA,EA,XA) Representing social networks A, UASet of users representing social network A, EASet of user relationships, X, representing social network AAA matrix of user attributes representing social network a,
Figure FDA0003136107600000011
representing user i in social network A; gB=(UB,EB,XB) Representing a social network B, and the rest parameters have similar meanings;
1-2, acquiring data of different social network platforms by using a crawler;
1-3. pairs are from social network G respectivelyAAnd GBTo a user
Figure FDA0003136107600000012
User name n ofkAnd njCalculating the similarity according to a formula (1), and adding a user pair with the similarity larger than 0.8 into the candidate paired user pair set P;
Figure FDA0003136107600000013
wherein, lev (n)k,nj) Represents the Levenshtein distance, l (n)k) Representing a user name nkThe character length of (d);
1-4, expanding neighbor nodes by taking each pair of users in the user pair set P as a seed user pair, selecting user pairs with r common neighbors (known pairs) from the neighbor nodes of the seed user pair, adding the user pairs into the P, and setting different r values according to different data sets.
3. The method for identifying the user identity across the social network based on the joint embedded learning model according to claim 2, wherein the step 2 is implemented as follows:
2-1.UPG=(UUPG,EUPG) Representing a user versus network diagram, UUPGRepresenting a set of nodes, EUPGRepresenting a set of relationships between nodes; pairing candidate users piNode as UPG and is recorded as u'i,u'i∈UUPG
2-2. suppose
Figure FDA0003136107600000021
And
Figure FDA0003136107600000022
two nodes in the UPG have an edge between the two nodes if the following relationship exists between the two nodes;
Figure FDA0003136107600000023
wherein,
Figure FDA0003136107600000024
representing a user
Figure FDA0003136107600000025
A set of neighboring nodes.
4. The method for identifying the user identity across the social network based on the joint embedded learning model according to claim 3, wherein the step 3 is implemented as follows:
3-1, marking accurate mapping account numbers of the users in another network by text analysis and matching technology and combining manual judgment through partial user attribute information crawled by a crawler; the marked user matching pairs are used as the monitoring information of model training;
3-2, every two users in the candidate pairing user set generated in the step 2-1 are paired
Figure FDA0003136107600000026
And
Figure FDA0003136107600000027
the attributes of (1) are subjected to feature conversion through one-hot coding and are respectively recorded as
Figure FDA0003136107600000028
And
Figure FDA0003136107600000029
the attributes comprise a user name, a gender, a graduation institution and a geographic location;
3-3, constructing a joint embedded learning model for the network aiming at the constructed user; vector the attributes of two users in a node
Figure FDA00031361076000000210
Performing a splicing operation, note
Figure FDA00031361076000000211
And d isiAs input to a joint embedded learning model; the output has a left branch and a right branch, and the left branch utilizes a multilayer perceptron model to output a node label yiThe probability values are 0 and 1, wherein 1 represents that two users in the node are the same user, and 0 represents that two users in the node are different users; outputting the predicted probability value of the Context node by the right branch by using a skipgram model;
the mth layer of the skipgram model is represented as:
Figure FDA0003136107600000031
Figure FDA0003136107600000032
Figure FDA0003136107600000033
wherein δ (·) represents a sigmoid function, WmAnd bmWeights and biases parameters for m layers; formula (4) and formula (5) represent the m +1 th layers of the left and right branches, respectively;
Figure FDA0003136107600000034
the weights parameter representing the left branch of the (m + 1) th layer,
Figure FDA0003136107600000035
the weights parameter of the right branch of the (m + 1) th layer is represented,
Figure FDA0003136107600000036
and
Figure FDA0003136107600000037
and so on;
the last layer of the left branch of the model is designed as softmax layer, and the input of the layer is:
Figure FDA0003136107600000038
the last layer of the right branch of the model is designed as a softmax layer, and the input of the layer is as follows:
Figure FDA0003136107600000039
where k represents the number of layers of the left branch implicit layer and k' represents the number of layers of the right branch implicit layer.
5. The method for identifying the user identity across the social network based on the joint embedded learning model according to claim 4, wherein the step 4 is implemented as follows:
4-1. the left branch of the joint embedding learning model is a multi-layer perceptual model, and the loss function of the branch is defined as:
Figure FDA00031361076000000310
wherein
Figure FDA00031361076000000311
Representing a tagged node in UPG, p (y)i|di) Represents given diUnder the condition of yiIs calculated as follows:
Figure FDA00031361076000000312
the right branch adopts a negative sampling mechanism to define a loss function as follows:
Figure FDA00031361076000000313
where δ (·) stands for sigmoid function, n ═ UUPGL, u 'represents node u'iThe context node of (a) is selected,
Figure FDA00031361076000000314
representing randomly selected t negative samples;
4-2, calculating parameters by adopting a mini-batch gradient descent method; setting the value of the left branch's batch b1Set to 200, the value of batch of the right branch b2Is 200; slave UUPGMiddle random sampling b1The labeled nodes, and calculate L(L)According to the gradient value of the parameter WmAnd bm
Figure FDA0003136107600000041
And
Figure FDA0003136107600000042
updating;
4-3 from UUPGMiddle random sampling b2A node and calculate
Figure FDA0003136107600000043
According to the gradient value of the parameter WmAnd bm
Figure FDA0003136107600000044
And
Figure FDA0003136107600000045
updating of (1);
4-4, returning to the step 4-2, and iterating for 100 times;
4-5 input node u 'to be predicted in UPG'jCalculating according to the step 3-2 to obtain the attribute vectors of the two users in the node, and splicing the attribute vectors to obtain a vector djInputting the data into a joint embedding learning model, and calculating to obtain a node u 'to be predicted'jThe label of (1).
CN202110718740.9A 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model Active CN113434782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110718740.9A CN113434782B (en) 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110718740.9A CN113434782B (en) 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model

Publications (2)

Publication Number Publication Date
CN113434782A true CN113434782A (en) 2021-09-24
CN113434782B CN113434782B (en) 2022-03-01

Family

ID=77755095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110718740.9A Active CN113434782B (en) 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model

Country Status (1)

Country Link
CN (1) CN113434782B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663245A (en) * 2022-03-16 2022-06-24 南京信息工程大学 Cross-social network identity matching method
CN114817757A (en) * 2022-04-02 2022-07-29 广州大学 Cross-social network virtual identity association method based on graph convolution network
CN116776193A (en) * 2023-05-17 2023-09-19 广州大学 Method and device for associating virtual identities across social networks based on attention mechanism

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140108152A1 (en) * 2012-10-12 2014-04-17 Google Inc. Managing Social Network Relationships Between A Commercial Entity and One or More Users
CN109753602A (en) * 2018-12-04 2019-05-14 中国科学院计算技术研究所 A kind of across social network user personal identification method and system based on machine learning
CN110347932A (en) * 2019-06-04 2019-10-18 中国科学院信息工程研究所 A kind of across a network user's alignment schemes based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140108152A1 (en) * 2012-10-12 2014-04-17 Google Inc. Managing Social Network Relationships Between A Commercial Entity and One or More Users
CN109753602A (en) * 2018-12-04 2019-05-14 中国科学院计算技术研究所 A kind of across social network user personal identification method and system based on machine learning
CN110347932A (en) * 2019-06-04 2019-10-18 中国科学院信息工程研究所 A kind of across a network user's alignment schemes based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIDONG WANG: "Factor Graph Model Based User Profile Matching Across Social Networks", 《IEEE ACCESS》 *
王李冬: "基于CLA 算法的跨社交平台用户身份匹配", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663245A (en) * 2022-03-16 2022-06-24 南京信息工程大学 Cross-social network identity matching method
CN114817757A (en) * 2022-04-02 2022-07-29 广州大学 Cross-social network virtual identity association method based on graph convolution network
CN116776193A (en) * 2023-05-17 2023-09-19 广州大学 Method and device for associating virtual identities across social networks based on attention mechanism

Also Published As

Publication number Publication date
CN113434782B (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN113434782B (en) Cross-social network user identity recognition method based on joint embedded learning model
CN110097125B (en) Cross-network account association method based on embedded representation
CN111753024B (en) Multi-source heterogeneous data entity alignment method oriented to public safety field
CN109857871B (en) User relationship discovery method based on social network mass contextual data
CN109753602B (en) Cross-social network user identity recognition method and system based on machine learning
CN107330461A (en) Collaborative filtering recommending method based on emotion with trust
WO2018112696A1 (en) Content pushing method and content pushing system
CN113628059B (en) Associated user identification method and device based on multi-layer diagram attention network
CN114817663B (en) Service modeling and recommendation method based on class perception graph neural network
CN112988917B (en) Entity alignment method based on multiple entity contexts
CN112084373B (en) Graph embedding-based multi-source heterogeneous network user alignment method
CN107391542A (en) A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates
CN113095948B (en) Multi-source heterogeneous network user alignment method based on graph neural network
CN112884045B (en) Classification method of random edge deletion embedded model based on multiple visual angles
CN109960755B (en) User privacy protection method based on dynamic iteration fast gradient
CN110136017A (en) A kind of group's discovery method based on data enhancing and nonnegative matrix sparse decomposition
CN113254652A (en) Social media posting authenticity detection method based on hypergraph attention network
CN113283243B (en) Entity and relationship combined extraction method
CN112749566B (en) Semantic matching method and device for English writing assistance
CN112561599A (en) Click rate prediction method based on attention network learning and fusing domain feature interaction
CN112905906A (en) Recommendation method and system fusing local collaboration and feature intersection
CN111563374A (en) Personnel social relationship extraction method based on judicial official documents
CN114154024B (en) Link prediction method based on dynamic network attribute representation
Ma et al. Friend closeness based user matching cross social networks
CN116049527A (en) Social network specific target account mining method oriented to military field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230726

Address after: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee after: Guangzhou Dayu Chuangfu Technology Co.,Ltd.

Address before: Hangzhou City, Zhejiang province 310036 Xiasha Higher Education Park forest Street No. 16

Patentee before: HANGZHOU NORMAL UNIVERSITY QIANJIANG College

TR01 Transfer of patent right