CN105741175A - Method for linking accounts in OSNs (On-line Social Networks) - Google Patents
Method for linking accounts in OSNs (On-line Social Networks) Download PDFInfo
- Publication number
- CN105741175A CN105741175A CN201610057577.5A CN201610057577A CN105741175A CN 105741175 A CN105741175 A CN 105741175A CN 201610057577 A CN201610057577 A CN 201610057577A CN 105741175 A CN105741175 A CN 105741175A
- Authority
- CN
- China
- Prior art keywords
- account
- osn
- node
- social networks
- online social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 45
- 239000011159 matrix material Substances 0.000 claims abstract description 37
- 238000005295 random walk Methods 0.000 claims abstract description 13
- 238000011478 gradient descent method Methods 0.000 claims abstract description 6
- 230000009466 transformation Effects 0.000 claims description 27
- 230000008859 change Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000013508 migration Methods 0.000 claims description 4
- 230000005012 migration Effects 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 239000000463 material Substances 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 abstract description 2
- 238000000605 extraction Methods 0.000 abstract description 2
- 238000007781 pre-processing Methods 0.000 abstract 2
- 238000004364 calculation method Methods 0.000 abstract 1
- 238000012216 screening Methods 0.000 abstract 1
- 238000011524 similarity measure Methods 0.000 abstract 1
- 239000000284 extract Substances 0.000 description 4
- 244000097202 Rathbunia alamosensis Species 0.000 description 3
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 210000000582 semen Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for linking accounts in a plurality of OSNs (On-line Social Networks) on the basis of node similarity. The friend relationship is extracted from OSN website accounts for linking the plurality of accounts, belonging to one real user, in different OSN platforms. The method comprises five parts: a data preprocessing part performs preprocessing on an account node relationship diagram of the OSNs; a node sequence extraction part obtains an account node sequence set through random walk; an account vector representation part generates a vector model of each account through a word to vector tool word2vec; a linear conversion matrix calculation part obtains a linear conversion matrix W from one OSN to another OSN by a gradient descent method; a linking account obtaining part maps the account in one OSN to a coordinate space of another OSN; and through similarity measure and threshold value screening, the linking accounts or a candidate set corresponding to all of the accounts are obtained. Errors of the account linking result caused by unreal feature information of the accounts are avoided, so that the account linking robustness is improved.
Description
Technical field
The invention belongs to technical field of network information, more specifically, relate to a kind of to account in online social networks
The method being associated.
Background technology
Account association be used to excavate a user in various online social networkies the master of likely accounts information
Want technological means, can be used for following the trail of the malicious user carrying out the network crime to stop malicious event to spread, the degree of depth excavate account letter
Breath, to optimize recommendation method, detects that accounts information carries out danger early warning to protect the account safety of other websites after stolen.
The feature that traditional account correlating method extracts includes personal information such as sex, age, the geographical position of account itself,
And user behavior rule such as text writing style, term use habit, the mouse time of staying etc..Yet with relating to individual
Privacy, personal information is the most untrue comprehensively;The data of user behavior rule are difficult to obtain the most mostly, and there is error, so
The accuracy of traditional account correlating method is the highest.Additionally, the information that traditional account correlating method needs account is substantially true, it is thought
Road is that the key character attribute that limit account has itself to express the most all sidedly a user, is summarized by feature extraction
Modeling, then falls incoherent candidate item account by model filter, thus finds the account that similarity is the highest as association
Account.Although traditional account correlating method completes the task of account association, but the data needed for traditional account correlating method are defeated
Enter information more, and be difficult to avoid that the deviation that association results is caused by false accounts information, so the stalwartness of account association
Property is the highest.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that account in online social networks is closed by one
The method of connection, to improve the vigorousness of account association.
For achieving the above object, the method that account in online social networks is associated by the present invention, its feature exists
In, comprise the following steps:
(1) two the online social networkies needing to carry out account association are determined, according to demand, by two online social networkies
The friend relation existed between respective account in OSN_X, OSN_Y is represented as by representing the set of node V of account and representing account
Between friend relation limit collection E composition non-directed graph i.e. account node relationships figure, obtain two online social networks OSN_X, OSN_Y
Respective account node relationships figure RD_X, RD_Y;
(2), respectively to respective all nodes in two accounts node relationships figure RD_X, RD_Y travel through, obtain
Line social networks OSN_X, OSN_Y respective account sequence node collection WalkList_X, WalkList_Y;
For online social networks OSN_X, all nodes in traversal account node relationships figure RD_X, select one successively
Node carries out random walk as start node, during migration from the neighbor node of start node or arrival node, randomly chooses
One neighbor node is as down hop, until the node passed by forms the sequence node of length L;
Traversal terminates the available sequence node collection using different nodes as starting point, many times searching loop account nodes every time
All nodes in graph of a relation RD_X, obtain account sequence node collection WalkList_X, so, the node started with certain node
Sequence just has a plurality of;
For online social networks OSN_Y, use identical method to process, obtain account sequence node collection
WalkList_Y;
(3), two accounts sequence node collection WalkList_X, WalkList_Y are respectively adopted word steering volume instrument
Word2Vec changes, and respectively obtains distributed at S dimension space (the general span of S tens between hundreds of) of account
Account vector model Model_X, the Model_Y represented, particularly as follows:
All sequence nodes in account sequence node collection WalkList_X corresponding for OSN_X are turned as language material input word
In vector instrument Word2Vec, window (window) and dimension (size) according to arranging are changed, and obtain each account
xiCorresponding vectorial Vec_xi, account xiAnd the vectorial Vec_x of correspondenceiThe account of online social networks OSN_X is constituted as item
Vector model Model_X, wherein, xiRepresent the i-th account of online social networks OSN_X, i=1,2 ..., m, m are online
The account number of social networks OSN_X;
Account sequence node collection WalkList_Y is done same process, obtains each account y in online social networks OSN_Yj
And the vectorial Vec_y of correspondencejThe account vector model Model_Y, wherein y of online social networks OSN_Y is constituted as itemjTable
Show the jth account of online social networks OSN_Y, j=1,2 ..., n, n are the account number of online social networks OSN_Y;
(4) matrix of a linear transformation W between two coordinate space corresponding to online social networks OSN_X, OSN_Y, is calculated
4.1), with known two online social networks OSN_X, OSN_Y are same user real accounts association to <
xk, yk> build training set RealPairL, wherein, xkRepresent kth same user account in online social networks OSN_X,
ykRepresent kth same user account in online social networks OSN_Y, total K same user;At account vector model
Model_X, account vector model Model_Y find account x respectivelykCorresponding vectorial Vec_xk, account ykCorresponding vector
Vec_yk;
4.2), use stochastic gradient descent method, solve following optimization problem:
First, W is initialized as each element and is the S × s-matrix of less random value, then, carry out H iteration;Right
In h (0 < h≤H) secondary iteration, randomly choose a sample point < Vec_xk、Vec_yk>, calculate gradient T=(W(h-1)Vec_xk-
Vec_yk)(Vec_xk) ', wherein W(h-1)For the matrix of a linear transformation after (h-1) secondary iteration, (Vec_xk) ' for Vec_xkTransposition
Matrix;It follows that the renewal matrix of a linear transformation: Wh=W(h-1)-α T, wherein, α is learning rate.After iteration several times, above-mentioned
Summing function value in optimization problem gradually restrains, and iterations now is H, and transformation matrix now is then required change
Change matrix W.
(5), account association
By each account x in online social networks OSN_Xi, carry out calculated below:
bi=W Vec_xi;
biThe node x being in OSN_XiIn the vector representation of the coordinate space of OSN_Y, then by cosine similarity letter
Number calculates biWith each account Vec_y in online social networks OSN_YjSimilarity, select similarity maximum and be more than
Set account y of threshold valuejmaxAs account xiInterlock account or select the maximum front t of similarity according to purposes (such as t be
5) individual as Candidate Set
The object of the present invention is achieved like this.
The invention discloses a kind of based on node similarity account in multiple online social networkies (OSN) is associated
Method, from OSN Web account, extract friend relation associate the multiple accounts belonging to same entity user on different OSN platform
Family.According to account joint demand, extract the friend relation existed between two online social network sites each accounts, it is expressed as by
The non-directed graph of the set of node V representing account and the limit collection E representing friend relation between account composition or title account node relationships figure;So
After in figure random walk obtain account sequence node set;And generate each account accordingly by word steering volume instrument word2vec
The vector model at family;Again using joint account disclosed present on different OSN as training set, utilize these accounts in difference
The upper corresponding expression vector of OSN, tries to achieve the matrix of a linear transformation W from an OSN to another OSN by gradient descent method;Will
Account in one of them OSN is mapped to the coordinate space of another OSN, is screened by measuring similarity and threshold value, is owned
Interlock account that account is corresponding or the Candidate Set of correspondence.The present invention can be used for the malicious user of tracking network crime to stop evil
Meaning event spreads, the degree of depth excavate accounts information to optimize proposed algorithm, detect that the stolen rear danger early warning of account is to protect this use
Family is at the account safety of other websites.
The method have the advantages that
(1), less data input, method based on node similarity, it is only necessary to account (typically can use account are used
Family number) and friend relation (or the relation of concern), it is not necessary to other loaded down with trivial details information such as account profile, behavior characteristics etc., and
Avoid error account association results brought because of account false characteristic information, so improve the stalwartness of account association
Property;
(2) transformation matrix of coordinates, in the present invention, can be put into the same space by the account in online for difference social networks
Relatively, computational methods simple general-purpose, there is generality, calculate the transformation matrix once obtained, the online social network of universal applicable source
Accounts different in network.Also, it is known that from the transformation matrix W of an online social networks to another online social networks, root
According to symmetry, then another online social networks is the inverse of W to the transformation matrix to an online social networks, it is not necessary to again
Calculate.
(3), the present invention extracts friend relation from OSN Web account and associates and belong to same entity on different OSN platform
Multiple accounts of user, are the new approaches of a kind of account association, and the method with only the friend relation of OSN.The present invention is also
Can use in conjunction with other account correlating method, improve association accuracy.
Accompanying drawing explanation
Fig. 1 is that the method one detailed description of the invention being associated account in online social networks invented by abridged edition of the present invention
Flow chart;
Fig. 2 is a kind of instantiation schematic diagram of account node relationships figure;
Fig. 3 is that random walk obtains account sequence node collection one detailed description of the invention flow chart.
Detailed description of the invention
Below in conjunction with the accompanying drawings the detailed description of the invention of the present invention is described, in order to those skilled in the art is preferably
Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps
When can desalinate the main contents of the present invention, these are described in and will be left in the basket here.
One, ultimate principle
Although described in two online social networkies, the information of the interlock account of same entity is different, but relation between account
Cloth is but identical.In such as Sina's microblogging, three accounts are 1,2,3 respectively;They are respectively at account corresponding in Renren Network
A1, a2, a3;1,2,3 is good friend, and a1, a2, a3 are also good friends.Therefore, the present invention consider utilize account friend relation (or close
Note relation etc.) this feature excavates interlock account, builds account node relationships figure.Then travel through according to account node relationships figure
Set of node, the most therefrom selects a node to carry out random walk as start node, at start node or arrival joint during migration
In the adjacent node of point, randomly choose a neighbor node as down hop, until the node passed by formed length L,
Have recorded the sequence node of account relation information.Traversal terminates the available sequence node using different nodes as starting point every time
Collection, the many times whole sets of node of searching loop, obtain account sequence node collection WalkList_X, the joint wherein started with certain node
Point sequence just has a plurality of.As long as waiting time enough and step, the number of times of random walk is abundant, the account that sequence sets contains
Relation information is the abundantest.
Account S in vector space (size=S, general span tens between hundreds of) dimension real number value represents
Can be obtained by word2vec, and the cosine similarity between vector can be used to weigh correlation degree between account.Same space
In two accounts ' 1 ', ' 2 ' can pass through ModelX.most_similar (' 1 ', ' 2 ') function ratio relatively similarity, but
' a1 ' in the different account in two online social networkies, ' the 1 ' of such as X space and Y spaces, owing to being in different vectors
Space, just can not use this function ratio relatively similarity.
Linear transformation is that one is mathematical maps and keeps additive operation and quantity multiplication fortune between vector space X, Y
Calculating the method closed, it is critical only that the mapping function y=Wx asked between two vector spaces, wherein W is transformation matrix.Based on
The friend relation similar distribution on different social networkies, can will cannot compare the account of similarity in different web sites space
Family, is placed in same space by linear transformation computing and compares.
Finally by the matrix of a linear transformation, the vector of account in two online social networkies is compared, find association account
Family or the front t (such as t is 5) according to purposes selection similarity maximum are individual as Candidate Set.
Two, account correlating method
In the present embodiment, as it is shown in figure 1, the method that account in online social networks is associated by the present invention includes five
Individual step: data prediction (step S101), random walk obtain account sequence node collection (step S102), distributed vector table
Show (step S103), ask different OSN (online social networks) linearly transformation matrix (step S104), computed range to obtain association
Account (S105).
1, data prediction
Determine the most according to demand and need the online social networks of association (Online Social Networking, abbreviation
OSN), as Sina's microblogging, Semen Sojae Preparatum, everybody, Twitter, Facebook etc., between the account of these OSN exist friend relation, can
To be described as by representing account set of node V and representing the non-directed graph that the limit collection E of relation between account forms, the present invention the most i.e. account
Family node relationships figure, in the present embodiment, as shown in Figure 2.
In the present embodiment, web crawlers is utilized to read the buddy list of each account, by good friend corresponding for each account
Relation is stored in text document, thus obtains the account of the online social networks of target that is two online social networks OSN_X, OSN_Y
Friend relation.
For convenience, only as a example by online social networks OSN_X, the processing mode of another online social networks OSN_Y
Identical.
In the account node relationships figure of the online social networks (OSN_X) shown in Fig. 2, acquisition for text document
For: 12,13,18,23,24,35,37,45,47,56,89,810,811,910,912,1011,1013,1112,1113,1114,
1314。
Read each edge in text document, limit be recorded in the dictionary dict of figure limit, each account i.e. node as key,
All nodes that node is connected are as value, and then the relationship graph between certain social network account can be shown as dictionary by us
The variable { account: (adjacent account) } of type, this variable is account node relationships figure, inputs for algorithm.Particularly as follows:
2, random walk obtains account sequence node collection
The account node relationships figure representing figure limit dictionary carries out random walk, obtains the account of two social network sites respectively
Sequence node collection WalkList_X, WalkList_Y.
As it is shown on figure 3, all nodes of interior searching loop, carry out random walk as start node, from start node
Or arrive in the neighbor node of node, randomly choose a neighbor node as down hop, until the node passed by forms one
The sequence node of length L.As a example by Fig. 2, traversal carries out random walk from 1 to 14 all of nodes as start node, and L is 7,
A plurality of sequence such as { 1,8,11,14,13,11,12}, { 2,4,7,3,5,6,5}, { 3,5,4,7,3,1,8} etc. can be obtained.
Cycle-index i.e. repeats the number of times traveling through respective all nodes in account node relationships figure RD_X, RD_Y
The biggest, it is thus achieved that account relation information is the abundantest, the accuracy rate of association is the highest.During loop ends, it is possible to obtain substantial amounts of representative
The sequence node collection WalkList that the numeral (account) of account or word (name on account) form.
Concrete methods of realizing includes:
Step S201: initiation parameter, sets to 0 including cycle-index, the number of times of preset loop;
Step S202: judge whether cycle-index reaches preset loop number of times, if reached, then forwards step S206 to, no
Then, step S203 is carried out;
Step S203: judge whether to have traveled through all nodes?If traveled through, then forward step S205 to, otherwise, forward step to
Rapid S204;
Step S204: the node of traversal carries out random walk as start node, until the node formation one passed by is long
The sequence node of degree L, returns step S203;
Step S205: cycle-index adds 1, returns step S202;
Step S206: return account sequence node collection.
Below for cycle-index be 2, the account sequence node collection WalkList_X of migration a length of 7:
3, distributed vector representation
Two accounts sequence node collection WalkList_X, WalkList_Y are respectively adopted word steering volume instrument Word2Vec
Change, respectively obtain account and tie up vector model Model_X, Model_Y of distributed expression at the S of higher dimensional space, specifically
For:
The window (window) arranged and dimension (size=S), by the joint of all length T that account sequence node is concentrated
Point sequence, as in language material input word steering volume instrument Word2Vec, is changed, is obtained each account xi、yjCorresponding vector
Vec_xi、Vec_yj, account xiAnd the vectorial Vec_x of correspondenceiThe account vector of online social networks OSN_X is constituted as item
Model M odel_X.Wherein, xiRepresent the i-th account of online social networks OSN_X, i=1,2 ..., m, m are online social
The account number of network OSN_X.Account sequence node collection WalkList_Y is done same process, obtains online social networks OSN_Y
In each account yjAnd the vectorial Vec_y of correspondencejThe account vector model Model_ of online social networks OSN_Y is constituted as item
Y, wherein yjRepresent the jth account of online social networks OSN_Y, j=1,2 ..., n, n are online social networks OSN_Y
Account number.
Word2vec is the instrument that word (text) is converted into vector, and its detailed process belongs to prior art, at this no longer
Repeat.
In the present embodiment, size (S)=4, window=2, the model M odel{ account obtained: (S dimensional vector) } as
Under:
It should be noted that in the present embodiment, vector is with horizontally-arranged editor (facilitating typesetting), but it is one in the present invention
Individual column vector.If by row vector, formula the most below need to carry out the i.e. matrix of a linear transformation of certain adjustment need to be placed on to
Amount followed by being multiplied.
4, the matrix of a linear transformation between the coordinate space of different OSN (online social networks) correspondence is sought
4.1), with known two online social networks OSN_X, OSN_Y are same user real accounts association to <
xk, yk> build training set RealPairL, wherein, xkRepresent kth same user account in online social networks OSN_X,
ykRepresent kth same user account in online social networks OSN_Y, total K same user;At account vector model
Model_X, account vector model Model_Y find account x respectivelykCorresponding vectorial Vec_xk, account ykCorresponding vector
Vec_yk, initializing matrix of a linear transformation W is unit matrix, and uses matrix W1Represent;
4.2), following optimization problem is solved:
Use stochastic gradient descent method, solve following optimization problem:
First, W is initialized as each element and is the S × s-matrix of random value, then, carry out H iteration;For h
(0 < h≤H) secondary iteration, randomly chooses a sample point < Vec_xk、Vec_yk>, calculate gradient T=(W(h-1)Vec_xk-Vec_
yk)(Vec_xk) ', wherein W(h-1)For the matrix of a linear transformation after (h-1) secondary iteration, (Vec_xk) ' for Vec_xkTransposition square
Battle array;It follows that the renewal matrix of a linear transformation: Wh=W(h-1)-α T, wherein, α is learning rate;After iteration several times, above-mentioned excellent
Summing function value in change problem gradually restrains, and iterations now is H, and transformation matrix now is then required conversion
Matrix W;
In the present embodiment, inputting known associated text trainConnect.txt as training set, its content is such as
Under: 1a1,3a3,4a4,6a6,9a9,12a12,13a13.
Then it is right, from vector model Model_X, Model_Y that the 3rd step obtains to associate according to the account of this training set
Find corresponding vector.Often row is a vector (4 dimension), finally add 1 reformed into 5 dimension: such as Model_X [1]=
[0.39482608 0.51815981-0.23675969 0.38197696], but this line of account 1 correspondence is
[0.39482608 0.51815981-0.23675969 0.381976961].Specifically, in the present embodiment, the line obtained
Property transformation matrix W (5*5 dimension) is:
W=[[-0.01109113 0.37251355-0.43007925-0.05281413 0.]
[0.84859938 0.31506663 0.24449666-0.82006226 0.]
[0.04590003-0.64139036-0.21258527 0.64892452 0.]
[0.23426961-0.07661551-0.21128366 0.71278242 0.]
[-0.32104064-0.27574391 0.1407943 0.2562349 1.]]
5, computed range obtains interlock account
When calculating the account of same user-association in different online social networks OSN, owing to the account of different spaces cannot
Relatively similarity, need to by account vector by linear matrix operation transform to same space, then pass through in the same space
In relatively two OSN, the Distance conformability degree of account, finds interlock account.
By each account x in online social networks OSN_Xi, carry out calculated below:
bi=WVec_xi;
biThe node x being in OSN_XiIn the vector representation of the coordinate space of OSN_Y, then by cosine similarity letter
Number calculates biWith each account Vec_y in online social networks OSN_YjDistance, select distance minimum i.e. similarity maximum
And more than setting account y of threshold valuejmaxAs account xiInterlock account or select the maximum front t of similarity according to purposes
(such as t is 5) is individual as Candidate Set.
Three, case verification
In the present embodiment, a lot of popular online social networks such as Sina's microbloggings, Semen Sojae Preparatum, everybody, Twitter,
There is friend relation between the account of Facebook etc., these information can be described as by representing the set of node V of account and representing account
The figure of the limit collection E composition of relation between family.The data set that this example is used is from https: //snap.stanford.edu/data/
Egonets-Facebook.html, using comprise 1034 nodes, 53498 limits text document as online social networks
The input of OSN_X, the most each node is represented by numeral.
For online social networks OSN_Y, we use following methods to build: the node table of online social networks OSN_X
' a ' is added, in order to distinguish with the node of online social networks OSN_X, in all limits of random erasure 10% before the numeral shown
Limit, then the node of random erasure 10% and comprise the limit of this nodes all, increase by nodes (the online social networks of 10% afterwards
The node maximum number of OSN_X adds backward, in order to avoid obscuring with origin node, each node increases by 50 limits at random), except newly-increased joint
Point, then be 5 limits of each node stochastic generation of original set of node.By either with or without change node (including node and limit),
The node work that such as online social networks OSN_X interior joint ' 1 ' is the most corresponding with online social networks OSN_Y interior joint ' a1 '
Right for account association, constitute a set, and be used for training set by therein 70%, 30% is used for test set.
The method being associated account in online social networks based on the present invention, the account in the test set of 30% is all
Successfully being associated, the method for the present invention has stronger vigorousness.
Although detailed description of the invention illustrative to the present invention is described above, in order to the technology of the art
Personnel understand the present invention, the common skill it should be apparent that the invention is not restricted to the scope of detailed description of the invention, to the art
From the point of view of art personnel, as long as various change limits and in the spirit and scope of the present invention that determine in appended claim, these
Change is apparent from, and all utilize the innovation and creation of present inventive concept all at the row of protection.
Claims (3)
1. the method that account in online social networks is associated, it is characterised in that comprise the following steps:
(1) two the online social networkies needing to carry out account association are determined, according to demand, by two online social networks OSN_
The friend relation existed between respective account in X, OSN_Y is represented by the set of node V of account and represents good friend between account
The non-directed graph i.e. account node relationships figure of the limit collection E composition of relation, obtains two online social networks OSN_X, OSN_Y respective
Account node relationships figure RD_X, RD_Y;
(2), respectively to all nodes in two accounts node relationships figure RD_X, RD_Y travel through, obtain online social networks
OSN_X, OSN_Y respective account sequence node collection WalkList_X, WalkList_Y;
For online social networks OSN_X, all nodes in traversal account node relationships figure RD_X, select a node successively
Carry out random walk as start node, during migration from the neighbor node of start node or arrival node, randomly choose one
Neighbor node is as down hop, until the node passed by forms the sequence node of length L;Every time traversal terminate available with
Different nodes, as the sequence node collection of starting point, all nodes in many times searching loop account node relationships figure RD_X, obtain
Account sequence node collection WalkList_X, so, just has a plurality of with the sequence node that certain node starts;
For online social networks OSN_Y, use identical method to process, obtain account sequence node collection WalkList_
X、WalkList_Y;
(3), two accounts sequence node collection WalkList_X, WalkList_Y are respectively adopted word steering volume instrument Word2Vec
Change, respectively obtain the account account in the distributed expression of S dimension space (the general span of S tens between hundreds of)
Family vector model Model_X, Model_Y, particularly as follows:
All sequence nodes in account sequence node collection WalkList_X corresponding for OSN_X are inputted word steering volume as language material
In instrument Word2Vec, window (window) and dimension (size) according to arranging are changed, and obtain each account xiRight
The vectorial Vec_x answeredi, account xiAnd the vectorial Vec_x of correspondenceiThe account vector of online social networks OSN_X is constituted as item
Model M odel_X, wherein, xiRepresent the i-th account of online social networks OSN_X, i=1,2 ..., m, m are online social
The account number of network OSN_X;
Account sequence node collection WalkList_Y is done same process, obtains each account y in online social networks OSN_YjAnd it is corresponding
Vectorial Vec_yjThe account vector model Model_Y, wherein y of online social networks OSN_Y is constituted as itemjRepresent online
The jth account of social networks OSN_Y, j=1,2 ..., n, n are the account number of online social networks OSN_Y;
(4) matrix of a linear transformation W between two coordinate space corresponding to online social networks OSN_X, OSN_Y, is calculated
4.1), associate < x with the real accounts in known two online social networks OSN_X, OSN_Y being same userk, yk
> build training set RealPairL, wherein, xkRepresent kth same user account in online social networks OSN_X, ykTable
Show kth same user account in online social networks OSN_Y, total K same user;At account vector model
Model_X, account vector model Model_Y find account x respectivelykCorresponding vectorial Vec_xk, account ykCorresponding vector
Vec_yk;
4.2), use stochastic gradient descent method, solve following optimization problem:
First, W is initialized as each element and is the S × s-matrix of random value, then, carry out H iteration;For h (0 < h
≤ H) secondary iteration, randomly choose a sample point < Vec_xk、Vec_yk>, calculate gradient T=(W(h-1)Vec_xk-Vec_yk)
(Vec_xk) ', wherein W(h-1)For the matrix of a linear transformation after (h-1) secondary iteration, (Vec_xk) ' for Vec_xkTransposed matrix;Connect
Get off, update the matrix of a linear transformation: Wh=W(h-1)-α T, wherein, α is learning rate;After iteration several times, above-mentioned optimization is asked
Summing function value in topic gradually restrains, and iterations now is H, and transformation matrix now is then required transformation matrix
W;
(5), account association
By each account x in online social networks OSN_Xi, carry out calculated below:
bi=W Vec_xi;
biThe node x being in OSN_XiIn the vector representation of the coordinate space of OSN_Y, then come by cosine similarity function
Calculate biWith each account Vec_y in online social networks OSN_YjSimilarity, select similarity maximum and more than setting
Account y of threshold valuejmaxAs account xiInterlock account or select maximum t (such as t is 5) the individual work of similarity according to purposes
For Candidate Set.
The method that account the most according to claim 1 is associated, it is characterised in that described account node relationships figure root
Obtain according to following methods:
2.1), utilize web crawlers to read the buddy list of each account, friend relation corresponding for each account is stored in text
Document, thus obtain the account friend relation of the online social networks of target that is two online social networks OSN_X, OSN_Y;
2.2), text document each edge be recorded in the dictionary dict of figure limit is read, using each account i.e. node as key, with
The all nodes being connected as value, then the relationship graph between certain social network account can be shown as dictionary class by us
The variable { account: (adjacent account) } of type, this variable is account node relationships figure.
The method that account the most according to claim 1 is associated, it is characterised in that described account association Candidate Set root
Obtain according to following methods:
Use and be associated with the account of same subscriber entity disclosed in two OSN as training set, utilize gradient descent method to ask
Transformation matrix W, then in coordinate transform corresponding for one of them OSN interior joint to the coordinate system of another OSN, thus
Can be based on cosine similarity cos (bi,,Vec_yj) compare the distance between two account vectors, show that in an OSN, given account exists
Interlock account Candidate Set in another OSN, similarity is the biggest, and distance is the least, account Vec_yjAnd Vec_xiRelatedness more
Height, it is then determined that similarity is the highest and similarity is the account being associated more than the node of given threshold value, user can also basis
Purposes selects front t (such as t is 5) the individual account of similarity maximum as association Candidate Set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610057577.5A CN105741175B (en) | 2016-01-27 | 2016-01-27 | A method of account in online social networks is associated |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610057577.5A CN105741175B (en) | 2016-01-27 | 2016-01-27 | A method of account in online social networks is associated |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105741175A true CN105741175A (en) | 2016-07-06 |
CN105741175B CN105741175B (en) | 2019-08-20 |
Family
ID=56246762
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610057577.5A Expired - Fee Related CN105741175B (en) | 2016-01-27 | 2016-01-27 | A method of account in online social networks is associated |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105741175B (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330020A (en) * | 2017-06-20 | 2017-11-07 | 电子科技大学 | A kind of user subject analytic method based on structure and attributes similarity |
CN107392782A (en) * | 2017-06-29 | 2017-11-24 | 上海斐讯数据通信技术有限公司 | Corporations' construction method, device and computer-processing equipment based on word2Vec |
CN108021610A (en) * | 2017-11-02 | 2018-05-11 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on distributed system |
CN108022171A (en) * | 2016-10-31 | 2018-05-11 | 腾讯科技(深圳)有限公司 | A kind of data processing method and equipment |
CN108073687A (en) * | 2017-11-17 | 2018-05-25 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on cluster |
CN108985309A (en) * | 2017-05-31 | 2018-12-11 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN109242515A (en) * | 2018-08-29 | 2019-01-18 | 阿里巴巴集团控股有限公司 | Cross-platform abnormal account recognition methods and device |
WO2019051962A1 (en) * | 2017-09-14 | 2019-03-21 | 平安科技(深圳)有限公司 | Real relationship matching method and apparatus for social platform users, and readable storage medium |
CN109658094A (en) * | 2017-10-10 | 2019-04-19 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on cluster |
CN109739938A (en) * | 2018-12-28 | 2019-05-10 | 广州华多网络科技有限公司 | A kind of correlating method, device and the equipment of more accounts |
CN110019975A (en) * | 2017-10-10 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on cluster |
CN110046194A (en) * | 2019-03-19 | 2019-07-23 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and electronic equipment of expanding node relational graph |
CN110162956A (en) * | 2018-03-12 | 2019-08-23 | 华东师范大学 | The method and apparatus for determining interlock account |
CN110515986A (en) * | 2019-08-27 | 2019-11-29 | 腾讯科技(深圳)有限公司 | A kind of processing method of social network diagram, device and storage medium |
CN111090814A (en) * | 2019-12-30 | 2020-05-01 | 四川大学 | Iterative cross-social network user account correlation method based on degree punishment |
CN111143701A (en) * | 2019-12-13 | 2020-05-12 | 中国电子科技网络信息安全有限公司 | Social network user recommendation method and system based on multiple dimensions |
CN111177248A (en) * | 2020-04-10 | 2020-05-19 | 上海飞旗网络技术股份有限公司 | Data storage method and device based on feature recognition and format conversion |
CN111192154A (en) * | 2019-12-25 | 2020-05-22 | 西安交通大学 | Social network user node matching method based on style migration |
CN111368013A (en) * | 2020-06-01 | 2020-07-03 | 深圳市卡牛科技有限公司 | Unified identification method, system, equipment and storage medium based on multiple accounts |
CN111915429A (en) * | 2020-08-11 | 2020-11-10 | 北京开科唯识技术有限公司 | Account checking method and device |
CN112232834A (en) * | 2020-09-29 | 2021-01-15 | 中国银联股份有限公司 | Resource account determination method, device, equipment and medium |
WO2021043093A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Method and apparatus for associating and registering multiple accounts, computer device and storage medium |
CN112819056A (en) * | 2021-01-25 | 2021-05-18 | 百果园技术(新加坡)有限公司 | Group control account mining method, device, equipment and storage medium |
CN112861015A (en) * | 2019-11-27 | 2021-05-28 | 北京达佳互联信息技术有限公司 | Account associated information acquisition method and device in application program and electronic equipment |
CN116090525A (en) * | 2022-11-15 | 2023-05-09 | 广东工业大学 | Embedded vector representation method and system based on hierarchical random walk sampling strategy |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009088671A1 (en) * | 2008-01-04 | 2009-07-16 | Yahoo! Inc. | Identifying and employing social network relationships |
CN102457501A (en) * | 2010-10-26 | 2012-05-16 | 腾讯科技(深圳)有限公司 | Identification method and system for instant messaging account |
CN103927303A (en) * | 2013-01-10 | 2014-07-16 | 华为技术有限公司 | Method and device for searching accounts |
CN104052651A (en) * | 2014-06-03 | 2014-09-17 | 西安交通大学 | Method and device for building social contact group |
CN104573057A (en) * | 2015-01-22 | 2015-04-29 | 电子科技大学 | Account correlation method used for UGC (User Generated Content)-spanning website platform |
CN104765729A (en) * | 2014-01-02 | 2015-07-08 | 中国人民大学 | Cross-platform micro-blogging community account matching method |
-
2016
- 2016-01-27 CN CN201610057577.5A patent/CN105741175B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009088671A1 (en) * | 2008-01-04 | 2009-07-16 | Yahoo! Inc. | Identifying and employing social network relationships |
CN102457501A (en) * | 2010-10-26 | 2012-05-16 | 腾讯科技(深圳)有限公司 | Identification method and system for instant messaging account |
CN103927303A (en) * | 2013-01-10 | 2014-07-16 | 华为技术有限公司 | Method and device for searching accounts |
CN104765729A (en) * | 2014-01-02 | 2015-07-08 | 中国人民大学 | Cross-platform micro-blogging community account matching method |
CN104052651A (en) * | 2014-06-03 | 2014-09-17 | 西安交通大学 | Method and device for building social contact group |
CN104573057A (en) * | 2015-01-22 | 2015-04-29 | 电子科技大学 | Account correlation method used for UGC (User Generated Content)-spanning website platform |
Non-Patent Citations (1)
Title |
---|
FREDRIK JOHANSSON 等: "Detecting Multiple Aliases in Social Media", 《2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2013)》 * |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108022171B (en) * | 2016-10-31 | 2021-10-15 | 腾讯科技(深圳)有限公司 | Data processing method and equipment |
CN108022171A (en) * | 2016-10-31 | 2018-05-11 | 腾讯科技(深圳)有限公司 | A kind of data processing method and equipment |
CN108985309B (en) * | 2017-05-31 | 2022-11-29 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN108985309A (en) * | 2017-05-31 | 2018-12-11 | 腾讯科技(深圳)有限公司 | A kind of data processing method and device |
CN107330020A (en) * | 2017-06-20 | 2017-11-07 | 电子科技大学 | A kind of user subject analytic method based on structure and attributes similarity |
CN107330020B (en) * | 2017-06-20 | 2020-03-24 | 电子科技大学 | User entity analysis method based on structure and attribute similarity |
CN107392782A (en) * | 2017-06-29 | 2017-11-24 | 上海斐讯数据通信技术有限公司 | Corporations' construction method, device and computer-processing equipment based on word2Vec |
WO2019051962A1 (en) * | 2017-09-14 | 2019-03-21 | 平安科技(深圳)有限公司 | Real relationship matching method and apparatus for social platform users, and readable storage medium |
CN109658094B (en) * | 2017-10-10 | 2020-09-18 | 阿里巴巴集团控股有限公司 | Random walk, random walk method based on cluster, random walk device and equipment |
US10776334B2 (en) | 2017-10-10 | 2020-09-15 | Alibaba Group Holding Limited | Random walking and cluster-based random walking method, apparatus and device |
US10901971B2 (en) | 2017-10-10 | 2021-01-26 | Advanced New Technologies Co., Ltd. | Random walking and cluster-based random walking method, apparatus and device |
CN110019975A (en) * | 2017-10-10 | 2019-07-16 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on cluster |
CN109658094A (en) * | 2017-10-10 | 2019-04-19 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on cluster |
WO2019085614A1 (en) * | 2017-11-02 | 2019-05-09 | 阿里巴巴集团控股有限公司 | Random walking, and random walking method, apparatus and device based on distributed system |
CN108021610A (en) * | 2017-11-02 | 2018-05-11 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on distributed system |
US11074246B2 (en) | 2017-11-17 | 2021-07-27 | Advanced New Technologies Co., Ltd. | Cluster-based random walk processing |
WO2019095858A1 (en) * | 2017-11-17 | 2019-05-23 | 阿里巴巴集团控股有限公司 | Random walk method, apparatus and device, and cluster-based random walk method, apparatus and device |
CN108073687B (en) * | 2017-11-17 | 2020-09-08 | 阿里巴巴集团控股有限公司 | Random walk, random walk method based on cluster, random walk device and equipment |
TWI709049B (en) * | 2017-11-17 | 2020-11-01 | 開曼群島商創新先進技術有限公司 | Random walk, cluster-based random walk method, device and equipment |
CN108073687A (en) * | 2017-11-17 | 2018-05-25 | 阿里巴巴集团控股有限公司 | Random walk, random walk method, apparatus and equipment based on cluster |
CN110162956B (en) * | 2018-03-12 | 2024-01-19 | 华东师范大学 | Method and device for determining associated account |
CN110162956A (en) * | 2018-03-12 | 2019-08-23 | 华东师范大学 | The method and apparatus for determining interlock account |
CN109242515A (en) * | 2018-08-29 | 2019-01-18 | 阿里巴巴集团控股有限公司 | Cross-platform abnormal account recognition methods and device |
CN109242515B (en) * | 2018-08-29 | 2021-07-23 | 创新先进技术有限公司 | Cross-platform abnormal account identification method and device |
CN109739938A (en) * | 2018-12-28 | 2019-05-10 | 广州华多网络科技有限公司 | A kind of correlating method, device and the equipment of more accounts |
CN110046194A (en) * | 2019-03-19 | 2019-07-23 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and electronic equipment of expanding node relational graph |
CN110515986A (en) * | 2019-08-27 | 2019-11-29 | 腾讯科技(深圳)有限公司 | A kind of processing method of social network diagram, device and storage medium |
CN110515986B (en) * | 2019-08-27 | 2023-01-06 | 腾讯科技(深圳)有限公司 | Processing method and device of social network diagram and storage medium |
WO2021043093A1 (en) * | 2019-09-02 | 2021-03-11 | 平安科技(深圳)有限公司 | Method and apparatus for associating and registering multiple accounts, computer device and storage medium |
CN112861015A (en) * | 2019-11-27 | 2021-05-28 | 北京达佳互联信息技术有限公司 | Account associated information acquisition method and device in application program and electronic equipment |
CN111143701A (en) * | 2019-12-13 | 2020-05-12 | 中国电子科技网络信息安全有限公司 | Social network user recommendation method and system based on multiple dimensions |
CN111192154A (en) * | 2019-12-25 | 2020-05-22 | 西安交通大学 | Social network user node matching method based on style migration |
CN111192154B (en) * | 2019-12-25 | 2023-05-02 | 西安交通大学 | Social network user node matching method based on style migration |
CN111090814B (en) * | 2019-12-30 | 2021-02-09 | 四川大学 | Iterative cross-social network user account correlation method based on degree punishment |
CN111090814A (en) * | 2019-12-30 | 2020-05-01 | 四川大学 | Iterative cross-social network user account correlation method based on degree punishment |
CN111177248A (en) * | 2020-04-10 | 2020-05-19 | 上海飞旗网络技术股份有限公司 | Data storage method and device based on feature recognition and format conversion |
CN111368013A (en) * | 2020-06-01 | 2020-07-03 | 深圳市卡牛科技有限公司 | Unified identification method, system, equipment and storage medium based on multiple accounts |
CN111915429A (en) * | 2020-08-11 | 2020-11-10 | 北京开科唯识技术有限公司 | Account checking method and device |
CN112232834A (en) * | 2020-09-29 | 2021-01-15 | 中国银联股份有限公司 | Resource account determination method, device, equipment and medium |
CN112232834B (en) * | 2020-09-29 | 2024-04-26 | 中国银联股份有限公司 | Resource account determination method, device, equipment and medium |
CN112819056A (en) * | 2021-01-25 | 2021-05-18 | 百果园技术(新加坡)有限公司 | Group control account mining method, device, equipment and storage medium |
CN116090525A (en) * | 2022-11-15 | 2023-05-09 | 广东工业大学 | Embedded vector representation method and system based on hierarchical random walk sampling strategy |
CN116090525B (en) * | 2022-11-15 | 2024-02-13 | 广东工业大学 | Embedded vector representation method and system based on hierarchical random walk sampling strategy |
Also Published As
Publication number | Publication date |
---|---|
CN105741175B (en) | 2019-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105741175A (en) | Method for linking accounts in OSNs (On-line Social Networks) | |
CN103227731B (en) | Based on the complex network node importance local calculation method improving " structural hole " | |
Danglade et al. | On the use of machine learning to defeature CAD models for simulation | |
CN112256981B (en) | Rumor detection method based on linear and nonlinear propagation | |
CN104462592A (en) | Social network user behavior relation deduction system and method based on indefinite semantics | |
CN102799671A (en) | Network individual recommendation method based on PageRank algorithm | |
CN105719191A (en) | System and method of discovering social group having unspecified behavior senses in multi-dimensional space | |
CN105158761A (en) | Radar synthetic phase unwrapping method based on branch-cut method and surface fitting | |
CN102708327A (en) | Network community discovery method based on spectrum optimization | |
CN113486190A (en) | Multi-mode knowledge representation method integrating entity image information and entity category information | |
Asoodeh et al. | Oil-CO2 MMP determination in competition of neural network, support vector regression, and committee machine | |
CN105574541A (en) | Compactness sorting based network community discovery method | |
CN105678590A (en) | topN recommendation method for social network based on cloud model | |
Mo et al. | Choosing a heuristic and root node for edge ordering in BDD-based network reliability analysis | |
Samantaray et al. | Modelling response of infiltration loss toward water table depth using RBFN, RNN, ANFIS techniques | |
CN102819611B (en) | Local community digging method of complicated network | |
CN105205184A (en) | Latent variable model-based user preference extraction method | |
CN107658029A (en) | A kind of brand-new distribution and privatization miRNA diseases contact Forecasting Methodology | |
CN107133274A (en) | A kind of distributed information retrieval set option method based on figure knowledge base | |
Campbell et al. | Cross-domain entity resolution in social media | |
CN104899283A (en) | Frequent sub-graph mining and optimizing method for single uncertain graph | |
CN109977131A (en) | A kind of house type matching system | |
CN103186696B (en) | Towards the auxiliary variable reduction method of high dimensional nonlinear soft-sensing model | |
Holzer et al. | An analysis of the renormalization group method for asymptotic expansions with logarithmic switchback terms | |
CN105761152A (en) | Topic participation prediction method based on triadic group in social network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190820 Termination date: 20220127 |
|
CF01 | Termination of patent right due to non-payment of annual fee |