CN105741175A - Method for linking accounts in OSNs (On-line Social Networks) - Google Patents

Method for linking accounts in OSNs (On-line Social Networks) Download PDF

Info

Publication number
CN105741175A
CN105741175A CN201610057577.5A CN201610057577A CN105741175A CN 105741175 A CN105741175 A CN 105741175A CN 201610057577 A CN201610057577 A CN 201610057577A CN 105741175 A CN105741175 A CN 105741175A
Authority
CN
China
Prior art keywords
account
osn
node
social networks
online social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610057577.5A
Other languages
Chinese (zh)
Other versions
CN105741175B (en
Inventor
罗绪成
周帆
刘梦娟
解书颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201610057577.5A priority Critical patent/CN105741175B/en
Publication of CN105741175A publication Critical patent/CN105741175A/en
Application granted granted Critical
Publication of CN105741175B publication Critical patent/CN105741175B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for linking accounts in a plurality of OSNs (On-line Social Networks) on the basis of node similarity. The friend relationship is extracted from OSN website accounts for linking the plurality of accounts, belonging to one real user, in different OSN platforms. The method comprises five parts: a data preprocessing part performs preprocessing on an account node relationship diagram of the OSNs; a node sequence extraction part obtains an account node sequence set through random walk; an account vector representation part generates a vector model of each account through a word to vector tool word2vec; a linear conversion matrix calculation part obtains a linear conversion matrix W from one OSN to another OSN by a gradient descent method; a linking account obtaining part maps the account in one OSN to a coordinate space of another OSN; and through similarity measure and threshold value screening, the linking accounts or a candidate set corresponding to all of the accounts are obtained. Errors of the account linking result caused by unreal feature information of the accounts are avoided, so that the account linking robustness is improved.

Description

A kind of method that account in online social networks is associated
Technical field
The invention belongs to technical field of network information, more specifically, relate to a kind of to account in online social networks The method being associated.
Background technology
Account association be used to excavate a user in various online social networkies the master of likely accounts information Want technological means, can be used for following the trail of the malicious user carrying out the network crime to stop malicious event to spread, the degree of depth excavate account letter Breath, to optimize recommendation method, detects that accounts information carries out danger early warning to protect the account safety of other websites after stolen.
The feature that traditional account correlating method extracts includes personal information such as sex, age, the geographical position of account itself, And user behavior rule such as text writing style, term use habit, the mouse time of staying etc..Yet with relating to individual Privacy, personal information is the most untrue comprehensively;The data of user behavior rule are difficult to obtain the most mostly, and there is error, so The accuracy of traditional account correlating method is the highest.Additionally, the information that traditional account correlating method needs account is substantially true, it is thought Road is that the key character attribute that limit account has itself to express the most all sidedly a user, is summarized by feature extraction Modeling, then falls incoherent candidate item account by model filter, thus finds the account that similarity is the highest as association Account.Although traditional account correlating method completes the task of account association, but the data needed for traditional account correlating method are defeated Enter information more, and be difficult to avoid that the deviation that association results is caused by false accounts information, so the stalwartness of account association Property is the highest.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that account in online social networks is closed by one The method of connection, to improve the vigorousness of account association.
For achieving the above object, the method that account in online social networks is associated by the present invention, its feature exists In, comprise the following steps:
(1) two the online social networkies needing to carry out account association are determined, according to demand, by two online social networkies The friend relation existed between respective account in OSN_X, OSN_Y is represented as by representing the set of node V of account and representing account Between friend relation limit collection E composition non-directed graph i.e. account node relationships figure, obtain two online social networks OSN_X, OSN_Y Respective account node relationships figure RD_X, RD_Y;
(2), respectively to respective all nodes in two accounts node relationships figure RD_X, RD_Y travel through, obtain Line social networks OSN_X, OSN_Y respective account sequence node collection WalkList_X, WalkList_Y;
For online social networks OSN_X, all nodes in traversal account node relationships figure RD_X, select one successively Node carries out random walk as start node, during migration from the neighbor node of start node or arrival node, randomly chooses One neighbor node is as down hop, until the node passed by forms the sequence node of length L;
Traversal terminates the available sequence node collection using different nodes as starting point, many times searching loop account nodes every time All nodes in graph of a relation RD_X, obtain account sequence node collection WalkList_X, so, the node started with certain node Sequence just has a plurality of;
For online social networks OSN_Y, use identical method to process, obtain account sequence node collection WalkList_Y;
(3), two accounts sequence node collection WalkList_X, WalkList_Y are respectively adopted word steering volume instrument Word2Vec changes, and respectively obtains distributed at S dimension space (the general span of S tens between hundreds of) of account Account vector model Model_X, the Model_Y represented, particularly as follows:
All sequence nodes in account sequence node collection WalkList_X corresponding for OSN_X are turned as language material input word In vector instrument Word2Vec, window (window) and dimension (size) according to arranging are changed, and obtain each account xiCorresponding vectorial Vec_xi, account xiAnd the vectorial Vec_x of correspondenceiThe account of online social networks OSN_X is constituted as item Vector model Model_X, wherein, xiRepresent the i-th account of online social networks OSN_X, i=1,2 ..., m, m are online The account number of social networks OSN_X;
Account sequence node collection WalkList_Y is done same process, obtains each account y in online social networks OSN_Yj And the vectorial Vec_y of correspondencejThe account vector model Model_Y, wherein y of online social networks OSN_Y is constituted as itemjTable Show the jth account of online social networks OSN_Y, j=1,2 ..., n, n are the account number of online social networks OSN_Y;
(4) matrix of a linear transformation W between two coordinate space corresponding to online social networks OSN_X, OSN_Y, is calculated
4.1), with known two online social networks OSN_X, OSN_Y are same user real accounts association to < xk, yk> build training set RealPairL, wherein, xkRepresent kth same user account in online social networks OSN_X, ykRepresent kth same user account in online social networks OSN_Y, total K same user;At account vector model Model_X, account vector model Model_Y find account x respectivelykCorresponding vectorial Vec_xk, account ykCorresponding vector Vec_yk
4.2), use stochastic gradient descent method, solve following optimization problem:
m i n W 1 2 K &Sigma; k = 1 K | | W V e c _ x k - V e c _ y k | | 2 ;
First, W is initialized as each element and is the S × s-matrix of less random value, then, carry out H iteration;Right In h (0 < h≤H) secondary iteration, randomly choose a sample point < Vec_xk、Vec_yk>, calculate gradient T=(W(h-1)Vec_xk- Vec_yk)(Vec_xk) ', wherein W(h-1)For the matrix of a linear transformation after (h-1) secondary iteration, (Vec_xk) ' for Vec_xkTransposition Matrix;It follows that the renewal matrix of a linear transformation: Wh=W(h-1)-α T, wherein, α is learning rate.After iteration several times, above-mentioned Summing function value in optimization problem gradually restrains, and iterations now is H, and transformation matrix now is then required change Change matrix W.
(5), account association
By each account x in online social networks OSN_Xi, carry out calculated below:
bi=W Vec_xi
biThe node x being in OSN_XiIn the vector representation of the coordinate space of OSN_Y, then by cosine similarity letter Number calculates biWith each account Vec_y in online social networks OSN_YjSimilarity, select similarity maximum and be more than Set account y of threshold valuejmaxAs account xiInterlock account or select the maximum front t of similarity according to purposes (such as t be 5) individual as Candidate Set
The object of the present invention is achieved like this.
The invention discloses a kind of based on node similarity account in multiple online social networkies (OSN) is associated Method, from OSN Web account, extract friend relation associate the multiple accounts belonging to same entity user on different OSN platform Family.According to account joint demand, extract the friend relation existed between two online social network sites each accounts, it is expressed as by The non-directed graph of the set of node V representing account and the limit collection E representing friend relation between account composition or title account node relationships figure;So After in figure random walk obtain account sequence node set;And generate each account accordingly by word steering volume instrument word2vec The vector model at family;Again using joint account disclosed present on different OSN as training set, utilize these accounts in difference The upper corresponding expression vector of OSN, tries to achieve the matrix of a linear transformation W from an OSN to another OSN by gradient descent method;Will Account in one of them OSN is mapped to the coordinate space of another OSN, is screened by measuring similarity and threshold value, is owned Interlock account that account is corresponding or the Candidate Set of correspondence.The present invention can be used for the malicious user of tracking network crime to stop evil Meaning event spreads, the degree of depth excavate accounts information to optimize proposed algorithm, detect that the stolen rear danger early warning of account is to protect this use Family is at the account safety of other websites.
The method have the advantages that
(1), less data input, method based on node similarity, it is only necessary to account (typically can use account are used Family number) and friend relation (or the relation of concern), it is not necessary to other loaded down with trivial details information such as account profile, behavior characteristics etc., and Avoid error account association results brought because of account false characteristic information, so improve the stalwartness of account association Property;
(2) transformation matrix of coordinates, in the present invention, can be put into the same space by the account in online for difference social networks Relatively, computational methods simple general-purpose, there is generality, calculate the transformation matrix once obtained, the online social network of universal applicable source Accounts different in network.Also, it is known that from the transformation matrix W of an online social networks to another online social networks, root According to symmetry, then another online social networks is the inverse of W to the transformation matrix to an online social networks, it is not necessary to again Calculate.
(3), the present invention extracts friend relation from OSN Web account and associates and belong to same entity on different OSN platform Multiple accounts of user, are the new approaches of a kind of account association, and the method with only the friend relation of OSN.The present invention is also Can use in conjunction with other account correlating method, improve association accuracy.
Accompanying drawing explanation
Fig. 1 is that the method one detailed description of the invention being associated account in online social networks invented by abridged edition of the present invention Flow chart;
Fig. 2 is a kind of instantiation schematic diagram of account node relationships figure;
Fig. 3 is that random walk obtains account sequence node collection one detailed description of the invention flow chart.
Detailed description of the invention
Below in conjunction with the accompanying drawings the detailed description of the invention of the present invention is described, in order to those skilled in the art is preferably Understand the present invention.Requiring particular attention is that, in the following description, when known function and design detailed description perhaps When can desalinate the main contents of the present invention, these are described in and will be left in the basket here.
One, ultimate principle
Although described in two online social networkies, the information of the interlock account of same entity is different, but relation between account Cloth is but identical.In such as Sina's microblogging, three accounts are 1,2,3 respectively;They are respectively at account corresponding in Renren Network A1, a2, a3;1,2,3 is good friend, and a1, a2, a3 are also good friends.Therefore, the present invention consider utilize account friend relation (or close Note relation etc.) this feature excavates interlock account, builds account node relationships figure.Then travel through according to account node relationships figure Set of node, the most therefrom selects a node to carry out random walk as start node, at start node or arrival joint during migration In the adjacent node of point, randomly choose a neighbor node as down hop, until the node passed by formed length L, Have recorded the sequence node of account relation information.Traversal terminates the available sequence node using different nodes as starting point every time Collection, the many times whole sets of node of searching loop, obtain account sequence node collection WalkList_X, the joint wherein started with certain node Point sequence just has a plurality of.As long as waiting time enough and step, the number of times of random walk is abundant, the account that sequence sets contains Relation information is the abundantest.
Account S in vector space (size=S, general span tens between hundreds of) dimension real number value represents Can be obtained by word2vec, and the cosine similarity between vector can be used to weigh correlation degree between account.Same space In two accounts ' 1 ', ' 2 ' can pass through ModelX.most_similar (' 1 ', ' 2 ') function ratio relatively similarity, but ' a1 ' in the different account in two online social networkies, ' the 1 ' of such as X space and Y spaces, owing to being in different vectors Space, just can not use this function ratio relatively similarity.
Linear transformation is that one is mathematical maps and keeps additive operation and quantity multiplication fortune between vector space X, Y Calculating the method closed, it is critical only that the mapping function y=Wx asked between two vector spaces, wherein W is transformation matrix.Based on The friend relation similar distribution on different social networkies, can will cannot compare the account of similarity in different web sites space Family, is placed in same space by linear transformation computing and compares.
Finally by the matrix of a linear transformation, the vector of account in two online social networkies is compared, find association account Family or the front t (such as t is 5) according to purposes selection similarity maximum are individual as Candidate Set.
Two, account correlating method
In the present embodiment, as it is shown in figure 1, the method that account in online social networks is associated by the present invention includes five Individual step: data prediction (step S101), random walk obtain account sequence node collection (step S102), distributed vector table Show (step S103), ask different OSN (online social networks) linearly transformation matrix (step S104), computed range to obtain association Account (S105).
1, data prediction
Determine the most according to demand and need the online social networks of association (Online Social Networking, abbreviation OSN), as Sina's microblogging, Semen Sojae Preparatum, everybody, Twitter, Facebook etc., between the account of these OSN exist friend relation, can To be described as by representing account set of node V and representing the non-directed graph that the limit collection E of relation between account forms, the present invention the most i.e. account Family node relationships figure, in the present embodiment, as shown in Figure 2.
In the present embodiment, web crawlers is utilized to read the buddy list of each account, by good friend corresponding for each account Relation is stored in text document, thus obtains the account of the online social networks of target that is two online social networks OSN_X, OSN_Y Friend relation.
For convenience, only as a example by online social networks OSN_X, the processing mode of another online social networks OSN_Y Identical.
In the account node relationships figure of the online social networks (OSN_X) shown in Fig. 2, acquisition for text document For: 12,13,18,23,24,35,37,45,47,56,89,810,811,910,912,1011,1013,1112,1113,1114, 1314。
Read each edge in text document, limit be recorded in the dictionary dict of figure limit, each account i.e. node as key, All nodes that node is connected are as value, and then the relationship graph between certain social network account can be shown as dictionary by us The variable { account: (adjacent account) } of type, this variable is account node relationships figure, inputs for algorithm.Particularly as follows:
2, random walk obtains account sequence node collection
The account node relationships figure representing figure limit dictionary carries out random walk, obtains the account of two social network sites respectively Sequence node collection WalkList_X, WalkList_Y.
As it is shown on figure 3, all nodes of interior searching loop, carry out random walk as start node, from start node Or arrive in the neighbor node of node, randomly choose a neighbor node as down hop, until the node passed by forms one The sequence node of length L.As a example by Fig. 2, traversal carries out random walk from 1 to 14 all of nodes as start node, and L is 7, A plurality of sequence such as { 1,8,11,14,13,11,12}, { 2,4,7,3,5,6,5}, { 3,5,4,7,3,1,8} etc. can be obtained.
Cycle-index i.e. repeats the number of times traveling through respective all nodes in account node relationships figure RD_X, RD_Y The biggest, it is thus achieved that account relation information is the abundantest, the accuracy rate of association is the highest.During loop ends, it is possible to obtain substantial amounts of representative The sequence node collection WalkList that the numeral (account) of account or word (name on account) form.
Concrete methods of realizing includes:
Step S201: initiation parameter, sets to 0 including cycle-index, the number of times of preset loop;
Step S202: judge whether cycle-index reaches preset loop number of times, if reached, then forwards step S206 to, no Then, step S203 is carried out;
Step S203: judge whether to have traveled through all nodes?If traveled through, then forward step S205 to, otherwise, forward step to Rapid S204;
Step S204: the node of traversal carries out random walk as start node, until the node formation one passed by is long The sequence node of degree L, returns step S203;
Step S205: cycle-index adds 1, returns step S202;
Step S206: return account sequence node collection.
Below for cycle-index be 2, the account sequence node collection WalkList_X of migration a length of 7:
3, distributed vector representation
Two accounts sequence node collection WalkList_X, WalkList_Y are respectively adopted word steering volume instrument Word2Vec Change, respectively obtain account and tie up vector model Model_X, Model_Y of distributed expression at the S of higher dimensional space, specifically For:
The window (window) arranged and dimension (size=S), by the joint of all length T that account sequence node is concentrated Point sequence, as in language material input word steering volume instrument Word2Vec, is changed, is obtained each account xi、yjCorresponding vector Vec_xi、Vec_yj, account xiAnd the vectorial Vec_x of correspondenceiThe account vector of online social networks OSN_X is constituted as item Model M odel_X.Wherein, xiRepresent the i-th account of online social networks OSN_X, i=1,2 ..., m, m are online social The account number of network OSN_X.Account sequence node collection WalkList_Y is done same process, obtains online social networks OSN_Y In each account yjAnd the vectorial Vec_y of correspondencejThe account vector model Model_ of online social networks OSN_Y is constituted as item Y, wherein yjRepresent the jth account of online social networks OSN_Y, j=1,2 ..., n, n are online social networks OSN_Y Account number.
Word2vec is the instrument that word (text) is converted into vector, and its detailed process belongs to prior art, at this no longer Repeat.
In the present embodiment, size (S)=4, window=2, the model M odel{ account obtained: (S dimensional vector) } as Under:
It should be noted that in the present embodiment, vector is with horizontally-arranged editor (facilitating typesetting), but it is one in the present invention Individual column vector.If by row vector, formula the most below need to carry out the i.e. matrix of a linear transformation of certain adjustment need to be placed on to Amount followed by being multiplied.
4, the matrix of a linear transformation between the coordinate space of different OSN (online social networks) correspondence is sought
4.1), with known two online social networks OSN_X, OSN_Y are same user real accounts association to < xk, yk> build training set RealPairL, wherein, xkRepresent kth same user account in online social networks OSN_X, ykRepresent kth same user account in online social networks OSN_Y, total K same user;At account vector model Model_X, account vector model Model_Y find account x respectivelykCorresponding vectorial Vec_xk, account ykCorresponding vector Vec_yk, initializing matrix of a linear transformation W is unit matrix, and uses matrix W1Represent;
4.2), following optimization problem is solved:
Use stochastic gradient descent method, solve following optimization problem:
m i n W 1 2 K &Sigma; k = 1 K | | W V e c _ x k - V e c _ y k | | 2
First, W is initialized as each element and is the S × s-matrix of random value, then, carry out H iteration;For h (0 < h≤H) secondary iteration, randomly chooses a sample point < Vec_xk、Vec_yk>, calculate gradient T=(W(h-1)Vec_xk-Vec_ yk)(Vec_xk) ', wherein W(h-1)For the matrix of a linear transformation after (h-1) secondary iteration, (Vec_xk) ' for Vec_xkTransposition square Battle array;It follows that the renewal matrix of a linear transformation: Wh=W(h-1)-α T, wherein, α is learning rate;After iteration several times, above-mentioned excellent Summing function value in change problem gradually restrains, and iterations now is H, and transformation matrix now is then required conversion Matrix W;
In the present embodiment, inputting known associated text trainConnect.txt as training set, its content is such as Under: 1a1,3a3,4a4,6a6,9a9,12a12,13a13.
Then it is right, from vector model Model_X, Model_Y that the 3rd step obtains to associate according to the account of this training set Find corresponding vector.Often row is a vector (4 dimension), finally add 1 reformed into 5 dimension: such as Model_X [1]= [0.39482608 0.51815981-0.23675969 0.38197696], but this line of account 1 correspondence is [0.39482608 0.51815981-0.23675969 0.381976961].Specifically, in the present embodiment, the line obtained Property transformation matrix W (5*5 dimension) is:
W=[[-0.01109113 0.37251355-0.43007925-0.05281413 0.]
[0.84859938 0.31506663 0.24449666-0.82006226 0.]
[0.04590003-0.64139036-0.21258527 0.64892452 0.]
[0.23426961-0.07661551-0.21128366 0.71278242 0.]
[-0.32104064-0.27574391 0.1407943 0.2562349 1.]]
5, computed range obtains interlock account
When calculating the account of same user-association in different online social networks OSN, owing to the account of different spaces cannot Relatively similarity, need to by account vector by linear matrix operation transform to same space, then pass through in the same space In relatively two OSN, the Distance conformability degree of account, finds interlock account.
By each account x in online social networks OSN_Xi, carry out calculated below:
bi=WVec_xi
biThe node x being in OSN_XiIn the vector representation of the coordinate space of OSN_Y, then by cosine similarity letter Number calculates biWith each account Vec_y in online social networks OSN_YjDistance, select distance minimum i.e. similarity maximum And more than setting account y of threshold valuejmaxAs account xiInterlock account or select the maximum front t of similarity according to purposes (such as t is 5) is individual as Candidate Set.
Three, case verification
In the present embodiment, a lot of popular online social networks such as Sina's microbloggings, Semen Sojae Preparatum, everybody, Twitter, There is friend relation between the account of Facebook etc., these information can be described as by representing the set of node V of account and representing account The figure of the limit collection E composition of relation between family.The data set that this example is used is from https: //snap.stanford.edu/data/ Egonets-Facebook.html, using comprise 1034 nodes, 53498 limits text document as online social networks The input of OSN_X, the most each node is represented by numeral.
For online social networks OSN_Y, we use following methods to build: the node table of online social networks OSN_X ' a ' is added, in order to distinguish with the node of online social networks OSN_X, in all limits of random erasure 10% before the numeral shown Limit, then the node of random erasure 10% and comprise the limit of this nodes all, increase by nodes (the online social networks of 10% afterwards The node maximum number of OSN_X adds backward, in order to avoid obscuring with origin node, each node increases by 50 limits at random), except newly-increased joint Point, then be 5 limits of each node stochastic generation of original set of node.By either with or without change node (including node and limit), The node work that such as online social networks OSN_X interior joint ' 1 ' is the most corresponding with online social networks OSN_Y interior joint ' a1 ' Right for account association, constitute a set, and be used for training set by therein 70%, 30% is used for test set.
The method being associated account in online social networks based on the present invention, the account in the test set of 30% is all Successfully being associated, the method for the present invention has stronger vigorousness.
Although detailed description of the invention illustrative to the present invention is described above, in order to the technology of the art Personnel understand the present invention, the common skill it should be apparent that the invention is not restricted to the scope of detailed description of the invention, to the art From the point of view of art personnel, as long as various change limits and in the spirit and scope of the present invention that determine in appended claim, these Change is apparent from, and all utilize the innovation and creation of present inventive concept all at the row of protection.

Claims (3)

1. the method that account in online social networks is associated, it is characterised in that comprise the following steps:
(1) two the online social networkies needing to carry out account association are determined, according to demand, by two online social networks OSN_ The friend relation existed between respective account in X, OSN_Y is represented by the set of node V of account and represents good friend between account The non-directed graph i.e. account node relationships figure of the limit collection E composition of relation, obtains two online social networks OSN_X, OSN_Y respective Account node relationships figure RD_X, RD_Y;
(2), respectively to all nodes in two accounts node relationships figure RD_X, RD_Y travel through, obtain online social networks OSN_X, OSN_Y respective account sequence node collection WalkList_X, WalkList_Y;
For online social networks OSN_X, all nodes in traversal account node relationships figure RD_X, select a node successively Carry out random walk as start node, during migration from the neighbor node of start node or arrival node, randomly choose one Neighbor node is as down hop, until the node passed by forms the sequence node of length L;Every time traversal terminate available with Different nodes, as the sequence node collection of starting point, all nodes in many times searching loop account node relationships figure RD_X, obtain Account sequence node collection WalkList_X, so, just has a plurality of with the sequence node that certain node starts;
For online social networks OSN_Y, use identical method to process, obtain account sequence node collection WalkList_ X、WalkList_Y;
(3), two accounts sequence node collection WalkList_X, WalkList_Y are respectively adopted word steering volume instrument Word2Vec Change, respectively obtain the account account in the distributed expression of S dimension space (the general span of S tens between hundreds of) Family vector model Model_X, Model_Y, particularly as follows:
All sequence nodes in account sequence node collection WalkList_X corresponding for OSN_X are inputted word steering volume as language material In instrument Word2Vec, window (window) and dimension (size) according to arranging are changed, and obtain each account xiRight The vectorial Vec_x answeredi, account xiAnd the vectorial Vec_x of correspondenceiThe account vector of online social networks OSN_X is constituted as item Model M odel_X, wherein, xiRepresent the i-th account of online social networks OSN_X, i=1,2 ..., m, m are online social The account number of network OSN_X;
Account sequence node collection WalkList_Y is done same process, obtains each account y in online social networks OSN_YjAnd it is corresponding Vectorial Vec_yjThe account vector model Model_Y, wherein y of online social networks OSN_Y is constituted as itemjRepresent online The jth account of social networks OSN_Y, j=1,2 ..., n, n are the account number of online social networks OSN_Y;
(4) matrix of a linear transformation W between two coordinate space corresponding to online social networks OSN_X, OSN_Y, is calculated
4.1), associate < x with the real accounts in known two online social networks OSN_X, OSN_Y being same userk, yk > build training set RealPairL, wherein, xkRepresent kth same user account in online social networks OSN_X, ykTable Show kth same user account in online social networks OSN_Y, total K same user;At account vector model Model_X, account vector model Model_Y find account x respectivelykCorresponding vectorial Vec_xk, account ykCorresponding vector Vec_yk
4.2), use stochastic gradient descent method, solve following optimization problem:
m i n W 1 2 K &Sigma; k = 1 K | | W V e c _ x k - V e c _ y k | | 2
First, W is initialized as each element and is the S × s-matrix of random value, then, carry out H iteration;For h (0 < h ≤ H) secondary iteration, randomly choose a sample point < Vec_xk、Vec_yk>, calculate gradient T=(W(h-1)Vec_xk-Vec_yk) (Vec_xk) ', wherein W(h-1)For the matrix of a linear transformation after (h-1) secondary iteration, (Vec_xk) ' for Vec_xkTransposed matrix;Connect Get off, update the matrix of a linear transformation: Wh=W(h-1)-α T, wherein, α is learning rate;After iteration several times, above-mentioned optimization is asked Summing function value in topic gradually restrains, and iterations now is H, and transformation matrix now is then required transformation matrix W;
(5), account association
By each account x in online social networks OSN_Xi, carry out calculated below:
bi=W Vec_xi
biThe node x being in OSN_XiIn the vector representation of the coordinate space of OSN_Y, then come by cosine similarity function Calculate biWith each account Vec_y in online social networks OSN_YjSimilarity, select similarity maximum and more than setting Account y of threshold valuejmaxAs account xiInterlock account or select maximum t (such as t is 5) the individual work of similarity according to purposes For Candidate Set.
The method that account the most according to claim 1 is associated, it is characterised in that described account node relationships figure root Obtain according to following methods:
2.1), utilize web crawlers to read the buddy list of each account, friend relation corresponding for each account is stored in text Document, thus obtain the account friend relation of the online social networks of target that is two online social networks OSN_X, OSN_Y;
2.2), text document each edge be recorded in the dictionary dict of figure limit is read, using each account i.e. node as key, with The all nodes being connected as value, then the relationship graph between certain social network account can be shown as dictionary class by us The variable { account: (adjacent account) } of type, this variable is account node relationships figure.
The method that account the most according to claim 1 is associated, it is characterised in that described account association Candidate Set root Obtain according to following methods:
Use and be associated with the account of same subscriber entity disclosed in two OSN as training set, utilize gradient descent method to ask Transformation matrix W, then in coordinate transform corresponding for one of them OSN interior joint to the coordinate system of another OSN, thus Can be based on cosine similarity cos (bi,,Vec_yj) compare the distance between two account vectors, show that in an OSN, given account exists Interlock account Candidate Set in another OSN, similarity is the biggest, and distance is the least, account Vec_yjAnd Vec_xiRelatedness more Height, it is then determined that similarity is the highest and similarity is the account being associated more than the node of given threshold value, user can also basis Purposes selects front t (such as t is 5) the individual account of similarity maximum as association Candidate Set.
CN201610057577.5A 2016-01-27 2016-01-27 A method of account in online social networks is associated Expired - Fee Related CN105741175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610057577.5A CN105741175B (en) 2016-01-27 2016-01-27 A method of account in online social networks is associated

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610057577.5A CN105741175B (en) 2016-01-27 2016-01-27 A method of account in online social networks is associated

Publications (2)

Publication Number Publication Date
CN105741175A true CN105741175A (en) 2016-07-06
CN105741175B CN105741175B (en) 2019-08-20

Family

ID=56246762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610057577.5A Expired - Fee Related CN105741175B (en) 2016-01-27 2016-01-27 A method of account in online social networks is associated

Country Status (1)

Country Link
CN (1) CN105741175B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330020A (en) * 2017-06-20 2017-11-07 电子科技大学 A kind of user subject analytic method based on structure and attributes similarity
CN107392782A (en) * 2017-06-29 2017-11-24 上海斐讯数据通信技术有限公司 Corporations' construction method, device and computer-processing equipment based on word2Vec
CN108021610A (en) * 2017-11-02 2018-05-11 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on distributed system
CN108022171A (en) * 2016-10-31 2018-05-11 腾讯科技(深圳)有限公司 A kind of data processing method and equipment
CN108073687A (en) * 2017-11-17 2018-05-25 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
CN108985309A (en) * 2017-05-31 2018-12-11 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN109242515A (en) * 2018-08-29 2019-01-18 阿里巴巴集团控股有限公司 Cross-platform abnormal account recognition methods and device
WO2019051962A1 (en) * 2017-09-14 2019-03-21 平安科技(深圳)有限公司 Real relationship matching method and apparatus for social platform users, and readable storage medium
CN109658094A (en) * 2017-10-10 2019-04-19 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
CN109739938A (en) * 2018-12-28 2019-05-10 广州华多网络科技有限公司 A kind of correlating method, device and the equipment of more accounts
CN110019975A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
CN110046194A (en) * 2019-03-19 2019-07-23 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment of expanding node relational graph
CN110162956A (en) * 2018-03-12 2019-08-23 华东师范大学 The method and apparatus for determining interlock account
CN110515986A (en) * 2019-08-27 2019-11-29 腾讯科技(深圳)有限公司 A kind of processing method of social network diagram, device and storage medium
CN111090814A (en) * 2019-12-30 2020-05-01 四川大学 Iterative cross-social network user account correlation method based on degree punishment
CN111143701A (en) * 2019-12-13 2020-05-12 中国电子科技网络信息安全有限公司 Social network user recommendation method and system based on multiple dimensions
CN111177248A (en) * 2020-04-10 2020-05-19 上海飞旗网络技术股份有限公司 Data storage method and device based on feature recognition and format conversion
CN111192154A (en) * 2019-12-25 2020-05-22 西安交通大学 Social network user node matching method based on style migration
CN111368013A (en) * 2020-06-01 2020-07-03 深圳市卡牛科技有限公司 Unified identification method, system, equipment and storage medium based on multiple accounts
CN111915429A (en) * 2020-08-11 2020-11-10 北京开科唯识技术有限公司 Account checking method and device
CN112232834A (en) * 2020-09-29 2021-01-15 中国银联股份有限公司 Resource account determination method, device, equipment and medium
WO2021043093A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Method and apparatus for associating and registering multiple accounts, computer device and storage medium
CN112819056A (en) * 2021-01-25 2021-05-18 百果园技术(新加坡)有限公司 Group control account mining method, device, equipment and storage medium
CN112861015A (en) * 2019-11-27 2021-05-28 北京达佳互联信息技术有限公司 Account associated information acquisition method and device in application program and electronic equipment
CN116090525A (en) * 2022-11-15 2023-05-09 广东工业大学 Embedded vector representation method and system based on hierarchical random walk sampling strategy

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009088671A1 (en) * 2008-01-04 2009-07-16 Yahoo! Inc. Identifying and employing social network relationships
CN102457501A (en) * 2010-10-26 2012-05-16 腾讯科技(深圳)有限公司 Identification method and system for instant messaging account
CN103927303A (en) * 2013-01-10 2014-07-16 华为技术有限公司 Method and device for searching accounts
CN104052651A (en) * 2014-06-03 2014-09-17 西安交通大学 Method and device for building social contact group
CN104573057A (en) * 2015-01-22 2015-04-29 电子科技大学 Account correlation method used for UGC (User Generated Content)-spanning website platform
CN104765729A (en) * 2014-01-02 2015-07-08 中国人民大学 Cross-platform micro-blogging community account matching method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009088671A1 (en) * 2008-01-04 2009-07-16 Yahoo! Inc. Identifying and employing social network relationships
CN102457501A (en) * 2010-10-26 2012-05-16 腾讯科技(深圳)有限公司 Identification method and system for instant messaging account
CN103927303A (en) * 2013-01-10 2014-07-16 华为技术有限公司 Method and device for searching accounts
CN104765729A (en) * 2014-01-02 2015-07-08 中国人民大学 Cross-platform micro-blogging community account matching method
CN104052651A (en) * 2014-06-03 2014-09-17 西安交通大学 Method and device for building social contact group
CN104573057A (en) * 2015-01-22 2015-04-29 电子科技大学 Account correlation method used for UGC (User Generated Content)-spanning website platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FREDRIK JOHANSSON 等: "Detecting Multiple Aliases in Social Media", 《2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2013)》 *

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108022171B (en) * 2016-10-31 2021-10-15 腾讯科技(深圳)有限公司 Data processing method and equipment
CN108022171A (en) * 2016-10-31 2018-05-11 腾讯科技(深圳)有限公司 A kind of data processing method and equipment
CN108985309B (en) * 2017-05-31 2022-11-29 腾讯科技(深圳)有限公司 Data processing method and device
CN108985309A (en) * 2017-05-31 2018-12-11 腾讯科技(深圳)有限公司 A kind of data processing method and device
CN107330020A (en) * 2017-06-20 2017-11-07 电子科技大学 A kind of user subject analytic method based on structure and attributes similarity
CN107330020B (en) * 2017-06-20 2020-03-24 电子科技大学 User entity analysis method based on structure and attribute similarity
CN107392782A (en) * 2017-06-29 2017-11-24 上海斐讯数据通信技术有限公司 Corporations' construction method, device and computer-processing equipment based on word2Vec
WO2019051962A1 (en) * 2017-09-14 2019-03-21 平安科技(深圳)有限公司 Real relationship matching method and apparatus for social platform users, and readable storage medium
CN109658094B (en) * 2017-10-10 2020-09-18 阿里巴巴集团控股有限公司 Random walk, random walk method based on cluster, random walk device and equipment
US10776334B2 (en) 2017-10-10 2020-09-15 Alibaba Group Holding Limited Random walking and cluster-based random walking method, apparatus and device
US10901971B2 (en) 2017-10-10 2021-01-26 Advanced New Technologies Co., Ltd. Random walking and cluster-based random walking method, apparatus and device
CN110019975A (en) * 2017-10-10 2019-07-16 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
CN109658094A (en) * 2017-10-10 2019-04-19 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
WO2019085614A1 (en) * 2017-11-02 2019-05-09 阿里巴巴集团控股有限公司 Random walking, and random walking method, apparatus and device based on distributed system
CN108021610A (en) * 2017-11-02 2018-05-11 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on distributed system
US11074246B2 (en) 2017-11-17 2021-07-27 Advanced New Technologies Co., Ltd. Cluster-based random walk processing
WO2019095858A1 (en) * 2017-11-17 2019-05-23 阿里巴巴集团控股有限公司 Random walk method, apparatus and device, and cluster-based random walk method, apparatus and device
CN108073687B (en) * 2017-11-17 2020-09-08 阿里巴巴集团控股有限公司 Random walk, random walk method based on cluster, random walk device and equipment
TWI709049B (en) * 2017-11-17 2020-11-01 開曼群島商創新先進技術有限公司 Random walk, cluster-based random walk method, device and equipment
CN108073687A (en) * 2017-11-17 2018-05-25 阿里巴巴集团控股有限公司 Random walk, random walk method, apparatus and equipment based on cluster
CN110162956B (en) * 2018-03-12 2024-01-19 华东师范大学 Method and device for determining associated account
CN110162956A (en) * 2018-03-12 2019-08-23 华东师范大学 The method and apparatus for determining interlock account
CN109242515A (en) * 2018-08-29 2019-01-18 阿里巴巴集团控股有限公司 Cross-platform abnormal account recognition methods and device
CN109242515B (en) * 2018-08-29 2021-07-23 创新先进技术有限公司 Cross-platform abnormal account identification method and device
CN109739938A (en) * 2018-12-28 2019-05-10 广州华多网络科技有限公司 A kind of correlating method, device and the equipment of more accounts
CN110046194A (en) * 2019-03-19 2019-07-23 阿里巴巴集团控股有限公司 A kind of method, apparatus and electronic equipment of expanding node relational graph
CN110515986A (en) * 2019-08-27 2019-11-29 腾讯科技(深圳)有限公司 A kind of processing method of social network diagram, device and storage medium
CN110515986B (en) * 2019-08-27 2023-01-06 腾讯科技(深圳)有限公司 Processing method and device of social network diagram and storage medium
WO2021043093A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Method and apparatus for associating and registering multiple accounts, computer device and storage medium
CN112861015A (en) * 2019-11-27 2021-05-28 北京达佳互联信息技术有限公司 Account associated information acquisition method and device in application program and electronic equipment
CN111143701A (en) * 2019-12-13 2020-05-12 中国电子科技网络信息安全有限公司 Social network user recommendation method and system based on multiple dimensions
CN111192154A (en) * 2019-12-25 2020-05-22 西安交通大学 Social network user node matching method based on style migration
CN111192154B (en) * 2019-12-25 2023-05-02 西安交通大学 Social network user node matching method based on style migration
CN111090814B (en) * 2019-12-30 2021-02-09 四川大学 Iterative cross-social network user account correlation method based on degree punishment
CN111090814A (en) * 2019-12-30 2020-05-01 四川大学 Iterative cross-social network user account correlation method based on degree punishment
CN111177248A (en) * 2020-04-10 2020-05-19 上海飞旗网络技术股份有限公司 Data storage method and device based on feature recognition and format conversion
CN111368013A (en) * 2020-06-01 2020-07-03 深圳市卡牛科技有限公司 Unified identification method, system, equipment and storage medium based on multiple accounts
CN111915429A (en) * 2020-08-11 2020-11-10 北京开科唯识技术有限公司 Account checking method and device
CN112232834A (en) * 2020-09-29 2021-01-15 中国银联股份有限公司 Resource account determination method, device, equipment and medium
CN112232834B (en) * 2020-09-29 2024-04-26 中国银联股份有限公司 Resource account determination method, device, equipment and medium
CN112819056A (en) * 2021-01-25 2021-05-18 百果园技术(新加坡)有限公司 Group control account mining method, device, equipment and storage medium
CN116090525A (en) * 2022-11-15 2023-05-09 广东工业大学 Embedded vector representation method and system based on hierarchical random walk sampling strategy
CN116090525B (en) * 2022-11-15 2024-02-13 广东工业大学 Embedded vector representation method and system based on hierarchical random walk sampling strategy

Also Published As

Publication number Publication date
CN105741175B (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN105741175A (en) Method for linking accounts in OSNs (On-line Social Networks)
CN103227731B (en) Based on the complex network node importance local calculation method improving &#34; structural hole &#34;
Danglade et al. On the use of machine learning to defeature CAD models for simulation
CN112256981B (en) Rumor detection method based on linear and nonlinear propagation
CN104462592A (en) Social network user behavior relation deduction system and method based on indefinite semantics
CN102799671A (en) Network individual recommendation method based on PageRank algorithm
CN105719191A (en) System and method of discovering social group having unspecified behavior senses in multi-dimensional space
CN105158761A (en) Radar synthetic phase unwrapping method based on branch-cut method and surface fitting
CN102708327A (en) Network community discovery method based on spectrum optimization
CN113486190A (en) Multi-mode knowledge representation method integrating entity image information and entity category information
Asoodeh et al. Oil-CO2 MMP determination in competition of neural network, support vector regression, and committee machine
CN105574541A (en) Compactness sorting based network community discovery method
CN105678590A (en) topN recommendation method for social network based on cloud model
Mo et al. Choosing a heuristic and root node for edge ordering in BDD-based network reliability analysis
Samantaray et al. Modelling response of infiltration loss toward water table depth using RBFN, RNN, ANFIS techniques
CN102819611B (en) Local community digging method of complicated network
CN105205184A (en) Latent variable model-based user preference extraction method
CN107658029A (en) A kind of brand-new distribution and privatization miRNA diseases contact Forecasting Methodology
CN107133274A (en) A kind of distributed information retrieval set option method based on figure knowledge base
Campbell et al. Cross-domain entity resolution in social media
CN104899283A (en) Frequent sub-graph mining and optimizing method for single uncertain graph
CN109977131A (en) A kind of house type matching system
CN103186696B (en) Towards the auxiliary variable reduction method of high dimensional nonlinear soft-sensing model
Holzer et al. An analysis of the renormalization group method for asymptotic expansions with logarithmic switchback terms
CN105761152A (en) Topic participation prediction method based on triadic group in social network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190820

Termination date: 20220127

CF01 Termination of patent right due to non-payment of annual fee