CN111475739A - Heterogeneous social network user anchor link identification method based on meta-path - Google Patents

Heterogeneous social network user anchor link identification method based on meta-path Download PDF

Info

Publication number
CN111475739A
CN111475739A CN202010438376.6A CN202010438376A CN111475739A CN 111475739 A CN111475739 A CN 111475739A CN 202010438376 A CN202010438376 A CN 202010438376A CN 111475739 A CN111475739 A CN 111475739A
Authority
CN
China
Prior art keywords
user
social network
matrix
users
heterogeneous social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010438376.6A
Other languages
Chinese (zh)
Other versions
CN111475739B (en
Inventor
杨武
王巍
玄世昌
苘大鹏
吕继光
刘娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202010438376.6A priority Critical patent/CN111475739B/en
Publication of CN111475739A publication Critical patent/CN111475739A/en
Application granted granted Critical
Publication of CN111475739B publication Critical patent/CN111475739B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention belongs to the technical field of social network entity anchor link identification, and particularly relates to a heterogeneous social network user anchor link identification method based on a meta-path. The method aims at scenes of multiple entities, complex link relations and a small number of label entities of a heterogeneous social network, avoids the problems of uneven data distribution, feature selection and the like based on unsupervised learning, and fully excavates the link relations related to the users by utilizing the link relations and attribute information of the user entities in the social network and combining a meta-path technology. And converting the user anchor link identification problem into the problem of optimizing the objective function by means of the definition of a cost function, a matrix norm, the objective function and the like.

Description

Heterogeneous social network user anchor link identification method based on meta-path
Technical Field
The invention belongs to the technical field of social network entity anchor link identification, and particularly relates to a heterogeneous social network user anchor link identification method based on a meta-path.
Background
Since the 21 st century, internet technology has evolved over the years and more people can become participants in social networks. According to statistics, the number of users in the global social network industry reaches 28.2 hundred million and accounts for 70.4 percent of the total number of net citizens by 2019. In foreign countries, people often communicate using Twitter and Facebook; in China, people like to browse various hot news by using microblogs, and meanwhile, real-time communication activities based on friends are carried out by using QQ and WeChat. Online social networks stem from various daily interactions from person to person, being snapshots of real-world activities mapped to the network, with the same user often participating in multiple social networks. With the ever increasing number of users participating in a social network, the problem of identifying anchor-linked users (aligned users) across the network has many important practical implications. Heterogeneity of social networks, variability among multiple social networks, the lack of a large number of tagged users with known anchor link relationships, and one-to-one limitations present challenges to anchor link user identification. How to construct a heterogeneous social network user anchor link identification model by using numerous and miscellaneous data related to users in a social network becomes a prior research focus of community discovery and recommendation systems and multi-network fusion.
The social network anchor link identification problem is firstly proposed in 2013 by J.Zhang et al, researchers perform cross-network fusion on information owned by a corresponding user in twitter and foursquare networks, and the user can be directly linked to one network when displaying a user homepage of the other network. The Danai Koutra converts the user anchor link identification problem into a bipartite graph matching problem based on the link relation characteristics between users. The Tang goose proposes an algorithm using user attribute information, analyzes all semantic information of a user in a social network based on a theme model, and synthesizes network structure characteristics to perform anchor link identification. Yizhou Sun et al first propose a meta-path concept, and in a paper partner network, partner link prediction is performed by using meta-paths to characterize complex relationships among entities such as papers, authors, meetings, publishers, and the like. The topic of how to extend the meta-path technology into the social network and simultaneously utilize the link relation and attribute information of the user to perform anchor link identification is that no novel and effective method exists so far. Aiming at different user entities in a heterogeneous social network, the anchor link identification problem is that accounts registered by the same user in the real world are all identified between two or more networks by analyzing information of the user in the networks, and the aligned accounts meet one-to-one link mapping relation between different networks.
Disclosure of Invention
The invention aims to provide a heterogeneous social network user anchor link identification method based on a meta path.
The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:
step 1: inputting heterogeneous social networks S1 and S2;
step 2: according to meta path
Figure BDA0002503153140000011
The first order link relationship between the users is determined in the two networks S1 and S2, respectively, and the matrix M is used to determine whether the first order link relationship exists between the users(1)And M(2)Expressing that rows and columns of the matrix respectively correspond to users in one network, matrix element 1 expresses that the two users in the network have a first-order link relation, and matrix element 0 expresses that the two users in the network do not have the first-order link relation;
and step 3: according to
Figure BDA0002503153140000021
And
Figure BDA0002503153140000022
Figure BDA0002503153140000023
the second-order link relation represented by the three element paths respectively obtains twoTotal number of users within the networks S1 and S2 and meta paths among users satisfying the above three formats
Figure BDA0002503153140000024
And
Figure BDA0002503153140000025
wherein u is1iAnd u1jTwo users in the heterogeneous social network S1; u. of2iAnd u2jTwo users in the heterogeneous social network S2;
and 4, step 4: computing a second order link relationship matrix B of the heterogeneous social network S1(1)Second order Link relationship matrix B with heterogeneous social network S2(2)(ii) a Matrix B(1)Of (2) element(s)
Figure BDA0002503153140000026
And matrix B(2)Of (2) element(s)
Figure BDA0002503153140000027
Comprises the following steps:
Figure BDA0002503153140000028
Figure BDA0002503153140000029
wherein the content of the first and second substances,
Figure BDA00025031531400000210
and
Figure BDA00025031531400000211
respectively representing users u in heterogeneous social networks S11iAnd u1jDegree of (d);
Figure BDA00025031531400000212
and
Figure BDA00025031531400000213
respectively representing users u in heterogeneous social networks S22iAnd u2jDegree of (d);
and 5: by means of matrices B(1)Correction matrix M(1)Using a matrix B(2)Correction matrix M(2)To obtain the final friend relation adjacent matrix M between users(1)And M(2)
If M is(1)Skipping if the middle element is 1; if M is(1)If the middle element is 0, check B(1)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(1)If the element at the corresponding position in M is not less than 0.5, M is added(1)Element 0 in (1) is changed to 1; otherwise, M(1)The middle element remains unchanged;
if M is(2)Skipping if the middle element is 1; if M is(2)If the middle element is 0, check B(2)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(2)If the element at the corresponding position in M is not less than 0.5, M is added(2)Element 0 in (1) is changed to 1; otherwise, M(2)The middle element remains unchanged;
step 6: according to meta path
Figure BDA00025031531400000214
The sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)Representing that rows and columns of the matrix correspond to users and positions, representing whether a check-in relation exists in corresponding elements, if the check-in relation exists in the corresponding users and positions, the corresponding elements are 1, otherwise, the corresponding elements are 0;
and 7: acquiring a user anchor link identification target function F (X, Y);
Figure BDA0002503153140000031
wherein X represents a mapping matrix of the anchor link relation of the user; y represents a user position relation mapping matrix;
Figure BDA00025031531400000313
a square of a Frobenius norm representing the matrix; | | non-woven hair1L1 norm of matrix, V is the similarity matrix of user attribute features between networks and matrix element Vm,nComprises the following steps:
Figure BDA0002503153140000032
wherein the content of the first and second substances,
Figure BDA0002503153140000033
for user u in heterogeneous social network S1iAnd user u in S2jThe user name similarity of (a) is high,
Figure BDA0002503153140000034
for user uiThe user name of (a) is used,
Figure BDA0002503153140000035
for user ujThe user name of (1);
Figure BDA0002503153140000036
for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the online activity time patterns of (c),
Figure BDA0002503153140000037
for user uiThe online activity time vector of (a) is,
Figure BDA0002503153140000038
for user ujAn online activity time vector of;
Figure BDA0002503153140000039
for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the text content of (a),
Figure BDA00025031531400000310
for user uiThe word-frequency vector of the text content,
Figure BDA00025031531400000311
for user ujWord frequency vectors of text content;
and 8: calculating a matrix X when the user anchor link identification target function F (X, Y) is the minimum value to obtain a user anchor link relation mapping matrix X; the element in the matrix X is 1, which represents that the two corresponding users have the anchor link relation, and the element in the matrix X is 0, which represents that the anchor link relation does not exist between the two corresponding users.
The present invention may further comprise:
the method for calculating the minimum value of the user anchor link identification objective function F (X, Y) in the step 8 comprises the following steps: fixing Y, solving the partial derivative of X by the target function and updating elements in the matrix X by adopting an alternative projection gradient descending method; fixing X, solving the partial derivative of the target function to Y, and updating the elements in the matrix Y; after each pair of X and Y is updated, correcting elements in the matrixes X and Y; if an element in the matrix is greater than 1, projecting the element as 1; if an element in the matrix is less than 0, projecting the element as 0; otherwise, the condition is kept unchanged; setting the maximum iteration times of the gradient decrease of the alternate projection, continuously updating the matrixes X and Y under the condition that the maximum iteration times are not exceeded, and outputting a user anchor link relation mapping matrix X until a target function reaches the minimum value;
Figure BDA00025031531400000312
the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jUser name similarity of
Figure BDA0002503153140000041
The calculation method comprises the following steps:
Figure BDA0002503153140000042
Figure BDA0002503153140000043
wherein g is user uiAnd user ujH is equal to half of the number of the transposition occurring in the matched characters;
Figure BDA0002503153140000044
for user uiThe user name length of (1);
Figure BDA0002503153140000045
for user ujThe user name length of (1); l is user uiAnd user ujCommon prefix length for both usernames.
The user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jOn-line activity time pattern similarity of
Figure BDA0002503153140000046
The calculation method comprises the following steps:
Figure BDA0002503153140000047
Figure BDA0002503153140000048
Figure BDA0002503153140000049
wherein the content of the first and second substances,
Figure BDA00025031531400000410
for user u in heterogeneous social network S1iFrequency of activity within the mth time period;
Figure BDA00025031531400000411
for user u in heterogeneous social network S2jFrequency of activity within the mth time period; aiming at the online activity time mode of a user, a 24-hour system is adopted, a day is divided into 24 time periods,one hour is a period of time; (ii) a
Figure BDA00025031531400000412
Figure BDA00025031531400000413
Figure BDA00025031531400000414
Representing a user u in a heterogeneous social network S1iIn the m period of time k is publishediA text;
Figure BDA00025031531400000415
representing a user u in a heterogeneous social network S2jIn the m period of time k is publishedjAnd (4) a text.
The user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents of
Figure BDA00025031531400000416
The calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectively
Figure BDA0002503153140000051
And
Figure BDA0002503153140000052
step 7.3: calculating word frequency vectors
Figure BDA0002503153140000053
And
Figure BDA0002503153140000054
the cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
Figure BDA0002503153140000055
the invention has the beneficial effects that:
the invention provides a heterogeneous social network user anchor link identification method based on a meta path, aiming at scenes of multiple entities, complex link relations and a few label entities of a heterogeneous social network. The method is based on unsupervised learning, avoids the problems of uneven data distribution, characteristic selection and the like, and fully excavates the link relation related to the user by utilizing the link relation and the attribute information of the user entity in the social network and combining the meta-path technology. And converting the user anchor link identification problem into the problem of optimizing the objective function by means of the definition of a cost function, a matrix norm, the objective function and the like.
Drawings
FIG. 1 is a heterogeneous social network structure schema diagram.
Fig. 2 is a schematic diagram of friend relationships between users in a network.
Fig. 3 is a schematic diagram of a check-in relationship between a user and a location within a network.
FIG. 4(a) is a fitting function f1(x)=0+1And (5) an x diagram.
FIG. 4(b) is a fitting function f2(x)=0+1x+2x2Figure (a).
FIG. 4(c) is a fitting function f3(x)=0+1x+2x2+3x3+4x4Figure (a).
Detailed Description
The invention is further described below with reference to the accompanying drawings.
FIG. 1 shows a diagram of a heterogeneous social network structure schema, with the content in the circles representing the structureFIG. 2 shows a schematic diagram of relationships between users and users within a social network, circles represent user entities, and undirected edges between circles represent link relationships between entities, wherein undirected edges between users f and g represent link relationships between entities, while user f and e do not have a first order link relationship, but have many common neighbors, as shown by an ellipse with dotted lines, both have friends a, a are referred to as user f and e, the common relationship between user f and e is referred to as a "sign relationship", and the common relationship between user f and user e is referred to as a "sign relationship" or "sign relationship" 4. the common relationship between user f and user e is referred to as a "sign relationship" 4. the common relationship between user f and user e is referred to a "sign relationship" 4. the sign relationship between user f and user e is referred to a "sign relationship" as "4. the corresponding to a" sign relationship between user f and user e.g. the "sign relationship between user f and user e" is referred to a "4. the corresponding to a" sign relationship between a "and a" sign relationship between user f, where the sign relationship between the user f and the sign relationship between the user is referred to a "sign relationship is referred to a" 4. the sign relationship between the sign relationship is referred to a "sign relationship between the sign1(x)=0+1x, and the fitting function corresponding to FIG. 4(b) is f2(x)=0+1x+2x2FIG. 4(c) shows a fitting function f3(x)=0+1x+2x2+3x3+4x4. The invention provides a heterogeneous social network user anchor link identification method based on a meta path.
The method comprises the following implementation steps:
1. for heterogeneous social networks S1 and S2.
2. According to meta path
Figure BDA0002503153140000061
The first-order link relation expressed is used for judging whether the first-order link relation exists between the users in the two networks and using a matrix M(1)And M(2)And the expression that 1 represents that a first-order link relation exists between two users in the network, and conversely, 0 represents that the link relation does not exist.
3. According to
Figure BDA0002503153140000062
And
Figure BDA0002503153140000063
and judging whether a second-order link relation exists between the user and the user or not by the second-order link relation represented by the three element paths. And respectively acquiring the total number of the users in the two networks and the total number of the meta paths between the users meeting the three formats by an adjacent matrix multiplication method.
4. Dividing 2 times of the total number of the element paths corresponding to the second-order link relation of the users in the network obtained in the step 3 by the sum of degrees of the end point users of the element paths, and correcting the matrix M by the obtained numerical value(1)And M: (2). If the numerical value is not less than 0.5, modifying the corresponding element of the matrix into 1, and keeping the original 1 unchanged; if the value is less than 0.5, the element remains unchanged.
5. And defining a cost function based on the friend relationship of the user.
6. According to meta path
Figure BDA0002503153140000064
The sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)And (4) showing.
7. A cost function is defined based on the sign-in relationship of the user and the location.
8. And defining a cost function based on attribute information of the user name, the online activity time mode and the text content of the user.
9. And acquiring a user anchor link identification total cost function.
10. And acquiring a user anchor link identification target function.
11. And solving the minimum value of the target function by using a projection gradient descent method to obtain a user anchor link relation mapping matrix.
The invention provides a heterogeneous social network user anchor link identification method based on a meta path, aiming at scenes of multiple entities, complex link relations and a few label entities of a heterogeneous social network. The method is based on unsupervised learning, avoids the problems of uneven data distribution, characteristic selection and the like, and fully excavates the link relation related to the user by utilizing the link relation and the attribute information of the user entity in the social network and combining the meta-path technology. And converting the user anchor link identification problem into the problem of optimizing the objective function by means of the definition of a cost function, a matrix norm, the objective function and the like.
1. The scheme relates to some definitions, wherein E represents an entity set in the heterogeneous social network, E represents U ∪L∪ T ∪ C, wherein U represents a user, L represents a position, f represents a timestamp, C represents text, R represents a link relation set between the entity and the entity, a represents an attribute information set possessed by a certain type of entity, a represents n ∪ T ∪ w for the user, n represents a user name, T represents a user daily activity time vector, and w represents a word frequency vector of text content published by the user.
Figure BDA0002503153140000071
User name representing user i in the heterogeneous social network S1,
Figure BDA0002503153140000072
A time vector representing daily activities of user i in the heterogeneous social network S1,
Figure BDA0002503153140000073
A word frequency vector representing the textual content posted by user i in the heterogeneous social network S1.
2. According to meta path
Figure BDA0002503153140000074
The first order link relationship between the users is determined in the two networks S1 and S2, respectively, and the matrix M is used to determine whether the first order link relationship exists between the users(1)And M(2)The expression shows that the rows and columns of the matrix respectively correspond to users in one network, and the matrix element 1 represents that the users in the two networks have a first-order link relation, whereas 0 represents that the users do not exist.
3. According to
Figure BDA0002503153140000075
And
Figure BDA0002503153140000076
and judging whether a second-order link relation exists between users in the network or not according to the second-order link relation represented by the three element paths. And respectively acquiring the total number of the users in the two networks and the total number of the meta paths between the users meeting the three formats by an adjacent matrix multiplication method. For example, in network S1, for two given users uiAnd ujTo see if there is a coincidence between the two MP2And calculating the number of meta-paths conforming to the definition of the meta-path by a meta-path counting method, wherein the number is expressed as
Figure BDA0002503153140000077
For MP3And MP4The same counting operation is also performed, and finally, the total number of the meta-paths between the two users, which are defined according to the three meta-paths, is accumulated and expressed as
Figure BDA0002503153140000078
Figure BDA0002503153140000079
4. According to the Soronson index formula, the total number of the element paths obtained in the step 3 is compared
Figure BDA00025031531400000710
Divided by user uiAnd ujThe obtained value is used as an element of a second-order link relation matrix to obtain B(1)And B(2)The matrix is a matrix of a plurality of matrices,
Figure BDA00025031531400000711
and
Figure BDA00025031531400000712
respectively represents uiAnd ujDegree of (c).
Figure BDA0002503153140000081
5. Is calculated to obtain B(1)And B(2)Then, using B(1)Correction of M: (1)With B(2)Correction M(2)。M(1)And M(2)If the corresponding element is already set to 1, skipping; if the element is 0, check B(1)And B(2)If the element of the corresponding position in the intermediate position is not less than 0.5, if so, M is added(1)Or M(2)0 in (1) to 1; if not, it remains unchanged. After all correction operations are finished, a final complete friend relation adjacency matrix M between the users is obtained(1)And M: (2)
6. According to the related definition of cost function and matrix norm in machine learning, defining cost function based on complete friend relationship between users in network, X represents user anchor link relation mapping matrix,
Figure BDA0002503153140000082
representing the square of the frobenius norm of the respective matrix.
Figure BDA0002503153140000083
7. According to meta path
Figure BDA0002503153140000084
The sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)And expressing that the rows and the columns of the matrix correspond to users and positions, and expressing whether the sign-in relation exists in the corresponding elements, wherein if the sign-in relation exists in the corresponding users and positions, the corresponding elements are 1, and otherwise, the corresponding elements are 0.
8. And according to the related definition of the cost function and the matrix norm in machine learning, defining the cost function based on the sign-in relation between users and positions in the network.
Figure BDA0002503153140000085
9. For the user name of the user, the similarity of the user names among different users is measured by using the Jaro-Winkler similarity, and for the user u in the network S1iAnd user u in network S2jThe respective corresponding user names are expressed as
Figure BDA0002503153140000086
And
Figure BDA0002503153140000087
the username similarity is expressed as
Figure BDA0002503153140000088
g represents the number of characters matching on two usernames, h is equal to half the number of transpositions occurring in the matching characters,
Figure BDA0002503153140000089
and
Figure BDA00025031531400000810
respectively, the lengths of the user names, and l represents the common prefix length of the two user names.
Figure BDA00025031531400000811
Figure BDA00025031531400000812
10. Aiming at the online activity time mode of a user, a 24-hour system is adopted, and a day is divided into 24 time periods, wherein one hour is one time period. And taking the published texts of the user in the network as the identifications of the user participating in the social activity, counting the total number of the published texts of the user in one day, respectively counting the number of the published texts of the user in each time period, and dividing the number of the published texts by the total number of the texts to obtain the proportion of the published texts in each time period. For example, for user uaDuring the mth period, k texts are published, denoted as num (u)aAnd m) is k, and the activity frequency of the user in the mth time period is recorded as t (u)a,m)。
Figure BDA0002503153140000091
Calculating each time segment in turn, and finally, calculating the time segment for the user uaObtaining a 24-length temporal activity vector ua,t
ua,t=(t(ua,0),t(ua,1)...t(ua,23)) (8)
Each element in the vector reveals the frequency of activity of the user over a corresponding time period, for user u in network S1iAnd user u in network S2jFirstly, the above-mentioned method is used to obtain respective online activity time vector
Figure BDA0002503153140000092
And
Figure BDA0002503153140000093
and then, calculating the similarity of the two vectors by utilizing the inner product to obtain the similarity of the online activity time modes of the two users.
Figure BDA0002503153140000094
11. For text content sent by userFor user u in network S1iAnd user u in network S2jFirstly, the key words in the respective texts are calculated by using the TF-IDF algorithm, and then the key words are combined into a set. Secondly, relative word frequency corresponding to words in the set is calculated for the text contents of the two users respectively, and word frequency vectors with the same length are generated for the two users respectively
Figure BDA0002503153140000095
And
Figure BDA0002503153140000096
and finally, calculating cosine similarity of the two vectors, and converting the text content similarity into comparison of the included angles of the vectors in the same dimension.
Figure BDA0002503153140000097
12. And (5) integrating the three attribute information of the 6, 9 and 10 related users to obtain an inter-network user attribute feature similarity matrix V, and defining a cost function based on the inter-network user attribute information similarity.
Figure BDA0002503153140000098
13. And integrating cost functions of the link relation and the attribute information of the users in the heterogeneous social network, and defining a total cost function of the user anchor link identification.
Figure BDA0002503153140000099
14. In order to prevent the overfitting phenomenon of the fitting function, a regularization term is added on the basis of the primary price function to form an objective function.
Figure BDA00025031531400000910
15. According to the following two lemmas, the difficulty in solving the objective function is reduced.
Introduction 1: for a given matrix A, its Frobenius norm squared is equal to matrix AATThe trace of (c).
Figure BDA0002503153140000101
Lesion 2 given matrix A and matrix B, L of the Hadamard products of A and B1Norm equal to matrix ABTOr ATTrace of B.
||AοB||1=tr(ABT)=tr(ATB) (15)
The objective function then translates into:
F(X,Y)=tr((XTM(1)X-M(2))(XTM(1)X-M(2))T)+tr((XTN(1)Y-N(2))(XTN(1)Y-N(2))T)-tr(XVT)+||X||1+||Y||1(16)
16. the objective function F (X, Y) is minimized by continuously updating the parameters X and Y, and when X and Y do not change any more, i.e. the matrices X and Y reach a convergence state, the objective function reaches a minimum value, at which time the matrix X is a mapping matrix of the user anchor link relationship between the two networks. Fixing Y, solving the partial derivative of X by the target function and updating elements in the matrix X by adopting an alternative projection gradient descending method; and fixing X, solving the partial derivative of the target function to Y, and updating the elements in the matrix Y. After each pair of wheels X and Y is updated, two things need to be done. Firstly, correcting elements in the matrix X and the matrix Y, and if the elements in the matrix are more than 1, projecting the elements to be 1; if the element is less than 0, the projection is 0; otherwise, the situation remains unchanged. And secondly, recalculating the updated value of the target function to obtain the latest cost value.
Figure BDA0002503153140000102
17. And setting the maximum iteration times of the gradient decrease of the alternate projection, continuously updating the matrixes X and Y under the condition that the maximum iteration times are not exceeded, and when the conditions are met, the target function reaches the minimum value, wherein the matrix X corresponds to the user anchor link relation mapping matrix. The element in the matrix X is 1, which represents that the two corresponding users have the anchor link relation, and the element in the matrix X is 0, which represents that the anchor link relation does not exist between the two corresponding users.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A heterogeneous social network user anchor link identification method based on meta-path is characterized by comprising the following steps:
step 1: inputting heterogeneous social networks S1 and S2;
step 2: according to meta path
Figure FDA0002503153130000011
The first order link relationship between the users is determined in the two networks S1 and S2, respectively, and the matrix M is used to determine whether the first order link relationship exists between the users(1)And M(2)Expressing that rows and columns of the matrix respectively correspond to users in one network, matrix element 1 expresses that the two users in the network have a first-order link relation, and matrix element 0 expresses that the two users in the network do not have the first-order link relation;
and step 3: according to
Figure FDA0002503153130000012
And
Figure FDA0002503153130000013
Figure FDA0002503153130000014
the second-order link relation represented by the three element paths respectively acquires two networks S1 and S2, total number of users and meta paths between users satisfying the above three formats
Figure FDA0002503153130000015
And
Figure FDA0002503153130000016
wherein u is1iAnd u1jTwo users in the heterogeneous social network S1; u. of2iAnd u2jTwo users in the heterogeneous social network S2;
and 4, step 4: computing a second order link relationship matrix B of the heterogeneous social network S1(1)Second order Link relationship matrix B with heterogeneous social network S2(2)(ii) a Matrix B(1)Of (2) element(s)
Figure FDA0002503153130000017
And matrix B(2)Of (2) element(s)
Figure FDA0002503153130000018
Comprises the following steps:
Figure FDA0002503153130000019
Figure FDA00025031531300000110
wherein the content of the first and second substances,
Figure FDA00025031531300000111
and
Figure FDA00025031531300000112
respectively representing users u in heterogeneous social networks S11iAnd u1jDegree of (d);
Figure FDA00025031531300000113
and
Figure FDA00025031531300000114
respectively representing users u in heterogeneous social networks S22iAnd u2jDegree of (d);
and 5: by means of matrices B(1)Correction matrix M(1)Using a matrix B(2)Correction matrix M(2)To obtain the final friend relation adjacent matrix M between users(1)And M(2)
If M is(1)Skipping if the middle element is 1; if M is(1)If the middle element is 0, check B(1)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(1)If the element at the corresponding position in M is not less than 0.5, M is added(1)Element 0 in (1) is changed to 1; otherwise, M(1)The middle element remains unchanged;
if M is(2)Skipping if the middle element is 1; if M is(2)If the middle element is 0, check B(2)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(2)If the element at the corresponding position in M is not less than 0.5, M is added(2)Element 0 in (1) is changed to 1; otherwise, M(2)The middle element remains unchanged;
step 6: according to meta path
Figure FDA0002503153130000021
The sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)Representing that rows and columns of the matrix correspond to users and positions, representing whether a check-in relation exists in corresponding elements, if the check-in relation exists in the corresponding users and positions, the corresponding elements are 1, otherwise, the corresponding elements are 0;
and 7: acquiring a user anchor link identification target function F (X, Y);
Figure FDA0002503153130000022
wherein X represents a mapping matrix of the anchor link relation of the user; y isRepresenting a user position relation mapping matrix;
Figure FDA0002503153130000023
a square of a Frobenius norm representing the matrix; | | non-woven hair1L1 norm of matrix, V is the similarity matrix of user attribute features between networks and matrix element Vm,nComprises the following steps:
Figure FDA0002503153130000024
wherein the content of the first and second substances,
Figure FDA0002503153130000025
for user u in heterogeneous social network S1iAnd user u in S2jThe user name similarity of (a) is high,
Figure FDA0002503153130000026
for user uiThe user name of (a) is used,
Figure FDA0002503153130000027
for user ujThe user name of (1);
Figure FDA0002503153130000028
for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the online activity time patterns of (c),
Figure FDA0002503153130000029
for user uiThe online activity time vector of (a) is,
Figure FDA00025031531300000210
for user ujAn online activity time vector of;
Figure FDA00025031531300000211
for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the text content of (a),
Figure FDA00025031531300000212
for user uiThe word-frequency vector of the text content,
Figure FDA00025031531300000213
for user ujWord frequency vectors of text content;
and 8: calculating a matrix X when the user anchor link identification target function F (X, Y) is the minimum value to obtain a user anchor link relation mapping matrix X; the element in the matrix X is 1, which represents that the two corresponding users have the anchor link relation, and the element in the matrix X is 0, which represents that the anchor link relation does not exist between the two corresponding users.
2. The heterogeneous social network user anchor link identification method based on meta path according to claim 1, wherein: the method for calculating the minimum value of the user anchor link identification objective function F (X, Y) in the step 8 comprises the following steps: fixing Y, solving the partial derivative of X by the target function and updating elements in the matrix X by adopting an alternative projection gradient descending method; fixing X, solving the partial derivative of the target function to Y, and updating the elements in the matrix Y; after each pair of X and Y is updated, correcting elements in the matrixes X and Y; if an element in the matrix is greater than 1, projecting the element as 1; if an element in the matrix is less than 0, projecting the element as 0; otherwise, the condition is kept unchanged; setting the maximum iteration times of the gradient decrease of the alternate projection, continuously updating the matrixes X and Y under the condition that the maximum iteration times are not exceeded, and outputting a user anchor link relation mapping matrix X until a target function reaches the minimum value;
Figure FDA0002503153130000031
3. the heterogeneous social network user anchor link identification method based on meta path according to claim 1 or 2, wherein: the steps areUser u in heterogeneous social network S1 in step 7iAnd user u in S2jUser name similarity of
Figure FDA0002503153130000032
The calculation method comprises the following steps:
Figure FDA0002503153130000033
Figure FDA0002503153130000034
wherein g is user uiAnd user ujH is equal to half of the number of the transposition occurring in the matched characters;
Figure FDA0002503153130000035
for user uiThe user name length of (1);
Figure FDA0002503153130000036
for user ujThe user name length of (1); l is user uiAnd user ujCommon prefix length for both usernames.
4. The heterogeneous social network user anchor link identification method based on meta path according to claim 1 or 2, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jOn-line activity time pattern similarity of
Figure FDA0002503153130000037
The calculation method comprises the following steps:
Figure FDA0002503153130000038
Figure FDA0002503153130000039
Figure FDA00025031531300000310
wherein the content of the first and second substances,
Figure FDA00025031531300000311
for user u in heterogeneous social network S1iFrequency of activity within the mth time period;
Figure FDA00025031531300000312
for user u in heterogeneous social network S2jFrequency of activity within the mth time period; aiming at the online activity time mode of a user, a 24-hour system is adopted, and one day is divided into 24 time periods, wherein one hour is one time period; (ii) a
Figure FDA00025031531300000313
Figure FDA00025031531300000314
Figure FDA00025031531300000315
Representing a user u in a heterogeneous social network S1iIn the m period of time k is publishediA text;
Figure FDA0002503153130000041
representing a user u in a heterogeneous social network S2jIn the m period of time k is publishedjAnd (4) a text.
5. The heterogeneous social network user anchor link identification method based on meta path as claimed in claim 3, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jOn-line activity time pattern similarity of
Figure FDA0002503153130000042
The calculation method comprises the following steps:
Figure FDA0002503153130000043
Figure FDA0002503153130000044
Figure FDA0002503153130000045
wherein the content of the first and second substances,
Figure FDA0002503153130000046
for user u in heterogeneous social network S1iFrequency of activity within the mth time period;
Figure FDA0002503153130000047
for user u in heterogeneous social network S2jFrequency of activity within the mth time period; aiming at the online activity time mode of a user, a 24-hour system is adopted, and one day is divided into 24 time periods, wherein one hour is one time period; (ii) a
Figure FDA0002503153130000048
Figure FDA0002503153130000049
Figure FDA00025031531300000410
Representing a user u in a heterogeneous social network S1iIn the m period of time k is publishediA text;
Figure FDA00025031531300000411
representing a user u in a heterogeneous social network S2jIn the m period of time k is publishedjAnd (4) a text.
6. The heterogeneous social network user anchor link identification method based on meta path according to claim 1 or 2, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents of
Figure FDA00025031531300000412
The calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectively
Figure FDA00025031531300000413
And
Figure FDA00025031531300000414
step 7.3: calculating word frequency vectors
Figure FDA00025031531300000415
And
Figure FDA00025031531300000416
the cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
Figure FDA0002503153130000051
7. the heterogeneous social network user anchor link identification method based on meta path as claimed in claim 3, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents of
Figure FDA0002503153130000052
The calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectively
Figure FDA0002503153130000053
And
Figure FDA0002503153130000054
step 7.3: calculating word frequency vectors
Figure FDA0002503153130000055
And
Figure FDA0002503153130000056
the cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
Figure FDA0002503153130000057
8. the heterogeneous social network user anchor link identification method based on meta path as claimed in claim 4, wherein: said step 7User u in heterogeneous social network S1iAnd user u in S2jSimilarity of text contents of
Figure FDA0002503153130000058
The calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectively
Figure FDA0002503153130000059
And
Figure FDA00025031531300000510
step 7.3: calculating word frequency vectors
Figure FDA00025031531300000511
And
Figure FDA00025031531300000512
the cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
Figure FDA00025031531300000513
9. the heterogeneous social network user anchor link identification method based on meta path as claimed in claim 5, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents of
Figure FDA00025031531300000514
The calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectively
Figure FDA0002503153130000061
And
Figure FDA0002503153130000062
step 7.3: calculating word frequency vectors
Figure FDA0002503153130000063
And
Figure FDA0002503153130000064
the cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
Figure FDA0002503153130000065
CN202010438376.6A 2020-05-22 2020-05-22 Heterogeneous social network user anchor link identification method based on meta-path Active CN111475739B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010438376.6A CN111475739B (en) 2020-05-22 2020-05-22 Heterogeneous social network user anchor link identification method based on meta-path

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010438376.6A CN111475739B (en) 2020-05-22 2020-05-22 Heterogeneous social network user anchor link identification method based on meta-path

Publications (2)

Publication Number Publication Date
CN111475739A true CN111475739A (en) 2020-07-31
CN111475739B CN111475739B (en) 2022-07-29

Family

ID=71764700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010438376.6A Active CN111475739B (en) 2020-05-22 2020-05-22 Heterogeneous social network user anchor link identification method based on meta-path

Country Status (1)

Country Link
CN (1) CN111475739B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085614A (en) * 2020-08-05 2020-12-15 国家计算机网络与信息安全管理中心 Cross-social-network virtual user identity alignment method based on spatio-temporal behavior data
CN112307343A (en) * 2020-11-05 2021-02-02 重庆邮电大学 Cross-E-book city user alignment method based on double-layer iterative compensation and full-face representation
CN113297500A (en) * 2021-06-23 2021-08-24 哈尔滨工程大学 Social network isolated node link prediction method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014003623A1 (en) * 2012-06-26 2014-01-03 Telefonaktiebolaget L M Ericsson (Publ) Methods and nodes for soft cell uplink prioritization
CN109635989A (en) * 2018-08-30 2019-04-16 电子科技大学 A kind of social networks link prediction method based on multi-source heterogeneous data fusion
CN109635201A (en) * 2018-12-18 2019-04-16 苏州大学 The heterogeneous cross-platform association user account method for digging of social networks
CN109949174A (en) * 2019-03-14 2019-06-28 哈尔滨工程大学 A kind of isomery social network user entity anchor chain connects recognition methods
CN110097125A (en) * 2019-05-07 2019-08-06 郑州轻工业学院 A kind of across a network account correlating method indicated based on insertion
CN110134883A (en) * 2019-04-22 2019-08-16 哈尔滨英赛克信息技术有限公司 A kind of isomery social network position entity anchor chain connects recognition methods
CN110532436A (en) * 2019-07-17 2019-12-03 中国人民解放军战略支援部队信息工程大学 Across social network user personal identification method based on community structure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014003623A1 (en) * 2012-06-26 2014-01-03 Telefonaktiebolaget L M Ericsson (Publ) Methods and nodes for soft cell uplink prioritization
CN109635989A (en) * 2018-08-30 2019-04-16 电子科技大学 A kind of social networks link prediction method based on multi-source heterogeneous data fusion
CN109635201A (en) * 2018-12-18 2019-04-16 苏州大学 The heterogeneous cross-platform association user account method for digging of social networks
CN109949174A (en) * 2019-03-14 2019-06-28 哈尔滨工程大学 A kind of isomery social network user entity anchor chain connects recognition methods
CN110134883A (en) * 2019-04-22 2019-08-16 哈尔滨英赛克信息技术有限公司 A kind of isomery social network position entity anchor chain connects recognition methods
CN110097125A (en) * 2019-05-07 2019-08-06 郑州轻工业学院 A kind of across a network account correlating method indicated based on insertion
CN110532436A (en) * 2019-07-17 2019-12-03 中国人民解放军战略支援部队信息工程大学 Across social network user personal identification method based on community structure

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
S.SAJADMANESH等: "Predicting anchor links between heterogeneous social networks", 《 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM)》 *
吕继光: "异质网络中的重叠社区发现算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)基础科学辑》 *
尹劼: "基于元路径的对齐异构社交网络中的链路预测", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
朱俊星: "多源社交网络实体对齐及信息关联若干关键技术研究及应用", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
马江涛: "基于社交网络的知识图谱构建技术研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
黄立威等: "一种基于元路径的异质信息网络链路预测模型", 《计算机学报》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085614A (en) * 2020-08-05 2020-12-15 国家计算机网络与信息安全管理中心 Cross-social-network virtual user identity alignment method based on spatio-temporal behavior data
CN112307343A (en) * 2020-11-05 2021-02-02 重庆邮电大学 Cross-E-book city user alignment method based on double-layer iterative compensation and full-face representation
CN112307343B (en) * 2020-11-05 2023-04-07 重庆邮电大学 Cross-E-book city user alignment method based on double-layer iterative compensation and full-face representation
CN113297500A (en) * 2021-06-23 2021-08-24 哈尔滨工程大学 Social network isolated node link prediction method

Also Published As

Publication number Publication date
CN111475739B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN111475739B (en) Heterogeneous social network user anchor link identification method based on meta-path
CN109241454B (en) Interest point recommendation method fusing social network and image content
CN104731962B (en) Friend recommendation method and system based on similar corporations in a kind of social networks
CN109345348A (en) The recommended method of multidimensional information portrait based on travel agency user
CN106104512A (en) System and method for active obtaining social data
CN111125453B (en) Opinion leader role identification method in social network based on subgraph isomorphism and storage medium
CN110795619A (en) Multi-target-fused educational resource personalized recommendation system and method
CN105874474A (en) Systems and methods for facial representation
CN106294590A (en) A kind of social networks junk user filter method based on semi-supervised learning
CN103136275A (en) System and method for recommending personalized video
CN111160954A (en) Recommendation method facing group object based on graph convolution network model
CN102880644A (en) Community discovering method
CN106776928A (en) Recommend method in position based on internal memory Computational frame, fusion social environment and space-time data
CN105787100A (en) User session recommendation method based on deep neural network
CN113095948A (en) Multi-source heterogeneous network user alignment method based on graph neural network
CN110334286A (en) A kind of personalized recommendation method based on trusting relationship
CN109949174A (en) A kind of isomery social network user entity anchor chain connects recognition methods
CN110209954A (en) Group recommending method based on LDA topic model and deep learning
Hu et al. Co-clustering enterprise social networks
Zhang et al. Analyzing the coevolution of mobile application diffusion and social network: a multi-agent model
CN110489665B (en) Microblog personalized recommendation method based on scene modeling and convolutional neural network
CN112231579A (en) Social video recommendation system and method based on implicit community discovery
CN115600642A (en) Streaming media-oriented decentralized federal learning method based on neighbor trust aggregation
Cui et al. DMFA-SR: Deeper membership and friendship awareness for social recommendation
Jing et al. Disinformation propagation trend analysis and identification based on social situation analytics and multilevel attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant