CN111475739A - Heterogeneous social network user anchor link identification method based on meta-path - Google Patents
Heterogeneous social network user anchor link identification method based on meta-path Download PDFInfo
- Publication number
- CN111475739A CN111475739A CN202010438376.6A CN202010438376A CN111475739A CN 111475739 A CN111475739 A CN 111475739A CN 202010438376 A CN202010438376 A CN 202010438376A CN 111475739 A CN111475739 A CN 111475739A
- Authority
- CN
- China
- Prior art keywords
- user
- social network
- matrix
- users
- heterogeneous social
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention belongs to the technical field of social network entity anchor link identification, and particularly relates to a heterogeneous social network user anchor link identification method based on a meta-path. The method aims at scenes of multiple entities, complex link relations and a small number of label entities of a heterogeneous social network, avoids the problems of uneven data distribution, feature selection and the like based on unsupervised learning, and fully excavates the link relations related to the users by utilizing the link relations and attribute information of the user entities in the social network and combining a meta-path technology. And converting the user anchor link identification problem into the problem of optimizing the objective function by means of the definition of a cost function, a matrix norm, the objective function and the like.
Description
Technical Field
The invention belongs to the technical field of social network entity anchor link identification, and particularly relates to a heterogeneous social network user anchor link identification method based on a meta-path.
Background
Since the 21 st century, internet technology has evolved over the years and more people can become participants in social networks. According to statistics, the number of users in the global social network industry reaches 28.2 hundred million and accounts for 70.4 percent of the total number of net citizens by 2019. In foreign countries, people often communicate using Twitter and Facebook; in China, people like to browse various hot news by using microblogs, and meanwhile, real-time communication activities based on friends are carried out by using QQ and WeChat. Online social networks stem from various daily interactions from person to person, being snapshots of real-world activities mapped to the network, with the same user often participating in multiple social networks. With the ever increasing number of users participating in a social network, the problem of identifying anchor-linked users (aligned users) across the network has many important practical implications. Heterogeneity of social networks, variability among multiple social networks, the lack of a large number of tagged users with known anchor link relationships, and one-to-one limitations present challenges to anchor link user identification. How to construct a heterogeneous social network user anchor link identification model by using numerous and miscellaneous data related to users in a social network becomes a prior research focus of community discovery and recommendation systems and multi-network fusion.
The social network anchor link identification problem is firstly proposed in 2013 by J.Zhang et al, researchers perform cross-network fusion on information owned by a corresponding user in twitter and foursquare networks, and the user can be directly linked to one network when displaying a user homepage of the other network. The Danai Koutra converts the user anchor link identification problem into a bipartite graph matching problem based on the link relation characteristics between users. The Tang goose proposes an algorithm using user attribute information, analyzes all semantic information of a user in a social network based on a theme model, and synthesizes network structure characteristics to perform anchor link identification. Yizhou Sun et al first propose a meta-path concept, and in a paper partner network, partner link prediction is performed by using meta-paths to characterize complex relationships among entities such as papers, authors, meetings, publishers, and the like. The topic of how to extend the meta-path technology into the social network and simultaneously utilize the link relation and attribute information of the user to perform anchor link identification is that no novel and effective method exists so far. Aiming at different user entities in a heterogeneous social network, the anchor link identification problem is that accounts registered by the same user in the real world are all identified between two or more networks by analyzing information of the user in the networks, and the aligned accounts meet one-to-one link mapping relation between different networks.
Disclosure of Invention
The invention aims to provide a heterogeneous social network user anchor link identification method based on a meta path.
The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:
step 1: inputting heterogeneous social networks S1 and S2;
step 2: according to meta pathThe first order link relationship between the users is determined in the two networks S1 and S2, respectively, and the matrix M is used to determine whether the first order link relationship exists between the users(1)And M(2)Expressing that rows and columns of the matrix respectively correspond to users in one network, matrix element 1 expresses that the two users in the network have a first-order link relation, and matrix element 0 expresses that the two users in the network do not have the first-order link relation;
and step 3: according toAnd the second-order link relation represented by the three element paths respectively obtains twoTotal number of users within the networks S1 and S2 and meta paths among users satisfying the above three formatsAnd
wherein u is1iAnd u1jTwo users in the heterogeneous social network S1; u. of2iAnd u2jTwo users in the heterogeneous social network S2;
and 4, step 4: computing a second order link relationship matrix B of the heterogeneous social network S1(1)Second order Link relationship matrix B with heterogeneous social network S2(2)(ii) a Matrix B(1)Of (2) element(s)And matrix B(2)Of (2) element(s)Comprises the following steps:
wherein the content of the first and second substances,andrespectively representing users u in heterogeneous social networks S11iAnd u1jDegree of (d);andrespectively representing users u in heterogeneous social networks S22iAnd u2jDegree of (d);
and 5: by means of matrices B(1)Correction matrix M(1)Using a matrix B(2)Correction matrix M(2)To obtain the final friend relation adjacent matrix M between users(1)And M(2);
If M is(1)Skipping if the middle element is 1; if M is(1)If the middle element is 0, check B(1)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(1)If the element at the corresponding position in M is not less than 0.5, M is added(1)Element 0 in (1) is changed to 1; otherwise, M(1)The middle element remains unchanged;
if M is(2)Skipping if the middle element is 1; if M is(2)If the middle element is 0, check B(2)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(2)If the element at the corresponding position in M is not less than 0.5, M is added(2)Element 0 in (1) is changed to 1; otherwise, M(2)The middle element remains unchanged;
step 6: according to meta pathThe sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)Representing that rows and columns of the matrix correspond to users and positions, representing whether a check-in relation exists in corresponding elements, if the check-in relation exists in the corresponding users and positions, the corresponding elements are 1, otherwise, the corresponding elements are 0;
and 7: acquiring a user anchor link identification target function F (X, Y);
wherein X represents a mapping matrix of the anchor link relation of the user; y represents a user position relation mapping matrix;a square of a Frobenius norm representing the matrix; | | non-woven hair1L1 norm of matrix, V is the similarity matrix of user attribute features between networks and matrix element Vm,nComprises the following steps:
wherein the content of the first and second substances,for user u in heterogeneous social network S1iAnd user u in S2jThe user name similarity of (a) is high,for user uiThe user name of (a) is used,for user ujThe user name of (1);for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the online activity time patterns of (c),for user uiThe online activity time vector of (a) is,for user ujAn online activity time vector of;for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the text content of (a),for user uiThe word-frequency vector of the text content,for user ujWord frequency vectors of text content;
and 8: calculating a matrix X when the user anchor link identification target function F (X, Y) is the minimum value to obtain a user anchor link relation mapping matrix X; the element in the matrix X is 1, which represents that the two corresponding users have the anchor link relation, and the element in the matrix X is 0, which represents that the anchor link relation does not exist between the two corresponding users.
The present invention may further comprise:
the method for calculating the minimum value of the user anchor link identification objective function F (X, Y) in the step 8 comprises the following steps: fixing Y, solving the partial derivative of X by the target function and updating elements in the matrix X by adopting an alternative projection gradient descending method; fixing X, solving the partial derivative of the target function to Y, and updating the elements in the matrix Y; after each pair of X and Y is updated, correcting elements in the matrixes X and Y; if an element in the matrix is greater than 1, projecting the element as 1; if an element in the matrix is less than 0, projecting the element as 0; otherwise, the condition is kept unchanged; setting the maximum iteration times of the gradient decrease of the alternate projection, continuously updating the matrixes X and Y under the condition that the maximum iteration times are not exceeded, and outputting a user anchor link relation mapping matrix X until a target function reaches the minimum value;
the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jUser name similarity ofThe calculation method comprises the following steps:
wherein g is user uiAnd user ujH is equal to half of the number of the transposition occurring in the matched characters;for user uiThe user name length of (1);for user ujThe user name length of (1); l is user uiAnd user ujCommon prefix length for both usernames.
The user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jOn-line activity time pattern similarity ofThe calculation method comprises the following steps:
wherein the content of the first and second substances,for user u in heterogeneous social network S1iFrequency of activity within the mth time period;for user u in heterogeneous social network S2jFrequency of activity within the mth time period; aiming at the online activity time mode of a user, a 24-hour system is adopted, a day is divided into 24 time periods,one hour is a period of time; (ii) a
Representing a user u in a heterogeneous social network S1iIn the m period of time k is publishediA text;representing a user u in a heterogeneous social network S2jIn the m period of time k is publishedjAnd (4) a text.
The user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents ofThe calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectivelyAnd
step 7.3: calculating word frequency vectorsAndthe cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
the invention has the beneficial effects that:
the invention provides a heterogeneous social network user anchor link identification method based on a meta path, aiming at scenes of multiple entities, complex link relations and a few label entities of a heterogeneous social network. The method is based on unsupervised learning, avoids the problems of uneven data distribution, characteristic selection and the like, and fully excavates the link relation related to the user by utilizing the link relation and the attribute information of the user entity in the social network and combining the meta-path technology. And converting the user anchor link identification problem into the problem of optimizing the objective function by means of the definition of a cost function, a matrix norm, the objective function and the like.
Drawings
FIG. 1 is a heterogeneous social network structure schema diagram.
Fig. 2 is a schematic diagram of friend relationships between users in a network.
Fig. 3 is a schematic diagram of a check-in relationship between a user and a location within a network.
FIG. 4(a) is a fitting function f1(x)=0+1And (5) an x diagram.
FIG. 4(b) is a fitting function f2(x)=0+1x+2x2Figure (a).
FIG. 4(c) is a fitting function f3(x)=0+1x+2x2+3x3+4x4Figure (a).
Detailed Description
The invention is further described below with reference to the accompanying drawings.
FIG. 1 shows a diagram of a heterogeneous social network structure schema, with the content in the circles representing the structureFIG. 2 shows a schematic diagram of relationships between users and users within a social network, circles represent user entities, and undirected edges between circles represent link relationships between entities, wherein undirected edges between users f and g represent link relationships between entities, while user f and e do not have a first order link relationship, but have many common neighbors, as shown by an ellipse with dotted lines, both have friends a, a are referred to as user f and e, the common relationship between user f and e is referred to as a "sign relationship", and the common relationship between user f and user e is referred to as a "sign relationship" or "sign relationship" 4. the common relationship between user f and user e is referred to as a "sign relationship" 4. the common relationship between user f and user e is referred to a "sign relationship" 4. the sign relationship between user f and user e is referred to a "sign relationship" as "4. the corresponding to a" sign relationship between user f and user e.g. the "sign relationship between user f and user e" is referred to a "4. the corresponding to a" sign relationship between a "and a" sign relationship between user f, where the sign relationship between the user f and the sign relationship between the user is referred to a "sign relationship is referred to a" 4. the sign relationship between the sign relationship is referred to a "sign relationship between the sign1(x)=0+1x, and the fitting function corresponding to FIG. 4(b) is f2(x)=0+1x+2x2FIG. 4(c) shows a fitting function f3(x)=0+1x+2x2+3x3+4x4. The invention provides a heterogeneous social network user anchor link identification method based on a meta path.
The method comprises the following implementation steps:
1. for heterogeneous social networks S1 and S2.
2. According to meta pathThe first-order link relation expressed is used for judging whether the first-order link relation exists between the users in the two networks and using a matrix M(1)And M(2)And the expression that 1 represents that a first-order link relation exists between two users in the network, and conversely, 0 represents that the link relation does not exist.
3. According toAndand judging whether a second-order link relation exists between the user and the user or not by the second-order link relation represented by the three element paths. And respectively acquiring the total number of the users in the two networks and the total number of the meta paths between the users meeting the three formats by an adjacent matrix multiplication method.
4. Dividing 2 times of the total number of the element paths corresponding to the second-order link relation of the users in the network obtained in the step 3 by the sum of degrees of the end point users of the element paths, and correcting the matrix M by the obtained numerical value(1)And M: (2). If the numerical value is not less than 0.5, modifying the corresponding element of the matrix into 1, and keeping the original 1 unchanged; if the value is less than 0.5, the element remains unchanged.
5. And defining a cost function based on the friend relationship of the user.
6. According to meta pathThe sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)And (4) showing.
7. A cost function is defined based on the sign-in relationship of the user and the location.
8. And defining a cost function based on attribute information of the user name, the online activity time mode and the text content of the user.
9. And acquiring a user anchor link identification total cost function.
10. And acquiring a user anchor link identification target function.
11. And solving the minimum value of the target function by using a projection gradient descent method to obtain a user anchor link relation mapping matrix.
The invention provides a heterogeneous social network user anchor link identification method based on a meta path, aiming at scenes of multiple entities, complex link relations and a few label entities of a heterogeneous social network. The method is based on unsupervised learning, avoids the problems of uneven data distribution, characteristic selection and the like, and fully excavates the link relation related to the user by utilizing the link relation and the attribute information of the user entity in the social network and combining the meta-path technology. And converting the user anchor link identification problem into the problem of optimizing the objective function by means of the definition of a cost function, a matrix norm, the objective function and the like.
1. The scheme relates to some definitions, wherein E represents an entity set in the heterogeneous social network, E represents U ∪L∪ T ∪ C, wherein U represents a user, L represents a position, f represents a timestamp, C represents text, R represents a link relation set between the entity and the entity, a represents an attribute information set possessed by a certain type of entity, a represents n ∪ T ∪ w for the user, n represents a user name, T represents a user daily activity time vector, and w represents a word frequency vector of text content published by the user.User name representing user i in the heterogeneous social network S1,A time vector representing daily activities of user i in the heterogeneous social network S1,A word frequency vector representing the textual content posted by user i in the heterogeneous social network S1.
2. According to meta pathThe first order link relationship between the users is determined in the two networks S1 and S2, respectively, and the matrix M is used to determine whether the first order link relationship exists between the users(1)And M(2)The expression shows that the rows and columns of the matrix respectively correspond to users in one network, and the matrix element 1 represents that the users in the two networks have a first-order link relation, whereas 0 represents that the users do not exist.
3. According toAndand judging whether a second-order link relation exists between users in the network or not according to the second-order link relation represented by the three element paths. And respectively acquiring the total number of the users in the two networks and the total number of the meta paths between the users meeting the three formats by an adjacent matrix multiplication method. For example, in network S1, for two given users uiAnd ujTo see if there is a coincidence between the two MP2And calculating the number of meta-paths conforming to the definition of the meta-path by a meta-path counting method, wherein the number is expressed asFor MP3And MP4The same counting operation is also performed, and finally, the total number of the meta-paths between the two users, which are defined according to the three meta-paths, is accumulated and expressed as
4. According to the Soronson index formula, the total number of the element paths obtained in the step 3 is comparedDivided by user uiAnd ujThe obtained value is used as an element of a second-order link relation matrix to obtain B(1)And B(2)The matrix is a matrix of a plurality of matrices,andrespectively represents uiAnd ujDegree of (c).
5. Is calculated to obtain B(1)And B(2)Then, using B(1)Correction of M: (1)With B(2)Correction M(2)。M(1)And M(2)If the corresponding element is already set to 1, skipping; if the element is 0, check B(1)And B(2)If the element of the corresponding position in the intermediate position is not less than 0.5, if so, M is added(1)Or M(2)0 in (1) to 1; if not, it remains unchanged. After all correction operations are finished, a final complete friend relation adjacency matrix M between the users is obtained(1)And M: (2)。
6. According to the related definition of cost function and matrix norm in machine learning, defining cost function based on complete friend relationship between users in network, X represents user anchor link relation mapping matrix,representing the square of the frobenius norm of the respective matrix.
7. According to meta pathThe sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)And expressing that the rows and the columns of the matrix correspond to users and positions, and expressing whether the sign-in relation exists in the corresponding elements, wherein if the sign-in relation exists in the corresponding users and positions, the corresponding elements are 1, and otherwise, the corresponding elements are 0.
8. And according to the related definition of the cost function and the matrix norm in machine learning, defining the cost function based on the sign-in relation between users and positions in the network.
9. For the user name of the user, the similarity of the user names among different users is measured by using the Jaro-Winkler similarity, and for the user u in the network S1iAnd user u in network S2jThe respective corresponding user names are expressed asAndthe username similarity is expressed asg represents the number of characters matching on two usernames, h is equal to half the number of transpositions occurring in the matching characters,andrespectively, the lengths of the user names, and l represents the common prefix length of the two user names.
10. Aiming at the online activity time mode of a user, a 24-hour system is adopted, and a day is divided into 24 time periods, wherein one hour is one time period. And taking the published texts of the user in the network as the identifications of the user participating in the social activity, counting the total number of the published texts of the user in one day, respectively counting the number of the published texts of the user in each time period, and dividing the number of the published texts by the total number of the texts to obtain the proportion of the published texts in each time period. For example, for user uaDuring the mth period, k texts are published, denoted as num (u)aAnd m) is k, and the activity frequency of the user in the mth time period is recorded as t (u)a,m)。
Calculating each time segment in turn, and finally, calculating the time segment for the user uaObtaining a 24-length temporal activity vector ua,t。
ua,t=(t(ua,0),t(ua,1)...t(ua,23)) (8)
Each element in the vector reveals the frequency of activity of the user over a corresponding time period, for user u in network S1iAnd user u in network S2jFirstly, the above-mentioned method is used to obtain respective online activity time vectorAndand then, calculating the similarity of the two vectors by utilizing the inner product to obtain the similarity of the online activity time modes of the two users.
11. For text content sent by userFor user u in network S1iAnd user u in network S2jFirstly, the key words in the respective texts are calculated by using the TF-IDF algorithm, and then the key words are combined into a set. Secondly, relative word frequency corresponding to words in the set is calculated for the text contents of the two users respectively, and word frequency vectors with the same length are generated for the two users respectivelyAndand finally, calculating cosine similarity of the two vectors, and converting the text content similarity into comparison of the included angles of the vectors in the same dimension.
12. And (5) integrating the three attribute information of the 6, 9 and 10 related users to obtain an inter-network user attribute feature similarity matrix V, and defining a cost function based on the inter-network user attribute information similarity.
13. And integrating cost functions of the link relation and the attribute information of the users in the heterogeneous social network, and defining a total cost function of the user anchor link identification.
14. In order to prevent the overfitting phenomenon of the fitting function, a regularization term is added on the basis of the primary price function to form an objective function.
15. According to the following two lemmas, the difficulty in solving the objective function is reduced.
Introduction 1: for a given matrix A, its Frobenius norm squared is equal to matrix AATThe trace of (c).
Lesion 2 given matrix A and matrix B, L of the Hadamard products of A and B1Norm equal to matrix ABTOr ATTrace of B.
||AοB||1=tr(ABT)=tr(ATB) (15)
The objective function then translates into:
F(X,Y)=tr((XTM(1)X-M(2))(XTM(1)X-M(2))T)+tr((XTN(1)Y-N(2))(XTN(1)Y-N(2))T)-tr(XVT)+||X||1+||Y||1(16)
16. the objective function F (X, Y) is minimized by continuously updating the parameters X and Y, and when X and Y do not change any more, i.e. the matrices X and Y reach a convergence state, the objective function reaches a minimum value, at which time the matrix X is a mapping matrix of the user anchor link relationship between the two networks. Fixing Y, solving the partial derivative of X by the target function and updating elements in the matrix X by adopting an alternative projection gradient descending method; and fixing X, solving the partial derivative of the target function to Y, and updating the elements in the matrix Y. After each pair of wheels X and Y is updated, two things need to be done. Firstly, correcting elements in the matrix X and the matrix Y, and if the elements in the matrix are more than 1, projecting the elements to be 1; if the element is less than 0, the projection is 0; otherwise, the situation remains unchanged. And secondly, recalculating the updated value of the target function to obtain the latest cost value.
17. And setting the maximum iteration times of the gradient decrease of the alternate projection, continuously updating the matrixes X and Y under the condition that the maximum iteration times are not exceeded, and when the conditions are met, the target function reaches the minimum value, wherein the matrix X corresponds to the user anchor link relation mapping matrix. The element in the matrix X is 1, which represents that the two corresponding users have the anchor link relation, and the element in the matrix X is 0, which represents that the anchor link relation does not exist between the two corresponding users.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A heterogeneous social network user anchor link identification method based on meta-path is characterized by comprising the following steps:
step 1: inputting heterogeneous social networks S1 and S2;
step 2: according to meta pathThe first order link relationship between the users is determined in the two networks S1 and S2, respectively, and the matrix M is used to determine whether the first order link relationship exists between the users(1)And M(2)Expressing that rows and columns of the matrix respectively correspond to users in one network, matrix element 1 expresses that the two users in the network have a first-order link relation, and matrix element 0 expresses that the two users in the network do not have the first-order link relation;
and step 3: according toAnd the second-order link relation represented by the three element paths respectively acquires two networks S1 and S2, total number of users and meta paths between users satisfying the above three formatsAnd
wherein u is1iAnd u1jTwo users in the heterogeneous social network S1; u. of2iAnd u2jTwo users in the heterogeneous social network S2;
and 4, step 4: computing a second order link relationship matrix B of the heterogeneous social network S1(1)Second order Link relationship matrix B with heterogeneous social network S2(2)(ii) a Matrix B(1)Of (2) element(s)And matrix B(2)Of (2) element(s)Comprises the following steps:
wherein the content of the first and second substances,andrespectively representing users u in heterogeneous social networks S11iAnd u1jDegree of (d);andrespectively representing users u in heterogeneous social networks S22iAnd u2jDegree of (d);
and 5: by means of matrices B(1)Correction matrix M(1)Using a matrix B(2)Correction matrix M(2)To obtain the final friend relation adjacent matrix M between users(1)And M(2);
If M is(1)Skipping if the middle element is 1; if M is(1)If the middle element is 0, check B(1)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(1)If the element at the corresponding position in M is not less than 0.5, M is added(1)Element 0 in (1) is changed to 1; otherwise, M(1)The middle element remains unchanged;
if M is(2)Skipping if the middle element is 1; if M is(2)If the middle element is 0, check B(2)Whether the element of the corresponding position in (1) is not less than 0.5; if B is present(2)If the element at the corresponding position in M is not less than 0.5, M is added(2)Element 0 in (1) is changed to 1; otherwise, M(2)The middle element remains unchanged;
step 6: according to meta pathThe sign-in relation between the user and the position is represented, whether the sign-in relation connected through the text exists between the user and the position is judged in the two networks respectively, and a matrix N is used(1)And N(2)Representing that rows and columns of the matrix correspond to users and positions, representing whether a check-in relation exists in corresponding elements, if the check-in relation exists in the corresponding users and positions, the corresponding elements are 1, otherwise, the corresponding elements are 0;
and 7: acquiring a user anchor link identification target function F (X, Y);
wherein X represents a mapping matrix of the anchor link relation of the user; y isRepresenting a user position relation mapping matrix;a square of a Frobenius norm representing the matrix; | | non-woven hair1L1 norm of matrix, V is the similarity matrix of user attribute features between networks and matrix element Vm,nComprises the following steps:
wherein the content of the first and second substances,for user u in heterogeneous social network S1iAnd user u in S2jThe user name similarity of (a) is high,for user uiThe user name of (a) is used,for user ujThe user name of (1);for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the online activity time patterns of (c),for user uiThe online activity time vector of (a) is,for user ujAn online activity time vector of;for user u in heterogeneous social network S1iAnd user u in S2jThe similarity of the text content of (a),for user uiThe word-frequency vector of the text content,for user ujWord frequency vectors of text content;
and 8: calculating a matrix X when the user anchor link identification target function F (X, Y) is the minimum value to obtain a user anchor link relation mapping matrix X; the element in the matrix X is 1, which represents that the two corresponding users have the anchor link relation, and the element in the matrix X is 0, which represents that the anchor link relation does not exist between the two corresponding users.
2. The heterogeneous social network user anchor link identification method based on meta path according to claim 1, wherein: the method for calculating the minimum value of the user anchor link identification objective function F (X, Y) in the step 8 comprises the following steps: fixing Y, solving the partial derivative of X by the target function and updating elements in the matrix X by adopting an alternative projection gradient descending method; fixing X, solving the partial derivative of the target function to Y, and updating the elements in the matrix Y; after each pair of X and Y is updated, correcting elements in the matrixes X and Y; if an element in the matrix is greater than 1, projecting the element as 1; if an element in the matrix is less than 0, projecting the element as 0; otherwise, the condition is kept unchanged; setting the maximum iteration times of the gradient decrease of the alternate projection, continuously updating the matrixes X and Y under the condition that the maximum iteration times are not exceeded, and outputting a user anchor link relation mapping matrix X until a target function reaches the minimum value;
3. the heterogeneous social network user anchor link identification method based on meta path according to claim 1 or 2, wherein: the steps areUser u in heterogeneous social network S1 in step 7iAnd user u in S2jUser name similarity ofThe calculation method comprises the following steps:
4. The heterogeneous social network user anchor link identification method based on meta path according to claim 1 or 2, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jOn-line activity time pattern similarity ofThe calculation method comprises the following steps:
wherein the content of the first and second substances,for user u in heterogeneous social network S1iFrequency of activity within the mth time period;for user u in heterogeneous social network S2jFrequency of activity within the mth time period; aiming at the online activity time mode of a user, a 24-hour system is adopted, and one day is divided into 24 time periods, wherein one hour is one time period; (ii) a
5. The heterogeneous social network user anchor link identification method based on meta path as claimed in claim 3, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jOn-line activity time pattern similarity ofThe calculation method comprises the following steps:
wherein the content of the first and second substances,for user u in heterogeneous social network S1iFrequency of activity within the mth time period;for user u in heterogeneous social network S2jFrequency of activity within the mth time period; aiming at the online activity time mode of a user, a 24-hour system is adopted, and one day is divided into 24 time periods, wherein one hour is one time period; (ii) a
6. The heterogeneous social network user anchor link identification method based on meta path according to claim 1 or 2, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents ofThe calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectivelyAnd
step 7.3: calculating word frequency vectorsAndthe cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
7. the heterogeneous social network user anchor link identification method based on meta path as claimed in claim 3, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents ofThe calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectivelyAnd
step 7.3: calculating word frequency vectorsAndthe cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
8. the heterogeneous social network user anchor link identification method based on meta path as claimed in claim 4, wherein: said step 7User u in heterogeneous social network S1iAnd user u in S2jSimilarity of text contents ofThe calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectivelyAnd
step 7.3: calculating word frequency vectorsAndthe cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
9. the heterogeneous social network user anchor link identification method based on meta path as claimed in claim 5, wherein: the user u in the heterogeneous social network S1 in the step 7iAnd user u in S2jSimilarity of text contents ofThe calculation method comprises the following steps:
step 7.1: for user u in heterogeneous social network S1iAnd user u in S2jCalculating keywords in respective texts by using a TF-IDF algorithm, and combining the keywords into a set;
step 7.2: calculating relative word frequency corresponding to words in the set for text contents of two users respectively, and generating word frequency vectors with same length for the two users respectivelyAnd
step 7.3: calculating word frequency vectorsAndthe cosine similarity of the text content is converted into an included angle of vectors in the same dimension;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010438376.6A CN111475739B (en) | 2020-05-22 | 2020-05-22 | Heterogeneous social network user anchor link identification method based on meta-path |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010438376.6A CN111475739B (en) | 2020-05-22 | 2020-05-22 | Heterogeneous social network user anchor link identification method based on meta-path |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111475739A true CN111475739A (en) | 2020-07-31 |
CN111475739B CN111475739B (en) | 2022-07-29 |
Family
ID=71764700
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010438376.6A Active CN111475739B (en) | 2020-05-22 | 2020-05-22 | Heterogeneous social network user anchor link identification method based on meta-path |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111475739B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085614A (en) * | 2020-08-05 | 2020-12-15 | 国家计算机网络与信息安全管理中心 | Cross-social-network virtual user identity alignment method based on spatio-temporal behavior data |
CN112307343A (en) * | 2020-11-05 | 2021-02-02 | 重庆邮电大学 | Cross-E-book city user alignment method based on double-layer iterative compensation and full-face representation |
CN113297500A (en) * | 2021-06-23 | 2021-08-24 | 哈尔滨工程大学 | Social network isolated node link prediction method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014003623A1 (en) * | 2012-06-26 | 2014-01-03 | Telefonaktiebolaget L M Ericsson (Publ) | Methods and nodes for soft cell uplink prioritization |
CN109635989A (en) * | 2018-08-30 | 2019-04-16 | 电子科技大学 | A kind of social networks link prediction method based on multi-source heterogeneous data fusion |
CN109635201A (en) * | 2018-12-18 | 2019-04-16 | 苏州大学 | The heterogeneous cross-platform association user account method for digging of social networks |
CN109949174A (en) * | 2019-03-14 | 2019-06-28 | 哈尔滨工程大学 | A kind of isomery social network user entity anchor chain connects recognition methods |
CN110097125A (en) * | 2019-05-07 | 2019-08-06 | 郑州轻工业学院 | A kind of across a network account correlating method indicated based on insertion |
CN110134883A (en) * | 2019-04-22 | 2019-08-16 | 哈尔滨英赛克信息技术有限公司 | A kind of isomery social network position entity anchor chain connects recognition methods |
CN110532436A (en) * | 2019-07-17 | 2019-12-03 | 中国人民解放军战略支援部队信息工程大学 | Across social network user personal identification method based on community structure |
-
2020
- 2020-05-22 CN CN202010438376.6A patent/CN111475739B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014003623A1 (en) * | 2012-06-26 | 2014-01-03 | Telefonaktiebolaget L M Ericsson (Publ) | Methods and nodes for soft cell uplink prioritization |
CN109635989A (en) * | 2018-08-30 | 2019-04-16 | 电子科技大学 | A kind of social networks link prediction method based on multi-source heterogeneous data fusion |
CN109635201A (en) * | 2018-12-18 | 2019-04-16 | 苏州大学 | The heterogeneous cross-platform association user account method for digging of social networks |
CN109949174A (en) * | 2019-03-14 | 2019-06-28 | 哈尔滨工程大学 | A kind of isomery social network user entity anchor chain connects recognition methods |
CN110134883A (en) * | 2019-04-22 | 2019-08-16 | 哈尔滨英赛克信息技术有限公司 | A kind of isomery social network position entity anchor chain connects recognition methods |
CN110097125A (en) * | 2019-05-07 | 2019-08-06 | 郑州轻工业学院 | A kind of across a network account correlating method indicated based on insertion |
CN110532436A (en) * | 2019-07-17 | 2019-12-03 | 中国人民解放军战略支援部队信息工程大学 | Across social network user personal identification method based on community structure |
Non-Patent Citations (6)
Title |
---|
S.SAJADMANESH等: "Predicting anchor links between heterogeneous social networks", 《 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM)》 * |
吕继光: "异质网络中的重叠社区发现算法研究", 《中国优秀博硕士学位论文全文数据库(硕士)基础科学辑》 * |
尹劼: "基于元路径的对齐异构社交网络中的链路预测", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
朱俊星: "多源社交网络实体对齐及信息关联若干关键技术研究及应用", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
马江涛: "基于社交网络的知识图谱构建技术研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
黄立威等: "一种基于元路径的异质信息网络链路预测模型", 《计算机学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085614A (en) * | 2020-08-05 | 2020-12-15 | 国家计算机网络与信息安全管理中心 | Cross-social-network virtual user identity alignment method based on spatio-temporal behavior data |
CN112307343A (en) * | 2020-11-05 | 2021-02-02 | 重庆邮电大学 | Cross-E-book city user alignment method based on double-layer iterative compensation and full-face representation |
CN112307343B (en) * | 2020-11-05 | 2023-04-07 | 重庆邮电大学 | Cross-E-book city user alignment method based on double-layer iterative compensation and full-face representation |
CN113297500A (en) * | 2021-06-23 | 2021-08-24 | 哈尔滨工程大学 | Social network isolated node link prediction method |
Also Published As
Publication number | Publication date |
---|---|
CN111475739B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111475739B (en) | Heterogeneous social network user anchor link identification method based on meta-path | |
CN109241454B (en) | Interest point recommendation method fusing social network and image content | |
CN104731962B (en) | Friend recommendation method and system based on similar corporations in a kind of social networks | |
CN109345348A (en) | The recommended method of multidimensional information portrait based on travel agency user | |
CN106104512A (en) | System and method for active obtaining social data | |
CN111125453B (en) | Opinion leader role identification method in social network based on subgraph isomorphism and storage medium | |
CN110795619A (en) | Multi-target-fused educational resource personalized recommendation system and method | |
CN105874474A (en) | Systems and methods for facial representation | |
CN106294590A (en) | A kind of social networks junk user filter method based on semi-supervised learning | |
CN103136275A (en) | System and method for recommending personalized video | |
CN111160954A (en) | Recommendation method facing group object based on graph convolution network model | |
CN102880644A (en) | Community discovering method | |
CN106776928A (en) | Recommend method in position based on internal memory Computational frame, fusion social environment and space-time data | |
CN105787100A (en) | User session recommendation method based on deep neural network | |
CN113095948A (en) | Multi-source heterogeneous network user alignment method based on graph neural network | |
CN110334286A (en) | A kind of personalized recommendation method based on trusting relationship | |
CN109949174A (en) | A kind of isomery social network user entity anchor chain connects recognition methods | |
CN110209954A (en) | Group recommending method based on LDA topic model and deep learning | |
Hu et al. | Co-clustering enterprise social networks | |
Zhang et al. | Analyzing the coevolution of mobile application diffusion and social network: a multi-agent model | |
CN110489665B (en) | Microblog personalized recommendation method based on scene modeling and convolutional neural network | |
CN112231579A (en) | Social video recommendation system and method based on implicit community discovery | |
CN115600642A (en) | Streaming media-oriented decentralized federal learning method based on neighbor trust aggregation | |
Cui et al. | DMFA-SR: Deeper membership and friendship awareness for social recommendation | |
Jing et al. | Disinformation propagation trend analysis and identification based on social situation analytics and multilevel attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |