CN109949174A - A kind of isomery social network user entity anchor chain connects recognition methods - Google Patents

A kind of isomery social network user entity anchor chain connects recognition methods Download PDF

Info

Publication number
CN109949174A
CN109949174A CN201910194845.1A CN201910194845A CN109949174A CN 109949174 A CN109949174 A CN 109949174A CN 201910194845 A CN201910194845 A CN 201910194845A CN 109949174 A CN109949174 A CN 109949174A
Authority
CN
China
Prior art keywords
user
similarity
social networks
users
heterogeneous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910194845.1A
Other languages
Chinese (zh)
Other versions
CN109949174B (en
Inventor
王巍
杨武
玄世昌
苘大鹏
吕继光
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201910194845.1A priority Critical patent/CN109949174B/en
Publication of CN109949174A publication Critical patent/CN109949174A/en
Application granted granted Critical
Publication of CN109949174B publication Critical patent/CN109949174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention belongs to social network analysis fields, and in particular to a kind of isomery social network user entity anchor chain connects recognition methods, comprising the following steps: give two isomery social networks G1And G2, user gathers to be used respectivelyWithIt indicates;Calculate user's similarity in two isomery social networks based on user property;Calculate user's similarity in two isomery social networks based on customer relationship;To user u and corresponding hail fellow { fiSequencing of similarity is carried out, choose topK ufi;The present invention optimizes previous common friend relation similarity, extracts the hail fellow on the basis of user property, strengthens the identification degree that user's anchor chain connects by the similarity degree of hail fellow relationship, greatly improves the recognition effect that user's anchor chain connects.

Description

Heterogeneous social network user entity anchor link identification method
Technical Field
The invention belongs to the field of social network analysis, and particularly relates to a heterogeneous social network user entity anchor link identification method.
Background
With the rapid development of the internet, social networks gradually enter people's lives, such as Facebook, Twitter, man-net, and surf microblog. People use the social network not only under one platform, but also simultaneously use a plurality of social platforms, and the attention points of the social platforms are different. Such as focusing on Twitter, a few stars, writers, or interested people and things, focusing on YouTube, or giving a like video like a praise, etc. As shown in fig. 1. Each user typically has a separate account in a different social network, and these accounts of the same user are seemingly without any connections or correspondences between each other. It is necessary to find the correspondence of different accounts of the same user in multiple social networks, for example, using information of multiple networks for link recommendation and community analysis.
In recent years, more and more researchers have conducted relevant research on heterogeneous social network entity anchor links, including research on link prediction under some multi-social networks to solve the problem of cross-network recommendation. In the research on the identification of the anchor links of heterogeneous social network entities, the identification of the anchor links of user entities has been researched. When the concept of anchor link is not provided at first, work similar to the identification of the anchor link of the user already exists, which can be called user matching at early stage, and the matching result of the final user is more accurate by researching the attribute of the user name in different social networks. Anchor chains were proposed in 2013 by Kong X et al. In the article, by performing experiments under two social platforms of Foursquare and Twitter, a user can add account information of the user in Twitter and Facebook to the Foursquare, and when other users access the homepage of the user, the user can directly link to the corresponding social network platform through buttons of the Twitter and Facebook, and the link is called as an anchor link by Kong X and the like. With the proposal of the anchor link concept, the research on the anchor link of the user is more extensive. Congratulatory and intelligent people and the like propose a matching problem of converting anchor links into one-to-one, and realize the identification of the anchor links of users mainly through common neighbors among users in different social networks; nac' era Bennacer et al, an algorithm is proposed to iteratively match users in multiple social networks using network topology and personal information. The Libanflag and the bradyseis are improved by researching the structural characteristics of a social network and a method for identifying the anchor link of the user of the common neighbor, and a method for considering the preferential connection on the basis of the common neighbor is adopted. Aiming at the complex characteristics of cross-network user data, users with anchor links are centered around, and user behaviors are analyzed from three aspects of analysis of user behavior patterns with anchor links, integration of user data and integration of user interests on the basis of cross-network user behavior analysis. User behavior is an important component in studying user anchor link recognition. With the development of social network location services and the research of location anchor links, some location information-based user anchor link identification is also more and more extensive. The Wangying combined with the position information provides a novel user anchor link identification framework, mining analysis is carried out on user similarity in the aspects of network structure, user sign-in behavior and the like, and link prediction simulation is realized based on unsupervised learning and supervised learning. Dong Y et al predict user anchor links in social networks by using a model of a factor graph, but this approach is not suitable for all scenarios because the implementation process is complex and dependent on the data set. Zhang et al use time, space, text and other content to extract features, and finally realize the identification of user anchor links. Sun Song and Li Qiudan et al propose a mapping method that integrates textual and structural information and develop a prototype system based on the proposed method that allows the user to set and adjust the weights of different information or set desired indices to achieve optimization of user anchor link identification. The invention aims to provide a user entity anchor link identification algorithm based on similarity judgment based on an unsupervised learning mode, and the algorithm establishes an anchor link identification model by combining user natural attributes and improved user relation analysis, so that the identification accuracy is improved.
Disclosure of Invention
The invention aims to provide a method for identifying user entity anchor links of a heterogeneous social network.
A heterogeneous social network user entity anchor link identification method comprises the following steps:
(1) given two heterogeneous social networks G1 and G2The users respectively useAndrepresents;
(2) calculating user similarity based on user attributes in the two heterogeneous social networks;
(3) calculating user similarity based on user relationship in two heterogeneous social networks;
(4) for user u and corresponding close friends { fiSequencing similarity, and selecting topK ufi
(5) Comparing heterogeneous social networks G1User u and heterogeneous social network G in (1)2Close friend f of user u' in (1)i and f′iGet K2The similarity result is represented by a two-dimensional matrix score with rows and columns of fi and f′iThe KM algorithm is adopted to realize maximum weight matching, and the result is stored by a two-dimensional matrix score';
(6) judging all values of the matrix score', namely the similarity of close friends, and selecting the close friend value with the maximum similarity as the user similarity based on the user relationship;
(7) integrating the user attribute and the user relationship;
(8) initializing a two-dimensional matrix SuRows and columns represent users in two social networks, n1 and n2Respectively represented in two heterogeneous community networks G1 and G2The number of users, initialization adjustment factors α, β, gamma, theta and mu are all 0.2, user similarity values based on user attributes and user relations are calculated, the best matching is achieved through a KM algorithm, and the best matching result is anchored and linked by a heterogeneous social network user entity through a two-dimensional matrix S'uAnd (5) storing.
The given two heterogeneous social networks G1 and G2The users respectively useAndthe method comprises the following steps:
a heterogeneous social network G ═ (V, E), where node V contains a variety of informational nodes, V ═ Vnum|num∈Z+Num represents the kind of node, when num is 1, the node represents user u and u belongs to V1(ii) a The links E between the nodes comprise a plurality of types, num1、num2representing the kind of the node;
and (3) judging the similarity of the user attributes:
and (3) judging the similarity of the user relationship:
wherein, attr ═ { n, a, l, c }, n, a, l, c respectively represent four attributes of the user: user name, user attribution, user real-time sharing position and user release content, wherein each value in P isAndjudging the obtained numerical value according to the similarity of the user attributes, wherein each value in Q isAndand judging the obtained numerical value according to the user relationship similarity.
The calculating of the user similarity based on the user attributes in the two heterogeneous social networks comprises the following steps:
(3.1) calculating user similarity based on user names in two heterogeneous social networks
Converting the user name into a group of word lists, and measuring by using a Jacard similarity algorithm:
wherein ,is represented by G1User's deviceThe user name of (a) is used,represents G2User's deviceUser name of (2), matrix PnUser's deviceAndsimilarity value of the similarity determination method based on the user name;
(3.2) calculating user similarity based on user attribution in two heterogeneous social networks
Given the region S as S ═ country, province, city]If any of S is missing, replace it with 0, in both social networks G1 and G2In each region is denoted as S1Is ═ country11, province of1City, city1] and S2Is ═ country21, province of2City, city2]For any two user regions, the identification degree is divided according to three types of position items, the identification degree of the country is lowest, and the identification degree of the city is highest; when actual matching is carried out, matching is started from an item with the lowest recognition degree, when a position item in a user region is missing, the position item with the low recognition degree is drawn close to the position item with the high recognition degree, a province corresponding to the city or a country corresponding to the province is found and is completed, the occupation ratio of each position item in the user region is distributed to be 1:1:1, after the missing position item is completed, occupation ratio accumulation is carried out according to whether each corresponding position is the same, and if the cities of the two positions are the same, the similarity of the users recognized according to the user region is considered to be 1;
notation for actual similarity comparison resultIt is shown that,is a matrix PaUser's deviceAndsimilarity value of the similarity determination method based on the user attribution;
(3.3) calculating the similarity of the users based on the real-time sharing positions of the users in the two heterogeneous social networks
Time intervals are set according to the time of day in a distinguishing way, every two hours are taken as a time period, and the time of day is divided into 12 interval sets TiCounting the number of positions accessed by the users in each time set, calculating the frequency of the positions accessed by the users in each time set, showing all frequency values in a vector mode, and finally comparing the vector similarity of any two users in two social networks to obtain the similarity of the positions checked in by the users;
user u has accessed count in the ith time setiA position, denoted as (u, i, count)i);
The one-day visiting location of user u is represented as wherein ,access the location number sum for the user a day;
the frequency of user visiting locations per time set is:
after the frequency of the user visiting the position in each period of time is calculated, the frequency of the user visiting the position is counted in all the time periods and is represented by a vector W:
W={Wi|1≤i≤12}
frequency vector for each user visiting location in two social networksAndcalculating the similarity of the two, and using symbols based on the similarity result of the real-time sharing position of the userRepresents:
wherein ,is a matrix PlUser's deviceAndsimilarity value of a similarity determination method based on the real-time shared position of the user;
(3.4) calculating the similarity of the users based on the published contents of the users in the two heterogeneous social networks
The TF-IDF algorithm is adopted to distinguish the importance of some words to the content:
tfidf(α)=tfα,k×idf(α)
wherein ,tfα,kRepresenting word frequency, idf (α) representing an inverse text frequency index;
at G1 and G2Firstly, two users are respectively connected by adopting TF-IDF algorithmAndextracting key words from the released content, combining all vocabularies into a set, and calculating the userAndword frequency of the issued content to words in the set and generating respective word frequency vectorsAndfinally, calculating cosine similarity of the two vectors, namely a user similarity result obtained by releasing contents through a user; user' sAndnotation based on similar results of user-published contentRepresents;is a matrix PcUser's deviceAndand determining the similarity value of the method based on the similarity of the content released by the user.
The calculating of the user similarity based on the user relationship in the two heterogeneous social networks comprises the following steps:
respectively count G1 and G2F-F of each user u in the listi|i∈[0,n]The unidirectional interactive behaviors comprise user comment, reply, like and forwarding behaviors, and are expressed by single _ behavior; the common participation behaviors comprise behaviors of commenting, forwarding and commenting on topics by users, and are expressed by double _ behavors; for G1 and G2For each user u in u, in u's buddy set f, for the buddy fiScoring with an initial score of 0 if u vs fiGenerating either of the behaviors of Single _ behavior or fiAdding 1 to the corresponding score of any behavior of u generating single _ behavior; if u and fiAny behavior of double _ behavior is generated for a topic, and the corresponding score is added by 1.
All values of the judgment matrix score', namely the close friend similarity, are selected, and the close friend value with the maximum similarity is used as the user similarity based on the user relationship, and the method comprises the following steps:
final notation for user relationship similarityIt is shown that,for users in matrix QAndand judging the similarity value of the method based on the user relationship similarity.
The integrating the user attribute and the user relationship comprises the following steps:
the heterogeneous social network user entity anchor link identification algorithm USDU is as follows:
Su=αPn+βPa+γPl+θPc+μQ
wherein ,SuIs a two-dimensional matrix with rows and columns of the matrix for two users in the social network, 0 means that both do not have a user anchor link, 1 means that both have a user anchor link, α, β, γ, θ and μ as scaling factors and α + β + γ + θ + μ ═ 1.
The invention has the beneficial effects that:
the main idea of the invention is to realize the user anchor link identification through the judgment of the similarity. The unsupervised learning mode is adopted to respectively judge the similarity of the user attributes in the heterogeneous social network and the similarity of the user relationships, and finally, the entity anchor link identification algorithm of the heterogeneous social network users is integrated, the previous common friend relationship similarity is optimized and improved, close friends with the user attributes as the benchmark are extracted, the identification degree of the user anchor links is strengthened through the similarity degree of the close friend relationships, and the identification effect of the user anchor links is greatly improved.
Drawings
FIG. 1 is a schematic diagram of a social network.
FIG. 2 is a diagram of social network user friend relationships.
Fig. 3 is a diagram of a user's close friend relationship.
Fig. 4 is an example where K is 3,andand (4) a close friend bipartite graph.
Fig. 5 is an example where K is 3,andand the close friends are similar to the matching graph.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
1. Given a heterogeneous social network G, G ═ (V, E), where node V contains a variety of informational nodes, V ═ Vnum|num∈Z+Num represents the kind of node, when num is 1, the node represents user u and u belongs to V1(ii) a The links E between the nodes comprise a plurality of types,num1、num2indicating the kind of node.
2. Given two heterogeneous social networks G1 and G2The users respectively useAndand (4) showing. User anchor linksIf and only if
3. Given two heterogeneous social networks G1 and G2Andthe method comprises the following steps of respectively using a user two-dimensional matrix for user attribute similarity judgment and user relationship similarity judgment, wherein attr is { n, a, l, c }, and n, a, l and c respectively represent four attributes of a user: user name, user attribution, user real-time sharing position and user publishing content. Each value in P isAndjudging the obtained numerical value according to the similarity of the user attributes, wherein each value in Q isAndand judging the obtained numerical value according to the user relationship similarity.
4. User similarity based on user attributes in two heterogeneous social networks is first calculated.
(1) And calculating user similarity based on user names in the two heterogeneous social networks, wherein the user names generally consist of Chinese characters, letters, underlines and the like, and converting the user names into a group of word lists. The measurement is performed using the Jacard similarity algorithm, as shown below.
wherein ,is represented by G1User's deviceThe user name of (a) is used,represents G2User's deviceUser name of (2), matrix PnUser's deviceAndand judging the similarity value of the method based on the similarity of the user names.
(2) And calculating the user similarity based on the user attribution in the two heterogeneous social networks. Given the region S as S ═ country, province, city]And if any of S is missing, 0 is substituted. In two social networks G1 and G2In each region is denoted as S1Is ═ country11, province of1City, city1] and S2Is ═ country21, province of2City, city2]For any two user regions, the identification degree is divided according to three types of position items, the identification degree of the country is the lowest, and the identification degree of the city is the highest. When the actual matching is performed, the matching is started from the item with the lowest recognition degree. And when the position item is missing in the user region, the position item with low identification degree is closed to the position item with high identification degree, and the province corresponding to the city or the country corresponding to the province is found and completed. The proportion of each position item in the user area is distributed to be 1:1:1, and after the missing position items are supplemented, proportion accumulation is carried out according to whether each corresponding position is the same or not. If the cities of the two are the same, the similarity of the users identified according to the user regions is considered to be 1.
For example, user U1Region S of1Is (Chinese, 0, harbin)]User U2Region S of2Is (China, Heilongjiang, 0)]When the two user regions are matched, S is firstly found out respectively1 and S2The position item with the lowest identification degree of the S pair is Harbin and Heilongjiang, and the corresponding position items of the two are different in type, so that the S pair is S1The position items of the three-dimensional map are close, the province corresponding to the Halbin city is the Heilongjiang province, and then the region S is complemented1Is supplemented with S'1Becoming [ China, Heilongjiang, Harbin ]]Then to S'1 and S2Matching is carried out, two items of matching results are the same, and finally, the user U with the identification result can be obtained1And user U2The similarity of the user anchor link identification by user area is 2/3.
Notation for actual similarity comparison resultIt is shown that,is a matrix PaUser's deviceAndand determining the similarity value of the method based on the similarity of the user attribution.
(3) And calculating the user similarity based on the real-time sharing position of the user in the two heterogeneous social networks. Location and time integration is used herein, i.e. the analysis of the user's sometimes dynamic location. The time is in a 24-hour system, time intervals are set according to the time of a day in a distinguishing way, every two hours are taken as a time period, and the time of each day is divided into 12 interval sets TiCalculating the number of positions accessed by the user in each time set by { i |1 ≦ i ≦ 12}, and calculating each time setAnd displaying all frequency values in a vector form according to the frequency of the user visiting the position, and finally comparing the vector similarity of any two users in the two social networks to obtain the similarity of the user signing in the position. Given user u, assume that the user has accessed count at the ith time setiA position, denoted as (u, i, count)i). The one-day visiting location for user u is denoted asThe sum of the number of locations is accessed for a user a day. The frequency of user visiting locations for each time set is shown below.
After calculating the frequency of the access position of the user in each time period, the frequency of the access position of the user is counted for all the time periods and is represented by a vector W, wherein W is { W ═ WiI 1 ≦ i ≦ 12}, frequency vector for each user access location in the two social networksAndand calculating the similarity of the two. The method adopts a cosine vector algorithm to measure the similarity, and symbols are used for similar results based on the real-time shared position of a userThe formula is shown below.
The closer to 1 the value of (A) indicates that the user is in both heterogeneous social networksAndthe more similar in time-visit location frequency.Is a matrix PlUser's deviceAndand determining the similarity value of the method based on the similarity of the real-time sharing positions of the users.
(4) And calculating the user similarity based on the user published contents in the two heterogeneous social networks. The similarity judgment of the users is carried out by comparing all keywords of the released contents of the users in the two networks, and the actual problem is solved by adopting a TF-IDF algorithm because the keywords of the released contents are not all contents but the core of the contents. The TF-IDF algorithm can distinguish the importance of some words to the content as shown in the following formula.
tfidf(α)=tfα,k×idf(α)
wherein ,tfα,kRepresenting word frequency, idf (α) representing the inverse text frequency index, at G1 and G2Firstly, two users are respectively connected by adopting TF-IDF algorithmAndextracting key words from the released content, combining all vocabularies into a set, and calculating the userAndword frequency of the issued content to words in the set and generating respective word frequency vectorsAndand finally calculating the cosine similarity of the two vectors, namely a user similarity result obtained by releasing the content through the user. User' sAndnotation based on similar results of user-published contentAnd (4) showing.Is a matrix PcUser's deviceAndand determining the similarity value of the method based on the similarity of the content released by the user.
5. And calculating the user similarity based on the user relationship. The social network user friend relationship is as in fig. 2. Respectively count G1 and G2F-F of each user u in the listi|i∈[0,n]The unidirectional interactive behaviors comprise user comment, reply, like and forwarding behaviors, and are expressed by single _ behavior; the common participation behaviors comprise behaviors of commenting, forwarding and commenting on topics by users and behavior of using a doublele _ behavior. And screening out the close friends of the user through the interaction behavior, wherein the close friend relationship of the user is shown in figure 3. For G1 and G2For each user u in u, in u's buddy set f, for the buddy fiScoring was performed with an initial score of 0. If u is to fiGenerating either of the behaviors of Single _ behavior or fiFor any behavior in u that produces a single _ behavior, add 1 to the corresponding score. If u and fiAny behavior of double _ behavior is generated for a topic, and the corresponding score is added by 1.
6. For ufiSorting and selecting topK ufiU is the corresponding intimate friend { fiAnd setting K to be 3.
7. Comparison G1U and G in (1)2U's close friend fi and f′iGet K2Similar results are shown in FIG. 4. Expressed in a two-dimensional matrix score with rows and columns of fi and f′i. The KM algorithm is used to achieve maximum weight matching, as in fig. 5, and the results are stored in a two-dimensional matrix score'.
8. And judging all values of the matrix score', namely the similarity of the close friends, and selecting the close friends with the maximum similarity as the user similarity based on the user relationship. Final notation for user relationship similarityIt is shown that,for users in matrix QAndand judging the similarity value of the method based on the user relationship similarity.
9. And finally integrating the user attributes and the user relationships, wherein the heterogeneous social network user entity anchor link identification algorithm USDU is shown as a formula (5).
Su=αPn+βPa+γPl+θPc+μQ
SuIs a two-dimensional matrix with rows and columns of the matrix being two users in the social network, 0 indicating that both do not have a user anchor link, and 1 indicating that both do have a user anchor link, where α, β, γ, θ, and μ are adjustment factors and satisfy the above equation:
α+β+γ+θ+μ=1
10. initializing a two-dimensional matrix SuRows and columns represent users in two social networks, n1 and n2Respectively represented in two heterogeneous social networks G1 and G2The number of users, the initial adjustment factors α, β, gamma, theta and mu are all 0.2, and user similarity values based on user attributes and user relations are calculated.uAnd (5) storing.
With the rapid development of the internet, social networks gradually enter people's lives, such as Facebook, Twitter, man-net, and surf microblog. People use the social network not only under one platform, but also simultaneously use a plurality of social platforms, and the attention points of the social platforms are different. Such as focusing on Twitter, a few stars, writers, or interested people and things, focusing on YouTube, or giving a like video like a praise, etc. As shown in fig. 1. Each user typically has a separate account in a different social network, and these accounts of the same user are seemingly without any connections or correspondences between each other. It is necessary to find the correspondence of different accounts of the same user in multiple social networks, for example, using information of multiple networks for link recommendation and community analysis.
In recent years, more and more researchers have conducted relevant research on heterogeneous social network entity anchor links, including research on link prediction under some multi-social networks to solve the problem of cross-network recommendation. In the research on the identification of the anchor links of heterogeneous social network entities, the identification of the anchor links of user entities has been researched. When the concept of anchor link is not provided at first, work similar to the identification of the anchor link of the user already exists, which can be called user matching at early stage, and the matching result of the final user is more accurate by researching the attribute of the user name in different social networks. Anchor chains were proposed in 2013 by Kong X et al. In the article, by performing experiments under two social platforms of Foursquare and Twitter, a user can add account information of the user in Twitter and Facebook to the Foursquare, and when other users access the homepage of the user, the user can directly link to the corresponding social network platform through buttons of the Twitter and Facebook, and the link is called as an anchor link by Kong X and the like. With the proposal of the anchor link concept, the research on the anchor link of the user is more extensive. Congratulatory and intelligent people and the like propose a matching problem of converting anchor links into one-to-one, and realize the identification of the anchor links of users mainly through common neighbors among users in different social networks; nac' era Bennacer et al, an algorithm is proposed to iteratively match users in multiple social networks using network topology and personal information. The Libanflag and the bradyseis are improved by researching the structural characteristics of a social network and a method for identifying the anchor link of the user of the common neighbor, and a method for considering the preferential connection on the basis of the common neighbor is adopted. Aiming at the complex characteristics of cross-network user data, users with anchor links are centered around, and user behaviors are analyzed from three aspects of analysis of user behavior patterns with anchor links, integration of user data and integration of user interests on the basis of cross-network user behavior analysis. User behavior is an important component in studying user anchor link recognition. With the development of social network location services and the research of location anchor links, some location information-based user anchor link identification is also more and more extensive. The Wangying combined with the position information provides a novel user anchor link identification framework, mining analysis is carried out on user similarity in the aspects of network structure, user sign-in behavior and the like, and link prediction simulation is realized based on unsupervised learning and supervised learning. Dong Y et al predict user anchor links in social networks by using a model of a factor graph, but this approach is not suitable for all scenarios because the implementation process is complex and dependent on the data set. Zhang et al use time, space, text and other content to extract features, and finally realize the identification of user anchor links. Sun Song and Li Qiudan et al propose a mapping method that integrates textual and structural information and develop a prototype system based on the proposed method that allows the user to set and adjust the weights of different information or set desired indices to achieve optimization of user anchor link identification. The invention aims to provide a user entity anchor link identification algorithm based on similarity judgment based on an unsupervised learning mode, and the algorithm establishes an anchor link identification model by combining user natural attributes and improved user relation analysis, so that the identification accuracy is improved.
The purpose of the invention is realized as follows:
1. carrying out similarity judgment on the user names in the two heterogeneous social networks;
2. carrying out similarity judgment on user attributions in the two heterogeneous social networks;
3. carrying out similarity judgment on the real-time sharing positions of the users in the two heterogeneous social networks;
4. carrying out similarity judgment on user published contents in the two heterogeneous social networks;
5. carrying out similarity judgment on user relations in the two heterogeneous social networks;
6. user entities are characterized from two aspects, namely user attributes and user relations, wherein the user attributes comprise user names, user attributions, user real-time sharing positions and user release contents. Constructing a plurality of sets of two-dimensional matrices Pn、Pa、Pl、PcAnd Q respectively represents the user name, the user attribution, the user real-time sharing position and the user usageAnd judging the similarity of the user issued content and the user relationship to generate a result and providing an anchor link identification algorithm of the user entity of the heterogeneous social network.
7. The anchor link relation between users in two social networks is a one-to-one relation, and the user-to-many relation is generated through the user attribute and the user relation, so that the many-to-many problem is solved in a bipartite graph mode, and the optimal matching of the anchor links of the users is realized through a KM algorithm.
The invention provides a method for identifying heterogeneous social network user entity anchor linkage USDU (user anchor linkage for Similarity Determination in Unsupervised mode), which has the main idea that the identification of user anchor linkage is realized through the judgment of Similarity. And adopting an unsupervised learning mode to respectively judge the similarity of the user attributes in the heterogeneous social network and the similarity of the user relationship, finally integrating the anchor link identification algorithm of the user entities in the heterogeneous social network, and achieving the best matching through the KM algorithm of the bipartite graph. The prior common friend relationship similarity is optimized and improved, close friends with user attributes as the reference are extracted, the identification degree of the user anchor links is strengthened through the similarity degree of the close friend relationships, and the identification effect of the user anchor links is greatly improved.

Claims (6)

1. A heterogeneous social network user entity anchor link identification method is characterized by comprising the following steps:
(1) given two heterogeneous social networks G1 and G2The users respectively useAndrepresents;
(2) calculating user similarity based on user attributes in the two heterogeneous social networks;
(3) calculating user similarity based on user relationship in two heterogeneous social networks;
(4) for user u and corresponding close friends { fiSequencing similarity, and selecting topK ufi
(5) Comparing heterogeneous social networks G1User u and heterogeneous social network G in (1)2Close friend f of user u' in (1)i and fi' Attribute, get K2The similarity result is represented by a two-dimensional matrix score with rows and columns of fi and fi'adopting a KM algorithm to realize maximum weight matching, and storing the result by using a two-dimensional matrix score';
(6) judging all values of the matrix score', namely the similarity of close friends, and selecting the close friend value with the maximum similarity as the user similarity based on the user relationship;
(7) integrating the user attribute and the user relationship;
(8) initializing a two-dimensional matrix SuRows and columns represent users in two social networks, n1 and n2Respectively represented in two heterogeneous community networks G1 and G2The number of users, initialization adjustment factors α, β, gamma, theta and mu are all 0.2, user similarity values based on user attributes and user relations are calculated, the best matching is achieved through a KM algorithm, and the best matching result is anchored and linked by a heterogeneous social network user entity through a two-dimensional matrix S'uAnd (5) storing.
2. The method of claim 1, wherein the given two heterogeneous social networks G are identified by anchor links of user entities in the heterogeneous social networks1 and G2The users respectively useAndthe method comprises the following steps:
a heterogeneous social network G ═ (V, E), where node V contains a variety of informational nodes, V ═ Vnum|num∈Z+Num represents the kind of node, when num is 1, the node represents user u and u belongs to V1(ii) a The links E between the nodes comprise a plurality of types, num1、num2representing the kind of the node;
and (3) judging the similarity of the user attributes:
and (3) judging the similarity of the user relationship:
wherein, attr ═ { n, a, l, c }, n, a, l, c respectively represent four attributes of the user: user name, user attribution, user real-time sharing position and user release content, wherein each value in P isAndjudging the obtained numerical value according to the similarity of the user attributes, wherein each value in Q isAndand judging the obtained numerical value according to the user relationship similarity.
3. The method for identifying the anchor links of the user entities in the heterogeneous social networks according to claim 1, wherein the calculating the user similarity based on the user attributes in the two heterogeneous social networks comprises:
(3.1) calculating user similarity based on user names in two heterogeneous social networks
Converting the user name into a group of word lists, and measuring by using a Jacard similarity algorithm:
wherein ,is represented by G1User's deviceThe user name of (a) is used,represents G2User's deviceUser name of (2), matrix PnUser's deviceAndsimilarity value of the similarity determination method based on the user name;
(3.2) calculating user similarity based on user attribution in two heterogeneous social networks
Given the region S as S ═ country, province, city]If any of S is missing, replace it with 0, in both social networks G1 and G2In each region is denoted as S1Is ═ country11, province of1City, city1] and S2Is ═ country21, province of2City, city2]For any two user regions, the identification degree is divided according to three types of position items, the identification degree of the country is lowest, and the identification degree of the city is highest; when actual matching is carried out, matching is started from an item with the lowest recognition degree, when a position item in a user region is missing, the position item with the low recognition degree is drawn close to the position item with the high recognition degree, a province corresponding to the city or a country corresponding to the province is found and is completed, the occupation ratio of each position item in the user region is distributed to be 1:1:1, after the missing position item is completed, occupation ratio accumulation is carried out according to whether each corresponding position is the same, and if the cities of the two positions are the same, the similarity of the users recognized according to the user region is considered to be 1;
notation for actual similarity comparison resultIt is shown that,is a matrix PaUser's deviceAndsimilarity value of the similarity determination method based on the user attribution;
(3.3) calculating the similarity of the users based on the real-time sharing positions of the users in the two heterogeneous social networks
Time intervals are set according to the time of day in a distinguishing way, every two hours are taken as a time period, and the time of day is divided into 12 interval sets TiCounting the number of positions accessed by the user in each time set, calculating the frequency of the positions accessed by the user in each time set, and calculating all frequency valuesDisplaying the quantity form, and finally comparing the vector similarity of any two users in the two social networks to obtain the similarity of the check-in positions of the users;
user u has accessed count in the ith time setiA position, denoted as (u, i, count)i);
The one-day visiting location of user u is represented as wherein ,access the location number sum for the user a day;
the frequency of user visiting locations per time set is:
after the frequency of the user visiting the position in each period of time is calculated, the frequency of the user visiting the position is counted in all the time periods and is represented by a vector W:
W={Wi|1≤i≤12}
frequency vector for each user visiting location in two social networksAndcalculating the similarity of the two, and using symbols based on the similarity result of the real-time sharing position of the userRepresents:
wherein ,is a matrix PlUser's deviceAndsimilarity value of a similarity determination method based on the real-time shared position of the user;
(3.4) calculating the similarity of the users based on the published contents of the users in the two heterogeneous social networks
The TF-IDF algorithm is adopted to distinguish the importance of some words to the content:
tfidf(α)=tfα,k×idf(α)
wherein ,tfα,kRepresenting word frequency, idf (α) representing an inverse text frequency index;
at G1 and G2Firstly, two users are respectively connected by adopting TF-IDF algorithmAndextracting key words from the released content, combining all vocabularies into a set, and calculating the userAndword frequency of the issued content to words in the set and generating respective word frequency vectorsAndfinally, calculating cosine similarity of the two vectors, namely a user similarity result obtained by releasing contents through a user; user' sAndnotation based on similar results of user-published contentRepresents;is a matrix PcUser's deviceAndand determining the similarity value of the method based on the similarity of the content released by the user.
4. The method for identifying the anchor links of the user entities in the heterogeneous social networks according to claim 1, wherein the calculating the user similarity based on the user relationship in the two heterogeneous social networks comprises:
respectively count G1 and G2F-F of each user u in the listi|i∈[0,n]The unidirectional interactive behaviors comprise user comment, reply, like and forwarding behaviors, and are expressed by single _ behavior; the common participation behaviors comprise behaviors of commenting, forwarding and commenting on topics by users, and are expressed by double _ behavors; for G1 and G2For each user u in u, in u's buddy set f, for the buddy fiScoring with an initial score of 0 if u vs fiGenerating either of the behaviors of Single _ behavior or fiGenerating single _ behavi for uor, any one of the behaviors, adding 1 to the corresponding score; if u and fiAny behavior of double _ behavior is generated for a topic, and the corresponding score is added by 1.
5. The method for identifying the anchor links of the user entities in the heterogeneous social network according to claim 1, wherein the determining all values of the matrix score', that is, the similarity of close friends, and selecting the close friend value with the highest similarity as the user similarity based on the user relationship comprises:
final notation for user relationship similarityIt is shown that,for users in matrix QAndand judging the similarity value of the method based on the user relationship similarity.
6. The method for identifying anchor links of user entities in heterogeneous social networks according to claim 1, wherein the integrating the user attributes and the user relationships comprises:
the heterogeneous social network user entity anchor link identification algorithm USDU is as follows:
Su=αPn+βPa+γPl+θPc+μQ
wherein ,SuIs a two-dimensional matrix with rows and columns of the matrix for two users in the social network, 0 means that both do not have a user anchor link, 1 means that both have a user anchor link, α, β, γ, θ and μ as scaling factors and α + β + γ + θ + μ ═ 1.
CN201910194845.1A 2019-03-14 2019-03-14 Heterogeneous social network user entity anchor link identification method Active CN109949174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910194845.1A CN109949174B (en) 2019-03-14 2019-03-14 Heterogeneous social network user entity anchor link identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910194845.1A CN109949174B (en) 2019-03-14 2019-03-14 Heterogeneous social network user entity anchor link identification method

Publications (2)

Publication Number Publication Date
CN109949174A true CN109949174A (en) 2019-06-28
CN109949174B CN109949174B (en) 2023-06-09

Family

ID=67009854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910194845.1A Active CN109949174B (en) 2019-03-14 2019-03-14 Heterogeneous social network user entity anchor link identification method

Country Status (1)

Country Link
CN (1) CN109949174B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413900A (en) * 2019-08-01 2019-11-05 电子科技大学 More social networks account matching process based on viterbi algorithm
CN110442758A (en) * 2019-07-23 2019-11-12 腾讯科技(深圳)有限公司 A kind of figure alignment schemes, device and storage medium
CN110737651A (en) * 2019-09-29 2020-01-31 武汉海昌信息技术有限公司 reducible desensitization data cleaning and exchanging method
CN111475738A (en) * 2020-05-22 2020-07-31 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN111475739A (en) * 2020-05-22 2020-07-31 哈尔滨工程大学 Heterogeneous social network user anchor link identification method based on meta-path

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145527A (en) * 2017-04-14 2017-09-08 东南大学 Link prediction method based on first path in alignment isomery social networks
CN107480714A (en) * 2017-08-09 2017-12-15 东北大学 Across social network user recognition methods based on full visual angle characteristic
US20180341696A1 (en) * 2017-05-27 2018-11-29 Hefei University Of Technology Method and system for detecting overlapping communities based on similarity between nodes in social network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145527A (en) * 2017-04-14 2017-09-08 东南大学 Link prediction method based on first path in alignment isomery social networks
US20180341696A1 (en) * 2017-05-27 2018-11-29 Hefei University Of Technology Method and system for detecting overlapping communities based on similarity between nodes in social network
CN107480714A (en) * 2017-08-09 2017-12-15 东北大学 Across social network user recognition methods based on full visual angle characteristic

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WENLONG ZHU等: "Location-Aware Influence Blocking Maximization in Social Networks", 《IEEE ACCESS》 *
罗梁等: "跨社交网络的实体用户关联技术研究", 《信息网络安全》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442758A (en) * 2019-07-23 2019-11-12 腾讯科技(深圳)有限公司 A kind of figure alignment schemes, device and storage medium
CN110442758B (en) * 2019-07-23 2022-05-06 腾讯科技(深圳)有限公司 Graph alignment method, device and storage medium
CN110413900A (en) * 2019-08-01 2019-11-05 电子科技大学 More social networks account matching process based on viterbi algorithm
CN110737651A (en) * 2019-09-29 2020-01-31 武汉海昌信息技术有限公司 reducible desensitization data cleaning and exchanging method
CN111475738A (en) * 2020-05-22 2020-07-31 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN111475739A (en) * 2020-05-22 2020-07-31 哈尔滨工程大学 Heterogeneous social network user anchor link identification method based on meta-path
CN111475738B (en) * 2020-05-22 2022-05-17 哈尔滨工程大学 Heterogeneous social network location anchor link identification method based on meta-path
CN111475739B (en) * 2020-05-22 2022-07-29 哈尔滨工程大学 Heterogeneous social network user anchor link identification method based on meta-path

Also Published As

Publication number Publication date
CN109949174B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN104318340B (en) Information visualization methods and intelligent visible analysis system based on text resume information
CN109949174A (en) A kind of isomery social network user entity anchor chain connects recognition methods
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN101354714B (en) Method for recommending problem based on probability latent semantic analysis
CN104133897B (en) A kind of microblog topic source tracing method based on topic influence
CN103995804B (en) Cross-media topic detection method and device based on multimodal information fusion and graph clustering
CN107291886A (en) A kind of microblog topic detecting method and system based on incremental clustering algorithm
CN104133817A (en) Online community interaction method and device and online community platform
CN106776928A (en) Recommend method in position based on internal memory Computational frame, fusion social environment and space-time data
CN107577782B (en) Figure similarity depicting method based on heterogeneous data
CN111191099B (en) User activity type identification method based on social media
CN110059177A (en) A kind of activity recommendation method and device based on user's portrait
CN106682236A (en) Machine learning based patent data processing method and processing system adopting same
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts
CN109408574A (en) Complaint confirmation of responsibility system based on Text Mining Technology
Li et al. Event extraction for criminal legal text
CN109636682A (en) A kind of teaching resource auto-collection system
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN104572915A (en) User event relevance calculation method based on content environment enhancement
CN110008411A (en) It is a kind of to be registered the deep learning point of interest recommended method of sparse matrix based on user
CN104765763B (en) A kind of semantic matching method of the Heterogeneous Spatial Information classification of service based on concept lattice
CN103761246A (en) Link network based user domain identifying method and device
CN102193928B (en) Method for matching lightweight ontologies based on multilayer text categorizer
CN106777395A (en) A kind of topic based on community's text data finds system
CN110399382A (en) Civil aviaton's master data recognition methods and system based on cloud model and rough set

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant