CN107784124B - LBSN (location based service) hyper-network link prediction method based on space-time relationship - Google Patents

LBSN (location based service) hyper-network link prediction method based on space-time relationship Download PDF

Info

Publication number
CN107784124B
CN107784124B CN201711182961.9A CN201711182961A CN107784124B CN 107784124 B CN107784124 B CN 107784124B CN 201711182961 A CN201711182961 A CN 201711182961A CN 107784124 B CN107784124 B CN 107784124B
Authority
CN
China
Prior art keywords
user
super
edge
network
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711182961.9A
Other languages
Chinese (zh)
Other versions
CN107784124A (en
Inventor
胡敏
陈元会
黄宏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201711182961.9A priority Critical patent/CN107784124B/en
Publication of CN107784124A publication Critical patent/CN107784124A/en
Application granted granted Critical
Publication of CN107784124B publication Critical patent/CN107784124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Abstract

The invention relates to a space-time relationship based LBSN (location based service) hyper-network link prediction method, belonging to the field of data mining. The method comprises the following steps: s1: acquiring a data source; s2: constructing a hyper-network model; s3: defining and quantizing the hyper-network edge weight; s4: based on the model, firstly, various types of weighted super-edge structures are constructed, different semantic relationships among users are mined through different structures, and finally, model parameters are trained through a gradient descent method, so that the link relationship in the network is predicted. According to the method, various incidence relations among nodes can be effectively mined through the weighted super-edge structure, the problem of sparsity in a network can be solved, the noise immunity and stability of a model can be improved, and the prediction accuracy is greatly improved.

Description

LBSN (location based service) hyper-network link prediction method based on space-time relationship
Technical Field
The invention belongs to the field of data mining, and relates to a space-time relationship-based LBSN (location based service) hyper-network link prediction method.
Background
With the continuous development of computer information technology and the rapid popularization of the internet, an online Social platform becomes an indispensable part in the life of people, people can establish a friend relationship Network of themselves through the online Social platform to perform instant communication and interaction with friends, which greatly facilitates the life of people, and especially in recent years, the appearance of a Location-Based Social Network (LBSN) enables some Location services to be advocated by a large number of users in a short time, so that great success is achieved. In the LBSN, a user can check in at the position where the user has gone, and can share the check-in place of the user with friends, and the check-in behavior can truly reflect the position activity of the user, so that close contact is established between an online virtual world and an offline real world, and new opportunities and challenges are brought to social network link prediction.
At present, link prediction can be mainly divided into two methods: a similarity-based approach and a learning-based approach. The main idea of the similarity-based method is to calculate a similarity score between any two node pairs without links, and the higher the score, the more likely a link is generated between them. Representative methods include common neighbor index (CN), preferential link index (PA), Adamic/Adar index (AA), Jaccard coefficient, Katz, Rooted PageRank, etc.; the learning-based method is used for converting the link prediction problem into a two-classification problem, and the method is characterized in that key features influencing link generation are found, feature vectors are established through the features, and efficient model learning is carried out, so that accurate prediction is realized.
A heterogeneous social network is a network that contains multiple types of nodes and edges, while a location-based social network belongs to a heterogeneous social network, mainly embodied in that it has user nodes, location nodes, category nodes, user-location edges, user-user edges, and so on. At present, most link prediction researches are mainly focused on homogeneous networks, namely, only one type of nodes (user nodes) and edges (user-user edges) are arranged in the networks, and researches on heterogeneous networks are relatively few, so that most link prediction methods based on the homogeneous networks are not applicable any more. For the link prediction of the heterogeneous network, some people research from the perspective of time information, and the time of link establishment or the correlation of two users on the time-space relationship are considered, so that the research shows that the accuracy of prediction can be effectively improved by considering the time factor; there is also a research on the basis of meta-path, which refers to a path connecting different nodes, and the path has certain semantic information, for example, user-comedy-user indicates that both users like comedy, and the correlation between nodes is calculated by random walk and the like on the basis of different semantic paths. The method solves the problem of data sparsity, however, most researches only aim at an unweighted network, and for a weighted network, the network weight characteristic still needs to be considered in a targeted manner; the existing method predicts the links in the network by constructing a weighted hyper-triangle structure based on a weighted hyper-network model, however, the existing weighted hyper-triangle structure can only capture the influence of the field nodes on the formation of the links, and other richer weighted hyper-edge structures are integrated, so that the problem of data sparsity can be relieved, and meanwhile, the prediction accuracy can be further improved. In addition, the existing method based on the hyper-network cannot utilize the time information, so that the accuracy of the method still has great improvement space.
Disclosure of Invention
In view of this, the present invention provides a space-time relationship-based lbs n hyper-network link prediction method, which provides a "space-time-user-location-category" four-layer weighted hyper-network model for the heterogeneity of heterogeneous social networks and the space-time correlation characteristics between users, and effectively incorporates time information into the hyper-network model. The edge weight of the weighted hyper-network is corrected by considering the implicit behavior among users, the potential position incidence relation and the user preference, and the model interpretability is improved. And finally, defining a super edge and a super edge structure based on the modified weighted super network model, and mining the incidence relation between users based on the super edge and the super edge structure.
In order to achieve the purpose, the invention provides the following technical scheme:
a LBSN hyper-network link prediction method based on a space-time relationship comprises the following steps:
s1: acquiring a data source; acquiring data information with high accuracy and reliability from the existing large social network platform; the acquired data content comprises friend relationships among users, comments and scores of the users on the positions, comment time, longitude and latitude of the positions and the types of the positions;
s2: constructing a hyper-network model; the method comprises the steps of constructing a space-time sub-network, a social sub-network, a position sub-network and a category sub-network, wherein the space-time sub-network is constructed by using the sign-in time of a user to a position and is used for mining the space-time similarity between the users;
s3: defining and quantizing the hyper-network edge weight; defining edge weight values in a hyper-network model through four different modes of user influence, implicit association relation, user preference and node degree information;
s4: through the process of S1-S3, a weighted super-network model is constructed, based on the model, various types of weighted super-edge structures are firstly constructed, different semantic relationships among users are mined through different structures, and finally model parameters are trained through a gradient descent method, so that the link relationship in the network is predicted and divided into a time space layer, a user layer, a position layer and a category layer.
Further, the step S2 specifically includes:
extracting a friend relationship list of a user, a check-in relationship list of the user and the category information of the position through the original data information;
s21: extracting time-space nodes through the sign-in time of the user; the spatio-temporal node means that if two or more users commonly visit a certain position in a certain time period, the position is defined as a spatio-temporal node; the spatio-temporal nodes reflect the interest preference of a user at a specific position at a specific time;
s22: constructing a space-time-user-position-category four-layer hyper-network model; the method comprises the following steps of dividing the method into a space-time subnet, a social subnet, a position subnet and a category subnet; the incidence relation among the four layers of sub-networks is that users can visit some interest points under some types according to own interest preference, check in, comment and score the interest points, and if the users have special interest preference in a specific time, the users can be associated by the same time-space node; so far, the construction of the four-layer sub-network under the social network based on the position is completed.
Further, the step S3 specifically includes:
s31: the user-user weight is strengthened through user influence; in a location-based social network, the influence of each user is different; dividing the influence of the users into individual influence of the users and influence among the users, and measuring the influence through a following network and a following behavior respectively;
defining the following behavior: if the user v signs in the place where the friend u signs in, the user v is considered to generate a following behavior for the user u, and a directed edge from v to u is correspondingly generated;
definition following network Gf=(Vf,Ef): wherein G isfRepresenting a directed network formed by following behaviors, VfRepresenting users in a follows network, EfRepresenting directed edges resulting from the follow-up behavior;
s311: individual influence of the user Iu: the method is used for measuring the influence of the user on other users in the network due to the self behavior; taking into account different time periods by dividing time slices
Figure GDA0003149903950000031
The following behavior of the user in each time slice forms a corresponding following network by the influence of the user, and S time slices are divided, tsFor the s-th time slice, the user's final individual influence is contributed by the individual influence in each time slice, and the time slices that are further away from the current time have their individual influences attenuated more;
considering existence of isolated nodes in the network, the user individual influence is solved by adopting a LeaderRank algorithm, and an iterative formula is as follows:
Figure GDA0003149903950000032
wherein N isuA neighbor node representing user u is shown,
Figure GDA0003149903950000033
representing the out-degree of user v; in a stable state, the leader rank uniformly distributes the scores of the group nodes to all other nodes, and the final scores of the nodes are expressed as:
Iu=Iu(td)+Ig(td)/N
wherein Ig(td) The number of the nodes is the fraction of the group Node in a stable state, and N is the total number of users;
since the influence of the user decreases with time, the decay function is defined as:
Wu(ti)=exp(-ln2×(tc-ti)/tm)
wherein t iscIndicates the current time, tiDenotes the ith time slice, tmHalf-life representing a decrease in potency;
user u total value I of individual influence at current momentuComprises the following steps:
Figure GDA0003149903950000034
wherein Iu(ti) Denotes the t-thiIndividual influence of individual time slice users u;
s312: influence between users: influence between users Ii(u, v) measuring the influence of the user u on the user v, regarding the following behavior as the interaction among the users and measuring the influence among the users;
proposing a following location ratio IpAnd follow sign-in ratio IcThese two metrics:
Figure GDA0003149903950000041
Figure GDA0003149903950000042
where M (v, u) represents the number of check-in places, positions, where user v follows user uuRepresents the total number of check-in locations for user u, K (v, u) represents the total number of check-ins for user v to follow user u, CheckinuRepresenting the total number of check-ins of user u;
the user influence I (u, v) is:
Figure GDA0003149903950000043
based on the user influence, quantizing the user-user edge weight, and for the node pair u and v, if the user influence of u on v is high, the corresponding edge weight should be high, and the edge weight between the user and the user is quantized as:
Figure GDA0003149903950000044
wherein w (u, w) is the neighbor node of the user u in the social subnet in the S, and I (u, v) represents the influence between the user and the neighbor node of the social subnet;
s32: defining and quantizing a position-position edge weight and a category-category edge weight through a hidden incidence relation;
defining an edge weight value between positions and an edge weight value between categories:
Figure GDA0003149903950000045
wherein geodesist (p, p ') denotes the distance between positions p and p', Max | WpI is the maximum of the number of times two locations are associated, w (p, p ') is the number of times locations p and p' are associated by the user,
Figure GDA0003149903950000046
is a correlation time threshold;
Figure GDA0003149903950000047
where | P (c, c ') | represents the number of locations that belong to both c and category c', Max | PcL represents the maximum value of the number of places belonging to the type c and other types at the same time;
s33: defining and quantifying a user-location edge weight by user preference; in the social network based on the position, the scoring attribute of the user to the position can intuitively reflect the preference degree of the user to the position; and (3) correcting the user-position edge weight value through an exponential function for higher weight values of positions with high preference of the user:
Figure GDA0003149903950000051
wherein r (u, p) is the score of user u at location p;
s34: the remaining edge weights are defined and quantized by node out-degree.
Further, the step S4 specifically includes:
s41: defining a super edge and super edge weight;
three types of super edges are defined:
class I super edge SEI: the method comprises the following steps that a super edge only containing one type of nodes belongs to a special super edge in a super network;
class II supercede SEII: the node pair between two adjacent layers of subnets forms an edge, and is characterized by only comprising two heterogeneous nodes;
class III supercede SEIII: the node is an edge formed by three adjacent layers of subnets and is characterized by only comprising three kinds of heterogeneous nodes;
the excess edge weight refers to the weight of each excess edge, and is obtained by calculating the edge weight contained in the excess edge;
s42: hyperlink prediction: based on the defined three types of super edges, a weighted super edge structure is provided, and the hyperlink prediction problem between users is solved through the weighted super edge structure; mining implicit semantic relations among nodes by constructing various types of super-edge structures;
s421: the weighting super-triangular structure comprises a single weighting super-triangular structure, a double weighting super-triangular structure and a large weighting super-triangular structure;
single weighted super triangle structure: calculating the similarity between user nodes through a single-weighted hyper-triangular structure formed by the space-time nodes and the user nodes, and expressing that two users like activities at the same time and at the same position; the defined super-edge structures are all closed-loop structures and have directivity;
double-weighted super-triangular structure: the finger comprises two continuous weighted super-triangular structures;
the heavy-weighted super-triangular structure: the finger is a triangular structure consisting of two three types of overedges;
s422: weighted hyper-rectangular structure: the user node likes to be active at two related spatio-temporal nodes, and its weight is the product of the corresponding excess weights:
s423: weighted super-hybrid architecture: the system comprises a weighted super-hybrid I structure and a weighted super-hybrid II structure; the definitions are respectively:
weighted super-hybrid I structure: the mixed I structure is formed by adding a class of super edges on the basis of a single triangular structure;
weighted super-hybrid II architecture: the mixed II structure is formed by adding a class of super edges on the basis of a rectangular structure;
the deeper the hierarchy is, the longer the associated link is, and the richer the super-edge structure is;
the weighted super-edge structure comprises: a weighted hyper-triangular structure, a weighted hyper-rectangular structure, and a weighted hyper-hybrid structure; different structures have different degrees of influence on the link prediction, so their similarity is expressed as:
S(u,v)=θ1WS1(u,v)+θ2WS2(u,v)+......+θ19WS19(u,v)
wherein theta isiThe weight of the ith weighted super-edge structure is obtained by training through a gradient descent method; the parameter updating process comprises the following steps:
Figure GDA0003149903950000061
wherein, thetai-oldRepresenting the weight, θ, before iterative trainingi-newRepresenting the weight after iterative training, lambda represents the learning step length, and y represents whether a link exists between users; when the variation value of each parameter is less than a certain threshold value, the updating of the parameters is converged to obtain an optimal parameter set theta+And finally using the optimal parameter set theta+Predicting the link relation among users, when the y value is 1, considering that the link among the users exists, otherwise, considering that the link among the users does not exist, and the definition formula is as follows:
Figure GDA0003149903950000062
Figure GDA0003149903950000063
the invention has the beneficial effects that: according to the method, various incidence relations among nodes can be effectively mined through the weighted super-edge structure, the problem of sparsity in a network can be solved, the noise immunity and stability of a model can be improved, and the prediction accuracy is greatly improved.
Drawings
In order to make the object, technical scheme and beneficial effect of the invention more clear, the invention provides the following drawings for explanation:
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a hyper-network model based on spatio-temporal relationships in LBSN;
FIG. 3 is a two-layer hyper-network model.
FIG. 4 is a three-tier hyper-network model.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1 and 2, the method includes: the LBSN super-network link prediction method based on the space-time relationship comprises four modules of a data acquisition module, a super-network model construction, network side weight definition and quantification.
The detailed implementation of the present invention is described in detail below.
S1: a data source is acquired. The acquired data is yelp data, and the data is an open data set of yelp websites. The obtained data content mainly comprises friend relationships among users, comments and scores of the users to the shops, longitude and latitude of the shops, belonged categories and the like.
S2: and constructing a hyper network model. In the social network based on the position, the establishment of the link is influenced by a plurality of factors, such as a time factor, a position factor, a social factor and the like. The invention integrates and runs through a plurality of factors by a method for constructing a hyper-network model and is applied to link prediction. The system is divided into four layers, namely a space-time layer, a user layer, a position layer and a category layer. The specific scheme is as follows:
s21: and constructing space-time nodes. A spatio-temporal node is defined as a spatio-temporal node if two or more users have commonly visited a location for a particular period of time. As can be seen, the spatio-temporal nodes reflect the interest preferences of a user at a particular location at a particular time, which are more likely to reflect the similarity between users than if two users had visited a location in common.
S22: and constructing a spatio-temporal-user-position-category four-layer hyper-network model. The method is mainly divided into a space-time subnet, a social subnet, a location subnet and a category subnet. The association relationship between the four layers of sub-networks can be summarized in that users can visit some interest points under some types according to own interest preference, check in, comment and score the interest points, and if the users have special interest preference in a specific time, the users can be associated with the same spatio-temporal node. So far, the construction of the four-layer sub-network under the social network based on the position is completed, and then the edge weight of the hyper-network is defined and quantified.
S3: hyper-network edge weights are defined and quantized. Because four types of nodes and ten types of edges exist in the four-layer hyper-network model, the edges in the sub-network are four, and the edges between the sub-networks are six, the invention defines the edge weight in the network by different methods, and the four methods are specifically divided into four types: based on user influence, based on implicit associations between locations, based on user preferences and based on node recency. The method comprises the following specific steps:
s31: user-to-user weights are leveraged by user influence. In a location-based social network, the influence of each user is different. If a friend has extremely low influence on us, it is difficult for us to take certain actions and contact with other people through the friend. Therefore, defining and quantifying the user-user edge weights through the influence of the user is one of the feasible methods for improving model interpretability. The invention divides the user influence into the individual influence of the user and the influence among the users, and measures the influence through the following network and the following behavior respectively.
A follow-up behavior is defined. If the user v performs a check-in at a place where the friend u checks in, the user v is considered to generate a following behavior for the user v, and a v-to-u directed edge is generated correspondingly.
Definition following network Gf=(Vf,Ef). Wherein G isfRepresenting a directed network formed by following behaviors, VfRepresenting users in a follows network, EfRepresenting the directed edges that follow the behavior.
S311: user-individual influence. Individual influence of the user IuThe method is used for measuring the influence of the user on other users in the network due to the self behavior, and is a measurement method of a global angle. Since the individual influence changes dynamically with time, some users may be initially active, and their sign-in behavior generates many trailing edges, forming a large influence, and then the influence gradually decreases to a stable value when the activity decreases. Therefore, in order to accurately measure the influence of the user, we should consider the time factor.
The invention considers the influence of users in different time periods by dividing the time slices, and forms the following behavior of the users in each time slice into a corresponding following network
Figure GDA0003149903950000081
Where S time slices are divided, the user' S final individual influence is contributed by the individual influence in each time slice, and the time slices that are further away from the current time instant have their individual influences attenuated more.
In consideration of existence of isolated nodes in the network, the method adopts a LeaderRank algorithm to solve individual influence of the user. The leader rank algorithm solves the problem that the sorting result is not unique due to the isolated nodes in the Pagerank by introducing the group node, has high convergence speed and strong noise resistance, and can be well applied to the method. The iterative formula of the algorithm is described as follows:
Figure GDA0003149903950000082
wherein
Figure GDA0003149903950000083
Representing the out-degree of user v. In a steady state, the leader rank uniformly distributes the scores of the group nodes to all other nodes, so the final score of a Node can be expressed as:
Iu=Iu(td)+Ig(td)/N (2)
wherein Ig(td) The scores of the group node in the stable state, and N is the total number of users.
Since the influence of the user decreases with time, the decay function is defined as:
Wu(ti)=exp(-ln2×(tc-ti)/tm) (3)
wherein t iscIndicates the current time, tiDenotes the ith time slice, tmIndicating a half-life of reduced influence.
User u total value I of individual influence at current momentuComprises the following steps:
Figure GDA0003149903950000084
wherein Iu(ti) Denotes the t-thiIndividual influence of individual time slice users u.
S312: inter-user influence. Influence between users Ii(u, v) is a method for measuring the influence of the user u on the user v, and is a method for measuring the local view angle. Generally, the greater the number of interactions between two users, the greater the impact they will have. The following behavior is regarded as the interaction among users in the invention, and the influence among the users is measured according to the interaction.
Proposing a following location ratio IpAnd follow sign-in ratio IcThese two metrics:
Figure GDA0003149903950000091
Figure GDA0003149903950000092
where M (v, u) represents the number of check-in places, positions, where user v follows user uuRepresents the total number of check-in locations for user u, K (v, u) represents the total number of check-ins for user v to follow user u, CheckinuRepresenting the total number of check-ins for user u.
From the above analysis, the user influence I (u, v) is:
Figure GDA0003149903950000093
based on the user influence, the user-user edge weight can be quantized, and for the node pair u and v, if the user influence of u on v is high, the corresponding edge weight should be high, so the edge weight between the user and the user is quantized as follows:
Figure GDA0003149903950000094
where w (u, w) e S represents the neighbor nodes of user u in the social subnet. I (u, v) represents the magnitude of the influence between the user and its social subnet neighbor nodes.
S32: and defining and quantizing the position-position edge weight and the category-category edge weight through the hidden incidence relation. If a user visits two locations continuously within a certain time threshold, then there is a certain implicit relationship between the two locations, and similarly, if two categories appear in multiple locations at the same time, there is a certain implicit relationship between the two categories, for example, it can be found from data statistics that categories Festivals and Arts & entitation often appear in category attributes of multiple locations, which implicitly indicates that there is a certain correlation between the two categories. Based on the above considerations, the edge weight value between positions and the edge weight value between categories are defined by the following formula.
Figure GDA0003149903950000095
Where Max | WpI is the maximum of the number of times two locations are associated, w (p, p ') is the number of times locations p and p' are associated by the user,
Figure GDA0003149903950000096
the threshold value of the association times can be adjusted according to network characteristics and experimental performance.
Figure GDA0003149903950000101
Where | P (c, c ') | represents the number of locations that belong to both c and category c', Max | PcL represents the maximum number of places belonging to both type c and some other type.
S33: user-location edge weights are defined and quantified by user preferences. In the social network based on the position, the scoring attribute of the position by the user can intuitively reflect the preference degree of the user to the position. For example, user u1At p1,p2,p3Three locations were scored and given a score value of 5, 3, 1, respectively, and if the user's scoring attribute for this location was not taken into account, then each user-location edge was assigned 1/3, but in practice this would be inaccurate, because if user u were to be present1To p3Is given a score of 1, indicating that the user is dissatisfied with the place, at which time u should be increased1-p1By decreasing u1-p3The edge weight of (2). It can be seen from this example that the user should be given a higher weight to the preferred high position, and in the present invention, the user-position edge weight is modified by an exponential function:
Figure GDA0003149903950000102
where r (u, p) is the score of user u at location p.
S34: the remaining edge weights are defined and quantized by node out-degree.
S4: LBSN hyper-network link prediction method based on space-time relationship. Through the process of S1-S3, a weighted hyper-network model is constructed, and a weighted hyper-edge structure is constructed for link prediction based on the model.
S41: the super-edge and super-edge weights are defined. In the lbs n hyper-network model, there are multiple types of hyper-edges, for example, an edge formed between a user node and a location node is a hyper-edge, and an edge formed between a user node and a spatio-temporal node is also a hyper-edge, and since different hyper-edges contain different numbers of heterogeneous nodes, three types of hyper-edges are defined.
Class I super edge SEI. The class-one super edge refers to a super edge only containing one type of nodes, and belongs to a special class of super edges in a super network. For example, a super edge formed by two user nodes is called a super edge class, and the super edge class indicates the association relationship between nodes in the same-layer subnet, for example, for a social subnet, it refers to the friend relationship between users.
Class II supercede SEII. The class II super-edge refers to an edge formed by node pairs between two adjacent layers of subnets and is characterized by only comprising two kinds of heterogeneous nodes. For example, the super-edge formed between the user and the position node or between the user and the spatio-temporal node is called as a class two super-edge.
Class III supercede SEIII. The three types of super edges refer to edges formed by three adjacent layers of subnets and are characterized by only comprising three types of heterogeneous nodes. For example, the super edges formed by the user, the location and the category nodes are called three types of super edges.
FIG. 3 is a diagram of two adjacent sub-networks, as shown in FIG. 3 and FIG. 4, wherein (T)1-T2) Form a class of super-edge, denoted as SEI(T1-T2)。(U1-T1) Form a two-class super edge, denoted as SEII(U1-T1),(U3-T1) Also forms a two-class over edge, denoted as SEII(U3-T1). FIG. 4 is an adjacent three-tier network, where (U)1-P1-C1) Form a three-class super edge, and is marked as SEIII(U1-P1-C1),(U3-P3-C1) Form a three-class super edge, and is marked as SEIII(U3-P3-C1)。
The super-edge weight. The super-edge weight refers to the weight of each super-edge, and can be calculated by the edge weight included in the super-edge. For example, class two super edge SE in FIG. 3II(U1-T1) Over-edge weight of
Figure GDA0003149903950000111
Three classes of hyperedge weights in FIG. 4
Figure GDA0003149903950000112
S41: and (4) hyperlink prediction. Based on the defined three types of hyper-edges, a weighted hyper-edge structure is provided, and the hyperlink prediction problem between users is solved through the weighted hyper-edge structure. In the conventional method, the association degree between nodes is mainly calculated by weighting a hyper-triangle structure, and the main idea is to associate two hyper-edges through co-occurrence nodes between different hyper-edges, so that the hyper-triangle structure is obtained and used for measuring the similarity between the nodes. The method is suitable for the heterogeneous network, can simply and efficiently capture the additional association between two nodes, and improves the prediction accuracy while relieving the data sparsity problem. However, the super network can describe not only the association between the homogeneous nodes but also the association between the heterogeneous nodes, so that the deeper the considered network hierarchy is, the longer the association chain is, the more the fine-grained implicit association between the nodes can be reflected. The invention excavates the implicit semantic relation between nodes by constructing various types of super-edge structures.
S411: a weighted hyper-triangular structure. The method comprises a single-weighted super-triangular structure, a double-weighted super-triangular structure and a large-weighted super-triangular structure. It is defined as follows:
a single weighted hyper-triangular structure. In FIG. 3, the space-time node T1 and the user nodes U1 and U3 can be used to formThe similarity between U1 and U3 is calculated by a single weighted hyper-triangle structure, and the semantic information expressed by the structure is that two users like to move at the same time and in the same position. If the number of the single-weighted super-triangle structures containing U1 and U3 is larger, the weight is larger, the similarity between the single-weighted super-triangle structures is considered to be larger, and the link is more likely to be generated. The super-triangular structure comprises two class-II super-edges SEII(U1-T1) And SEII(T1-U3) The weight of the hyper-triangle structure is the product of the corresponding hyper-edge weights, so the weights are:
Figure GDA0003149903950000113
it is emphasized that the super-edge structures defined by the present invention are all closed-loop structures, and have directionality. Therefore WS3(U1ΔU3)≠WS3(U3ΔU1) The same shall apply hereinafter.
A double weighted hyper-triangular structure. Double triangles are meant to include two consecutive weighted super-triangle structures, e.g. SE in FIG. 3II(U1-T1) And SEII(T1-U2) Form a weighted super-triangular structure, SEII(U2-T2) And SEII(T2-U3) And a weighted hyper-triangle structure is formed, the two weighted hyper-triangle structures can be combined into a double weighted hyper-triangle structure for measuring the similarity between the U1 and the U3, and the semantic information of the structure is that the users U1 and U3 both like the same position activity at the same time as the user U2. The dual weighted hyper-triangle structure weight is the product of two corresponding single weighted hyper-triangle structure weights, so the weights are:
WS6(U1ΔΔU3)=WS3(U1ΔU2)·WS3(U2ΔU3) (14)
a heavily weighted hyper-triangular structure. The large-weighted super-triangular structure is a triangular structure consisting of two three types of super-edges. For example, in FIG. 4, over edge SEIII(Ui-Pj-Ck) And over edge SEIII(Ui-Pj-Ck) A heavily weighted hyper-triangular structure is formed, the semantic information of which is that two users have the same category of preference. The weight is the product of two three types of excess edge weights, so the weight is:
Figure GDA0003149903950000121
s412: a weighted hyper-rectangular structure. In FIG. 3, over edge SEII(U1-T1),SEI(T1-T2),SEII(T2-U3) A weighted hyper-rectangle structure can be formed, the weighted hyper-rectangle structure comprises two nodes of U1 and U3, and can be used for measuring the similarity between U1 and U3, the semantic information of the structure is that users U1 and U3 like to be active at two related spatio-temporal nodes, and the weight is the product of the corresponding hyper-edge weights:
Figure GDA0003149903950000122
s413: a weighted super-hybrid structure. The system comprises a weighted super-hybrid I structure and a weighted super-hybrid II structure. It is defined as follows:
weighted super-hybrid I structure: the mixed I structure is formed by adding a class of super edges on the basis of a single triangular structure. For example, FIG. 1 by super edge SEII(U1-T1),SEII(T1-U2),SEI(U2-U3) The formed structure belongs to a mixed I structure, and the structure expresses that a friend U2 with semantic information of U3 likes to move at the same position and the same time as U1. The weight is the product of the corresponding single weighted super-triangle structure weight and a class of super-edge weight:
Figure GDA0003149903950000123
weighted super-hybrid II architecture: the mixed II structure is a structure formed by adding a class of super edges on the basis of a rectangular structure. For example, FIG. 1 by super edge SEII(U1-T1),SEI(T1-T2),SEII(U2-T2),SEI(U2-U3) The structure of the composition belongs to a mixed II structure. The weight is the product of the corresponding weighted hyper-rectangular structure weight and a class hyper-edge weight:
Figure GDA0003149903950000124
it can be seen that the deeper the hierarchy, the longer the associated link, and the richer the super-edge structure. The present invention lists 19 of these effective weighted super-edge structures, as shown in table 1.
Figure GDA0003149903950000125
Figure GDA0003149903950000131
From the above analysis, it can be seen that different weighted super-edge structures have different semantic information, for example, the S2 structure embodies the meaning of location entropy, which is that if two users have checked-in together in a place where many people have gone, it is difficult to predict that there is a friend relationship between the two people, because it may be a coincidence, but if two users often check-in a place where few people have gone, it indicates that there may be a certain relationship between them. The popularity of a location also has an impact on link prediction, which can be effectively captured by the S2 structure. And S3 can mine the user' S short-term interest, which is interpreted herein as the interest the user may have only during a certain period of time, such as 7 pm going to a movie theater every friday. The interest only occurs in a specific time period, but the personality of the user can be better reflected.
Since different structures have different degrees of influence on the link prediction, their similarity can be expressed as:
S(u,v)=θ1WS1(u,v)+θ2WS2(u,v)+......+θ19WS19(u,v) (19)
wherein λiThe weight of the ith weighted super-edge structure can be obtained by training through a gradient descent method. The parameter updating process is as follows:
Figure GDA0003149903950000141
where λ represents the learning step size and y represents whether a link exists between users. When the variation value of each parameter is less than a certain threshold value, the updating of the parameters is converged to obtain an optimal parameter set theta+. Finally, the optimal parameter set theta is utilized+Predicting the link relation among users, when the y value is 1, considering that the link among the users exists, otherwise, considering that the link among the users does not exist, and the definition formula is as follows:
Figure GDA0003149903950000142
Figure GDA0003149903950000143
according to the invention, time factors are integrated into the super-network model by introducing space-time nodes, then a four-layer weighted super-network model is constructed based on user influence, hidden association relation, user preference and node degree information, so that the interpretability of the model is improved, and finally, the semantic relation between users is mined through various weighted super-edge structures, so that the problem of data sparsity is solved, and meanwhile, the prediction accuracy is improved. It is emphasized that the present invention is an effective method for weighted network link prediction, which can solve the link prediction problem in weighted network well.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (4)

1. A LBSN hyper-network link prediction method based on a space-time relationship is characterized in that: the method comprises the following steps:
s1: acquiring a data source; acquiring data information with high accuracy and reliability from the existing large social network platform; the acquired data content comprises friend relationships among users, comments and scores of the users on the positions, comment time, longitude and latitude of the positions and the types of the positions;
s2: constructing a hyper-network model; the method comprises the steps of constructing a space-time sub-network, a social sub-network, a position sub-network and a category sub-network, wherein the space-time sub-network is constructed by using the sign-in time of a user to a position and is used for mining the space-time similarity between the users;
s3: defining and quantizing the hyper-network edge weight; defining edge weight values in a hyper-network model through four different modes of user influence, implicit association relation, user preference and node degree information;
s4: through the process of S1-S3, a weighted super-network model is constructed, based on the model, various types of weighted super-edge structures are constructed firstly, different semantic relationships among users are mined through different structures, finally, model parameters are trained through a gradient descent method, the link relationship in the network is predicted, and the constructed weighted super-network model is divided into a time space layer, a user layer, a position layer and a category layer.
2. The LBSN hyper-network link prediction method based on the spatiotemporal relationship as claimed in claim 1, wherein: the step S2 specifically includes:
extracting a friend relationship list of a user, a check-in relationship list of the user and the category information of the position through the original data information;
s21: extracting time-space nodes through the sign-in time of the user; the spatio-temporal node means that if two or more users commonly visit a certain position in a certain time period, the position is defined as a spatio-temporal node; the spatio-temporal nodes reflect the interest preference of a user at a specific position at a specific time;
s22: constructing a space-time-user-position-category four-layer hyper-network model; the method comprises the following steps of dividing the method into a space-time subnet, a social subnet, a position subnet and a category subnet; the incidence relation among the four layers of sub-networks is that users can visit some interest points under some types according to own interest preference, check in, comment and score the interest points, and if the users have special interest preference in a specific time, the users can be associated by the same time-space node; so far, the construction of the four-layer sub-network under the social network based on the position is completed.
3. The LBSN hyper-network link prediction method based on the spatiotemporal relationship as claimed in claim 1, wherein: the step S3 specifically includes:
s31: the user-user weight is strengthened through user influence; in a location-based social network, the influence of each user is different; dividing the influence of the users into individual influence of the users and influence among the users, and measuring the influence through a following network and a following behavior respectively;
defining the following behavior: if the user v signs in the place where the friend u signs in, the user v is considered to generate a following behavior for the user u, and a directed edge from v to u is correspondingly generated;
definition following network Gf=(Vf,Ef): wherein G isfRepresenting a directed network formed by following behaviors, VfRepresenting users in a follows network, EfRepresenting directed edges resulting from the follow-up behavior;
s311: individual influence of the user Iu: the method is used for measuring the influence of the user on other users in the network due to the self behavior; by passingThe manner in which the time slices are divided takes into account different time periods
Figure FDA0002974108890000021
The following behavior of the user in each time slice forms a corresponding following network by the influence of the user, and S time slices are divided, tsFor the s-th time slice, the user's final individual influence is contributed by the individual influence in each time slice, and the time slices that are further away from the current time have their individual influences attenuated more;
considering existence of isolated nodes in the network, the user individual influence is solved by adopting a LeaderRank algorithm, and an iterative formula is as follows:
Figure FDA0002974108890000022
wherein N isuA neighbor node representing user u is shown,
Figure FDA0002974108890000023
representing the out-degree of user v; in a stable state, the leader rank uniformly distributes the scores of the group nodes to all other nodes, and the final scores of the nodes are expressed as:
Iu=Iu(td)+Ig(td)/N
wherein Ig(td) The scores of the GroundNode in a stable state, and N is the total number of users;
since the influence of the user decreases with time, the decay function is defined as:
Wu(ti)=exp(-ln2×(tc-ti)/tm)
wherein t iscIndicates the current time, tiDenotes the ith time slice, tmHalf-life representing a decrease in potency;
user u total value I of individual influence at current momentuComprises the following steps:
Figure FDA0002974108890000024
wherein Iu(ti) Denotes the t-thiIndividual influence of individual time slice users u;
s312: influence between users: influence between users Ii(u, v) measuring the influence of the user u on the user v, regarding the following behavior as the interaction among the users and measuring the influence among the users;
proposing a following location ratio IpAnd follow sign-in ratio IcThese two metrics:
Figure FDA0002974108890000025
Figure FDA0002974108890000026
where M (v, u) represents the number of check-in places, positions, where user v follows user uuRepresents the total number of check-in locations for user u, K (v, u) represents the total number of check-ins for user v to follow user u, CheckinuRepresenting the total number of check-ins of user u;
the user influence I (u, v) is:
Figure FDA0002974108890000031
based on the user influence, quantizing the user-user edge weight, and for the node pair u and v, if the user influence of u on v is high, the corresponding edge weight should be high, and the edge weight between the user and the user is quantized as:
Figure FDA0002974108890000032
wherein w (u, w) is the neighbor node of the user u in the social subnet in the S, and I (u, v) represents the influence between the user and the neighbor node of the social subnet;
s32: defining and quantizing a position-position edge weight and a category-category edge weight through a hidden incidence relation;
defining an edge weight value between positions and an edge weight value between categories:
Figure FDA0002974108890000033
wherein geodesist (p, p ') denotes the distance between positions p and p', Max | WpI is the maximum of the number of times two locations are associated, w (p, p ') is the number of times locations p and p' are associated by the user,
Figure FDA0002974108890000034
is a correlation time threshold;
Figure FDA0002974108890000035
where | P (c, c ') | represents the number of locations that belong to both c and category c', Max | PcL represents the maximum value of the number of places belonging to the type c and other types at the same time;
s33: defining and quantifying a user-location edge weight by user preference; in the social network based on the position, the scoring attribute of the user to the position can intuitively reflect the preference degree of the user to the position; and (3) correcting the user-position edge weight value through an exponential function for higher weight values of positions with high preference of the user:
Figure FDA0002974108890000036
wherein r (u, p) is the score of user u at location p;
s34: the remaining edge weights are defined and quantized by node out-degree.
4. The LBSN hyper-network link prediction method based on the spatiotemporal relationship as claimed in claim 1, wherein: the step S4 specifically includes:
s41: defining a super edge and super edge weight;
three types of super edges are defined:
class I super edge SEI: the method comprises the following steps that a super edge only containing one type of nodes belongs to a special super edge in a super network;
class II supercede SEII: the node pair between two adjacent layers of subnets forms an edge, and is characterized by only comprising two heterogeneous nodes;
class III supercede SEIII: the node is an edge formed by three adjacent layers of subnets and is characterized by only comprising three kinds of heterogeneous nodes;
the excess edge weight refers to the weight of each excess edge, and is obtained by calculating the edge weight contained in the excess edge;
s42: hyperlink prediction: based on the defined three types of super edges, a weighted super edge structure is provided, and the hyperlink prediction problem between users is solved through the weighted super edge structure; mining implicit semantic relations among nodes by constructing various types of super-edge structures;
s421: the weighting super-triangular structure comprises a single weighting super-triangular structure, a double weighting super-triangular structure and a large weighting super-triangular structure;
single weighted super triangle structure: calculating the similarity between user nodes through a single-weighted hyper-triangular structure formed by the space-time nodes and the user nodes, and expressing that two users like activities at the same time and at the same position; the defined super-edge structures are all closed-loop structures and have directivity;
double-weighted super-triangular structure: the finger comprises two continuous weighted super-triangular structures;
the heavy-weighted super-triangular structure: the finger is a triangular structure consisting of two three types of overedges;
s422: weighted hyper-rectangular structure: the user node likes to be active at two related spatio-temporal nodes, and its weight is the product of the corresponding excess weights:
s423: weighted super-hybrid architecture: the system comprises a weighted super-hybrid I structure and a weighted super-hybrid II structure; the definitions are respectively:
weighted super-hybrid I structure: the mixed I structure is formed by adding a class of super edges on the basis of a single triangular structure;
weighted super-hybrid II architecture: the mixed II structure is formed by adding a class of super edges on the basis of a rectangular structure;
the deeper the hierarchy is, the longer the associated link is, and the richer the super-edge structure is;
the weighted super-edge structure comprises: a weighted hyper-triangular structure, a weighted hyper-rectangular structure, and a weighted hyper-hybrid structure; different structures have different degrees of influence on the link prediction, so their similarity is expressed as:
S(u,v)=θ1WS1(u,v)+θ2WS2(u,v)+......+θ19WS19(u,v)
wherein theta isiThe weight of the ith weighted super-edge structure is obtained by training through a gradient descent method; the parameter updating process comprises the following steps:
Figure FDA0002974108890000051
wherein, thetai-oldRepresenting the weight, θ, before iterative trainingi-newRepresenting the weight after iterative training, lambda represents the learning step length, and y represents whether a link exists between users; when the variation value of each parameter is less than a certain threshold value, the updating of the parameters is converged to obtain an optimal parameter set theta+And finally using the optimal parameter set theta+Predicting the link relation among users, when the y value is 1, considering that the link among the users exists, otherwise, considering that the link among the users does not exist, and the definition formula is as follows:
Figure FDA0002974108890000052
Figure FDA0002974108890000053
CN201711182961.9A 2017-11-23 2017-11-23 LBSN (location based service) hyper-network link prediction method based on space-time relationship Active CN107784124B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711182961.9A CN107784124B (en) 2017-11-23 2017-11-23 LBSN (location based service) hyper-network link prediction method based on space-time relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711182961.9A CN107784124B (en) 2017-11-23 2017-11-23 LBSN (location based service) hyper-network link prediction method based on space-time relationship

Publications (2)

Publication Number Publication Date
CN107784124A CN107784124A (en) 2018-03-09
CN107784124B true CN107784124B (en) 2021-08-24

Family

ID=61430658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711182961.9A Active CN107784124B (en) 2017-11-23 2017-11-23 LBSN (location based service) hyper-network link prediction method based on space-time relationship

Country Status (1)

Country Link
CN (1) CN107784124B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109086373B (en) * 2018-07-23 2021-01-12 东南大学 Method for constructing fair link prediction evaluation system
CN109635989B (en) * 2018-08-30 2022-03-29 电子科技大学 Social network link prediction method based on multi-source heterogeneous data fusion
US11321632B2 (en) * 2018-11-21 2022-05-03 Paypal, Inc. Machine learning based on post-transaction data
CN110134883B (en) * 2019-04-22 2023-06-06 哈尔滨英赛克信息技术有限公司 Heterogeneous social network location entity anchor link identification method
CN110851491B (en) * 2019-10-17 2023-06-30 天津大学 Network link prediction method based on multiple semantic influence of multiple neighbor nodes
CN111368788B (en) * 2020-03-17 2023-10-27 北京迈格威科技有限公司 Training method and device for image recognition model and electronic equipment
CN112765754B (en) * 2020-12-31 2023-12-22 西安电子科技大学 Superside-based time evolution graph design method suitable for satellite-to-ground network
CN113297500B (en) * 2021-06-23 2023-07-25 哈尔滨工程大学 Social network isolated node link prediction method
CN115242659A (en) * 2022-08-09 2022-10-25 安徽大学 High-order collective influence-based hyper-network node analysis method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559407A (en) * 2013-11-14 2014-02-05 北京航空航天大学深圳研究院 Recommendation system and method for measuring node intimacy in weighted graph with direction
CN107085616A (en) * 2017-05-31 2017-08-22 东南大学 A kind of false comment suspected sites detection method excavated based on multidimensional property in LBSN
CN107145527A (en) * 2017-04-14 2017-09-08 东南大学 Link prediction method based on first path in alignment isomery social networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9083757B2 (en) * 2012-11-21 2015-07-14 Telefonaktiebolaget L M Ericsson LLP Multi-objective server placement determination

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559407A (en) * 2013-11-14 2014-02-05 北京航空航天大学深圳研究院 Recommendation system and method for measuring node intimacy in weighted graph with direction
CN107145527A (en) * 2017-04-14 2017-09-08 东南大学 Link prediction method based on first path in alignment isomery social networks
CN107085616A (en) * 2017-05-31 2017-08-22 东南大学 A kind of false comment suspected sites detection method excavated based on multidimensional property in LBSN

Also Published As

Publication number Publication date
CN107784124A (en) 2018-03-09

Similar Documents

Publication Publication Date Title
CN107784124B (en) LBSN (location based service) hyper-network link prediction method based on space-time relationship
Nettleton Data mining of social networks represented as graphs
Jain et al. Discover opinion leader in online social network using firefly algorithm
US20210141814A1 (en) Concept-level user intent profile extraction and applications
Logesh et al. A personalised travel recommender system utilising social network profile and accurate GPS data
US10972559B2 (en) Systems and methods for providing recommendations and explanations
Yin et al. Modeling location-based user rating profiles for personalized recommendation
Zhang et al. Combining latent factor model with location features for event-based group recommendation
US11113745B1 (en) Neural contextual bandit based computational recommendation method and apparatus
US20150134402A1 (en) System and method for network-oblivious community detection
Al Hasan Haldar et al. Location prediction in large-scale social networks: an in-depth benchmarking study
Wen et al. Exploring social influence on location-based social networks
Ding et al. Predicting the attributes of social network users using a graph-based machine learning method
Bagci et al. Random walk based context-aware activity recommendation for location based social networks
Malhotra et al. Supervised-learning link prediction in single layer and multiplex networks
Gu et al. Context aware matrix factorization for event recommendation in event-based social networks
Albinali et al. The roles of social network mavens
Njoo et al. Distinguishing friends from strangers in location-based social networks using co-location
Shafik et al. Recommendation system comparative analysis: internet of things aided networks
Hu et al. Learning the strength of the factors influencing user behavior in online social networks
Sun Research on the method of digital media content creation based on the internet of things
Bok et al. Recommending similar users using moving patterns in mobile social networks
Kang et al. Characterizing collective knowledge sharing behaviors in social network
Hai A novel approach for location promotion on location-based social networks
Li Joint modeling of user behaviors based on variable-order additive markov chain for poi recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant