CN112508725B - Community structure-based location awareness influence maximization method - Google Patents

Community structure-based location awareness influence maximization method Download PDF

Info

Publication number
CN112508725B
CN112508725B CN202011404347.4A CN202011404347A CN112508725B CN 112508725 B CN112508725 B CN 112508725B CN 202011404347 A CN202011404347 A CN 202011404347A CN 112508725 B CN112508725 B CN 112508725B
Authority
CN
China
Prior art keywords
user
users
influence
community
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011404347.4A
Other languages
Chinese (zh)
Other versions
CN112508725A (en
Inventor
李晓
彭岩
王洁
赵成安
王康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN202011404347.4A priority Critical patent/CN112508725B/en
Publication of CN112508725A publication Critical patent/CN112508725A/en
Application granted granted Critical
Publication of CN112508725B publication Critical patent/CN112508725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Abstract

The invention discloses a location awareness influence maximization method based on a community structure, which comprises the following steps: step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query; step 2: preprocessing user data in a social network, designing a PR tree index structure according to a check-in record of a user in the social network, and modeling the position preference of the user so as to quickly find a target user for online query; and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user; and 4, step 4: aiming at a target user, a community structure is utilized, and an influence maximization method based on the community structure is designed to find a seed user set, namely an opinion leader set, aiming at specific online query so as to maximize the influence range of the online query in the whole network.

Description

Community structure-based location awareness influence maximization method
Technical Field
The invention is applied to searching opinion leaders in a network in the process of implementing public praise marketing on a social network platform, and belongs to the field of social network analysis and data mining.
Background
The influence maximization method is a basic method for public praise marketing in the social network, and aims to find out an opinion leader in the social network, so that marketing products can be spread in a ten-to-ten and ten-to-hundred manner through the opinion leader, and users in the social network can know and purchase the products. The method for maximizing the influence is specifically described as follows: given a social network, a certain number of influential users (i.e., a set of opinion leaders) are sought, with this set of users as a starting point for the propagation of marketing information, thereby maximizing the propagation of information in the network. Compared with other marketing modes, public praise marketing has the advantages of low cost, high propagation speed, wide propagation range and the like, and is favored by a large number of merchants.
With the popularization of mobile internet and mobile devices, the development of online social networks gradually develops towards diversification, and a location social network appears. In such social networks, users tend to share their own location information. In recent years, there has been an increasing public market demand for products containing location information. For example, a restaurant newly opened in a western style order wants to be publicized and promoted in a public opinion website by means of public praise marketing, the marketing strategy is to provide free meal tickets for a specific number of users, and the users share the own eating experience information in the public opinion, so that the restaurant information is quickly spread in the public opinion website, and finally more users know the restaurant and go to the restaurant to eat. It is a problem to find a group of users of a certain number (called opinion leaders) to be able to make the maximum number of users in the network aware of the restaurant and go to the restaurant for a meal. It is particularly noted that the location preference of the user needs to be taken into account when solving the problem.
Currently, research is considered to integrate position information into the influence maximization problem, and various position perception influence maximization methods are proposed. According to the method, the historical movement track of a user in a social network based on the position is considered, a two-stage information propagation model is provided, and two heuristic algorithms are designed to search for a group of users with the maximum influence. However, it does not consider the interaction effect between selected users, thereby reducing the influence of information on the whole network. For location-based social networks, there is also a search for a group of users with the greatest influence within a certain area. Or the distance between the user and the promoted product is considered, the influence maximization problem of distance perception is defined, and an approximate algorithm is proposed for solving. Aiming at the problem of influence maximization in a social network based on a position, an influence range framework based on multi-objective optimization is provided, and an approximation algorithm based on a greedy algorithm and a particle swarm algorithm based on a heuristic algorithm are provided for solving.
However, the above document assumes that the user has a positional preference only for a certain area, and in reality, the user tends to have a positional preference for a plurality of areas. In addition, most of the literature utilizes a greedy algorithm to solve the problem, which results in low operation efficiency of the problem.
Disclosure of Invention
The invention aims to quickly search a user set with larger influence according to the position preference of a user aiming at a position-perceived marketing product in a social network so as to maximize the diffusion and the propagation of the marketing product in the social network and meet the public praise marketing requirement of a merchant.
In order to solve the problems, the invention defines a new position perception influence maximization problem and provides an efficient method for solving the problem.
The technical scheme of the invention is a location awareness influence maximization method based on a community structure, which comprises the following steps:
step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query;
step 2: preprocessing user data in the social network, and modeling the position preference of a user by utilizing a PR tree index structure according to the sign-in record of the user in the social network so as to quickly find a target user queried on line;
and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user;
and 4, step 4: aiming at a target user, a community structure is utilized, and an influence maximization method based on the community structure is designed to find a seed user set, namely an opinion leader set, aiming at specific online query so as to maximize the influence range of the online query in the whole network.
Further, the problem of maximizing the influence of position sensing is that: for a social network, given a maximum influence tree information propagation model, a user set S composed of k users needs to be searched, so that the influence range of the group of users on a target user in the social network is maximum, wherein the target user of the query is a user set having a position preference for a query area.
Further, in the step 2, a position preference of the user is modeled by using a PR tree index structure, which specifically includes the following steps:
step (2.1) creating a PR tree structure, and creating a PR tree based on the R tree, wherein the nodes of the PR tree store the position preference of a user;
and (2.2) searching the PR tree aiming at the online query, and quickly finding the queried target user through pruning operation.
Further, in the step 2, a seed user set for a specific online query is found by using an influence maximization method based on a community structure, influence of the user is stored by creating an offline index structure, and a seed user is determined based on the offline index method, the community discovery method and a seed user determination method based on the community structure; the offline indexing method comprises the following steps:
creating a community-based influence index for each tree node of the PR tree, wherein the index structure consists of a plurality of influence lists, and each influence list stores the influence of users in one community;
a community-based user index is created for each tree node in the PR tree, with the index structure storing the influence of each user in the community in which they reside.
Further, in the step 3, the community structure of the network is determined by using the structure of the social network and the check-in record of the user, by using a community discovery method based on deep learning, and the seed user is determined by using an offline indexing method, a community discovery method and a seed user determination method based on the community structure; the community discovery method comprises the following steps:
based on the social network structure information, calculating the structure-based closeness between adjacent users by using the number of common neighbors between the users; calculating the closeness based on behaviors between adjacent users by using the sign-in information of the users in the social network; calculating the comprehensive compactness between adjacent users by combining a weight function;
taking a comprehensive compactness matrix among users as input, and designing a community discovery method based on a depth self-encoder to obtain potential low-dimensional representation of the users;
and (4) obtaining the community structure of the social network by using the low-dimensional representation of the user and combining a k-means clustering algorithm.
Further, in the step 4, a seed user set for a specific online query is found by using an influence maximization method based on a community structure, an offline index structure is created by the method to store the influence of the user, and the seed user is determined based on the offline index, a community discovery method and a seed user determination algorithm based on the community structure; the seed user determination algorithm based on the community structure comprises the following steps:
for online query, determining tree nodes which are completely covered by a query region and regions which are not completely covered by the tree nodes by traversing the PR tree; aiming at tree nodes completely covered by the query area, constructing an influence list according to the created offline index, and storing users and influences thereof; constructing an influence list aiming at an area which is not completely covered by the tree nodes; aggregating the above lists, and creating an influence list for each community; finally, constructing a priority queue according to the influence list of each community;
the method comprises the following steps of sequentially and iteratively searching a plurality of seed users by utilizing a plurality of established priority queues and a greedy mode, determining the seed users in the current round according to the states of the users in the priority queues in each round of iteration, selecting the first user with the largest influence in the priority queues as the seed user aiming at the first seed user, and aiming at the subsequent seed users: if the first user state of a certain priority queue is 'invalid', calculating the estimated marginal influence of the first user state, and updating the position of the first user in the queue; if the state is 'estimation', calculating the real marginal influence of the user and updating the position of the user; if its status is "accurate," then the user is the seed user determined for the round.
Has the advantages that:
compared with the existing influence maximization method, the method disclosed by the invention has the following advantages:
1) Compared with the traditional influence maximization method, the method can consider the inquired position information and the position preference of the user, so that the searched seed user set has a better effect and can meet the demand of virus marketing.
2) Compared with the existing method based on the greedy algorithm, the method has higher efficiency and greatly saves the running time.
3) Compared with the existing method based on the heuristic algorithm, the method has better effect, and can greatly improve the influence range of the seed user set in the social network.
4) The method and the device can better balance the efficiency and the effect of the influence maximization method, and can ensure the influence range of the seed user set in the social network while improving the efficiency.
Drawings
FIG. 1 is a schematic diagram of a process framework;
FIG. 2 is a diagram of a PR tree structure;
FIG. 3 shows a PR tree with a node R 4 The community-based influence index of (a);
FIG. 4 is a diagram of a node R in a PR tree 4 Is used to identify the community-based user index.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.
According to an embodiment of the present invention, as shown in fig. 1, a method for maximizing location awareness influence based on a community structure is provided, which includes the following steps:
step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query;
step 2: preprocessing user data in the social network, and modeling the position preference of a user by utilizing a PR tree index structure according to the sign-in record of the user in the social network so as to quickly find a target user for online query;
and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user;
and 4, step 4: aiming at a target user, a community structure is utilized, and an influence maximization method based on the community structure is designed to find a seed user set, namely an opinion leader set, aiming at specific online query so as to maximize the influence range of the online query in the whole network.
The social network is composed of a plurality of communities, the connection among nodes in the communities is relatively tight, and the connection among the communities is relatively sparse; the following describes the definition of the influence maximization problem of location awareness, the PR tree index structure, the community discovery method, and the influence maximization method based on the community structure.
Location-aware impact maximization problem definition
And (3) inquiring the model: for a product to be marketed for online public praise, an online query Q = (R, k) is defined, where R represents the area of the query and k represents the size of the set of seed users to be sought, and the set of users with greater influence is referred to as the set of seed users.
And (3) data model: from the user's check-in record in the social network, the user's location preference for query Q may be expressed as the frequency with which he checked-in within the area.
Location-aware influence maximization problem: for a social network, given a maximum influence tree information propagation model, the problem of maximizing influence of location awareness is defined as finding a set 5 of k users, so that the set of users has the largest influence on target users in the social network, where the target users of the query are a set of users having location preference for a query area, that is:
Figure BDA0002818147450000051
σ RA (T,Q)=∑ v∈V P(T,v)·γ(v,Q) (2)
wherein σ RA (T, Q) represents the influence range of the user set T in the social network, P (T, V) represents the influence probability of the user set T on the user V, the influence probability is calculated by the existing maximum influence tree information propagation algorithm, gamma (V, Q) represents the position preference of the user V on the query Q, and V represents the user set in the social network.
Theorem 1. The problem of maximizing the influence of location awareness is an NP-hard problem, and the influence range σ RA (T, Q) has monotonic sub-mode characteristics.
Based on theorem 1, the traditional greedy algorithm can be expanded to solve the problem. However, the greedy algorithm is run for a long time because it requires traversal to compute the marginal impact of each user. Fast algorithms need to be designed to solve the location awareness impact maximization problem.
PR tree index structure
Since the defined location-aware influence maximization problem takes a set of users having a preference for the query location as target users, solving the location-aware influence maximization problem requires first finding the target users of the query quickly. The PR tree index structure is based on an R-tree, with nodes of the tree storing the user's location preferences. And for the query Q, traversing the whole tree in a depth-first mode from the root node, and quickly filtering out tree nodes which do not intersect with the query region R through a pruning strategy.
In the PR tree, leaf node O has stored thereon some element of the form < PD, M >, where PD is a pointer to document D and M is a minimum bounding rectangle MBR. Let E represent an element of the leaf node O, the document D pointed to by the pointer PD of E contains the following: u represents a user with preference to M, TN records the check-in times of each user in U, and LN records the check-in times of each user in U in M area.
In addition, the leaf node O contains a pointer to a pseudo document, which is formed by the aggregation of the documents of all the elements on O.
In the PR tree, some elements of the form < PC, M > are stored on non-leaf nodes N, where PD is a pointer to its child node N and M is an aggregate of the minimum bounding matrices of all the elements on the child nodes. The non-leaf nodes also contain a pointer to a dummy document that is aggregated from the dummy documents of all the child nodes. FIG. 2 is a PR tree structure containing a pseudo document.
Given a query Q, the following describes how to quickly find a target user by traversing the PR tree and calculating the user's preference for the query. For one tree node B, B.D represents its pseudo document, B.M represents its minimum bounding matrix. Since the target user of query Q is a user with location preferences for query region R,
1. if B.M intersects R as an empty set, none of the users stored in node B are the target users of the query. Thus, node B may be pruned at this point and not traverse all of B's child nodes.
2. If B.M intersects with R,
1) If B.M is completely contained by R, then all users stored on node B are the target users of query Q, at which point it is not necessary to continue traversing the child nodes of B, but only needs to access pseudo document B.D to calculate the location preference of each target user for Q.
2) If B.M is not fully contained by R, then the child nodes of node B need to be visited in turn. In particular, if node B is a leaf node, all elements on it need to be accessed at this time.
The above process is repeated until all target users are found and their location preferences for the query are calculated. In particular, if a target user is stored in multiple tree nodes, the user's location preference for the query is calculated by summing their preferences for each tree node. Through the pruning process, the target user of the query can be quickly found.
Community discovery method based on deep learning
The community discovery method based on deep learning utilizes structural information of a social network and sign-in records of users in the social network to calculate closeness between adjacent users in the social network, further obtains low-dimensional representation of the users by designing a deep self-encoder, and finally obtains a community structure of the social network by utilizing a k-means clustering algorithm. The following respectively introduces compactness calculation among users and a community discovery algorithm based on a depth self-encoder.
1. Closeness calculation between users
1) Structure-based closeness between neighboring users: according to the topological structure information of the social network, the structure-based closeness between the adjacent users is calculated by utilizing the number of common neighbors between the users, and if the two users have more common neighbors, the structure-based closeness between the users is higher. For a pair of adjacent users (u, v), the closeness between them based on the structure is calculated as follows,
Figure BDA0002818147450000071
wherein, F u Representing a set of friends of user u in a social network, F v Representing a set of friends of user v in the social network.
2) Closeness between adjacent users based on behavior: in a social network, a user can check in, and if the two users have similar behaviors, the closeness between the two users is higher. Therefore, the historical check-in behavior records of the users are utilized to calculate the closeness among the users based on the behaviors. For a pair of adjacent users (u, v), the closeness between them based on behavior is calculated as follows,
Figure BDA0002818147450000072
wherein r is u Represents the regional preference vector of user u and r u =(r u1 ,r u2 ,...,r uc ),r uc Represents the ratio of the number of check-ins of user u in area c to the total number of check-ins, r v The same is true. In addition, c areas are obtained by using a k-means clustering algorithm aiming at all check-in places of all users in the social network.
3) Integrated closeness between adjacent users: the compactness based on the structure and the compactness based on the behavior between the adjacent users are aggregated by utilizing the weight factors to obtain the comprehensive compactness between the adjacent users in the following calculation mode,
I uv =W S ·S uv +w B ·B uv
wherein, w S And w B Weight factors representing the organization-based affinity and the behavior-based affinity, respectively, and w S +w B =1。
2. Community discovery algorithm based on depth self-encoder
1) Depth self-coder design: and obtaining the compactness matrix I of all users by utilizing the comprehensive compactness between the adjacent users, taking the matrix as the input of the depth self-encoder, passing through a plurality of hidden layers of the depth self-encoder, and finally obtaining the potential low-dimensional representation of all users through an output layer. Wherein the parameters in the depth autoencoder are obtained by back propagation learning.
2) k-means clustering algorithm: and obtaining K classes as final K communities of the social network by using the potential low-dimensional representation of the user and using a K-means clustering algorithm.
Community structure-based location awareness influence maximization method
The location awareness influence maximization method based on the community structure utilizes community structure information in the social network to approximate the influence of a user in the social network to the influence of the user in the community, so that the solving efficiency is improved. To further improve efficiency, the method creates an offline index structure to store the influence of the user. And introducing an offline index and a seed user determination algorithm based on a community structure respectively at the lower side.
1. Under-wire rope guide
1) Community-based influence indexing: each tree node in the PR tree stores a community-based influence index. Indexing structure for node R in PR tree i Each community C j Corresponding to an influence list I (R) i ,C j ) The influence of each user of the community is stored in the list, and the users are arranged in descending order of influence. Wherein, the tree node R i Upper community C j The user influence in the influence list is calculated as follows:
(1) Find out to belong to community C j And is located at a tree node R i Set of users of
Figure BDA0002818147450000081
(2) Computing can affect a set of users
Figure BDA0002818147450000082
Set of influencers per user in
Figure BDA0002818147450000083
(3) For set of influencers
Figure BDA0002818147450000084
Each user in the group, calculating his set of users in advance
Figure BDA0002818147450000085
The influence of (a) on the magnetic field,
Figure BDA0002818147450000086
wherein, gamma (v, R) i ) Representing user v at a tree node R i The above stored location preferences are derived from the PR tree index structure.
(4) For tree node R i And community C j It has a descending order list of influence I (R) i ,C j ) The list consisting of a plurality of tuples
Figure BDA0002818147450000087
And (4) forming.
For example, FIG. 3 shows node R in a PR tree 4 The community-based influence index.
2) Community-based user indexing: each tree node in the PR tree also stores a community-based user index. For each user u, there is one triple containing multiple triplets
Figure BDA0002818147450000088
Each triplet in the list representing a user u versus a set of users on the node
Figure BDA0002818147450000089
Influence of (2)
Figure BDA00028181474500000810
For example, FIG. 4 shows a tree node R 4 Is used to identify the community-based user index.
2. Seed user determination algorithm based on community structure
Based on the designed offline index structure and greedy algorithm, a seed user determination algorithm based on a community structure is designed, and a seed user set aiming at online query is quickly found in an iterative manner. The seed user determination algorithm based on the community structure comprises two steps of priority queue construction and rapid seed user determination.
1) Priority queue construction
Given a query Q = (R, k), a priority queue is constructed by traversing the nodes of the PR tree from the root node, determining the set of tree nodes that are completely covered by the query region R, i.e., R Q ={R 1 ,R 2 ,...,R i ,...,R r And determine that there are not completely tree nodesCovered region R 0 =R-R Q . Lower pair of R Q And R 0 The following operations were performed, respectively.
Figure BDA0002818147450000091
For R Q
(1) For each tree node R i ∈R Q To, for
Figure BDA0002818147450000092
Selection List I (R) i ,C j ) Wherein m is the number of communities.
(2) For each community C j Polymerizing it in the set R Q List on each tree node in
I(R 1 ,C j ),I(R 2 ,C j ),...,I(R r ,C j ) To obtain the list I (R) Q ,C j )。I(R Q ,C j ) The list is also composed of a plurality of duplets
Figure BDA0002818147450000093
The composition of the components, wherein,
Figure BDA0002818147450000094
set R of tree nodes on behalf of user Q The influence of all users on all tree nodes in the tree is calculated as follows,
Figure BDA0002818147450000095
wherein the content of the first and second substances,
Figure BDA0002818147450000096
represents a pair of regions R Q With location preference and in community C j Is selected.
Figure BDA0002818147450000097
For R 0
(1) Order to
Figure BDA0002818147450000098
Represents to the region R 0 With location preference and in community C j Set of users in (1), order
Figure BDA0002818147450000099
The representation can influence
Figure BDA00028181474500000910
Set of influencers of the user, pair
Figure BDA00028181474500000911
Calculate his pair
Figure BDA00028181474500000912
The influence of the user, namely:
Figure BDA00028181474500000913
(2) For each community C j Creating a plurality of binary groups
Figure BDA00028181474500000914
List of compositions I (R) 0 ,C j )。
Figure BDA00028181474500000915
Finally, for each community C j Aggregation List I (R) Q ,C j ) And I (R) 0 ,C j ) Get list I (C) j ). List I (C) j ) From a plurality of doublets
Figure BDA00028181474500000916
The composition, and:
Figure BDA00028181474500000917
wherein the content of the first and second substances,
Figure BDA00028181474500000918
representatives have both location preferences for query Q and are located in community C j Is selected.
Figure BDA00028181474500000919
In this way, m lists I (C) are obtained for m communities 1 ),I(C 2 ),...,I(C m ). In particular, the users in the m lists are a set of candidate seed users. From these m lists, m priority queues H (C) are created 1 ),H(C 2 ),...,H(C j ),...H(C m ) Priority queue H (C) j ) Containing a plurality of triplets
Figure BDA00028181474500000920
The invalidity represents the state of the user u, and the seed user determining method utilizes the state of the user to quickly find the seed user set by updating the priority queue.
2) Rapid seed user determination method
Based on m priority queues, the method finds k seed users in sequence through a greedy algorithm. In the original conventional algorithm, the marginal influence of each user needs to be calculated in each round (the influence mentioned in the present invention refers to the influence between two adjacent users, for example, the influence of a user a on a user b can represent the probability that b will go to a place if a goes to the place, for example, the probability that a user a posts a post, and the user replies to the post, etc. in the present invention, the influence between users is obtained by the existing widely adopted weighting model, and is calculated by the frequency of interaction between two users in the social network), and then the user with the largest marginal influence is selected as the seed user selected in the current round. But the process is particularly inefficient. Thus, in the designed fast seed user selection method, the estimation of the marginal influence of partial candidate seed users is calculatedValues to improve the efficiency of finding seed users. In each iteration, since users with larger estimated marginal influence tend to have larger true marginal influence, the method preferentially calculates the true marginal influence of users with larger estimated marginal influence. In addition, each queue H (C) is updated during the update of the priority queue j ) Each user u may contain three states: invalid, estimated and exact, dynamically updating the state of the user as required, and quickly finding out the seed user set through the stored state of the user.
The "invalid" state represents that the user u has stored the influence as its initial influence, i.e., the influence of the user u is stored as the initial influence
Figure BDA0002818147450000101
The 'estimation' state represents that the influence stored by the user u is the estimated marginal influence and is calculated by a subsequent quick estimation method; the "accurate" state represents the marginal influence at which the influence stored by the user u is true, and is calculated by the following formula:
Δσ RA (u|S j ,Q)=σ RA ({u}∪S j ,Q)-σ RA (S j ,Q) (5)
wherein S is j Representative Community C j The existing seed user set is in community C j
Next, how to quickly find the seed user set by updating the priority queue is described in detail.
(1) A first seed user is determined. For the initial m priority queues, the first user of each queue is selected, m users are found, and the stored influence of these users is compared, i.e.
Figure BDA0002818147450000102
The user with the greatest influence is selected as the first seed user and removed in the priority queue.
(2) Subsequent seed users are determined.
Since there may be a cross between the seed usersThe influence of each other causes that when selecting the subsequent seed users, the interaction influence of the subsequent seed users and the current seed user set needs to be considered, and each round ensures that the user with the minimum interaction influence (namely, the user with the maximum marginal influence) is selected as the selected user in the round. At the beginning of each round, the status of all users is reset to invalid. For each community C j Let I u Representing user u at C j Set of users that can be affected in, order
Figure BDA0002818147450000103
Representing the current set S of seeds j The set of users that can be affected.
In each round, the first user in the m priority queues is selected, their influence is compared, and the user u with the greatest influence is found t And the queue H (C) in which he is located t ) If, if
Figure BDA0002818147450000104
Explain user u t With the current set S of seed users in the community t Without interaction, then u t The seed user to be searched is selected; if it is not
Figure BDA0002818147450000105
Indicate user u t With the current set S of seed users in the community t With interactive influence, then:
■ If u is t Is invalid, user u is assigned t Is updated to an estimatized "estimate" and its impact is updated to the estimated marginal impact
Figure BDA0002818147450000111
Theorem 2 below will describe how to quickly calculate the estimated marginal impact.
■ If u is t Is an estimatized "estimate", then user u t Is updated to exact and his true marginal influence Δ σ is calculated by equation (5) RA (u t |S t ,Q)。
■ If u is t Is exact, then user u t The seed user sought for that round is added to the set of seed users and removed from the priority queue.
And (3) sequentially and iteratively searching for seed users by circularly executing the steps (1) and (2) until k seed users are found.
Theorem 2 given query Q, for Community C j User u in (1), which is in the current set of seed users S j The lower marginal influence upper bound value (i.e. the estimated marginal influence) is calculated in such a way that,
Figure BDA0002818147450000112
wherein the content of the first and second substances,
Figure BDA0002818147450000113
from the influence of user u stored in the priority queue,
Figure BDA0002818147450000114
the delegate has a location preference for query Q and is in community C j Is selected.
Through the formula (6), the current seed user set S of u can be quickly estimated j The lower margin influences the upper bound of force.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims (3)

1. A location awareness influence maximization method based on a community structure is characterized by comprising the following steps:
step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query; the problem of maximizing the influence of position perception is as follows: aiming at a social network, a maximum influence tree information propagation model is given, a user set S consisting of k users needs to be searched, so that the influence range of the group of users on a target user in the social network is maximum, wherein the target user to be inquired is a user set with position preference on an inquiry area;
and 2, step: preprocessing user data in the social network, and modeling the position preference of a user by utilizing a PR tree index structure according to the sign-in record of the user in the social network so as to quickly find a target user queried on line; in the step 2, a PR tree index structure is used for modeling the position preference of the user, and the method specifically comprises the following steps:
step (2.1), creating a PR tree structure, and creating a PR tree based on the R tree, wherein the nodes of the PR tree store the position preference of a user;
step (2.2), aiming at online query, searching a PR tree, and quickly finding a target user for query through pruning operation; the method specifically comprises the following steps:
(2.2.1) if the intersection of B.M and R is an empty set, none of the users stored in node B are the target users of the query; therefore, node B may be pruned at this point and not traverse all child nodes of B;
(2.2.2.) if B.M intersects with R,
1) If B.M is completely contained by R, then all users stored on node B are target users for query Q, at this time, it is only necessary to access pseudo document B.D to calculate the position preference of each target user for Q without going on traversing the child nodes of B;
2) If B.M is not fully contained by R, then the child nodes of node B need to be visited in turn; if node B is a leaf node, all elements on it need to be accessed at this time;
repeating the above processes 1) -2) until all target users are found and the position preference of the target users to the query is calculated; if a certain target user is stored in a plurality of tree nodes, the position preference of the user to query is obtained by summing the preference of the user to each tree node, and the queried target user is quickly found through a pruning process;
and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user; in the step 3, a community structure of the social network is obtained by using a deep learning method according to the social network structure and the check-in record of the user, and the method specifically comprises the following steps:
step (3.1) calculating the structure-based closeness between adjacent users according to the social network structure;
step (3.2) calculating the closeness based on the behaviors between the adjacent users according to the sign-in records of the users;
step (3.3) designing a weight function, and comprehensively calculating the compactness between adjacent users;
step (3.4) designing a depth self-encoder according to the mixed influence force between adjacent users, and calculating the low-dimensional vector representation of the users;
step (3.5) obtaining a community structure of the social network by using the low-dimensional representation of the user and a k-means clustering algorithm;
the closeness between the users is calculated as follows:
1) Structure-based closeness between neighboring users: according to the topological structure information of the social network, the structure-based closeness between the adjacent users is calculated by utilizing the number of common neighbors between the adjacent users, if the two users have more common neighbors, the structure-based closeness between the two users is higher, for a pair of adjacent users (u, v), the structure-based closeness between the two users is calculated as follows,
Figure FDA0003946901720000021
wherein, F u Representing a set of friends of user u in a social network, F v Representing a set of friends of user v in the social network;
2) Closeness between adjacent users based on behavior: in the social network, users can perform check-in behaviors, if two users have similar behaviors, the closeness between the two users is higher, the closeness based on the behaviors between the two users is calculated by using the historical check-in behavior records of the users, for a pair of adjacent users (u, v), the closeness based on the behaviors between the two users is calculated in the following way,
Figure FDA0003946901720000022
wherein r is u Represents the regional preference vector of user u and r u =(r u1 ,r u2 ,…,r uc ),r uc Represents the ratio of the number of check-ins of user u in area c to the total number of check-ins, r v Similarly, in addition, c areas are obtained by using a k-means clustering algorithm aiming at all check-in places of all users in the social network;
3) Comprehensive compactness between adjacent users: the compactness based on the structure and the compactness based on the behavior between the adjacent users are aggregated by utilizing the weight factors to obtain the comprehensive compactness between the adjacent users in the following calculation mode,
I uv =w S ·S uv +w B ·B uv
wherein, w S And w B Weight factors representing organization-based affinity and behavior-based affinity, respectively, and w S +w B =1;
The design depth self-encoder is based on the community discovery method of the depth self-encoder as follows:
1) Depth self-coder design: obtaining a compactness matrix I of all users by utilizing the comprehensive compactness between adjacent users, taking the matrix as the input of a depth self-encoder, passing through a plurality of hidden layers of the depth self-encoder and finally passing through an output layer to obtain the potential low-dimensional representation of all users, wherein parameters in the depth self-encoder are obtained by back propagation learning;
2) k-means clustering algorithm: obtaining K classes as final K communities of the social network by using the potential low-dimensional representation of the user and using a K-means clustering algorithm;
and 4, step 4: aiming at a target user, a community structure is utilized, and an influence maximization method based on the community structure is designed to find a seed user set, namely an opinion leader set, aiming at specific online query so as to maximize the influence range of the online query in the whole network.
2. The method of claim 1, wherein the method comprises:
in the step 4, a seed user set for a specific online query is found by using an influence maximization method based on a community structure, including that influence of users is stored by creating an offline index structure, and seed users are determined by using the offline index method and a seed user determination method based on the community structure; the offline indexing method comprises the following steps:
creating a community-based influence index for each tree node of the PR tree, wherein the index structure consists of a plurality of influence lists, and each influence list stores the influence of users in one community;
a community-based user index is created for each tree node in the PR tree, with the index structure storing the influence of each user in the community in which they reside.
3. The method of claim 1, wherein the method comprises:
in the step 4, the seed user set for a specific online query is found by using an influence maximization method based on a community structure, an offline index structure is created by the method to store the influence of the user, and the seed user is determined by using a seed user determination algorithm based on the offline index and the community structure; the seed user determination algorithm based on the community structure comprises the following two steps of priority queue construction and rapid seed user determination:
for online query, determining tree nodes which are completely covered by a query region and regions which are not completely covered by the tree nodes by traversing the PR tree; aiming at tree nodes completely covered by the query area, constructing an influence list according to the created offline index, and storing users and influences thereof; constructing an influence list aiming at an area which is not completely covered by the tree nodes; aggregating the lists, and creating an influence list for each community; finally, constructing a priority queue according to the influence list of each community;
given a query Q = (R, k), a priority queue is constructed by traversing the nodes of the PR tree from the root node, determining the set of tree nodes that are completely covered by the query region R, i.e., R Q ={R 1 ,R 2 ,…,R i ,…,R r And determining the area R which is not completely covered by the tree node 0 =R-R Q (ii) a Lower pair of R Q And R 0 The following operations were performed:
Figure FDA0003946901720000031
for R Q
(1) For each tree node R i ∈R Q To, for
Figure FDA0003946901720000041
Selection List I (R) i ,C j ) Wherein m is the number of communities;
(2) For each community C j Polymerizing it in the set R Q List I (R) on each tree node in 1 ,C j ),I(R 2 ,C j ),…,I(R r ,C j ) To obtain the list I (R) Q ,C j ),I(R Q ,C j ) The list is also composed of a plurality of duplets
Figure FDA0003946901720000042
The composition of the components, wherein,
Figure FDA0003946901720000043
set R of tree nodes on behalf of user Q The influence of all users on all tree nodes in the tree is calculated as follows,
Figure FDA0003946901720000044
wherein the content of the first and second substances,
Figure FDA0003946901720000045
is shown for region R Q With location preference and in community C j A set of users in (1);
Figure FDA0003946901720000046
for R 0
(1) Order to
Figure FDA0003946901720000047
Represents the pair region R 0 With location preference and in community C j Set of users in (1), order
Figure FDA0003946901720000048
Representation can influence
Figure FDA0003946901720000049
Set of influencers of the user, pair
Figure FDA00039469017200000410
Calculate his pair
Figure FDA00039469017200000411
The influence of the user, namely:
Figure FDA00039469017200000412
(2) For each communityC j Creating a plurality of binary groups
Figure FDA00039469017200000413
List of compositions I (R) 0 ,C j );
Figure FDA00039469017200000414
Finally, for each community C j Aggregation List I (R) Q ,C j ) And I (R) 0 ,C j ) Get list I (C) j ) (ii) a List I (C) j ) From a plurality of doublets
Figure FDA00039469017200000415
Consists of the following components:
Figure FDA00039469017200000416
wherein the content of the first and second substances,
Figure FDA00039469017200000417
the delegate has a location preference for query Q and is in community C j A set of users in (1);
in this way, m lists I (C) are obtained for m communities 1 ),I(C 2 ),…,I(C m ) (ii) a The users in the m lists are candidate seed user sets; from these m lists, m priority queues H (C) are created 1 ),H(C 2 ),…,H(C j ),…H(C m ) Priority queue H (C) j ) Containing a plurality of triplets
Figure FDA00039469017200000418
The following seed user determining method utilizes the state of the user and quickly finds a seed user set by updating a priority queue;
the method comprises the following steps of sequentially and iteratively searching a plurality of seed users in a greedy mode by utilizing a plurality of established priority queues, determining the seed users in the current round according to the states of the users in the priority queues in each iteration, selecting the first user with the maximum influence in the priority queues as the seed user aiming at the first seed user, and aiming at the subsequent seed users: if the first user state of a certain priority queue is 'invalid', calculating the estimated marginal influence of the first user state, and updating the position of the first user in the queue; if the state is 'estimation', calculating the real marginal influence of the user and updating the position of the user; if its status is "accurate," then the user is the seed user determined for the round;
given query Q, for Community C j User u in (1), which is in the current set of seed users S j The lower marginal influence upper bound value is calculated in the way that,
Figure FDA0003946901720000051
wherein the content of the first and second substances,
Figure FDA0003946901720000052
from the influence of user u stored in the priority queue,
Figure FDA0003946901720000053
representatives have both location preferences for query Q and are located in community C j Is selected.
CN202011404347.4A 2020-12-04 2020-12-04 Community structure-based location awareness influence maximization method Active CN112508725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011404347.4A CN112508725B (en) 2020-12-04 2020-12-04 Community structure-based location awareness influence maximization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011404347.4A CN112508725B (en) 2020-12-04 2020-12-04 Community structure-based location awareness influence maximization method

Publications (2)

Publication Number Publication Date
CN112508725A CN112508725A (en) 2021-03-16
CN112508725B true CN112508725B (en) 2023-02-17

Family

ID=74968406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011404347.4A Active CN112508725B (en) 2020-12-04 2020-12-04 Community structure-based location awareness influence maximization method

Country Status (1)

Country Link
CN (1) CN112508725B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971345A (en) * 2016-01-08 2017-07-21 车海莺 A kind of location recommendation method based on position social networks
CN107123056A (en) * 2017-03-03 2017-09-01 华南理工大学 A kind of location-based social big data information maximization method
CN107123055A (en) * 2017-03-03 2017-09-01 华南理工大学 A kind of social big data information maximization method based on PageRank
CN107291860A (en) * 2017-06-09 2017-10-24 北京邮电大学 Seed user determines method
CN107766462A (en) * 2017-09-28 2018-03-06 重庆大学 Point of interest based on user preference, social credit worthiness and geographical position recommends method
CN110069718A (en) * 2019-04-15 2019-07-30 哈尔滨工程大学 A kind of social networks rumour suppressing method based on theme
CN110334293A (en) * 2019-07-12 2019-10-15 吉林大学 A kind of facing position social networks has Time Perception position recommended method based on fuzzy clustering
CN110838072A (en) * 2019-10-24 2020-02-25 华中科技大学 Social network influence maximization method and system based on community discovery

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971345A (en) * 2016-01-08 2017-07-21 车海莺 A kind of location recommendation method based on position social networks
CN107123056A (en) * 2017-03-03 2017-09-01 华南理工大学 A kind of location-based social big data information maximization method
CN107123055A (en) * 2017-03-03 2017-09-01 华南理工大学 A kind of social big data information maximization method based on PageRank
CN107291860A (en) * 2017-06-09 2017-10-24 北京邮电大学 Seed user determines method
CN107766462A (en) * 2017-09-28 2018-03-06 重庆大学 Point of interest based on user preference, social credit worthiness and geographical position recommends method
CN110069718A (en) * 2019-04-15 2019-07-30 哈尔滨工程大学 A kind of social networks rumour suppressing method based on theme
CN110334293A (en) * 2019-07-12 2019-10-15 吉林大学 A kind of facing position social networks has Time Perception position recommended method based on fuzzy clustering
CN110838072A (en) * 2019-10-24 2020-02-25 华中科技大学 Social network influence maximization method and system based on community discovery

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Modeling User Mobility for Location Promotion in Location-based Social Networks;Wen-Yuan Zhu 等;《Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining》;20151231;第1573-1582页 *

Also Published As

Publication number Publication date
CN112508725A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
Jiao et al. A novel next new point-of-interest recommendation system based on simulated user travel decision-making process
CN110334293B (en) Position social network-oriented position recommendation method with time perception based on fuzzy clustering
WO2020147595A1 (en) Method, system and device for obtaining relationship expression between entities, and advertisement recalling system
CN109948066B (en) Interest point recommendation method based on heterogeneous information network
CN105320719B (en) A kind of crowd based on item label and graphics relationship raises website item recommended method
Ouadah et al. SEFAP: an efficient approach for ranking skyline web services
CN108804551A (en) It is a kind of to take into account diversity and personalized space point of interest recommendation method
Al-Taie et al. Python for graph and network analysis
CN112667877A (en) Scenic spot recommendation method and equipment based on tourist knowledge map
US20210160653A1 (en) System and method for accurately and efficiently generating ambient point-of-interest recommendations
CN110134883B (en) Heterogeneous social network location entity anchor link identification method
CN105528395A (en) Method and system for recommending potential consumers
Hu et al. Nonnegative matrix tri-factorization with user similarity for clustering in point-of-interest
CN106789338B (en) Method for discovering key people in dynamic large-scale social network
Bai et al. Quantifying the impact of scholarly papers based on higher-order weighted citations
CN114036376A (en) Time-aware self-adaptive interest point recommendation method based on K-means clustering
CN111611499A (en) Collaborative filtering method, collaborative filtering device and collaborative filtering system
US20170236226A1 (en) Computerized systems, processes, and user interfaces for globalized score for a set of real-estate assets
Cao et al. Effective fine-grained location prediction based on user check-in pattern in LBSNs
CN109684561B (en) Interest point recommendation method based on deep semantic analysis of user sign-in behavior change
CN109857928A (en) User preference prediction technique based on polynary credit evaluation
Jiang et al. A user interest community evolution model based on subgraph matching for social networking in mobile edge computing environments
CN112508725B (en) Community structure-based location awareness influence maximization method
CN115408618B (en) Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features
CN109299368A (en) A kind of method and system for the intelligent personalized recommendation of environmental information resource AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant