CN112508725B

CN112508725B - Community structure-based location awareness influence maximization method

Info

Publication number: CN112508725B
Application number: CN202011404347.4A
Authority: CN
Inventors: 李晓; 彭岩; 王洁; 赵成安; 王康
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2020-12-04
Filing date: 2020-12-04
Publication date: 2023-02-17
Anticipated expiration: 2040-12-04
Also published as: CN112508725A

Abstract

The invention discloses a location awareness influence maximization method based on a community structure, which comprises the following steps: step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query; step 2: preprocessing user data in a social network, designing a PR tree index structure according to a check-in record of a user in the social network, and modeling the position preference of the user so as to quickly find a target user for online query; and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user; and 4, step 4: aiming at a target user, a community structure is utilized, and an influence maximization method based on the community structure is designed to find a seed user set, namely an opinion leader set, aiming at specific online query so as to maximize the influence range of the online query in the whole network.

Description

Community structure-based location awareness influence maximization method

Technical Field

The invention is applied to searching opinion leaders in a network in the process of implementing public praise marketing on a social network platform, and belongs to the field of social network analysis and data mining.

Background

The influence maximization method is a basic method for public praise marketing in the social network, and aims to find out an opinion leader in the social network, so that marketing products can be spread in a ten-to-ten and ten-to-hundred manner through the opinion leader, and users in the social network can know and purchase the products. The method for maximizing the influence is specifically described as follows: given a social network, a certain number of influential users (i.e., a set of opinion leaders) are sought, with this set of users as a starting point for the propagation of marketing information, thereby maximizing the propagation of information in the network. Compared with other marketing modes, public praise marketing has the advantages of low cost, high propagation speed, wide propagation range and the like, and is favored by a large number of merchants.

With the popularization of mobile internet and mobile devices, the development of online social networks gradually develops towards diversification, and a location social network appears. In such social networks, users tend to share their own location information. In recent years, there has been an increasing public market demand for products containing location information. For example, a restaurant newly opened in a western style order wants to be publicized and promoted in a public opinion website by means of public praise marketing, the marketing strategy is to provide free meal tickets for a specific number of users, and the users share the own eating experience information in the public opinion, so that the restaurant information is quickly spread in the public opinion website, and finally more users know the restaurant and go to the restaurant to eat. It is a problem to find a group of users of a certain number (called opinion leaders) to be able to make the maximum number of users in the network aware of the restaurant and go to the restaurant for a meal. It is particularly noted that the location preference of the user needs to be taken into account when solving the problem.

Currently, research is considered to integrate position information into the influence maximization problem, and various position perception influence maximization methods are proposed. According to the method, the historical movement track of a user in a social network based on the position is considered, a two-stage information propagation model is provided, and two heuristic algorithms are designed to search for a group of users with the maximum influence. However, it does not consider the interaction effect between selected users, thereby reducing the influence of information on the whole network. For location-based social networks, there is also a search for a group of users with the greatest influence within a certain area. Or the distance between the user and the promoted product is considered, the influence maximization problem of distance perception is defined, and an approximate algorithm is proposed for solving. Aiming at the problem of influence maximization in a social network based on a position, an influence range framework based on multi-objective optimization is provided, and an approximation algorithm based on a greedy algorithm and a particle swarm algorithm based on a heuristic algorithm are provided for solving.

However, the above document assumes that the user has a positional preference only for a certain area, and in reality, the user tends to have a positional preference for a plurality of areas. In addition, most of the literature utilizes a greedy algorithm to solve the problem, which results in low operation efficiency of the problem.

Disclosure of Invention

The invention aims to quickly search a user set with larger influence according to the position preference of a user aiming at a position-perceived marketing product in a social network so as to maximize the diffusion and the propagation of the marketing product in the social network and meet the public praise marketing requirement of a merchant.

In order to solve the problems, the invention defines a new position perception influence maximization problem and provides an efficient method for solving the problem.

The technical scheme of the invention is a location awareness influence maximization method based on a community structure, which comprises the following steps:

step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query;

step 2: preprocessing user data in the social network, and modeling the position preference of a user by utilizing a PR tree index structure according to the sign-in record of the user in the social network so as to quickly find a target user queried on line;

and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user;

and 4, step 4: aiming at a target user, a community structure is utilized, and an influence maximization method based on the community structure is designed to find a seed user set, namely an opinion leader set, aiming at specific online query so as to maximize the influence range of the online query in the whole network.

Further, the problem of maximizing the influence of position sensing is that: for a social network, given a maximum influence tree information propagation model, a user set S composed of k users needs to be searched, so that the influence range of the group of users on a target user in the social network is maximum, wherein the target user of the query is a user set having a position preference for a query area.

Further, in the step 2, a position preference of the user is modeled by using a PR tree index structure, which specifically includes the following steps:

step (2.1) creating a PR tree structure, and creating a PR tree based on the R tree, wherein the nodes of the PR tree store the position preference of a user;

and (2.2) searching the PR tree aiming at the online query, and quickly finding the queried target user through pruning operation.

Further, in the step 2, a seed user set for a specific online query is found by using an influence maximization method based on a community structure, influence of the user is stored by creating an offline index structure, and a seed user is determined based on the offline index method, the community discovery method and a seed user determination method based on the community structure; the offline indexing method comprises the following steps:

creating a community-based influence index for each tree node of the PR tree, wherein the index structure consists of a plurality of influence lists, and each influence list stores the influence of users in one community;

a community-based user index is created for each tree node in the PR tree, with the index structure storing the influence of each user in the community in which they reside.

Further, in the step 3, the community structure of the network is determined by using the structure of the social network and the check-in record of the user, by using a community discovery method based on deep learning, and the seed user is determined by using an offline indexing method, a community discovery method and a seed user determination method based on the community structure; the community discovery method comprises the following steps:

based on the social network structure information, calculating the structure-based closeness between adjacent users by using the number of common neighbors between the users; calculating the closeness based on behaviors between adjacent users by using the sign-in information of the users in the social network; calculating the comprehensive compactness between adjacent users by combining a weight function;

taking a comprehensive compactness matrix among users as input, and designing a community discovery method based on a depth self-encoder to obtain potential low-dimensional representation of the users;

and (4) obtaining the community structure of the social network by using the low-dimensional representation of the user and combining a k-means clustering algorithm.

Further, in the step 4, a seed user set for a specific online query is found by using an influence maximization method based on a community structure, an offline index structure is created by the method to store the influence of the user, and the seed user is determined based on the offline index, a community discovery method and a seed user determination algorithm based on the community structure; the seed user determination algorithm based on the community structure comprises the following steps:

for online query, determining tree nodes which are completely covered by a query region and regions which are not completely covered by the tree nodes by traversing the PR tree; aiming at tree nodes completely covered by the query area, constructing an influence list according to the created offline index, and storing users and influences thereof; constructing an influence list aiming at an area which is not completely covered by the tree nodes; aggregating the above lists, and creating an influence list for each community; finally, constructing a priority queue according to the influence list of each community;

the method comprises the following steps of sequentially and iteratively searching a plurality of seed users by utilizing a plurality of established priority queues and a greedy mode, determining the seed users in the current round according to the states of the users in the priority queues in each round of iteration, selecting the first user with the largest influence in the priority queues as the seed user aiming at the first seed user, and aiming at the subsequent seed users: if the first user state of a certain priority queue is 'invalid', calculating the estimated marginal influence of the first user state, and updating the position of the first user in the queue; if the state is 'estimation', calculating the real marginal influence of the user and updating the position of the user; if its status is "accurate," then the user is the seed user determined for the round.

Has the advantages that:

compared with the existing influence maximization method, the method disclosed by the invention has the following advantages:

1) Compared with the traditional influence maximization method, the method can consider the inquired position information and the position preference of the user, so that the searched seed user set has a better effect and can meet the demand of virus marketing.

2) Compared with the existing method based on the greedy algorithm, the method has higher efficiency and greatly saves the running time.

3) Compared with the existing method based on the heuristic algorithm, the method has better effect, and can greatly improve the influence range of the seed user set in the social network.

4) The method and the device can better balance the efficiency and the effect of the influence maximization method, and can ensure the influence range of the seed user set in the social network while improving the efficiency.

Drawings

FIG. 1 is a schematic diagram of a process framework;

FIG. 2 is a diagram of a PR tree structure;

FIG. 3 shows a PR tree with a node R ₄ The community-based influence index of (a);

FIG. 4 is a diagram of a node R in a PR tree ₄ Is used to identify the community-based user index.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention without creative efforts.

According to an embodiment of the present invention, as shown in fig. 1, a method for maximizing location awareness influence based on a community structure is provided, which includes the following steps:

step 2: preprocessing user data in the social network, and modeling the position preference of a user by utilizing a PR tree index structure according to the sign-in record of the user in the social network so as to quickly find a target user for online query;

The social network is composed of a plurality of communities, the connection among nodes in the communities is relatively tight, and the connection among the communities is relatively sparse; the following describes the definition of the influence maximization problem of location awareness, the PR tree index structure, the community discovery method, and the influence maximization method based on the community structure.

Location-aware impact maximization problem definition

And (3) inquiring the model: for a product to be marketed for online public praise, an online query Q = (R, k) is defined, where R represents the area of the query and k represents the size of the set of seed users to be sought, and the set of users with greater influence is referred to as the set of seed users.

And (3) data model: from the user's check-in record in the social network, the user's location preference for query Q may be expressed as the frequency with which he checked-in within the area.

Location-aware influence maximization problem: for a social network, given a maximum influence tree information propagation model, the problem of maximizing influence of location awareness is defined as finding a set 5 of k users, so that the set of users has the largest influence on target users in the social network, where the target users of the query are a set of users having location preference for a query area, that is:

σ _RA (T，Q)＝∑ _v∈V P(T，v)·γ(v，Q) (2)

wherein σ _RA (T, Q) represents the influence range of the user set T in the social network, P (T, V) represents the influence probability of the user set T on the user V, the influence probability is calculated by the existing maximum influence tree information propagation algorithm, gamma (V, Q) represents the position preference of the user V on the query Q, and V represents the user set in the social network.

Theorem 1. The problem of maximizing the influence of location awareness is an NP-hard problem, and the influence range σ _RA (T, Q) has monotonic sub-mode characteristics.

Based on theorem 1, the traditional greedy algorithm can be expanded to solve the problem. However, the greedy algorithm is run for a long time because it requires traversal to compute the marginal impact of each user. Fast algorithms need to be designed to solve the location awareness impact maximization problem.

PR tree index structure

Since the defined location-aware influence maximization problem takes a set of users having a preference for the query location as target users, solving the location-aware influence maximization problem requires first finding the target users of the query quickly. The PR tree index structure is based on an R-tree, with nodes of the tree storing the user's location preferences. And for the query Q, traversing the whole tree in a depth-first mode from the root node, and quickly filtering out tree nodes which do not intersect with the query region R through a pruning strategy.

In the PR tree, leaf node O has stored thereon some element of the form < PD, M >, where PD is a pointer to document D and M is a minimum bounding rectangle MBR. Let E represent an element of the leaf node O, the document D pointed to by the pointer PD of E contains the following: u represents a user with preference to M, TN records the check-in times of each user in U, and LN records the check-in times of each user in U in M area.

In addition, the leaf node O contains a pointer to a pseudo document, which is formed by the aggregation of the documents of all the elements on O.

In the PR tree, some elements of the form < PC, M > are stored on non-leaf nodes N, where PD is a pointer to its child node N and M is an aggregate of the minimum bounding matrices of all the elements on the child nodes. The non-leaf nodes also contain a pointer to a dummy document that is aggregated from the dummy documents of all the child nodes. FIG. 2 is a PR tree structure containing a pseudo document.

Given a query Q, the following describes how to quickly find a target user by traversing the PR tree and calculating the user's preference for the query. For one tree node B, B.D represents its pseudo document, B.M represents its minimum bounding matrix. Since the target user of query Q is a user with location preferences for query region R,

1. if B.M intersects R as an empty set, none of the users stored in node B are the target users of the query. Thus, node B may be pruned at this point and not traverse all of B's child nodes.

2. If B.M intersects with R,

1) If B.M is completely contained by R, then all users stored on node B are the target users of query Q, at which point it is not necessary to continue traversing the child nodes of B, but only needs to access pseudo document B.D to calculate the location preference of each target user for Q.

2) If B.M is not fully contained by R, then the child nodes of node B need to be visited in turn. In particular, if node B is a leaf node, all elements on it need to be accessed at this time.

The above process is repeated until all target users are found and their location preferences for the query are calculated. In particular, if a target user is stored in multiple tree nodes, the user's location preference for the query is calculated by summing their preferences for each tree node. Through the pruning process, the target user of the query can be quickly found.

Community discovery method based on deep learning

The community discovery method based on deep learning utilizes structural information of a social network and sign-in records of users in the social network to calculate closeness between adjacent users in the social network, further obtains low-dimensional representation of the users by designing a deep self-encoder, and finally obtains a community structure of the social network by utilizing a k-means clustering algorithm. The following respectively introduces compactness calculation among users and a community discovery algorithm based on a depth self-encoder.

1. Closeness calculation between users

1) Structure-based closeness between neighboring users: according to the topological structure information of the social network, the structure-based closeness between the adjacent users is calculated by utilizing the number of common neighbors between the users, and if the two users have more common neighbors, the structure-based closeness between the users is higher. For a pair of adjacent users (u, v), the closeness between them based on the structure is calculated as follows,

wherein, F _u Representing a set of friends of user u in a social network, F _v Representing a set of friends of user v in the social network.

2) Closeness between adjacent users based on behavior: in a social network, a user can check in, and if the two users have similar behaviors, the closeness between the two users is higher. Therefore, the historical check-in behavior records of the users are utilized to calculate the closeness among the users based on the behaviors. For a pair of adjacent users (u, v), the closeness between them based on behavior is calculated as follows,

wherein r is _u Represents the regional preference vector of user u and r _u ＝(r _u1 ，r _u2 ，...，r _uc )，r _uc Represents the ratio of the number of check-ins of user u in area c to the total number of check-ins, r _v The same is true. In addition, c areas are obtained by using a k-means clustering algorithm aiming at all check-in places of all users in the social network.

3) Integrated closeness between adjacent users: the compactness based on the structure and the compactness based on the behavior between the adjacent users are aggregated by utilizing the weight factors to obtain the comprehensive compactness between the adjacent users in the following calculation mode,

I _uv ＝W _S ·S _uv +w _B ·B _uv

wherein, w _S And w _B Weight factors representing the organization-based affinity and the behavior-based affinity, respectively, and w _S +w _B ＝1。

2. Community discovery algorithm based on depth self-encoder

1) Depth self-coder design: and obtaining the compactness matrix I of all users by utilizing the comprehensive compactness between the adjacent users, taking the matrix as the input of the depth self-encoder, passing through a plurality of hidden layers of the depth self-encoder, and finally obtaining the potential low-dimensional representation of all users through an output layer. Wherein the parameters in the depth autoencoder are obtained by back propagation learning.

2) k-means clustering algorithm: and obtaining K classes as final K communities of the social network by using the potential low-dimensional representation of the user and using a K-means clustering algorithm.

Community structure-based location awareness influence maximization method

The location awareness influence maximization method based on the community structure utilizes community structure information in the social network to approximate the influence of a user in the social network to the influence of the user in the community, so that the solving efficiency is improved. To further improve efficiency, the method creates an offline index structure to store the influence of the user. And introducing an offline index and a seed user determination algorithm based on a community structure respectively at the lower side.

1. Under-wire rope guide

1) Community-based influence indexing: each tree node in the PR tree stores a community-based influence index. Indexing structure for node R in PR tree _i Each community C _j Corresponding to an influence list I (R) _i ，C _j ) The influence of each user of the community is stored in the list, and the users are arranged in descending order of influence. Wherein, the tree node R _i Upper community C _j The user influence in the influence list is calculated as follows:

(1) Find out to belong to community C _j And is located at a tree node R _i Set of users of

(2) Computing can affect a set of users

Set of influencers per user in

(3) For set of influencers

Each user in the group, calculating his set of users in advance

The influence of (a) on the magnetic field,

wherein, gamma (v, R) _i ) Representing user v at a tree node R _i The above stored location preferences are derived from the PR tree index structure.

(4) For tree node R _i And community C _j It has a descending order list of influence I (R) _i ，C _j ) The list consisting of a plurality of tuples

And (4) forming.

For example, FIG. 3 shows node R in a PR tree ₄ The community-based influence index.

2) Community-based user indexing: each tree node in the PR tree also stores a community-based user index. For each user u, there is one triple containing multiple triplets

Each triplet in the list representing a user u versus a set of users on the node

Influence of (2)

For example, FIG. 4 shows a tree node R ₄ Is used to identify the community-based user index.

2. Seed user determination algorithm based on community structure

Based on the designed offline index structure and greedy algorithm, a seed user determination algorithm based on a community structure is designed, and a seed user set aiming at online query is quickly found in an iterative manner. The seed user determination algorithm based on the community structure comprises two steps of priority queue construction and rapid seed user determination.

1) Priority queue construction

Given a query Q = (R, k), a priority queue is constructed by traversing the nodes of the PR tree from the root node, determining the set of tree nodes that are completely covered by the query region R, i.e., R _Q ＝{R ₁ ，R ₂ ，...，R _i ，...，R _r And determine that there are not completely tree nodesCovered region R ₀ ＝R-R _Q . Lower pair of R _Q And R ₀ The following operations were performed, respectively.

For R _Q ：

(1) For each tree node R _i ∈R _Q To, for

Selection List I (R) _i ，C _j ) Wherein m is the number of communities.

(2) For each community C _j Polymerizing it in the set R _Q List on each tree node in

I(R ₁ ，C _j )，I(R ₂ ，C _j )，...，I(R _r ，C _j ) To obtain the list I (R) _Q ，C _j )。I(R _Q ，C _j ) The list is also composed of a plurality of duplets

The composition of the components, wherein,

set R of tree nodes on behalf of user _Q The influence of all users on all tree nodes in the tree is calculated as follows,

wherein the content of the first and second substances,

represents a pair of regions R _Q With location preference and in community C _j Is selected.

For R ₀ ：

(1) Order to

Represents to the region R ₀ With location preference and in community C _j Set of users in (1), order

The representation can influence

Set of influencers of the user, pair

Calculate his pair

The influence of the user, namely:

(2) For each community C _j Creating a plurality of binary groups

List of compositions I (R) ₀ ，C _j )。

Finally, for each community C _j Aggregation List I (R) _Q ，C _j ) And I (R) ₀ ，C _j ) Get list I (C) _j ). List I (C) _j ) From a plurality of doublets

The composition, and:

wherein the content of the first and second substances,

representatives have both location preferences for query Q and are located in community C _j Is selected.

In this way, m lists I (C) are obtained for m communities ₁ )，I(C ₂ )，...，I(C _m ). In particular, the users in the m lists are a set of candidate seed users. From these m lists, m priority queues H (C) are created ₁ )，H(C ₂ )，...，H(C _j )，...H(C _m ) Priority queue H (C) _j ) Containing a plurality of triplets

The invalidity represents the state of the user u, and the seed user determining method utilizes the state of the user to quickly find the seed user set by updating the priority queue.

2) Rapid seed user determination method

Based on m priority queues, the method finds k seed users in sequence through a greedy algorithm. In the original conventional algorithm, the marginal influence of each user needs to be calculated in each round (the influence mentioned in the present invention refers to the influence between two adjacent users, for example, the influence of a user a on a user b can represent the probability that b will go to a place if a goes to the place, for example, the probability that a user a posts a post, and the user replies to the post, etc. in the present invention, the influence between users is obtained by the existing widely adopted weighting model, and is calculated by the frequency of interaction between two users in the social network), and then the user with the largest marginal influence is selected as the seed user selected in the current round. But the process is particularly inefficient. Thus, in the designed fast seed user selection method, the estimation of the marginal influence of partial candidate seed users is calculatedValues to improve the efficiency of finding seed users. In each iteration, since users with larger estimated marginal influence tend to have larger true marginal influence, the method preferentially calculates the true marginal influence of users with larger estimated marginal influence. In addition, each queue H (C) is updated during the update of the priority queue _j ) Each user u may contain three states: invalid, estimated and exact, dynamically updating the state of the user as required, and quickly finding out the seed user set through the stored state of the user.

The "invalid" state represents that the user u has stored the influence as its initial influence, i.e., the influence of the user u is stored as the initial influence

The 'estimation' state represents that the influence stored by the user u is the estimated marginal influence and is calculated by a subsequent quick estimation method; the "accurate" state represents the marginal influence at which the influence stored by the user u is true, and is calculated by the following formula:

Δσ _RA (u|S _j ，Q)＝σ _RA ({u}∪S _j ，Q)-σ _RA (S _j ，Q) (5)

wherein S is _j Representative Community C _j The existing seed user set is in community C _j 。

Next, how to quickly find the seed user set by updating the priority queue is described in detail.

(1) A first seed user is determined. For the initial m priority queues, the first user of each queue is selected, m users are found, and the stored influence of these users is compared, i.e.

The user with the greatest influence is selected as the first seed user and removed in the priority queue.

(2) Subsequent seed users are determined.

Since there may be a cross between the seed usersThe influence of each other causes that when selecting the subsequent seed users, the interaction influence of the subsequent seed users and the current seed user set needs to be considered, and each round ensures that the user with the minimum interaction influence (namely, the user with the maximum marginal influence) is selected as the selected user in the round. At the beginning of each round, the status of all users is reset to invalid. For each community C _j Let I _u Representing user u at C _j Set of users that can be affected in, order

Representing the current set S of seeds _j The set of users that can be affected.

In each round, the first user in the m priority queues is selected, their influence is compared, and the user u with the greatest influence is found _t And the queue H (C) in which he is located _t ) If, if

Explain user u _t With the current set S of seed users in the community _t Without interaction, then u _t The seed user to be searched is selected; if it is not

Indicate user u _t With the current set S of seed users in the community _t With interactive influence, then:

■ If u is _t Is invalid, user u is assigned _t Is updated to an estimatized "estimate" and its impact is updated to the estimated marginal impact

Theorem 2 below will describe how to quickly calculate the estimated marginal impact.

■ If u is _t Is an estimatized "estimate", then user u _t Is updated to exact and his true marginal influence Δ σ is calculated by equation (5) _RA (u _t |S _t ，Q)。

■ If u is _t Is exact, then user u _t The seed user sought for that round is added to the set of seed users and removed from the priority queue.

And (3) sequentially and iteratively searching for seed users by circularly executing the steps (1) and (2) until k seed users are found.

Theorem 2 given query Q, for Community C _j User u in (1), which is in the current set of seed users S _j The lower marginal influence upper bound value (i.e. the estimated marginal influence) is calculated in such a way that,

wherein the content of the first and second substances,

from the influence of user u stored in the priority queue,

the delegate has a location preference for query Q and is in community C _j Is selected.

Through the formula (6), the current seed user set S of u can be quickly estimated _j The lower margin influences the upper bound of force.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but various changes may be apparent to those skilled in the art, and it is intended that all inventive concepts utilizing the inventive concepts set forth herein be protected without departing from the spirit and scope of the present invention as defined and limited by the appended claims.

Claims

1. A location awareness influence maximization method based on a community structure is characterized by comprising the following steps:

step 1: in a social network, defining the problem of influence maximization of location awareness, and finding out an opinion leader of the location awareness through online query; the problem of maximizing the influence of position perception is as follows: aiming at a social network, a maximum influence tree information propagation model is given, a user set S consisting of k users needs to be searched, so that the influence range of the group of users on a target user in the social network is maximum, wherein the target user to be inquired is a user set with position preference on an inquiry area;

and 2, step: preprocessing user data in the social network, and modeling the position preference of a user by utilizing a PR tree index structure according to the sign-in record of the user in the social network so as to quickly find a target user queried on line; in the step 2, a PR tree index structure is used for modeling the position preference of the user, and the method specifically comprises the following steps:

step (2.1), creating a PR tree structure, and creating a PR tree based on the R tree, wherein the nodes of the PR tree store the position preference of a user;

step (2.2), aiming at online query, searching a PR tree, and quickly finding a target user for query through pruning operation; the method specifically comprises the following steps:

(2.2.1) if the intersection of B.M and R is an empty set, none of the users stored in node B are the target users of the query; therefore, node B may be pruned at this point and not traverse all child nodes of B;

(2.2.2.) if B.M intersects with R,

1) If B.M is completely contained by R, then all users stored on node B are target users for query Q, at this time, it is only necessary to access pseudo document B.D to calculate the position preference of each target user for Q without going on traversing the child nodes of B;

2) If B.M is not fully contained by R, then the child nodes of node B need to be visited in turn; if node B is a leaf node, all elements on it need to be accessed at this time;

repeating the above processes 1) -2) until all target users are found and the position preference of the target users to the query is calculated; if a certain target user is stored in a plurality of tree nodes, the position preference of the user to query is obtained by summing the preference of the user to each tree node, and the queried target user is quickly found through a pruning process;

and step 3: designing a community discovery method based on deep learning to find a community structure of a network according to a given social network structure and a check-in record of a user; in the step 3, a community structure of the social network is obtained by using a deep learning method according to the social network structure and the check-in record of the user, and the method specifically comprises the following steps:

step (3.1) calculating the structure-based closeness between adjacent users according to the social network structure;

step (3.2) calculating the closeness based on the behaviors between the adjacent users according to the sign-in records of the users;

step (3.3) designing a weight function, and comprehensively calculating the compactness between adjacent users;

step (3.4) designing a depth self-encoder according to the mixed influence force between adjacent users, and calculating the low-dimensional vector representation of the users;

step (3.5) obtaining a community structure of the social network by using the low-dimensional representation of the user and a k-means clustering algorithm;

the closeness between the users is calculated as follows:

1) Structure-based closeness between neighboring users: according to the topological structure information of the social network, the structure-based closeness between the adjacent users is calculated by utilizing the number of common neighbors between the adjacent users, if the two users have more common neighbors, the structure-based closeness between the two users is higher, for a pair of adjacent users (u, v), the structure-based closeness between the two users is calculated as follows,

wherein, F _u Representing a set of friends of user u in a social network, F _v Representing a set of friends of user v in the social network;

2) Closeness between adjacent users based on behavior: in the social network, users can perform check-in behaviors, if two users have similar behaviors, the closeness between the two users is higher, the closeness based on the behaviors between the two users is calculated by using the historical check-in behavior records of the users, for a pair of adjacent users (u, v), the closeness based on the behaviors between the two users is calculated in the following way,

wherein r is _u Represents the regional preference vector of user u and r _u ＝(r _u1 ,r _u2 ,…,r _uc )，r _uc Represents the ratio of the number of check-ins of user u in area c to the total number of check-ins, r _v Similarly, in addition, c areas are obtained by using a k-means clustering algorithm aiming at all check-in places of all users in the social network;

3) Comprehensive compactness between adjacent users: the compactness based on the structure and the compactness based on the behavior between the adjacent users are aggregated by utilizing the weight factors to obtain the comprehensive compactness between the adjacent users in the following calculation mode,

I _uv ＝w _S ·S _uv +w _B ·B _uv

wherein, w _S And w _B Weight factors representing organization-based affinity and behavior-based affinity, respectively, and w _S +w _B ＝1；

The design depth self-encoder is based on the community discovery method of the depth self-encoder as follows:

1) Depth self-coder design: obtaining a compactness matrix I of all users by utilizing the comprehensive compactness between adjacent users, taking the matrix as the input of a depth self-encoder, passing through a plurality of hidden layers of the depth self-encoder and finally passing through an output layer to obtain the potential low-dimensional representation of all users, wherein parameters in the depth self-encoder are obtained by back propagation learning;

2) k-means clustering algorithm: obtaining K classes as final K communities of the social network by using the potential low-dimensional representation of the user and using a K-means clustering algorithm;

2. The method of claim 1, wherein the method comprises:

in the step 4, a seed user set for a specific online query is found by using an influence maximization method based on a community structure, including that influence of users is stored by creating an offline index structure, and seed users are determined by using the offline index method and a seed user determination method based on the community structure; the offline indexing method comprises the following steps:

3. The method of claim 1, wherein the method comprises:

in the step 4, the seed user set for a specific online query is found by using an influence maximization method based on a community structure, an offline index structure is created by the method to store the influence of the user, and the seed user is determined by using a seed user determination algorithm based on the offline index and the community structure; the seed user determination algorithm based on the community structure comprises the following two steps of priority queue construction and rapid seed user determination:

for online query, determining tree nodes which are completely covered by a query region and regions which are not completely covered by the tree nodes by traversing the PR tree; aiming at tree nodes completely covered by the query area, constructing an influence list according to the created offline index, and storing users and influences thereof; constructing an influence list aiming at an area which is not completely covered by the tree nodes; aggregating the lists, and creating an influence list for each community; finally, constructing a priority queue according to the influence list of each community;

given a query Q = (R, k), a priority queue is constructed by traversing the nodes of the PR tree from the root node, determining the set of tree nodes that are completely covered by the query region R, i.e., R _Q ＝{R ₁ ,R ₂ ,…,R _i ,…,R _r And determining the area R which is not completely covered by the tree node ₀ ＝R-R _Q (ii) a Lower pair of R _Q And R ₀ The following operations were performed:

for R _Q ：

(1) For each tree node R _i ∈R _Q To, for

Selection List I (R) _i ,C _j ) Wherein m is the number of communities;

(2) For each community C _j Polymerizing it in the set R _Q List I (R) on each tree node in ₁ ,C _j ),I(R ₂ ,C _j ),…,I(R _r ,C _j ) To obtain the list I (R) _Q ,C _j )，I(R _Q ,C _j ) The list is also composed of a plurality of duplets

The composition of the components, wherein,

wherein the content of the first and second substances,

is shown for region R _Q With location preference and in community C _j A set of users in (1);

for R ₀ ：

(1) Order to

Represents the pair region R ₀ With location preference and in community C _j Set of users in (1), order

Representation can influence

Set of influencers of the user, pair

Calculate his pair

The influence of the user, namely:

(2) For each communityC _j Creating a plurality of binary groups

List of compositions I (R) ₀ ,C _j )；

Finally, for each community C _j Aggregation List I (R) _Q ,C _j ) And I (R) ₀ ,C _j ) Get list I (C) _j ) (ii) a List I (C) _j ) From a plurality of doublets

Consists of the following components:

wherein the content of the first and second substances,

the delegate has a location preference for query Q and is in community C _j A set of users in (1);

in this way, m lists I (C) are obtained for m communities ₁ ),I(C ₂ ),…,I(C _m ) (ii) a The users in the m lists are candidate seed user sets; from these m lists, m priority queues H (C) are created ₁ ),H(C ₂ ),…,H(C _j ),…H(C _m ) Priority queue H (C) _j ) Containing a plurality of triplets

The following seed user determining method utilizes the state of the user and quickly finds a seed user set by updating a priority queue;

the method comprises the following steps of sequentially and iteratively searching a plurality of seed users in a greedy mode by utilizing a plurality of established priority queues, determining the seed users in the current round according to the states of the users in the priority queues in each iteration, selecting the first user with the maximum influence in the priority queues as the seed user aiming at the first seed user, and aiming at the subsequent seed users: if the first user state of a certain priority queue is 'invalid', calculating the estimated marginal influence of the first user state, and updating the position of the first user in the queue; if the state is 'estimation', calculating the real marginal influence of the user and updating the position of the user; if its status is "accurate," then the user is the seed user determined for the round;

given query Q, for Community C _j User u in (1), which is in the current set of seed users S _j The lower marginal influence upper bound value is calculated in the way that,

wherein the content of the first and second substances,

from the influence of user u stored in the priority queue,