CN112905905A

CN112905905A - Interest point-area joint recommendation method in location social network

Info

Publication number: CN112905905A
Application number: CN202110092706.5A
Authority: CN
Inventors: 袁浩; 徐建
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2021-06-04

Abstract

The invention belongs to the technical field of computer application, and particularly discloses a point of interest-region joint recommendation method in a location social network.

Description

Interest point-area joint recommendation method in location social network

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to a point of interest-region joint recommendation method in a location social network.

Background

In recent years, with the rapid development of location social networks, known location social networks such as Foursquare have attracted nearly billions of registered users, thus generating a large number of historical user check-in records, and the data record information has a unique and key role in the field of point of interest recommendation.

The basic process of the current interest point recommendation is to infer the interest point preference of a user through historical check-in information of the user, and to perform recommendation according to the interest point preference of the user. In the historical check-in data of the users, not only the interest point preference of the users is implied, but also the preference of the users for the areas where the interest points are located is implied, the preference indirectly influences the preference of the users for the interest points, in the interest point recommendation field, people often ignore the area level interest, and the check-in habit of the areas of the users is just ignored, so that the system cannot capture the real complete interest point preference of the users, which is also a defect of the current interest point recommendation technology.

Disclosure of Invention

The invention aims to overcome the defect that the prior art captures the incomplete interest of a user, fully excavates the regional interest of the user neglected in the prior interest point recommendation field, and provides a method for constructing the regional interest of the user on the basis. The specific technical scheme is as follows:

a point-of-interest-area joint recommendation method in a location social network comprises the following steps:

step 1, preprocessing user historical sign-in data:

the user historical check-in data is formed by a series of quintuple records l, and each quintuple l comprises a user ID, an interest point ID, longitude, latitude and access frequency; when the data set is preprocessed, the interest points accessed by at least 10 different users are selected, and the users signed in at least 15 different interest points form the main body of the data set;

step 2, constructing a region level interest characteristic matrix R:

aiming at the processed check-in data set, a region level interest characteristic matrix is constructed, the matrix can reflect the interest of the user in the region where the interest point is located, and the interest characteristic matrix is input for region level interest modeling in the next step;

step 3, modeling regional interest:

the region level interest characteristic matrix is used as input, after modeling is carried out through a logic matrix decomposition technology, a user characteristic vector and an interest point characteristic vector are learned, and the region level interest preference of a user can be obtained through the dot product operation of the two vectors;

step 4, adding regional dynamic weight:

aiming at the difference of the attention degree of the user to the region, obtaining the region level interest dynamic weight by using a DBSCAN clustering algorithm, and expressing the real region preference of the user by combining the region level interest preference;

step 5, combining the personal interests and the region interests of the user:

the interest of the user to each interest point cannot be ignored while the region level interest is captured, so that the interest preference of the user can be complete as much as possible by fusing personal interests, and the effect of recommending the interest points is improved;

and 6, recommending the top-k interest points to the user.

Further, the step 1 specifically includes the following steps:

step 1-1, firstly determining a main body comprising a user ID and an interest point ID, then taking the main body as a key for all data, and taking accumulated sign-in times as a value, namely the most basic data composition comprising the user ID, the interest point ID and sign-in frequency;

step 1-2, selecting interest points accessed by at least 10 different users and users signed in at least 15 different interest points to form a main body of the data set, and finally, establishing a record format of a dictionary H according to the data set, wherein the record format is { [ user ID, interest point ID ]: sign-in frequency }.

Further, the step 2 specifically includes the following steps:

step 2-1, firstly defining a concept, and calling a circular geographic area formed by the interest points with the geographic positions of the interest points as the circle centers and the radius r as a logic area of the interest points, wherein in the logic area, the number of other interest points accessed by a user influences the interest of the user in the logic area of the interest points;

step 2-2, matrix construction:

step 2-2-1, constructing an interest point set G (G) according to the data set obtained in the step 1₁,g₂,g₃,…,g_n) And a dictionary M { g of the interest point ID corresponding to the longitude and latitude coordinates_i:[long_i，lat_i]The set G contains all non-repeated interest points, and an interest point ID and corresponding longitude and latitude coordinates are recorded in the dictionary M;

step 2-2-2, calculating the distance between each interest point and a target interest point through a haversine distance formula, setting a distance threshold to be 10km, and storing a set of other interest points of which the distance between each target interest point and the corresponding target interest point is within the threshold through a dictionary X while calculating the distance; after all the distances are calculated, the dictionary X can be constructed, the data record format is { target interest point ID: set S }, wherein the distance between the interest point in the set S and the target interest point is less than or equal to the distance threshold;

step 2-2-3, initializing a region level interest characteristic matrix R, and constructing the matrix R through the dictionary X, wherein the specific formula is as follows:

R_uirepresenting the preference of the user u in the area of the interest point i in the matrix R; wherein X_iRepresents the set of points of interest within the logical area of the target point of interest i, and Y_uRepresenting a set of points of interest, X, visited by user u_i∩Y_uThe method comprises the steps that a point of interest set visited by a user in a logic area of a target point of interest is represented, alpha is an area compensation factor, and the area level interest of the user is also influenced by dense or sparse distribution of the point of interest in the logic area under the condition that the same user has the same size of the logic area and the same number of visited points of interest; the more sparse the points of interest in the logical area, the more interesting the user is for the area, and conversely, the more dense the points of interest in the logical area, the less interesting the user is for the area compared to the sparse area.

Further, step 3-1 assumes l_ijAn event indicating that the user i chooses to interact with the logical area where the point of interest j is located (i.e. the user i likes and visits the logical area where the point of interest j is located), and sets β_iAnd beta_jThe probability P (l) of the user i accessing the logic area where the interest point j is located is the user offset vector and the interest point offset vector respectively_ij|u_i,h_j,β_i,β_j) Comprises the following steps:

wherein u is_iIs a potential vector of user i, h_jThe probability P is a potential vector of a logic area where the interest point j is located, and can be expressed as a preference score of the user on the area where the interest point is located;

step 3-2 finally learns the parameters U, H, β in the above formula by solving the following optimization problem:

the required parameters can be learned by optimizing the objective function in an iterative mode of random gradient descent, and finally the region-level interest of the user, namely P (l) is represented by the learned parameters U, H and beta_ij|u_i,h_j,β_i,β_j)。

Further, step 4-1, indirectly converting the dynamic weight by using a density clustering method, and adopting two parameters: the scanning area radius eps and the minimum number of interest points contained, MinPts:

step 4-1-1, clustering historical sign-in interest points of each user independently, initializing an inaccessible interest point set T for one target user, and detecting unprocessed interest points T in the set T_iIf the point of interest t_iIf it is not included in a cluster or marked as noise, i.e. if the number of interest points in the neighborhood is less than minPts, the interest point t is examined_iScanning the area in the radius, if the number of the interest points in the neighborhood is more than minPts, establishing a new cluster c_iAdding all interest points in the neighborhood into a candidate interest point set N; if the number is less than minPts, then t is considered to be_iIs a noise point;

step 4-1-2, checking the neighborhood region of all the interest points t which are not processed in the candidate interest point set N, and adding the interest points into the candidate interest point set N if at least minPts interest points are contained; if the interest point t is not divided into any cluster, adding it to the cluster c;

4-1-3, repeating the step 4-1-2, and continuously checking the interest points which are not processed in the candidate interest point set N until the candidate interest point set N is empty;

4-1-4, repeating the steps 4-1-1 to 4-1-3 until all the interest points are classified into a certain cluster or marked as noise points;

the final output is a set C (C) of clusters₁,c₂,…,c_m) Which contains a set of noise points.

Step 4-2, combining the clustering result, regarding the noise interest points generated after clustering, regarding the ratio of the number of the noise interest points to the number of the user sign-in interest points as sign-in distribution dispersion d, combining step 3 and step 4, and integrating the actual region level interests of the user:

S＝2×(1-d)P(l_ij|u_i,h_j,β_i,β_j)

the dynamic weight of the region is set to be 2 x (1-d), wherein d is the check-in distribution dispersion degree obtained after the historical check-in clusters of each user i, and the check-in distribution dispersion degree is used for balancing the preference of the user on the interest points and the preference of the user on the region where the interest points are located.

Further, the POI-level interest of the user in step 5 may be expressed as:

P(l_ij|u_i,v_j,η_i,η_j) Probability of visiting interest point j for user i, and preference score of user i to interest point, where u_iIs a potential vector, v, of user i_jPotential vectors, η, representing points of interest_iAnd η_jRespectively representing a user offset vector and an area offset vector of the interest point; probability P through step 3-2 for historical sign-in matrix of user

Modeling is carried out, each element in the matrix represents the frequency of the user accessing the interest point, and if the frequency is 0, the user does not interact with the interest point; modeled POI-level interest P (l)_ij|u_i,v_j,η_i,η_j) And the region level interest P (l) after step 4_ij|u_i,h_j,β_i,β_j) After aggregation, the true preference S' of the user is formed:

S'＝P(l_ij|u_i,v_j,η_i,η_j)+2×(1-d)P(l_ij|u_i,h_j,β_i,β_j)。

further, according to the comprehensive interest preference score S', the interests of all the interest points of each user are comprehensively ranked, and top-k interest points with the highest scores are selected for each user to form a list and are recommended to the user.

The invention has the beneficial effects that:

according to the interest point-area combined recommendation method based on the location social network, the interest of the user in the area where the interest point is located is fully utilized, so that deeper user check-in behaviors are mined, the interest points which are more interested are recommended to the user by means of the complete user preference in combination with the interest of the user in the interest point, and the user is guided to check in the interest points.

The method not only considers the interest point preference of the user, but also considers the area level interest preference of the user to the area where the interest point is located when recommending the interest point to the user, and also considers the difference of the attention degree of each user to the area where the specific interest point is located when capturing the area level interest of the user, so as to establish an area level dynamic weight, and further make more accurate interest point recommendation.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a schematic diagram of the region level interest modeling process of the present invention.

FIG. 3 is a visualization diagram of a clustering result of historical check-in records of a certain user.

FIG. 4 is a graph of a historical check-in record clustering result visualization of another user.

Detailed Description

The invention will be further explained with reference to the drawings.

FIG. 1 is a flowchart illustrating a point-of-interest-area joint recommendation method in a social network for location according to the present invention. The flow chart shows 6 execution steps included in the joint recommendation method: preprocessing user sign-in data, constructing a region level interest characteristic matrix, modeling region level interest, adding region dynamic weight, combining user personal interest and region interest, and recommending top-k interest points.

The interest point-area joint recommendation method in the location social network comprises the following specific steps:

step 1, preprocessing user historical sign-in data:

the user historical check-in data is formed by a series of quintuple records l, and each quintuple l comprises a user ID, an interest point ID, longitude, latitude and access frequency; in preprocessing the data set, the user who has picked at least 10 different points of interest visited and checked in at least 15 different points of interest constitutes the subject of the data set.

Step 2, constructing a region level interest characteristic matrix R:

and constructing a region-level interest characteristic matrix aiming at the processed check-in data set, wherein the matrix can reflect the interest of the user in the region where the interest point is located and is input for region-level interest modeling in the next step.

Step 3, modeling regional interest:

the region level interest characteristic matrix is used as input, after modeling is carried out through a logic matrix decomposition technology, a user characteristic vector and an interest point characteristic vector are learned, and the region level interest preference of the user can be obtained through the dot product operation of the two vectors.

Step 4, adding regional dynamic weight:

aiming at the difference of the attention degree of the user to the area, the dynamic weight of the area level interest is obtained by using a DBSCAN clustering algorithm, and the real area preference of the user is expressed by combining the area level interest preference.

Step 5, combining the personal interests and the region interests of the user:

the interest of the user to each interest point cannot be ignored while the region level interest is captured, so that the interest preference of the user can be complete as much as possible by fusing personal interests, and the effect of recommending the interest points is improved.

Step 6, recommending the top-k interest points to the user:

combining the above interest scores, top-k most interesting points of interest that have not been visited before will be recommended to the user.

Specifically, the data preprocessing part of step 1 will have the following steps:

1-1, the historical sign-in data of the user is not only messy, but also redundant, and in order to solve the problems, a main body (user ID + interest point ID) is firstly determined, then all data are used as keys, and the accumulated sign-in times are used as values, namely the most basic data composition (user ID, interest point ID, sign-in frequency) of the invention.

1-2 in order to obtain better recommendation effect, the invention selects interest points visited by at least 10 different users and users who checked in at least 15 different interest points to form the main body of the data set, so as to remove records with less check-in times, thus leading the data set to be more refined and effective, because interest points with too little visit or users with less visit interest points can cause less data information, thus leading the effective information not to be obtained. So the record format of the dictionary H finally constructed according to the data set is { [ user ID, Point of interest ID ]: check-in frequency }

Further, in step 2, a region-level interest feature matrix R is constructed, and the specific steps are as follows:

2-1, firstly, defining a concept, and referring to a circular geographic area formed by the interest point with its own geographic position as a center and radius r as a logical area of the interest point, in which the number of other interest points visited by the user will affect the interest of the user in the logical area of the interest point.

2-2 matrix construction:

2-2-1 first, a point of interest set G (G) is constructed from the data set obtained in step 1₁,g₂,g₃,…,g_n) And a dictionary M { g of the interest point ID corresponding to the longitude and latitude coordinates_i:[long_i，lat_i]And the set G contains all unrepeated interest points, and the dictionary M records the interest point IDs and corresponding longitude and latitude coordinates, so that the subsequent conversion between the interest point IDs and the position longitude and latitude information is facilitated.

2-2-2, the distance between each interest point and the target interest point is calculated by using a haversine distance formula, which is also called a half-vector positive formula, and is a calculation method for determining the distance between two points on a great circle according to the longitude and the latitude of the two points, and is commonly used for calculating the distance between the two points on the earth. In the present invention, the distance threshold is set to 10km, and while calculating the distance, a set of other interest points whose distance corresponding to each target interest point is within the threshold is saved by the dictionary X. And when all the distances are calculated, the dictionary X can be constructed, the data record format is { target interest point ID: set S }, wherein the distance between the interest point in the set S and the target interest point is less than or equal to the distance threshold.

2-2-3, initializing a region level interest characteristic matrix R, and constructing the matrix R through the dictionary X, wherein the specific formula is as follows:

R_uirepresenting the preference of the user u for the area in which the point of interest i is located in the matrix R. Wherein X_iRepresents a set of points of interest within the logical area of the target point of interest i (i.e., the distance between two points is within a threshold), and Y_uRepresents the set of points of interest visited by user u, so X_i∩Y_uI.e. the set of interest points visited by the user in the logical area of the target interest point, and α is the area compensation factor, this compensation mechanism is introduced because the area level interest of the user is also influenced by the same user in the case of the same size of logical area and the same number of visited interest pointsTo the effect of dense or sparse distribution of points of interest in the logical area. The more sparse the points of interest in the logical area, the more interesting the user is for the area, and conversely, the more dense the points of interest in the logical area, the less interesting the user is for the area compared to the sparse area.

In step 3, modeling the constructed region level interest characteristic matrix by using a logic matrix decomposition technology to obtain the region level interest preference of the user, wherein the whole modeling process comprises the following steps:

3-1 hypothesis l_ijAn event indicating that the user i chooses to interact with the logical area where the point of interest j is located (i.e. the user i likes and visits the logical area where the point of interest j is located), and sets β_iAnd beta_jRespectively a user bias vector and a point of interest bias vector. The probability P (l) of the user i accessing the logic area where the interest point j is located_ij|u_i,h_j,β_i,β_j) Comprises the following steps:

wherein u is_iIs a potential vector of user i, h_jThe probability P is a potential vector of the logic region where the interest point j is located, and can be represented as a preference score of the user for the region where the interest point is located.

3-2 finally learn the parameters U, H, β in the above formula by solving the following optimization problem:

Based on the above region-level interests, adding the region dynamic weight in step 4 includes the following specific steps:

4-1 the above-constructed region level interest is in an ideal state, but in real life, the user habits are different, and the degree of importance to the region is greatly different, so that it is reasonable and effective to add the region dynamic weight. In this step, the dynamic weights will be indirectly translated using a density clustering method, in which the main two parameters: the radius eps of the scanning area and the minimum number MinPts of interest points are included, in the invention, eps is 10km, minPts is 5, and the distance measurement formula is haversine distance formula.

4-1-1, clustering historical sign-in interest points of each user individually, initializing an unvisited interest point set T (interest points checked in by a target user) for one target user, and detecting unprocessed interest points T in the T_iIf the point of interest t_iIf it is not classified in a cluster or marked as noise (the number of interest points in the neighborhood is less than minPts), the interest point t is examined_iScanning the area in the radius, if the number of the interest points in the neighborhood is more than minPts, establishing a new cluster c_iAdding all interest points in the neighborhood into a candidate interest point set N; if the number is less than minPts, then t is considered to be_iAre noise points.

4-1-2, examining the neighborhood region of all interest points t which are not processed in the candidate interest point set N, and adding the interest points into the candidate interest point set N if at least minPts interest points are contained; if the point of interest t is not divided into any one cluster, it is added to the cluster c.

4-1-3 repeating the step 4-1-2, and continuing to check the interest points in the candidate interest point set N which are not processed until the candidate interest point set N is empty.

4-1-4 repeat steps 4-1-1 to 4-1-3 until all points of interest are grouped in a cluster or marked as noise points, and the algorithm ends.

The final output of the above steps is a set of clusters C (C)₁,c₂,…,c_m) Which contains a set of noise points.

4-2, combining the above-mentioned clustered results, regarding the noise interest points generated after clustering, therefore, regarding the ratio of the number of noise interest points to the number of user check-in interest points as the dispersion d of check-in distribution, that is, if more noise interest points are generated, it means that the check-in distribution of users is more dispersed, and at the same time, it means that users have their own specific targets when accessing the interest points, and may not like the mode of region access. Finally, combining

steps

3 and 4, the user's actual region-level interests can be integrated:

S＝2×(1-d)P(l_ij|u_i,h_j,β_i,β_j)

In step 5, in combination with the interest of the user in the interest point itself, similar to step 3-1, the POI-level interest of the user can be expressed as:

P(l_ij|u_i,v_j,η_i,η_j) Probability of visiting interest point j for user i, and preference score of user i to interest point, where u_iIs a potential vector, v, of user i_jPotential vectors, η, representing points of interest_iAnd η_jRespectively a user bias vector and a bias vector of the region where the interest point is located. Probability P may be through a 3-2 step matrix of historical check-ins for users

Modeling is performed (each element in the matrix represents the frequency of the user accessing the point of interest, and if 0, it represents that the user has no interaction with the point of interest). Modeled POI-level interest P (l)_ij|u_i,v_j,η_i,η_j) And the region level interest P (l) after step 4_ij|u_i,h_j,β_i,β_j) After aggregation, the user truths can be formedPositive preference S':

S'＝P(l_ij|u_i,v_j,η_i,η_j)+2×(1-d)P(l_ij|u_i,h_j,β_i,β_j)

finally, in step 6, the point of interest recommendation includes the following steps:

and comprehensively ordering the interests of all the interest points of each user according to the comprehensive interest preference score S', selecting top-k interest points with highest scores for each user to form a list and recommending the list to the user.

FIG. 2 is a schematic diagram of a region level interest modeling process. The left side is initialized user potential matrix and interest point area potential matrix, the purpose of the logic matrix decomposition model is to learn the two matrix parameters, after logic matrix decomposition modeling, the right learned actual user potential matrix and interest point area potential matrix can be obtained, and the interest score of each user for each interest point can be obtained through dot product operation of the two matrices.

Fig. 3 is a clustering visualization result of check-in points of interest of a user, and it can be seen that check-in of the user is dense, red represents noise points of interest, green and blue represent two clustered clusters, and the ratio of the number of check-in noise points of interest of the user to the total number of check-in points of interest of the user is 0.085, which indicates that the user has a large preference for areas and places importance on area access, so that the weight distribution of area-level interest is large.

Fig. 4 is a clustering result of check-in points of interest of another user, and it can be seen from the relatively dispersed check-in distribution of red noise points of interest in the graph that this user does not have habit of region access, likes to visit points everywhere, and its check-in dispersion is 0.71, so that he can be assigned with lower region-level interest dynamic weight.

Claims

1. A point-of-interest-area joint recommendation method in a location social network is characterized by comprising the following steps:

step 1, preprocessing user historical sign-in data:

step 2, constructing a region level interest characteristic matrix R:

step 3, modeling regional interest:

step 4, adding regional dynamic weight:

step 5, combining the personal interests and the region interests of the user:

and 6, recommending the top-k interest points to the user.

2. The method of claim 1, wherein the method comprises the following steps:

the step 1 specifically comprises the following steps:

3. The method of claim 2, wherein the method comprises the following steps:

the step 2 specifically comprises the following steps:

step 2-2, matrix construction:

4. The method of claim 3, wherein the method comprises the following steps:

step 3-1 hypothesis l_ijAn event indicating that the user i chooses to interact with the logical area where the point of interest j is located (i.e. the user i likes and visits the logical area where the point of interest j is located), and sets β_iAnd beta_jThe probability P (l) of the user i accessing the logic area where the interest point j is located is the user offset vector and the interest point offset vector respectively_ij|u_i,h_j,β_i,β_j) Comprises the following steps:

wherein u is_iIs a potential vector of user i, h_jIs a potential vector of the logic region where the interest point j is located, and the probability P can be expressed as the region where the interest point is located of the userA preference score;

5. The method of claim 4, wherein the method comprises the following steps:

step 4-1, indirectly converting the dynamic weight by using a density clustering method, and adopting two parameters: the scanning area radius eps and the minimum number of interest points contained, MinPts:

S＝2×(1-d)P(l_ij|u_i,h_j,β_i,β_j)

6. The method of claim 5, wherein the method comprises:

the POI-level interest of the user in step 5 may be expressed as:

S'＝P(l_ij|u_i,v_j,η_i,η_j)+2×(1-d)P(l_ij|u_i,h_j,β_i,β_j)。

7. the method of claim 6, wherein the method comprises the following steps: