CN112905905A - Interest point-area joint recommendation method in location social network - Google Patents

Interest point-area joint recommendation method in location social network Download PDF

Info

Publication number
CN112905905A
CN112905905A CN202110092706.5A CN202110092706A CN112905905A CN 112905905 A CN112905905 A CN 112905905A CN 202110092706 A CN202110092706 A CN 202110092706A CN 112905905 A CN112905905 A CN 112905905A
Authority
CN
China
Prior art keywords
interest
user
points
point
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110092706.5A
Other languages
Chinese (zh)
Inventor
袁浩
徐建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110092706.5A priority Critical patent/CN112905905A/en
Publication of CN112905905A publication Critical patent/CN112905905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of computer application, and particularly discloses a point of interest-region joint recommendation method in a location social network.

Description

Interest point-area joint recommendation method in location social network
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a point of interest-region joint recommendation method in a location social network.
Background
In recent years, with the rapid development of location social networks, known location social networks such as Foursquare have attracted nearly billions of registered users, thus generating a large number of historical user check-in records, and the data record information has a unique and key role in the field of point of interest recommendation.
The basic process of the current interest point recommendation is to infer the interest point preference of a user through historical check-in information of the user, and to perform recommendation according to the interest point preference of the user. In the historical check-in data of the users, not only the interest point preference of the users is implied, but also the preference of the users for the areas where the interest points are located is implied, the preference indirectly influences the preference of the users for the interest points, in the interest point recommendation field, people often ignore the area level interest, and the check-in habit of the areas of the users is just ignored, so that the system cannot capture the real complete interest point preference of the users, which is also a defect of the current interest point recommendation technology.
Disclosure of Invention
The invention aims to overcome the defect that the prior art captures the incomplete interest of a user, fully excavates the regional interest of the user neglected in the prior interest point recommendation field, and provides a method for constructing the regional interest of the user on the basis. The specific technical scheme is as follows:
a point-of-interest-area joint recommendation method in a location social network comprises the following steps:
step 1, preprocessing user historical sign-in data:
the user historical check-in data is formed by a series of quintuple records l, and each quintuple l comprises a user ID, an interest point ID, longitude, latitude and access frequency; when the data set is preprocessed, the interest points accessed by at least 10 different users are selected, and the users signed in at least 15 different interest points form the main body of the data set;
step 2, constructing a region level interest characteristic matrix R:
aiming at the processed check-in data set, a region level interest characteristic matrix is constructed, the matrix can reflect the interest of the user in the region where the interest point is located, and the interest characteristic matrix is input for region level interest modeling in the next step;
step 3, modeling regional interest:
the region level interest characteristic matrix is used as input, after modeling is carried out through a logic matrix decomposition technology, a user characteristic vector and an interest point characteristic vector are learned, and the region level interest preference of a user can be obtained through the dot product operation of the two vectors;
step 4, adding regional dynamic weight:
aiming at the difference of the attention degree of the user to the region, obtaining the region level interest dynamic weight by using a DBSCAN clustering algorithm, and expressing the real region preference of the user by combining the region level interest preference;
step 5, combining the personal interests and the region interests of the user:
the interest of the user to each interest point cannot be ignored while the region level interest is captured, so that the interest preference of the user can be complete as much as possible by fusing personal interests, and the effect of recommending the interest points is improved;
and 6, recommending the top-k interest points to the user.
Further, the step 1 specifically includes the following steps:
step 1-1, firstly determining a main body comprising a user ID and an interest point ID, then taking the main body as a key for all data, and taking accumulated sign-in times as a value, namely the most basic data composition comprising the user ID, the interest point ID and sign-in frequency;
step 1-2, selecting interest points accessed by at least 10 different users and users signed in at least 15 different interest points to form a main body of the data set, and finally, establishing a record format of a dictionary H according to the data set, wherein the record format is { [ user ID, interest point ID ]: sign-in frequency }.
Further, the step 2 specifically includes the following steps:
step 2-1, firstly defining a concept, and calling a circular geographic area formed by the interest points with the geographic positions of the interest points as the circle centers and the radius r as a logic area of the interest points, wherein in the logic area, the number of other interest points accessed by a user influences the interest of the user in the logic area of the interest points;
step 2-2, matrix construction:
step 2-2-1, constructing an interest point set G (G) according to the data set obtained in the step 11,g2,g3,…,gn) And a dictionary M { g of the interest point ID corresponding to the longitude and latitude coordinatesi:[long_i,lat_i]The set G contains all non-repeated interest points, and an interest point ID and corresponding longitude and latitude coordinates are recorded in the dictionary M;
step 2-2-2, calculating the distance between each interest point and a target interest point through a haversine distance formula, setting a distance threshold to be 10km, and storing a set of other interest points of which the distance between each target interest point and the corresponding target interest point is within the threshold through a dictionary X while calculating the distance; after all the distances are calculated, the dictionary X can be constructed, the data record format is { target interest point ID: set S }, wherein the distance between the interest point in the set S and the target interest point is less than or equal to the distance threshold;
step 2-2-3, initializing a region level interest characteristic matrix R, and constructing the matrix R through the dictionary X, wherein the specific formula is as follows:
Figure BDA0002911278430000031
Ruirepresenting the preference of the user u in the area of the interest point i in the matrix R; wherein XiRepresents the set of points of interest within the logical area of the target point of interest i, and YuRepresenting a set of points of interest, X, visited by user ui∩YuThe method comprises the steps that a point of interest set visited by a user in a logic area of a target point of interest is represented, alpha is an area compensation factor, and the area level interest of the user is also influenced by dense or sparse distribution of the point of interest in the logic area under the condition that the same user has the same size of the logic area and the same number of visited points of interest; the more sparse the points of interest in the logical area, the more interesting the user is for the area, and conversely, the more dense the points of interest in the logical area, the less interesting the user is for the area compared to the sparse area.
Further, step 3-1 assumes lijAn event indicating that the user i chooses to interact with the logical area where the point of interest j is located (i.e. the user i likes and visits the logical area where the point of interest j is located), and sets βiAnd betajThe probability P (l) of the user i accessing the logic area where the interest point j is located is the user offset vector and the interest point offset vector respectivelyij|ui,hjij) Comprises the following steps:
Figure BDA0002911278430000032
wherein u isiIs a potential vector of user i, hjThe probability P is a potential vector of a logic area where the interest point j is located, and can be expressed as a preference score of the user on the area where the interest point is located;
step 3-2 finally learns the parameters U, H, β in the above formula by solving the following optimization problem:
Figure BDA0002911278430000033
the required parameters can be learned by optimizing the objective function in an iterative mode of random gradient descent, and finally the region-level interest of the user, namely P (l) is represented by the learned parameters U, H and betaij|ui,hjij)。
Further, step 4-1, indirectly converting the dynamic weight by using a density clustering method, and adopting two parameters: the scanning area radius eps and the minimum number of interest points contained, MinPts:
step 4-1-1, clustering historical sign-in interest points of each user independently, initializing an inaccessible interest point set T for one target user, and detecting unprocessed interest points T in the set TiIf the point of interest tiIf it is not included in a cluster or marked as noise, i.e. if the number of interest points in the neighborhood is less than minPts, the interest point t is examinediScanning the area in the radius, if the number of the interest points in the neighborhood is more than minPts, establishing a new cluster ciAdding all interest points in the neighborhood into a candidate interest point set N; if the number is less than minPts, then t is considered to beiIs a noise point;
step 4-1-2, checking the neighborhood region of all the interest points t which are not processed in the candidate interest point set N, and adding the interest points into the candidate interest point set N if at least minPts interest points are contained; if the interest point t is not divided into any cluster, adding it to the cluster c;
4-1-3, repeating the step 4-1-2, and continuously checking the interest points which are not processed in the candidate interest point set N until the candidate interest point set N is empty;
4-1-4, repeating the steps 4-1-1 to 4-1-3 until all the interest points are classified into a certain cluster or marked as noise points;
the final output is a set C (C) of clusters1,c2,…,cm) Which contains a set of noise points.
Step 4-2, combining the clustering result, regarding the noise interest points generated after clustering, regarding the ratio of the number of the noise interest points to the number of the user sign-in interest points as sign-in distribution dispersion d, combining step 3 and step 4, and integrating the actual region level interests of the user:
S=2×(1-d)P(lij|ui,hjij)
the dynamic weight of the region is set to be 2 x (1-d), wherein d is the check-in distribution dispersion degree obtained after the historical check-in clusters of each user i, and the check-in distribution dispersion degree is used for balancing the preference of the user on the interest points and the preference of the user on the region where the interest points are located.
Further, the POI-level interest of the user in step 5 may be expressed as:
Figure BDA0002911278430000041
P(lij|ui,vjij) Probability of visiting interest point j for user i, and preference score of user i to interest point, where uiIs a potential vector, v, of user ijPotential vectors, η, representing points of interestiAnd ηjRespectively representing a user offset vector and an area offset vector of the interest point; probability P through step 3-2 for historical sign-in matrix of user
Figure BDA0002911278430000042
Modeling is carried out, each element in the matrix represents the frequency of the user accessing the interest point, and if the frequency is 0, the user does not interact with the interest point; modeled POI-level interest P (l)ij|ui,vjij) And the region level interest P (l) after step 4ij|ui,hjij) After aggregation, the true preference S' of the user is formed:
S'=P(lij|ui,vjij)+2×(1-d)P(lij|ui,hjij)。
further, according to the comprehensive interest preference score S', the interests of all the interest points of each user are comprehensively ranked, and top-k interest points with the highest scores are selected for each user to form a list and are recommended to the user.
The invention has the beneficial effects that:
according to the interest point-area combined recommendation method based on the location social network, the interest of the user in the area where the interest point is located is fully utilized, so that deeper user check-in behaviors are mined, the interest points which are more interested are recommended to the user by means of the complete user preference in combination with the interest of the user in the interest point, and the user is guided to check in the interest points.
The method not only considers the interest point preference of the user, but also considers the area level interest preference of the user to the area where the interest point is located when recommending the interest point to the user, and also considers the difference of the attention degree of each user to the area where the specific interest point is located when capturing the area level interest of the user, so as to establish an area level dynamic weight, and further make more accurate interest point recommendation.
Drawings
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic diagram of the region level interest modeling process of the present invention.
FIG. 3 is a visualization diagram of a clustering result of historical check-in records of a certain user.
FIG. 4 is a graph of a historical check-in record clustering result visualization of another user.
Detailed Description
The invention will be further explained with reference to the drawings.
FIG. 1 is a flowchart illustrating a point-of-interest-area joint recommendation method in a social network for location according to the present invention. The flow chart shows 6 execution steps included in the joint recommendation method: preprocessing user sign-in data, constructing a region level interest characteristic matrix, modeling region level interest, adding region dynamic weight, combining user personal interest and region interest, and recommending top-k interest points.
The interest point-area joint recommendation method in the location social network comprises the following specific steps:
step 1, preprocessing user historical sign-in data:
the user historical check-in data is formed by a series of quintuple records l, and each quintuple l comprises a user ID, an interest point ID, longitude, latitude and access frequency; in preprocessing the data set, the user who has picked at least 10 different points of interest visited and checked in at least 15 different points of interest constitutes the subject of the data set.
Step 2, constructing a region level interest characteristic matrix R:
and constructing a region-level interest characteristic matrix aiming at the processed check-in data set, wherein the matrix can reflect the interest of the user in the region where the interest point is located and is input for region-level interest modeling in the next step.
Step 3, modeling regional interest:
the region level interest characteristic matrix is used as input, after modeling is carried out through a logic matrix decomposition technology, a user characteristic vector and an interest point characteristic vector are learned, and the region level interest preference of the user can be obtained through the dot product operation of the two vectors.
Step 4, adding regional dynamic weight:
aiming at the difference of the attention degree of the user to the area, the dynamic weight of the area level interest is obtained by using a DBSCAN clustering algorithm, and the real area preference of the user is expressed by combining the area level interest preference.
Step 5, combining the personal interests and the region interests of the user:
the interest of the user to each interest point cannot be ignored while the region level interest is captured, so that the interest preference of the user can be complete as much as possible by fusing personal interests, and the effect of recommending the interest points is improved.
Step 6, recommending the top-k interest points to the user:
combining the above interest scores, top-k most interesting points of interest that have not been visited before will be recommended to the user.
Specifically, the data preprocessing part of step 1 will have the following steps:
1-1, the historical sign-in data of the user is not only messy, but also redundant, and in order to solve the problems, a main body (user ID + interest point ID) is firstly determined, then all data are used as keys, and the accumulated sign-in times are used as values, namely the most basic data composition (user ID, interest point ID, sign-in frequency) of the invention.
1-2 in order to obtain better recommendation effect, the invention selects interest points visited by at least 10 different users and users who checked in at least 15 different interest points to form the main body of the data set, so as to remove records with less check-in times, thus leading the data set to be more refined and effective, because interest points with too little visit or users with less visit interest points can cause less data information, thus leading the effective information not to be obtained. So the record format of the dictionary H finally constructed according to the data set is { [ user ID, Point of interest ID ]: check-in frequency }
Further, in step 2, a region-level interest feature matrix R is constructed, and the specific steps are as follows:
2-1, firstly, defining a concept, and referring to a circular geographic area formed by the interest point with its own geographic position as a center and radius r as a logical area of the interest point, in which the number of other interest points visited by the user will affect the interest of the user in the logical area of the interest point.
2-2 matrix construction:
2-2-1 first, a point of interest set G (G) is constructed from the data set obtained in step 11,g2,g3,…,gn) And a dictionary M { g of the interest point ID corresponding to the longitude and latitude coordinatesi:[long_i,lat_i]And the set G contains all unrepeated interest points, and the dictionary M records the interest point IDs and corresponding longitude and latitude coordinates, so that the subsequent conversion between the interest point IDs and the position longitude and latitude information is facilitated.
2-2-2, the distance between each interest point and the target interest point is calculated by using a haversine distance formula, which is also called a half-vector positive formula, and is a calculation method for determining the distance between two points on a great circle according to the longitude and the latitude of the two points, and is commonly used for calculating the distance between the two points on the earth. In the present invention, the distance threshold is set to 10km, and while calculating the distance, a set of other interest points whose distance corresponding to each target interest point is within the threshold is saved by the dictionary X. And when all the distances are calculated, the dictionary X can be constructed, the data record format is { target interest point ID: set S }, wherein the distance between the interest point in the set S and the target interest point is less than or equal to the distance threshold.
2-2-3, initializing a region level interest characteristic matrix R, and constructing the matrix R through the dictionary X, wherein the specific formula is as follows:
Figure BDA0002911278430000071
Ruirepresenting the preference of the user u for the area in which the point of interest i is located in the matrix R. Wherein XiRepresents a set of points of interest within the logical area of the target point of interest i (i.e., the distance between two points is within a threshold), and YuRepresents the set of points of interest visited by user u, so Xi∩YuI.e. the set of interest points visited by the user in the logical area of the target interest point, and α is the area compensation factor, this compensation mechanism is introduced because the area level interest of the user is also influenced by the same user in the case of the same size of logical area and the same number of visited interest pointsTo the effect of dense or sparse distribution of points of interest in the logical area. The more sparse the points of interest in the logical area, the more interesting the user is for the area, and conversely, the more dense the points of interest in the logical area, the less interesting the user is for the area compared to the sparse area.
In step 3, modeling the constructed region level interest characteristic matrix by using a logic matrix decomposition technology to obtain the region level interest preference of the user, wherein the whole modeling process comprises the following steps:
3-1 hypothesis lijAn event indicating that the user i chooses to interact with the logical area where the point of interest j is located (i.e. the user i likes and visits the logical area where the point of interest j is located), and sets βiAnd betajRespectively a user bias vector and a point of interest bias vector. The probability P (l) of the user i accessing the logic area where the interest point j is locatedij|ui,hjij) Comprises the following steps:
Figure BDA0002911278430000081
wherein u isiIs a potential vector of user i, hjThe probability P is a potential vector of the logic region where the interest point j is located, and can be represented as a preference score of the user for the region where the interest point is located.
3-2 finally learn the parameters U, H, β in the above formula by solving the following optimization problem:
Figure BDA0002911278430000082
the required parameters can be learned by optimizing the objective function in an iterative mode of random gradient descent, and finally the region-level interest of the user, namely P (l) is represented by the learned parameters U, H and betaij|ui,hjij)。
Based on the above region-level interests, adding the region dynamic weight in step 4 includes the following specific steps:
4-1 the above-constructed region level interest is in an ideal state, but in real life, the user habits are different, and the degree of importance to the region is greatly different, so that it is reasonable and effective to add the region dynamic weight. In this step, the dynamic weights will be indirectly translated using a density clustering method, in which the main two parameters: the radius eps of the scanning area and the minimum number MinPts of interest points are included, in the invention, eps is 10km, minPts is 5, and the distance measurement formula is haversine distance formula.
4-1-1, clustering historical sign-in interest points of each user individually, initializing an unvisited interest point set T (interest points checked in by a target user) for one target user, and detecting unprocessed interest points T in the TiIf the point of interest tiIf it is not classified in a cluster or marked as noise (the number of interest points in the neighborhood is less than minPts), the interest point t is examinediScanning the area in the radius, if the number of the interest points in the neighborhood is more than minPts, establishing a new cluster ciAdding all interest points in the neighborhood into a candidate interest point set N; if the number is less than minPts, then t is considered to beiAre noise points.
4-1-2, examining the neighborhood region of all interest points t which are not processed in the candidate interest point set N, and adding the interest points into the candidate interest point set N if at least minPts interest points are contained; if the point of interest t is not divided into any one cluster, it is added to the cluster c.
4-1-3 repeating the step 4-1-2, and continuing to check the interest points in the candidate interest point set N which are not processed until the candidate interest point set N is empty.
4-1-4 repeat steps 4-1-1 to 4-1-3 until all points of interest are grouped in a cluster or marked as noise points, and the algorithm ends.
The final output of the above steps is a set of clusters C (C)1,c2,…,cm) Which contains a set of noise points.
4-2, combining the above-mentioned clustered results, regarding the noise interest points generated after clustering, therefore, regarding the ratio of the number of noise interest points to the number of user check-in interest points as the dispersion d of check-in distribution, that is, if more noise interest points are generated, it means that the check-in distribution of users is more dispersed, and at the same time, it means that users have their own specific targets when accessing the interest points, and may not like the mode of region access. Finally, combining steps 3 and 4, the user's actual region-level interests can be integrated:
S=2×(1-d)P(lij|ui,hjij)
the dynamic weight of the region is set to be 2 x (1-d), wherein d is the check-in distribution dispersion degree obtained after the historical check-in clusters of each user i, and the check-in distribution dispersion degree is used for balancing the preference of the user on the interest points and the preference of the user on the region where the interest points are located.
In step 5, in combination with the interest of the user in the interest point itself, similar to step 3-1, the POI-level interest of the user can be expressed as:
Figure BDA0002911278430000091
P(lij|ui,vjij) Probability of visiting interest point j for user i, and preference score of user i to interest point, where uiIs a potential vector, v, of user ijPotential vectors, η, representing points of interestiAnd ηjRespectively a user bias vector and a bias vector of the region where the interest point is located. Probability P may be through a 3-2 step matrix of historical check-ins for users
Figure BDA0002911278430000092
Modeling is performed (each element in the matrix represents the frequency of the user accessing the point of interest, and if 0, it represents that the user has no interaction with the point of interest). Modeled POI-level interest P (l)ij|ui,vjij) And the region level interest P (l) after step 4ij|ui,hjij) After aggregation, the user truths can be formedPositive preference S':
S'=P(lij|ui,vjij)+2×(1-d)P(lij|ui,hjij)
finally, in step 6, the point of interest recommendation includes the following steps:
and comprehensively ordering the interests of all the interest points of each user according to the comprehensive interest preference score S', selecting top-k interest points with highest scores for each user to form a list and recommending the list to the user.
FIG. 2 is a schematic diagram of a region level interest modeling process. The left side is initialized user potential matrix and interest point area potential matrix, the purpose of the logic matrix decomposition model is to learn the two matrix parameters, after logic matrix decomposition modeling, the right learned actual user potential matrix and interest point area potential matrix can be obtained, and the interest score of each user for each interest point can be obtained through dot product operation of the two matrices.
Fig. 3 is a clustering visualization result of check-in points of interest of a user, and it can be seen that check-in of the user is dense, red represents noise points of interest, green and blue represent two clustered clusters, and the ratio of the number of check-in noise points of interest of the user to the total number of check-in points of interest of the user is 0.085, which indicates that the user has a large preference for areas and places importance on area access, so that the weight distribution of area-level interest is large.
Fig. 4 is a clustering result of check-in points of interest of another user, and it can be seen from the relatively dispersed check-in distribution of red noise points of interest in the graph that this user does not have habit of region access, likes to visit points everywhere, and its check-in dispersion is 0.71, so that he can be assigned with lower region-level interest dynamic weight.

Claims (7)

1. A point-of-interest-area joint recommendation method in a location social network is characterized by comprising the following steps:
step 1, preprocessing user historical sign-in data:
the user historical check-in data is formed by a series of quintuple records l, and each quintuple l comprises a user ID, an interest point ID, longitude, latitude and access frequency; when the data set is preprocessed, the interest points accessed by at least 10 different users are selected, and the users signed in at least 15 different interest points form the main body of the data set;
step 2, constructing a region level interest characteristic matrix R:
aiming at the processed check-in data set, a region level interest characteristic matrix is constructed, the matrix can reflect the interest of the user in the region where the interest point is located, and the interest characteristic matrix is input for region level interest modeling in the next step;
step 3, modeling regional interest:
the region level interest characteristic matrix is used as input, after modeling is carried out through a logic matrix decomposition technology, a user characteristic vector and an interest point characteristic vector are learned, and the region level interest preference of a user can be obtained through the dot product operation of the two vectors;
step 4, adding regional dynamic weight:
aiming at the difference of the attention degree of the user to the region, obtaining the region level interest dynamic weight by using a DBSCAN clustering algorithm, and expressing the real region preference of the user by combining the region level interest preference;
step 5, combining the personal interests and the region interests of the user:
the interest of the user to each interest point cannot be ignored while the region level interest is captured, so that the interest preference of the user can be complete as much as possible by fusing personal interests, and the effect of recommending the interest points is improved;
and 6, recommending the top-k interest points to the user.
2. The method of claim 1, wherein the method comprises the following steps:
the step 1 specifically comprises the following steps:
step 1-1, firstly determining a main body comprising a user ID and an interest point ID, then taking the main body as a key for all data, and taking accumulated sign-in times as a value, namely the most basic data composition comprising the user ID, the interest point ID and sign-in frequency;
step 1-2, selecting interest points accessed by at least 10 different users and users signed in at least 15 different interest points to form a main body of the data set, and finally, establishing a record format of a dictionary H according to the data set, wherein the record format is { [ user ID, interest point ID ]: sign-in frequency }.
3. The method of claim 2, wherein the method comprises the following steps:
the step 2 specifically comprises the following steps:
step 2-1, firstly defining a concept, and calling a circular geographic area formed by the interest points with the geographic positions of the interest points as the circle centers and the radius r as a logic area of the interest points, wherein in the logic area, the number of other interest points accessed by a user influences the interest of the user in the logic area of the interest points;
step 2-2, matrix construction:
step 2-2-1, constructing an interest point set G (G) according to the data set obtained in the step 11,g2,g3,…,gn) And a dictionary M { g of the interest point ID corresponding to the longitude and latitude coordinatesi:[long_i,lat_i]The set G contains all non-repeated interest points, and an interest point ID and corresponding longitude and latitude coordinates are recorded in the dictionary M;
step 2-2-2, calculating the distance between each interest point and a target interest point through a haversine distance formula, setting a distance threshold to be 10km, and storing a set of other interest points of which the distance between each target interest point and the corresponding target interest point is within the threshold through a dictionary X while calculating the distance; after all the distances are calculated, the dictionary X can be constructed, the data record format is { target interest point ID: set S }, wherein the distance between the interest point in the set S and the target interest point is less than or equal to the distance threshold;
step 2-2-3, initializing a region level interest characteristic matrix R, and constructing the matrix R through the dictionary X, wherein the specific formula is as follows:
Figure FDA0002911278420000021
Ruirepresenting the preference of the user u in the area of the interest point i in the matrix R; wherein XiRepresents the set of points of interest within the logical area of the target point of interest i, and YuRepresenting a set of points of interest, X, visited by user ui∩YuThe method comprises the steps that a point of interest set visited by a user in a logic area of a target point of interest is represented, alpha is an area compensation factor, and the area level interest of the user is also influenced by dense or sparse distribution of the point of interest in the logic area under the condition that the same user has the same size of the logic area and the same number of visited points of interest; the more sparse the points of interest in the logical area, the more interesting the user is for the area, and conversely, the more dense the points of interest in the logical area, the less interesting the user is for the area compared to the sparse area.
4. The method of claim 3, wherein the method comprises the following steps:
step 3-1 hypothesis lijAn event indicating that the user i chooses to interact with the logical area where the point of interest j is located (i.e. the user i likes and visits the logical area where the point of interest j is located), and sets βiAnd betajThe probability P (l) of the user i accessing the logic area where the interest point j is located is the user offset vector and the interest point offset vector respectivelyij|ui,hjij) Comprises the following steps:
Figure FDA0002911278420000031
wherein u isiIs a potential vector of user i, hjIs a potential vector of the logic region where the interest point j is located, and the probability P can be expressed as the region where the interest point is located of the userA preference score;
step 3-2 finally learns the parameters U, H, β in the above formula by solving the following optimization problem:
Figure FDA0002911278420000032
the required parameters can be learned by optimizing the objective function in an iterative mode of random gradient descent, and finally the region-level interest of the user, namely P (l) is represented by the learned parameters U, H and betaij|ui,hjij)。
5. The method of claim 4, wherein the method comprises the following steps:
step 4-1, indirectly converting the dynamic weight by using a density clustering method, and adopting two parameters: the scanning area radius eps and the minimum number of interest points contained, MinPts:
step 4-1-1, clustering historical sign-in interest points of each user independently, initializing an inaccessible interest point set T for one target user, and detecting unprocessed interest points T in the set TiIf the point of interest tiIf it is not included in a cluster or marked as noise, i.e. if the number of interest points in the neighborhood is less than minPts, the interest point t is examinediScanning the area in the radius, if the number of the interest points in the neighborhood is more than minPts, establishing a new cluster ciAdding all interest points in the neighborhood into a candidate interest point set N; if the number is less than minPts, then t is considered to beiIs a noise point;
step 4-1-2, checking the neighborhood region of all the interest points t which are not processed in the candidate interest point set N, and adding the interest points into the candidate interest point set N if at least minPts interest points are contained; if the interest point t is not divided into any cluster, adding it to the cluster c;
4-1-3, repeating the step 4-1-2, and continuously checking the interest points which are not processed in the candidate interest point set N until the candidate interest point set N is empty;
4-1-4, repeating the steps 4-1-1 to 4-1-3 until all the interest points are classified into a certain cluster or marked as noise points;
the final output is a set C (C) of clusters1,c2,…,cm) Which contains a set of noise points.
Step 4-2, combining the clustering result, regarding the noise interest points generated after clustering, regarding the ratio of the number of the noise interest points to the number of the user sign-in interest points as sign-in distribution dispersion d, combining step 3 and step 4, and integrating the actual region level interests of the user:
S=2×(1-d)P(lij|ui,hjij)
the dynamic weight of the region is set to be 2 x (1-d), wherein d is the check-in distribution dispersion degree obtained after the historical check-in clusters of each user i, and the check-in distribution dispersion degree is used for balancing the preference of the user on the interest points and the preference of the user on the region where the interest points are located.
6. The method of claim 5, wherein the method comprises:
the POI-level interest of the user in step 5 may be expressed as:
Figure FDA0002911278420000041
P(lij|ui,vjij) Probability of visiting interest point j for user i, and preference score of user i to interest point, where uiIs a potential vector, v, of user ijPotential vectors, η, representing points of interestiAnd ηjRespectively representing a user offset vector and an area offset vector of the interest point; probability P through step 3-2 for historical sign-in matrix of user
Figure FDA0002911278420000042
Modeling is carried out, each element in the matrix represents the frequency of the user accessing the interest point, and if the frequency is 0, the user does not interact with the interest point; modeled POI-level interest P (l)ij|ui,vjij) And the region level interest P (l) after step 4ij|ui,hjij) After aggregation, the true preference S' of the user is formed:
S'=P(lij|ui,vjij)+2×(1-d)P(lij|ui,hjij)。
7. the method of claim 6, wherein the method comprises the following steps:
and comprehensively ordering the interests of all the interest points of each user according to the comprehensive interest preference score S', selecting top-k interest points with highest scores for each user to form a list and recommending the list to the user.
CN202110092706.5A 2021-01-22 2021-01-22 Interest point-area joint recommendation method in location social network Pending CN112905905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110092706.5A CN112905905A (en) 2021-01-22 2021-01-22 Interest point-area joint recommendation method in location social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110092706.5A CN112905905A (en) 2021-01-22 2021-01-22 Interest point-area joint recommendation method in location social network

Publications (1)

Publication Number Publication Date
CN112905905A true CN112905905A (en) 2021-06-04

Family

ID=76117158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110092706.5A Pending CN112905905A (en) 2021-01-22 2021-01-22 Interest point-area joint recommendation method in location social network

Country Status (1)

Country Link
CN (1) CN112905905A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505306A (en) * 2021-06-21 2021-10-15 广东交通职业技术学院 Interest point recommendation method, system and medium based on heterogeneous graph neural network
CN113761381A (en) * 2021-09-23 2021-12-07 北京百度网讯科技有限公司 Method, device and equipment for recommending interest points and storage medium
CN114048391A (en) * 2022-01-13 2022-02-15 中国测绘科学研究院 Interest activity recommendation method based on geographic grid
CN115048560A (en) * 2022-03-30 2022-09-13 华为技术有限公司 Data processing method and related device
CN115860780A (en) * 2022-12-16 2023-03-28 江苏易交易信息科技有限公司 Accurate analysis recommendation matching method and system for potential trading users

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653637A (en) * 2015-12-28 2016-06-08 苏州大学 Interest point recommendation method based on hierarchical structure
US10019734B2 (en) * 2007-07-03 2018-07-10 Vulcan Inc. Method and system for continuous, dynamic, adaptive recommendation based on a continuously evolving personal region of interest
CN111324816A (en) * 2020-03-05 2020-06-23 重庆大学 Interest point recommendation method based on region division and context influence
CN112052405A (en) * 2020-08-24 2020-12-08 杭州电子科技大学 Passenger searching area recommendation method based on driver experience
CN112131490A (en) * 2020-09-18 2020-12-25 东南大学 Region-sensitive interest point recommendation method driven by knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019734B2 (en) * 2007-07-03 2018-07-10 Vulcan Inc. Method and system for continuous, dynamic, adaptive recommendation based on a continuously evolving personal region of interest
CN105653637A (en) * 2015-12-28 2016-06-08 苏州大学 Interest point recommendation method based on hierarchical structure
CN111324816A (en) * 2020-03-05 2020-06-23 重庆大学 Interest point recommendation method based on region division and context influence
CN112052405A (en) * 2020-08-24 2020-12-08 杭州电子科技大学 Passenger searching area recommendation method based on driver experience
CN112131490A (en) * 2020-09-18 2020-12-25 东南大学 Region-sensitive interest point recommendation method driven by knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAO YUAN ET AL: "PRPOIR: Exploiting the Region-Level Interest for POI Recommendation", 《2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI)》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505306A (en) * 2021-06-21 2021-10-15 广东交通职业技术学院 Interest point recommendation method, system and medium based on heterogeneous graph neural network
CN113505306B (en) * 2021-06-21 2022-04-22 广东交通职业技术学院 Interest point recommendation method, system and medium based on heterogeneous graph neural network
CN113761381A (en) * 2021-09-23 2021-12-07 北京百度网讯科技有限公司 Method, device and equipment for recommending interest points and storage medium
CN113761381B (en) * 2021-09-23 2023-08-08 北京百度网讯科技有限公司 Method, device, equipment and storage medium for recommending interest points
CN114048391A (en) * 2022-01-13 2022-02-15 中国测绘科学研究院 Interest activity recommendation method based on geographic grid
CN114048391B (en) * 2022-01-13 2022-04-19 中国测绘科学研究院 Interest activity recommendation method based on geographic grid
CN115048560A (en) * 2022-03-30 2022-09-13 华为技术有限公司 Data processing method and related device
CN115860780A (en) * 2022-12-16 2023-03-28 江苏易交易信息科技有限公司 Accurate analysis recommendation matching method and system for potential trading users
CN115860780B (en) * 2022-12-16 2023-11-21 江苏易交易信息科技有限公司 Transaction potential user accurate analysis recommendation matching method and system

Similar Documents

Publication Publication Date Title
CN112905905A (en) Interest point-area joint recommendation method in location social network
CN111309824B (en) Entity relationship graph display method and system
CN110781406B (en) Social network user multi-attribute inference method based on variational automatic encoder
CN109949176B (en) Graph embedding-based method for detecting abnormal users in social network
CN107220312B (en) Interest point recommendation method and system based on co-occurrence graph
CN108280121B (en) Method for obtaining social network opinion leader based on K-kernel decomposition
WO2022247955A1 (en) Abnormal account identification method, apparatus and device, and storage medium
CN109492076B (en) Community question-answer website answer credible evaluation method based on network
CN113407864B (en) Group recommendation method based on mixed attention network
CN111143704B (en) Online community friend recommendation method and system integrating user influence relationship
CN110134883B (en) Heterogeneous social network location entity anchor link identification method
CN111143689A (en) Method for constructing recommendation engine according to user requirements and user portrait
CN112817563A (en) Target attribute configuration information determination method, computer device, and storage medium
CN110059795A (en) A kind of mobile subscriber's node networking method merging geographical location and temporal characteristics
Hu et al. Source inference attacks: Beyond membership inference attacks in federated learning
CN112214684B (en) Seed-expanded overlapping community discovery method and device
CN109472115B (en) Large-scale complex network modeling method and device based on geographic information
CN112052995A (en) Social network user influence prediction method based on fusion emotional tendency theme
CN110543601B (en) Method and system for recommending context-aware interest points based on intelligent set
CN111221915B (en) Online learning resource quality analysis method based on CWK-means
CN114757391B (en) Network data space design and application method oriented to service quality prediction
CN110633890A (en) Land utilization efficiency judgment method and system
CN116188174A (en) Insurance fraud detection method and system based on modularity and mutual information
CN115965466A (en) Sub-graph comparison-based Ethernet room account identity inference method and system
CN114529096A (en) Social network link prediction method and system based on ternary closure graph embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210604

RJ01 Rejection of invention patent application after publication