CN116166878A

CN116166878A - Time perception self-adaptive interest point recommendation method based on K-means clustering

Info

Publication number: CN116166878A
Application number: CN202211571570.7A
Authority: CN
Inventors: 朱俊; 梁太波; 韩立新
Original assignee: Nanjing Vocational University of Industry Technology NUIT
Current assignee: Nanjing Vocational University of Industry Technology NUIT
Priority date: 2022-01-21
Filing date: 2022-12-08
Publication date: 2023-05-26
Also published as: CN114528480A

Abstract

The invention discloses a time perception self-adaptive interest point recommendation method based on K-means clustering, which comprises the following steps: firstly, converting a sign-in data set into a three-dimensional scoring matrix; secondly, counting the number of sign-in users, the number of accessed interest points and the number of sign-in times in each time slot, and constructing three-dimensional sign-in feature vectors of each time slot; thirdly, carrying out K-means clustering on the time slots, and calculating the time similarity among the time slots in the same cluster; fourthly, calculating the similarity of the users at the current time by using scoring information in other time slots in the same time cluster; fifthly, improving the traditional collaborative filtering method based on the user by utilizing a time clustering result and the time similarity in the clusters, so that the method can adaptively generate an interest point prediction score according to the current recommended time; and sixthly, comparing the recommendation accuracy of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology.

Description

Time perception self-adaptive interest point recommendation method based on K-means clustering

Technical Field

The invention relates to a time perception self-adaptive interest point recommendation method based on K-means clustering in a position social network, and belongs to the technical field of artificial intelligence and machine learning.

Background

In recent years, communication technology, location technology and mobile internet technology have rapidly developed, and Location-based social networks (Location-based Social Networks, LBSNs) have become a new media form for people to share and transfer information, providing a platform for closely connecting online virtual networks with offline real world. At present, a large number of mature social network platforms based on positions exist at home and abroad, such as Facebook, youTube, twitter, microblog, bean paste, public critique, a group net, a WeChat friend circle and the like. In a location-based social network, users may establish complex social relationships, such as friends, colleagues, relatives, etc.; viewing some places of interest (simply "points of interest") such as restaurants, shops, movie theatres, etc. with the added geographic tags; check-in is performed by a mobile device when points of interest (POIs) are accessed, geographical location information is published, and suggestions and comments of the points of interest (POIs) are shared. LBSNs can bring convenience to users, and can help merchants to know real users behind the network, so that personalized services meeting the requirements of different users can be customized in a 'best' manner, and the method has strong practicability and advancement.

As the number of users communicating in LBSNs increases, LBSNs store and accumulate rich available information such as check-in records, social relationships, spatiotemporal data, and various text, image, video, etc. The massive information provides abundant data resources for users, but also causes the problem of information overload (Information Overload), and increases the difficulty of accurately acquiring target items for users. Therefore, the recommendation system for solving the information overload problem is paid attention to by more researchers, such as the famous Amazon company uses the recommendation system to recommend goods to users, so that the click rate and turnover are improved for merchants; movie recommendation website Netflix attracts many research teams to work on improving recommendation accuracy by hosting recommendation system campaigns. As a special information filtering system, the recommendation system does not need users to actively provide determined keyword information, but models the interests and hobbies of the users by analyzing the existing historical behaviors of the users, and discovers the potential preference of the users, so that goods, services and the like meeting the requirements of the users are actively recommended to the users. Based on a large amount of user information, friend information and position information, researchers realize applications such as friend recommendation, expert discovery, interest point recommendation, activity recommendation, path recommendation and the like for LBSNs. The point of interest recommendation (POIs Recommendation) has become a research hotspot as an inevitable product of collaborative development of a traditional recommendation system and a location social network.

Considering that the point of interest recommendation is an important branch of a recommendation system, whether development history or key technology is carried out in a pulse manner with a traditional recommendation system, part of point of interest recommendation research regards the position as a common item similar to films, music and the like, and a recommendation result is generated by using a traditional recommendation method. The conventional recommendation algorithm mainly comprises a collaborative filtering algorithm, a content-based recommendation algorithm and a mixed recommendation algorithm according to design strategies. Collaborative filtering algorithms in turn include memory-based collaborative filtering algorithms (e.g., user-based collaborative filtering, item-based collaborative filtering) and model-based collaborative filtering algorithms (e.g., singular value decomposition, clustering models, probabilistic latent semantic analysis, etc.). Wherein content-based point-of-interest recommendation techniques extract relevant information, such as tags, classifications, and user reviews, from the accessed location; user preferences are extracted from the user's profile and then matched with the location profile to obtain accurate recommendations. The user-based collaborative filtering (UBCF) technology converts the sign-in behavior of the user into a user-interest point scoring matrix, searches similar users of the current active user by utilizing the existing sign-in records, predicts the score of the active user on the place which is not signed in according to the interest preference of the similar users, and recommends the interest point with the highest predicted score to the current user. Project-based collaborative filtering (IBCF) techniques are based on one assumption that: the user always prefers a position that is highly similar to his previous favorite address. The IBCF technique therefore first calculates the similarity between points of interest and recommends to active users the address most similar to the POIs that the user has visited. Singular Value Decomposition (SVD) is a classical representation of matrix decomposition, the main task of which is to generate low rank approximations. The low-dimensional orthogonal matrix decomposed by the SVD technology reduces noise on the basis of the original matrix, and can more effectively reveal potential association between users and commodities. In various recommendation technologies, the collaborative filtering algorithm does not need too much knowledge in specific fields, avoids complex information collection and content analysis processes, is easy to realize in engineering, and can be conveniently applied to products. Thus, collaborative filtering has become the most widely used and popular recommendation technique in the traditional recommendation field.

The above conventional recommendation techniques ignore the influence of the time context in the point of interest recommendation on the sign-in behavior of the user. However, in fact, the time attribute is a very important context information in the point of interest recommendation application scenario, and the sign-in habit of the user is always closely related to the time attribute. From a macroscopic perspective, the user's favor of points of interest can be affected by the surrounding large-time environment, for example, the beauty platform recommends a dumpling shop for the user in winter, and the travel network recommends a water park for the user in summer. More importantly, user preferences migrate over time, for example, users prefer to go to KTV and movie theatres before, but recently like to go to bookstores and coffee shops. In addition to the above macro features, the fine-grained time effect can better reflect the sign-in preference of the user in a specific time period, for example, the interest points of the catering are accessed most at about 12 points and 18 points, and the popularity of the bar rises from 21 points onwards. How to introduce time information into a recommendation algorithm and provide a suitable point-of-interest recommendation list for a user in a specific time period has become an urgent need for various social application platforms.

At present, some recommendation systems integrate time context into the point of interest recommendation problem, but the existing time-aware point of interest recommendation systems still have some drawbacks and disadvantages, which are summarized as follows:

(1) The related research of the point-of-interest recommendation technology based on the time feature is still relatively less compared with the recommendation technology considering other category contexts such as social relations, geographic features and the like. Most of the point-of-interest recommendation technologies are not good at handling dynamically changing user demands, are difficult to support the correction and adjustment of user preferences generated over time, and cannot give the point-of-interest recommendation results most in line with the current time situation in real time.

(2) The time-dimensional dynamic features of user similarity are ignored. When the user similarity is calculated in the existing research, the time dimension dynamic characteristics of the user similarity are not considered, and the same similarity matrix is shared in different time periods. However, in reality, user similarity may change over time. For example, at noon on a workday, a user often accesses a restaurant near a unit with a colleague where the similarity between the user and the colleague is higher than the similarity between the user and a family, whereas after coming home from work, the user often accesses a supermarket near an address with a family where the similarity between the user and the family is higher. Thus, the use of global user similarity at different times is not in line with the fact law.

(3) Data sparseness problem of user-time-point of interest three-dimensional matrix. The number of addresses visited by the user is very small compared to thousands of geographic locations in a location social network, which results in a very sparse scoring matrix itself. The problem of data sparseness is more pronounced in point of interest recommendation systems that consider space-time context. This is because, in order to explore the behavior pattern of the user in the target period, the present sparse check-in data set needs to be further divided into several subsets according to the time axis, which undoubtedly aggravates the sparseness of the scoring matrix. Therefore, a method capable of alleviating the data sparseness problem must be studied to improve the accuracy and reliability of the recommended results over a certain period of time.

The defects of the conventional time-aware interest point recommendation technology are caused by great defects in the design, development, deployment and operation of social network platforms at different positions, and particularly the service quality of a recommendation system is reduced on the network platform with massive project information, so that the sales performance of an electronic commerce system is affected.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims at constructing an interest point recommendation system with accurate recommendation results, which can generate an interest point list in real time according to time points, and provides a time-aware self-adaptive interest point recommendation method based on K-means clustering. Meanwhile, in consideration of the difference and correlation of user sign-in data characteristics in different time slots, the invention innovatively provides an analysis mode of the distance from a time point to a clustering center, adopts a K-means clustering method to mine the correlation between the time slots, relieves the sparse problem of high-dimensional sign-in data through time clustering, improves the effectiveness of scoring prediction, and strengthens the service quality of a recommendation system.

The technical scheme adopted for solving the technical problems is as follows: dividing a day into 24 time slots, respectively counting the number of checked-in users, the number of accessed interest points and the number of checked-in times in each time slot according to time tags, and carrying out K-means clustering on the time slots based on the third-order data characteristics; calculating the similarity of the users in different time slots according to the time clustering result and the historical sign-in information of the users; the scoring method of the traditional UBCF algorithm is improved by utilizing time clustering, so that the scoring method can adaptively generate the interest point prediction scores according to the time slots; the predictive scores of all non-visited addresses are ranked, and the top ranked addresses are selected for recommendation to the user (as shown in fig. 1).

The method comprises the following specific processes:

step 1: the original sign-in data set of the user is collected and arranged and converted into a three-dimensional scoring matrix of the user-time-interest points.

Step 2: counting the number of checked-in users, the number of accessed interest points and the number of checked-in times in each time slot. And constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set.

Step 3: based on the statistical result of the second step, clustering the time slots by adopting a K-means method. And calculating the time similarity between the time slots in the same cluster.

Step 4: and according to the basic principles of high similarity in clusters and low similarity among clusters, calculating the user similarity at the current recommended time by reasonably utilizing the scoring information in other time slots in the same time cluster.

Step 5: and improving the scoring method of the traditional collaborative filtering algorithm based on the user by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate a point-of-interest prediction score according to the current recommendation time, and recommending a plurality of non-access addresses with top ranking of the current time to the user.

Step 6: and evaluating the recommendation quality by using the recommendation precision index, and comparing the recommendation precision of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology.

The beneficial effects are that:

(1) The time-aware self-adaptive interest point recommendation method based on the K-means clustering can generate a real-time interest point recommendation list for the user according to the current behavior habit of the user and the current fashion trend of the interest points at any time, and can help merchants to accurately push advertisements for the user, so that more potential consumers are attracted.

(2) The method creatively clusters time, digs time-dimensional dynamic characteristics of user similarity, searches different similar crowds for users at different times, and the 'time-varying' adjacent user searching mode is more in line with preference change of users in reality, thereby greatly improving the use satisfaction degree of the users on a social network platform, increasing the accuracy and the interpretability of a recommendation system and having very important significance for practical application.

(3) The time is clustered by the K-means method, so that the sharing of scoring data of all time slots in the cluster is realized, the similarity between the time slots is fully mined, and the data sparseness problem of a high-order scoring matrix is relieved. The method has certain universality and portability, can be applied to not only the interest point recommendation system, but also the personalized recommendation field of other traditional projects, and has wide industrial application prospect.

Drawings

FIG. 1 is a flowchart of a time-aware adaptive interest point recommendation method based on K-means clustering.

Fig. 2 is a flowchart of specific steps of a time-aware adaptive interest point recommendation method based on K-means clustering.

FIG. 3 is a schematic diagram of check-in records of a user in a location social network in an embodiment of the present invention.

Fig. 4 is a schematic diagram of statistics of the number of checked-in users, the number of points of interest to be accessed, and the number of check-ins in each time slot in the embodiment of the present invention.

FIG. 5 is a graph showing K-means clustering results for all time slots in an embodiment of the present invention.

FIG. 6 is a bar graph comparing accuracy Precision of a recommendation algorithm and a classical user-based collaborative filtering (UBCF), social relationship-based collaborative filtering (SCF) algorithm in an embodiment of the present invention.

FIG. 7 is a histogram of Recall contrast for a recommendation algorithm and a classical user-based collaborative filtering (UBCF), social relationship-based collaborative filtering (SCF) algorithm in an embodiment of the present invention.

FIG. 8 is a bar graph of the comparison of the integrated accuracy index F1 values of a recommendation algorithm and a classical user-based collaborative filtering (UBCF), social relationship-based collaborative filtering (SCF) algorithm in an embodiment of the present invention.

Detailed Description

The invention will now be described in further detail with reference to the accompanying drawings and specific examples.

The specific flow of the design and implementation of the invention is shown in figure 2, and the main variables and parameters in the process are shown in table 1.

TABLE 1 Functions of the main variables and parameters

First, the original sign-in data set of the user is collected and arranged and converted into a three-dimensional scoring matrix of the user-time-interest points. The operation steps are as follows:

(1. A) sorting the original check-in data set C of the user to obtain n check-in records, denoted as C= { C ₁ ,c ₂ ,…,c _n }. Each check-in record is formed as a user ID, check-in time, geographic latitude, geographic longitude, and point of interest ID quintuple. All user sets in the check-in dataset are denoted by U, all interest point sets by L, NU and NL are the number of users and interest points, respectively.

(1. B) dividing the time of day into 24 discrete time slots, the set of time slots being denoted t= {0,1,2, …,23}. And rounding the check-in time in each check-in record to obtain the value (tE [0,23 ]) of the corresponding time slot t.

(1. C) counting check-in times of five-tuple set of check-in records, generating corresponding four-tuple (u) for each pair of user-time-interest points _i ,t,l _j ,n _i,t,j ) Wherein u is _i Is the ith user (i.e. [1, NU)])，l _j Is the j-th interest point (j E [1, NL)]) T is the time slot value (t E) obtained by rounding the time point in the check-in record[0,23])，n _i,t,j Is user u _i Accessing point of interest l at time slot t _j Is a number of times (1).

(1. D) user u _i Accessing point of interest l at time slot t _j Number of check-ins n _i,t,j Conversion to user u _i At time slot t, point of interest l _j Score r of (2) _i,t,j . If user u _i Go past the interest point l in the time slot t _j Score r _i,t,j =1; conversely, r _i,t,j ＝0：

Wherein r is _i,t,j Representing user u _i For address l at time slot t _j Score of n _i,t,j Representing user u _i Accessing a point of interest l at time slot t _j Is a number of times (1).

Summarizing all scores to form a user-time-interest point three-dimensional scoring matrix R= { R _i,t,j },i∈[1,NU],t∈[0,23]，j∈[1,NL]Wherein i denotes a user number, t denotes a time slot value, j denotes an address number, NU denotes a total number of users, NL denotes a total number of points of interest, r _i,t,j Representing user u _i For address l at time slot t _j Is a score of (2).

And secondly, counting the number of check-in users, the number of accessed interest points and the number of check-in times in each time slot. And constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set. The specific operation steps are as follows:

(2. A) counting the number of users Unum whose check-in actions occur in the time slot t in the check-in data set _t ：

Where U is a user in the location social network, U represents all user sets in the check-in dataset, and the isCheck function represents whether user U has a check-in behavior in time slot t:

where L is a point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, r _u,t,l Representing the score of user u for address l at time slot t.

(2. B) counting the number of points of interest Pnum in which the check-in data is concentrated in the time slot t to be accessed _t ：

Where L is a certain point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, and the ischcocked function represents whether the point of interest L is accessed within the time slot t:

where U is a user in the location social network, U represents a collection of all users in the check-in dataset, r _u,t,l Representing the score of user u for address l at time slot t.

(2. C) counting the total number of check-ins Cnum in which the check-in data is concentrated in the time slot t _t ：

Where n is the number of check-in records in the check-in dataset C, and the isTime function represents the ith check-in record C _i Whether it occurs within time slot t:

wherein, time is _i in t represents the ith check-in record c _i Is the time of check-in time of (C) _i The corresponding time slot is t.

(2. D) constructing the three-dimensional check-in feature vector x for each time slot t based on the above statistical result _t ＝{Unum _t ,Pnum _t ,Cnum _t Form a time slot sign-in data feature set x= { X ₀ ,x ₁ ,…,x ₂₃ }. Wherein t is [0,23 ]]，Unum _t The number of users, pnum, who have checked-in the time slot t _t Is the number of points of interest accessed in time slot t, cnum _t Is the total number of check-ins that occur in time slot t.

And thirdly, clustering the time slots by adopting a K-means method based on the statistical result of the second step. And calculating the time similarity between the time slots in the same cluster. The implementation steps are as follows:

(3. A) clustering the 24 time slots by adopting a K-means method with simple algorithm and high convergence speed to generate nc clustering centers Cen= { Cen ₁ ,cen ₂ ,…,cen _nc }(nc∈[2,24])。

(3.b) for any two time slots t and t' in each set of temporal clusters, calculating a temporal similarity between the two:

Where U is a user in the location social network, U is a set of all users in the check-in data set, L is a point of interest in the location social network, L is a set of all points of interest in the check-in data set, r _u,t,l Representing the score of user u to address l at time slot t, r _u,t',l Representing the score of user u to address l at time slot t', NU represents the total number of users in the check-in dataset.

And fourthly, calculating the user similarity at the current recommended time by reasonably utilizing the scoring information in other time slots in the same time cluster according to the basic principles of high similarity in the clusters and low similarity among the clusters. The implementation steps are as follows:

(4. A) selecting a target user u in the location social network _t As a recommended service object, the current recommended time is used for time _r Conversion to time slot t _r 。

(4. B) determining the time slot t based on the clustering result _r Belonging cluster cen _j And the number of time slots nj in the cluster, denoted cen _j ＝{t _r ,t ₂ ,t ₃ ,…,t _nj }. Computing active user u _t And other users v in time slot t _r User similarity at time:

wherein u is _t Is the target object of the current service of the recommendation system, v is one other user in the location social network, t _r Is the time slot corresponding to the current recommended time, and nj is the time slot t _r The cluster cen _j In the data set, NL represents the total number of points of interest in the check-in data set, r _ut,cenj[a],l Representing target user u _t At cluster cen _j Other time slots cen _j [a]The point of interest i is scored at the time,

representing that user v is clustered in cen _j Other time slots cen _j [b]Scoring the interest point l, a E [1, nj]，b∈[1,nj]。

And fifthly, improving the scoring method of the traditional collaborative filtering algorithm based on the user by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate interest point prediction scores according to the current recommendation time, and recommending a plurality of non-access addresses with the top ranking of the current time for the user. The implementation steps are as follows:

(5.a) determining a target user u in a location social network _t As a recommended service object, the current recommended time is used for time _r Conversion to time slot t _r 。

(5. B) determining the time slot t based on the clustering result _r Belonging cluster cen _j And the number of time slots nj in the cluster, denoted cen _j ＝{t _r ,t ₂ ,t ₃ ,…,t _nj }。

(5. C) calculating the target user u _t At t _r Prediction score for time access point of interest/:

wherein u is _t Is a target object of the current service of the recommendation system, t _r Is the time slot corresponding to the current recommended time, l is an interest point which is not visited by the target user in the location social network, v is one other user in the location social network, U represents all user sets, sim (U) _t ,v,t _r ) Representing user u _t And user v in time slot t _r User similarity at time, nj is time slot t _r The cluster cen _j In the number of time slots in (a),

representing that user v is at time cen _j [i]Scoring the interest point l, i E [1, nj]，timesimi(t _r ,cen _j [i]) Representing the current time t _r With other times cen _j [i]Similarity between them.

(5. D) for target user u _t All addresses which are not accessed are ordered according to predictive scores, N positions which are ranked at the top are formed into a recommendation list, and the recommendation list TopNList is formed _t And returning to the target user.

And sixthly, evaluating the recommendation quality by using the recommendation precision index, and comparing the recommendation precision of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology. The implementation steps are as follows:

(6.a) randomly selecting NU×10% users from the target data set as a target user set AU, running a respective recommendation algorithm for each target user in the set, and generating a recommendation list. Where NU represents the total number of users in the check-in dataset.

And (6. B) evaluating the accuracy of each recommendation system by using the accuracy indexes, wherein the values of the accuracy Precision, recall ratio Recall and comprehensive accuracy index F1 of each algorithm running once for the target user set AU are the average value of the indexes of all users in the AU set.

(6. C) repeating steps Ntimes (6.a) and (6. B), i.e., all algorithms run independently Ntimes.

(6.d) the values of the Precision, recall, and integrated Precision index F1 of the set recommendation algorithm are the average of the results of the Ntime runs.

(6.e) comparative analysis of each index results: if the accuracy of the time-aware self-adaptive interest point recommendation algorithm based on the K-means clustering is larger than the accuracy of other recommendation algorithms, the accuracy of the technology provided by the invention for hitting user favorite items is higher; if the Recall ratio Recall of the algorithm provided by the invention is larger than the Recall values of other recommended algorithms, the technical query capability provided by the invention is stronger; if the comprehensive precision index F1 value of the algorithm provided by the invention is larger than the F1 values of other recommendation algorithms, the technology provided by the invention has stronger comprehensive capacity in the aspect of recommendation precision.

In the following, a specific social network based on location is taken as an example to describe in detail how the time-aware adaptive interest point recommendation method based on K-means clustering in the present invention operates.

Gowalla is a location-based social networking service provider where users share their locations by checking in. The Gowalla dataset collected social relationship and check-in information for 196591 users on the website during 2 months 2009 through 10 months 2010. The number of the points of interest in the Gowalla dataset is 1256379, the number of check-in records of users on the points of interest is 6442892, and 950327 social relations are formed among the users. The Gowalla dataset has become one of the most commonly used test datasets by recommendation system researchers.

The invention selects check-in data of five hot areas of Los Angeles, san Francisco, new York, maricopa and King in Gowalla dataset as an example for illustration.

The first step, collecting and sorting the original sign-in data set of the user, converting the original sign-in data set into a three-dimensional scoring matrix of the user-time-interest points, and the operation steps are as follows:

(1. A) collecting and sorting user check-in data of Los Angeles, san Francisco, new York, maricopa and King regions in an example dataset Gowalla, obtaining a check-in dataset C consisting of 50007 historical access records of 1572 users at 1420 addresses, denoted as C= { C ₁ ,c ₂ ,…,c ₅₀₀₀₇ }. A schematic diagram of historical access records of users in a location social network in a Gowalla dataset is shown in FIG. 3. 13864 social relations are formed among the users, the average number of check-in records of each user is 31.81, the average number of social relations of each user is 8.82, and the average number of times that each interest point is accessed is 35.22.

Each check-in record is formed as a user ID, check-in time, geographic latitude, geographic longitude, and point of interest ID quintuple. All user sets in the check-in dataset are denoted by U, all interest point sets are denoted by L, the number of users NU is 1572, and the number of interest points NL is 1420.

(1. B) dividing the time of day into 24 discrete time slots, the set of time slots being denoted t= {0,1,2, …,23}. And rounding the check-in time in each check-in record to obtain the value (tE [0,23 ]) of the corresponding time slot t. For example, the time slot corresponding to the check-in time=15:13:23 is t=15, and the time slot corresponding to the check-in time=00:11:20 is t=0.

(1. C) counting check-in times of five-tuple set of check-in records, generating corresponding four-tuple (u) for each pair of user-time-interest points _i ,t,l _j ,n _i,t,j ) Wherein u is _i Is the ith user (i e 1,1572])，l _j Is the j-th interest point (j E [1,1420)]) T is the value of the time slot obtained by rounding the time point in the check-in record (t e [0,23)])，n _i,t,j Is user u _i Accessing point of interest l at time slot t _j Is a number of times (1).

Summarizing all scores to form a user-time-interest point three-dimensional scoring matrix R= { R _i,t,j },i∈[1,1572],t∈[0,23]，j∈[1,1420]Where i denotes the user number, t denotes the value of the time slot, j denotes the address number, r _i,t,j Representing user u _i For address l at time slot t _j Is a score of (2).

The statistics of the number of checked-in users, the number of accessed interest points and the number of checked-in times of each time slot are shown in fig. 4.

(3. A) clustering the 24 time slots by adopting a K-means method with simple algorithm and high convergence speed to generate 3 clusters, cen= { Cen ₁ ,cen ₂ ,cen ₃ }. Wherein the first cluster time slot set is {7,8,9,10,11,12,13}, the second cluster time slot set is {0,1,2,3,16,17,18,19,20,21,22,23}, and the third cluster time slot set is {4,5,6,14,15}. A graph of K-means clustering results for 24 time slots is shown in FIG. 5.

(3.b) calculating the temporal similarity between any two time slots t and t' in the three time cluster sets:

where U is a user in the location social network, U is a set of all users in the check-in data set, L is a point of interest in the location social network, L is a set of all points of interest in the check-in data set, r _u,t,l Representing the score of user u to address l at time slot t, r _u,t',l Representing the score of user u for address l at time slot t'.

(4. A) selecting a target user u in the location social network _t As a recommended service object, the current recommended time is used for time _r Conversion to time slot t _r . Assume the current time of day _r 20:14:13, then corresponding time slot t _r 20.

(4. B) determining the time slot t based on the clustering result _r Belonging cluster cen _j And the number of time slots nj in the cluster, denoted cen _j ＝{t _r ,t ₂ ,t ₃ ,…,t _nj }. For example, when time slot t is recommended _r At 20, the cluster is cen _j = {20,0,1,2,3,16,17,18,19,21,22,23}, the number of time slots in this cluster is 12 (nj=12).

Computing active user u _t And other users v in time slot t _r User similarity at time:

wherein u is _t Is the target object of the current service of the recommendation system, v is one other user in the location social network, t _r Is the time slot corresponding to the current recommended time, and nj is the time slot t _r The cluster cen _j In the number of time slots in (a),

representing target user u _t At cluster cen _j Other time slots cen _j [a]Scoring the interest point l at the time, +.>

(5. D) for target user u _t All addresses which are not accessed are ordered according to predictive scores, N positions which are ranked at the top are formed into a recommendation list, and the recommendation is formedList TopNList _t And returned to the target user (N can be a multiple of 5, and N is more than or equal to 5 and less than or equal to 50 in general cases).

(6.a) randomly selecting 157 users from the target data set as a target user set AU, and respectively running a time-aware self-adaptive interest point recommendation algorithm, a classical user-based collaborative filtering algorithm UBCF and a social relationship-based collaborative filtering algorithm SCF for each target user in the set to generate a recommendation list.

(6. C) repeating steps (6.a) and (6. B) 100 times, i.e., all algorithms run independently 100 times.

(6.d) setting the values of the accuracy Precision, recall and comprehensive Precision index F1 of the recommendation algorithm and UBCF and SCF algorithms proposed by the invention to be the average value of 100 running results. When N takes different values, the results of Precision, recall, and integrated Precision index F1 of each recommendation algorithm are shown in tables 2, 3, and 4, respectively, where the value of each row with the bold format represents the maximum value of the row index:

TABLE 2 Precision index values for different recommendation algorithms

Table 3 Recall index values for different recommendation algorithms

TABLE 4 recommendation precision F1 index values of different recommendation algorithms

The histogram of the comparison of the accuracy Precision, recall, and integrated accuracy index F1 of the recommended algorithm and the classical UBCF, SCF algorithms in this case are shown in fig. 6, 7, and 8, respectively.

(6.e) comparative analysis of each index results: the accuracy of the time perception self-adaptive interest point recommendation algorithm based on the K-means clustering is larger than that of other recommendation algorithms, so that the accuracy of the technology provided by the invention for hitting user favorite items is higher; the Recall rate Recall of the algorithm provided by the invention is larger than the Recall value of other recommended algorithms, which shows that the technical query capability of the algorithm provided by the invention is stronger; the comprehensive precision index F1 value of the algorithm provided by the invention is larger than the F1 values of other recommendation algorithms, which shows that the technology provided by the invention has stronger comprehensive capability in the aspect of recommendation precision.

Different from the conventional interest point recommendation algorithm, the method aims at constructing the interest point recommendation system which can generate an interest point list according to time points in real time and has accurate recommendation results, considers the difference and the correlation of user sign-in data characteristics in different time slots, innovatively provides an analysis mode of the distance from the time points to a clustering center, adopts a K-means clustering method to mine the correlation between the time slots, relieves the sparse problem of high-dimensional sign-in data through the time clustering, improves the accuracy and the effectiveness of scoring prediction, and strengthens the service quality of the recommendation system. The technology provided by the invention has wide application prospect and is expected to be widely applied to the social network market based on the position.

The above technical process is only a preferred embodiment of the present invention, but not represents all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the scope of the present disclosure, which is within the spirit and principles of the present invention, should be included in the scope of the present invention.

Claims

1. A time perception self-adaptive interest point recommendation method based on K-means clustering is characterized by comprising the following steps:

step 1: collecting and sorting an original sign-in data set of a user, and converting the original sign-in data set into a three-dimensional scoring matrix of the user-time-interest point;

step 2: counting the number of checked-in users, the number of accessed interest points and the number of checked-in times in each time slot; constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set;

step 3: based on the statistical result of the second step, clustering the time slots by adopting a K-means method, and calculating the time similarity between the time slots in the same cluster;

step 4: according to the basic principles of high similarity in clusters and low similarity among clusters, calculating the similarity of users at the current recommended time by reasonably utilizing scoring information in other time slots in the same time cluster;

Step 5: the scoring method of the traditional collaborative filtering algorithm based on the user is improved by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate a point-of-interest prediction score according to the current recommendation time, and a plurality of non-access addresses with the top ranking of the current time are recommended to the user;

step 6: and evaluating the recommendation quality by using a recommendation precision index, and comparing the recommendation precision with the recommendation precision of other classical recommendation systems to evaluate the accuracy and the effectiveness of the proposed technology.

2. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 1 of the method comprises:

step 11: the original check-in data set C of the user is arranged to obtain n check-in records, and the n check-in records are recorded as C= { C ₁ ,c ₂ ,…,c _n -a }; each sign-in record is formed into a user ID, a sign-in time,Geographic latitude, geographic longitude, and point of interest ID quintuple; all user sets in the sign-in dataset are represented by U, all interest point sets are represented by L, and NU and NL are the number of users and interest points respectively;

step 12: dividing the time of day into 24 discrete time slots, the set of time slots being denoted t= {0,1,2, …,23}; rounding the check-in time in each check-in record to obtain the value of the corresponding time slot t, and t epsilon [0,23];

Step 13: counting check-in times of five-tuple set of check-in records, and generating corresponding four-tuple u for each pair of user-time-interest points _i ,t,l _j ,n _i,t,j Wherein u is _i Is the ith user (i.e. [1, NU)])，l _j Is the j-th interest point, j is E [1, NL]T is the value of the time slot obtained by rounding the time point in the check-in record, t is epsilon [0,23 ]]，n _i,t,j Is user u _i Accessing point of interest l at time slot t _j Is a number of times (1);

step 14: user u _i Accessing point of interest l at time slot t _j Number of check-ins n _i,t,j Conversion to user u _i At time slot t, point of interest l _j Score r of (2) _i,t,j If user u _i Go past the interest point l in the time slot t _j Score r _i,t,j =1; conversely, r _i,t,j ＝0：

Wherein r is _i,t,j Representing user u _i For address l at time slot t _j Score of n _i,t,j Representing user u _i Accessing a point of interest l at time slot t _j Is a number of times (1);

summarizing all scores to form a user-time-interest point three-dimensional scoring matrix R= { R _i,t,j },i∈[1,NU],t∈[0,23]，j∈[1,NL]Wherein i denotes a user number, t denotes a time slot value, j denotes an address number, NU denotes a total number of users, NL denotes a total number of points of interest, r _i,t,j Representation ofUser u _i For address l at time slot t _j Is a score of (2).

3. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 2 of the method comprises:

Step 21: counting the number of users Unum whose check-in actions occur in the time slot t in the check-in data set _t ：

Unum _t ＝∑ _u∈U isCheck(u,t) (2)

where L is a point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, r _u,t,l A score representing the address l of user u at time slot t;

step 22: counting the number of points of interest Pnum in which the check-in data is accessed in time slot t _t ：

Pnum _t ＝∑ _l∈L isChecked(l,t) (4)

where U is a user in the location social network, U represents a collection of all users in the check-in dataset, r _u,t,l A score representing the address l of user u at time slot t;

step 23: statistics check-in dataThe total number of check-ins Cnum occurring in time slot t _t ：

Wherein, time is _i in t represents the ith check-in record c _i Is the time of check-in time of (C) _i The corresponding time slot is t;

step 24: based on the statistical result, constructing a three-dimensional sign-in feature vector x of each time slot t _t ＝{Unum _t ,Pnum _t ,Cnum _t Form a time slot sign-in data feature set x= { X ₀ ,x ₁ ,…,x ₂₃ T e [0,23 ]]，Unum _t The number of users, pnum, who have checked-in the time slot t _t Is the number of points of interest accessed in time slot t, cnum _t Is the total number of check-ins that occur in time slot t.

4. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein the step 3 comprises:

step 31: the 24 time slots are clustered by adopting a K-means method with simple algorithm and high convergence speed, and nc cluster centers Cen= { Cen are generated ₁ ,cen ₂ ,…,cen _nc }(nc∈[2,24])；

Step 32: for any two time slots t and t' in each time cluster set, calculating the time similarity between the two time slots:

5. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 4 of the method comprises:

step 41: selecting a target user u in a location social network _t As a recommended service object, the current recommended time is used for time _r Conversion to time slot t _r ；

Step 42: determining a time slot t according to the clustering result _r Belonging cluster cen _j And the number of time slots nj in the cluster, denoted cen _j ＝{t _r ,t ₂ ,t ₃ ,…,t _nj Computing active user u _t And other users v in time slot t _r User similarity at time:

wherein u is _t Is the target object of the current service of the recommendation system, v is one other user in the location social network, t _r Is the time slot corresponding to the current recommended time, and nj is the time slot t _r The cluster cen _j NL denotes the total number of points of interest in the check-in dataset,

6. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 5 of the method comprises:

Step 51: determining a target user u in a location social network _t As a recommended service object, the current recommended time is used for time _r Conversion to time slot t _r ；

Step 52: determining a time slot t according to the clustering result _r Belonging cluster cen _j And the number of time slots nj in the cluster, denoted cen _j ＝{t _r ,t ₂ ,t ₃ ,…,t _nj }；

Step 53: calculating the target user u _t At t _r Prediction score for time access point of interest/:

representing that user v is at time cen _j [i]Scoring the interest point l, i E [1 ],nj]，timesimi(t _r ,cen _j [i]) Representing the current time t _r With other times cen _j [i]Similarity between;

step 54: for target user u _t All addresses which are not accessed are ordered according to predictive scores, N positions which are ranked at the top are formed into a recommendation list, and the recommendation list TopNList is formed _t And returning to the target user.

7. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein said step 6 comprises:

Step 61: randomly selecting NU×10% users from a target data set as a target user set AU, and running each recommendation algorithm for each target user in the set to generate a recommendation list, wherein NU represents the total number of users in the signed-in data set;

step 62: using the Precision index to evaluate the accuracy of each recommendation system, wherein the values of the Precision, recall rate Recall and comprehensive Precision index F1 of the target user set AU running once by each algorithm are the average value of the index of all users in the AU set;

step 63: repeating steps (6.a) and (6. B) Ntimes, i.e., all algorithms run independently Ntimes;

step 64: setting the values of the Precision, recall rate Recall and comprehensive Precision index F1 of a recommendation algorithm as the average value of Ntime running results;

step 65: comparing and analyzing the results of each index: if the Precision of the time perception self-adaptive interest point recommendation algorithm based on the K-means clustering is larger than the Precision of other recommendation algorithms, the accuracy of the user preference hit by the technology is higher; if the Recall ratio Recall is larger than the Recall values of other recommendation algorithms, the technical query capability is higher; and if the comprehensive precision index F1 value is larger than the F1 values of other recommendation algorithms, the comprehensive capability of the technology in the aspect of recommendation precision is higher.