CN116166878A - Time perception self-adaptive interest point recommendation method based on K-means clustering - Google Patents
Time perception self-adaptive interest point recommendation method based on K-means clustering Download PDFInfo
- Publication number
- CN116166878A CN116166878A CN202211571570.7A CN202211571570A CN116166878A CN 116166878 A CN116166878 A CN 116166878A CN 202211571570 A CN202211571570 A CN 202211571570A CN 116166878 A CN116166878 A CN 116166878A
- Authority
- CN
- China
- Prior art keywords
- time
- user
- interest
- time slot
- check
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000003064 k means clustering Methods 0.000 title claims abstract description 25
- 230000008447 perception Effects 0.000 title claims abstract description 7
- 238000001914 filtration Methods 0.000 claims abstract description 24
- 238000005516 engineering process Methods 0.000 claims abstract description 23
- 239000011159 matrix material Substances 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 10
- 238000013077 scoring method Methods 0.000 claims description 10
- 230000003044 adaptive effect Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 230000006399 behavior Effects 0.000 description 7
- 238000011160 research Methods 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- YOETUEMZNOLGDB-UHFFFAOYSA-N 2-methylpropyl carbonochloridate Chemical compound CC(C)COC(Cl)=O YOETUEMZNOLGDB-UHFFFAOYSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010835 comparative analysis Methods 0.000 description 2
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 235000021251 pulses Nutrition 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001550 time effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007306 turnover Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0269—Targeted advertisements based on user profile or attribute
- G06Q30/0271—Personalized advertisement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Accounting & Taxation (AREA)
- Probability & Statistics with Applications (AREA)
- Finance (AREA)
- Quality & Reliability (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a time perception self-adaptive interest point recommendation method based on K-means clustering, which comprises the following steps: firstly, converting a sign-in data set into a three-dimensional scoring matrix; secondly, counting the number of sign-in users, the number of accessed interest points and the number of sign-in times in each time slot, and constructing three-dimensional sign-in feature vectors of each time slot; thirdly, carrying out K-means clustering on the time slots, and calculating the time similarity among the time slots in the same cluster; fourthly, calculating the similarity of the users at the current time by using scoring information in other time slots in the same time cluster; fifthly, improving the traditional collaborative filtering method based on the user by utilizing a time clustering result and the time similarity in the clusters, so that the method can adaptively generate an interest point prediction score according to the current recommended time; and sixthly, comparing the recommendation accuracy of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology.
Description
Technical Field
The invention relates to a time perception self-adaptive interest point recommendation method based on K-means clustering in a position social network, and belongs to the technical field of artificial intelligence and machine learning.
Background
In recent years, communication technology, location technology and mobile internet technology have rapidly developed, and Location-based social networks (Location-based Social Networks, LBSNs) have become a new media form for people to share and transfer information, providing a platform for closely connecting online virtual networks with offline real world. At present, a large number of mature social network platforms based on positions exist at home and abroad, such as Facebook, youTube, twitter, microblog, bean paste, public critique, a group net, a WeChat friend circle and the like. In a location-based social network, users may establish complex social relationships, such as friends, colleagues, relatives, etc.; viewing some places of interest (simply "points of interest") such as restaurants, shops, movie theatres, etc. with the added geographic tags; check-in is performed by a mobile device when points of interest (POIs) are accessed, geographical location information is published, and suggestions and comments of the points of interest (POIs) are shared. LBSNs can bring convenience to users, and can help merchants to know real users behind the network, so that personalized services meeting the requirements of different users can be customized in a 'best' manner, and the method has strong practicability and advancement.
As the number of users communicating in LBSNs increases, LBSNs store and accumulate rich available information such as check-in records, social relationships, spatiotemporal data, and various text, image, video, etc. The massive information provides abundant data resources for users, but also causes the problem of information overload (Information Overload), and increases the difficulty of accurately acquiring target items for users. Therefore, the recommendation system for solving the information overload problem is paid attention to by more researchers, such as the famous Amazon company uses the recommendation system to recommend goods to users, so that the click rate and turnover are improved for merchants; movie recommendation website Netflix attracts many research teams to work on improving recommendation accuracy by hosting recommendation system campaigns. As a special information filtering system, the recommendation system does not need users to actively provide determined keyword information, but models the interests and hobbies of the users by analyzing the existing historical behaviors of the users, and discovers the potential preference of the users, so that goods, services and the like meeting the requirements of the users are actively recommended to the users. Based on a large amount of user information, friend information and position information, researchers realize applications such as friend recommendation, expert discovery, interest point recommendation, activity recommendation, path recommendation and the like for LBSNs. The point of interest recommendation (POIs Recommendation) has become a research hotspot as an inevitable product of collaborative development of a traditional recommendation system and a location social network.
Considering that the point of interest recommendation is an important branch of a recommendation system, whether development history or key technology is carried out in a pulse manner with a traditional recommendation system, part of point of interest recommendation research regards the position as a common item similar to films, music and the like, and a recommendation result is generated by using a traditional recommendation method. The conventional recommendation algorithm mainly comprises a collaborative filtering algorithm, a content-based recommendation algorithm and a mixed recommendation algorithm according to design strategies. Collaborative filtering algorithms in turn include memory-based collaborative filtering algorithms (e.g., user-based collaborative filtering, item-based collaborative filtering) and model-based collaborative filtering algorithms (e.g., singular value decomposition, clustering models, probabilistic latent semantic analysis, etc.). Wherein content-based point-of-interest recommendation techniques extract relevant information, such as tags, classifications, and user reviews, from the accessed location; user preferences are extracted from the user's profile and then matched with the location profile to obtain accurate recommendations. The user-based collaborative filtering (UBCF) technology converts the sign-in behavior of the user into a user-interest point scoring matrix, searches similar users of the current active user by utilizing the existing sign-in records, predicts the score of the active user on the place which is not signed in according to the interest preference of the similar users, and recommends the interest point with the highest predicted score to the current user. Project-based collaborative filtering (IBCF) techniques are based on one assumption that: the user always prefers a position that is highly similar to his previous favorite address. The IBCF technique therefore first calculates the similarity between points of interest and recommends to active users the address most similar to the POIs that the user has visited. Singular Value Decomposition (SVD) is a classical representation of matrix decomposition, the main task of which is to generate low rank approximations. The low-dimensional orthogonal matrix decomposed by the SVD technology reduces noise on the basis of the original matrix, and can more effectively reveal potential association between users and commodities. In various recommendation technologies, the collaborative filtering algorithm does not need too much knowledge in specific fields, avoids complex information collection and content analysis processes, is easy to realize in engineering, and can be conveniently applied to products. Thus, collaborative filtering has become the most widely used and popular recommendation technique in the traditional recommendation field.
The above conventional recommendation techniques ignore the influence of the time context in the point of interest recommendation on the sign-in behavior of the user. However, in fact, the time attribute is a very important context information in the point of interest recommendation application scenario, and the sign-in habit of the user is always closely related to the time attribute. From a macroscopic perspective, the user's favor of points of interest can be affected by the surrounding large-time environment, for example, the beauty platform recommends a dumpling shop for the user in winter, and the travel network recommends a water park for the user in summer. More importantly, user preferences migrate over time, for example, users prefer to go to KTV and movie theatres before, but recently like to go to bookstores and coffee shops. In addition to the above macro features, the fine-grained time effect can better reflect the sign-in preference of the user in a specific time period, for example, the interest points of the catering are accessed most at about 12 points and 18 points, and the popularity of the bar rises from 21 points onwards. How to introduce time information into a recommendation algorithm and provide a suitable point-of-interest recommendation list for a user in a specific time period has become an urgent need for various social application platforms.
At present, some recommendation systems integrate time context into the point of interest recommendation problem, but the existing time-aware point of interest recommendation systems still have some drawbacks and disadvantages, which are summarized as follows:
(1) The related research of the point-of-interest recommendation technology based on the time feature is still relatively less compared with the recommendation technology considering other category contexts such as social relations, geographic features and the like. Most of the point-of-interest recommendation technologies are not good at handling dynamically changing user demands, are difficult to support the correction and adjustment of user preferences generated over time, and cannot give the point-of-interest recommendation results most in line with the current time situation in real time.
(2) The time-dimensional dynamic features of user similarity are ignored. When the user similarity is calculated in the existing research, the time dimension dynamic characteristics of the user similarity are not considered, and the same similarity matrix is shared in different time periods. However, in reality, user similarity may change over time. For example, at noon on a workday, a user often accesses a restaurant near a unit with a colleague where the similarity between the user and the colleague is higher than the similarity between the user and a family, whereas after coming home from work, the user often accesses a supermarket near an address with a family where the similarity between the user and the family is higher. Thus, the use of global user similarity at different times is not in line with the fact law.
(3) Data sparseness problem of user-time-point of interest three-dimensional matrix. The number of addresses visited by the user is very small compared to thousands of geographic locations in a location social network, which results in a very sparse scoring matrix itself. The problem of data sparseness is more pronounced in point of interest recommendation systems that consider space-time context. This is because, in order to explore the behavior pattern of the user in the target period, the present sparse check-in data set needs to be further divided into several subsets according to the time axis, which undoubtedly aggravates the sparseness of the scoring matrix. Therefore, a method capable of alleviating the data sparseness problem must be studied to improve the accuracy and reliability of the recommended results over a certain period of time.
The defects of the conventional time-aware interest point recommendation technology are caused by great defects in the design, development, deployment and operation of social network platforms at different positions, and particularly the service quality of a recommendation system is reduced on the network platform with massive project information, so that the sales performance of an electronic commerce system is affected.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims at constructing an interest point recommendation system with accurate recommendation results, which can generate an interest point list in real time according to time points, and provides a time-aware self-adaptive interest point recommendation method based on K-means clustering. Meanwhile, in consideration of the difference and correlation of user sign-in data characteristics in different time slots, the invention innovatively provides an analysis mode of the distance from a time point to a clustering center, adopts a K-means clustering method to mine the correlation between the time slots, relieves the sparse problem of high-dimensional sign-in data through time clustering, improves the effectiveness of scoring prediction, and strengthens the service quality of a recommendation system.
The technical scheme adopted for solving the technical problems is as follows: dividing a day into 24 time slots, respectively counting the number of checked-in users, the number of accessed interest points and the number of checked-in times in each time slot according to time tags, and carrying out K-means clustering on the time slots based on the third-order data characteristics; calculating the similarity of the users in different time slots according to the time clustering result and the historical sign-in information of the users; the scoring method of the traditional UBCF algorithm is improved by utilizing time clustering, so that the scoring method can adaptively generate the interest point prediction scores according to the time slots; the predictive scores of all non-visited addresses are ranked, and the top ranked addresses are selected for recommendation to the user (as shown in fig. 1).
The method comprises the following specific processes:
step 1: the original sign-in data set of the user is collected and arranged and converted into a three-dimensional scoring matrix of the user-time-interest points.
Step 2: counting the number of checked-in users, the number of accessed interest points and the number of checked-in times in each time slot. And constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set.
Step 3: based on the statistical result of the second step, clustering the time slots by adopting a K-means method. And calculating the time similarity between the time slots in the same cluster.
Step 4: and according to the basic principles of high similarity in clusters and low similarity among clusters, calculating the user similarity at the current recommended time by reasonably utilizing the scoring information in other time slots in the same time cluster.
Step 5: and improving the scoring method of the traditional collaborative filtering algorithm based on the user by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate a point-of-interest prediction score according to the current recommendation time, and recommending a plurality of non-access addresses with top ranking of the current time to the user.
Step 6: and evaluating the recommendation quality by using the recommendation precision index, and comparing the recommendation precision of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology.
The beneficial effects are that:
(1) The time-aware self-adaptive interest point recommendation method based on the K-means clustering can generate a real-time interest point recommendation list for the user according to the current behavior habit of the user and the current fashion trend of the interest points at any time, and can help merchants to accurately push advertisements for the user, so that more potential consumers are attracted.
(2) The method creatively clusters time, digs time-dimensional dynamic characteristics of user similarity, searches different similar crowds for users at different times, and the 'time-varying' adjacent user searching mode is more in line with preference change of users in reality, thereby greatly improving the use satisfaction degree of the users on a social network platform, increasing the accuracy and the interpretability of a recommendation system and having very important significance for practical application.
(3) The time is clustered by the K-means method, so that the sharing of scoring data of all time slots in the cluster is realized, the similarity between the time slots is fully mined, and the data sparseness problem of a high-order scoring matrix is relieved. The method has certain universality and portability, can be applied to not only the interest point recommendation system, but also the personalized recommendation field of other traditional projects, and has wide industrial application prospect.
Drawings
FIG. 1 is a flowchart of a time-aware adaptive interest point recommendation method based on K-means clustering.
Fig. 2 is a flowchart of specific steps of a time-aware adaptive interest point recommendation method based on K-means clustering.
FIG. 3 is a schematic diagram of check-in records of a user in a location social network in an embodiment of the present invention.
Fig. 4 is a schematic diagram of statistics of the number of checked-in users, the number of points of interest to be accessed, and the number of check-ins in each time slot in the embodiment of the present invention.
FIG. 5 is a graph showing K-means clustering results for all time slots in an embodiment of the present invention.
FIG. 6 is a bar graph comparing accuracy Precision of a recommendation algorithm and a classical user-based collaborative filtering (UBCF), social relationship-based collaborative filtering (SCF) algorithm in an embodiment of the present invention.
FIG. 7 is a histogram of Recall contrast for a recommendation algorithm and a classical user-based collaborative filtering (UBCF), social relationship-based collaborative filtering (SCF) algorithm in an embodiment of the present invention.
FIG. 8 is a bar graph of the comparison of the integrated accuracy index F1 values of a recommendation algorithm and a classical user-based collaborative filtering (UBCF), social relationship-based collaborative filtering (SCF) algorithm in an embodiment of the present invention.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings and specific examples.
The specific flow of the design and implementation of the invention is shown in figure 2, and the main variables and parameters in the process are shown in table 1.
TABLE 1 Functions of the main variables and parameters
First, the original sign-in data set of the user is collected and arranged and converted into a three-dimensional scoring matrix of the user-time-interest points. The operation steps are as follows:
(1. A) sorting the original check-in data set C of the user to obtain n check-in records, denoted as C= { C 1 ,c 2 ,…,c n }. Each check-in record is formed as a user ID, check-in time, geographic latitude, geographic longitude, and point of interest ID quintuple. All user sets in the check-in dataset are denoted by U, all interest point sets by L, NU and NL are the number of users and interest points, respectively.
(1. B) dividing the time of day into 24 discrete time slots, the set of time slots being denoted t= {0,1,2, …,23}. And rounding the check-in time in each check-in record to obtain the value (tE [0,23 ]) of the corresponding time slot t.
(1. C) counting check-in times of five-tuple set of check-in records, generating corresponding four-tuple (u) for each pair of user-time-interest points i ,t,l j ,n i,t,j ) Wherein u is i Is the ith user (i.e. [1, NU)]),l j Is the j-th interest point (j E [1, NL)]) T is the time slot value (t E) obtained by rounding the time point in the check-in record[0,23]),n i,t,j Is user u i Accessing point of interest l at time slot t j Is a number of times (1).
(1. D) user u i Accessing point of interest l at time slot t j Number of check-ins n i,t,j Conversion to user u i At time slot t, point of interest l j Score r of (2) i,t,j . If user u i Go past the interest point l in the time slot t j Score r i,t,j =1; conversely, r i,t,j =0:
Wherein r is i,t,j Representing user u i For address l at time slot t j Score of n i,t,j Representing user u i Accessing a point of interest l at time slot t j Is a number of times (1).
Summarizing all scores to form a user-time-interest point three-dimensional scoring matrix R= { R i,t,j },i∈[1,NU],t∈[0,23],j∈[1,NL]Wherein i denotes a user number, t denotes a time slot value, j denotes an address number, NU denotes a total number of users, NL denotes a total number of points of interest, r i,t,j Representing user u i For address l at time slot t j Is a score of (2).
And secondly, counting the number of check-in users, the number of accessed interest points and the number of check-in times in each time slot. And constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set. The specific operation steps are as follows:
(2. A) counting the number of users Unum whose check-in actions occur in the time slot t in the check-in data set t :
Where U is a user in the location social network, U represents all user sets in the check-in dataset, and the isCheck function represents whether user U has a check-in behavior in time slot t:
where L is a point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, r u,t,l Representing the score of user u for address l at time slot t.
(2. B) counting the number of points of interest Pnum in which the check-in data is concentrated in the time slot t to be accessed t :
Where L is a certain point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, and the ischcocked function represents whether the point of interest L is accessed within the time slot t:
where U is a user in the location social network, U represents a collection of all users in the check-in dataset, r u,t,l Representing the score of user u for address l at time slot t.
(2. C) counting the total number of check-ins Cnum in which the check-in data is concentrated in the time slot t t :
Where n is the number of check-in records in the check-in dataset C, and the isTime function represents the ith check-in record C i Whether it occurs within time slot t:
wherein, time is i in t represents the ith check-in record c i Is the time of check-in time of (C) i The corresponding time slot is t.
(2. D) constructing the three-dimensional check-in feature vector x for each time slot t based on the above statistical result t ={Unum t ,Pnum t ,Cnum t Form a time slot sign-in data feature set x= { X 0 ,x 1 ,…,x 23 }. Wherein t is [0,23 ]],Unum t The number of users, pnum, who have checked-in the time slot t t Is the number of points of interest accessed in time slot t, cnum t Is the total number of check-ins that occur in time slot t.
And thirdly, clustering the time slots by adopting a K-means method based on the statistical result of the second step. And calculating the time similarity between the time slots in the same cluster. The implementation steps are as follows:
(3. A) clustering the 24 time slots by adopting a K-means method with simple algorithm and high convergence speed to generate nc clustering centers Cen= { Cen 1 ,cen 2 ,…,cen nc }(nc∈[2,24])。
(3.b) for any two time slots t and t' in each set of temporal clusters, calculating a temporal similarity between the two:
Where U is a user in the location social network, U is a set of all users in the check-in data set, L is a point of interest in the location social network, L is a set of all points of interest in the check-in data set, r u,t,l Representing the score of user u to address l at time slot t, r u,t',l Representing the score of user u to address l at time slot t', NU represents the total number of users in the check-in dataset.
And fourthly, calculating the user similarity at the current recommended time by reasonably utilizing the scoring information in other time slots in the same time cluster according to the basic principles of high similarity in the clusters and low similarity among the clusters. The implementation steps are as follows:
(4. A) selecting a target user u in the location social network t As a recommended service object, the current recommended time is used for time r Conversion to time slot t r 。
(4. B) determining the time slot t based on the clustering result r Belonging cluster cen j And the number of time slots nj in the cluster, denoted cen j ={t r ,t 2 ,t 3 ,…,t nj }. Computing active user u t And other users v in time slot t r User similarity at time:
wherein u is t Is the target object of the current service of the recommendation system, v is one other user in the location social network, t r Is the time slot corresponding to the current recommended time, and nj is the time slot t r The cluster cen j In the data set, NL represents the total number of points of interest in the check-in data set, r ut,cenj[a],l Representing target user u t At cluster cen j Other time slots cen j [a]The point of interest i is scored at the time,representing that user v is clustered in cen j Other time slots cen j [b]Scoring the interest point l, a E [1, nj],b∈[1,nj]。
And fifthly, improving the scoring method of the traditional collaborative filtering algorithm based on the user by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate interest point prediction scores according to the current recommendation time, and recommending a plurality of non-access addresses with the top ranking of the current time for the user. The implementation steps are as follows:
(5.a) determining a target user u in a location social network t As a recommended service object, the current recommended time is used for time r Conversion to time slot t r 。
(5. B) determining the time slot t based on the clustering result r Belonging cluster cen j And the number of time slots nj in the cluster, denoted cen j ={t r ,t 2 ,t 3 ,…,t nj }。
(5. C) calculating the target user u t At t r Prediction score for time access point of interest/:
wherein u is t Is a target object of the current service of the recommendation system, t r Is the time slot corresponding to the current recommended time, l is an interest point which is not visited by the target user in the location social network, v is one other user in the location social network, U represents all user sets, sim (U) t ,v,t r ) Representing user u t And user v in time slot t r User similarity at time, nj is time slot t r The cluster cen j In the number of time slots in (a),representing that user v is at time cen j [i]Scoring the interest point l, i E [1, nj],timesimi(t r ,cen j [i]) Representing the current time t r With other times cen j [i]Similarity between them.
(5. D) for target user u t All addresses which are not accessed are ordered according to predictive scores, N positions which are ranked at the top are formed into a recommendation list, and the recommendation list TopNList is formed t And returning to the target user.
And sixthly, evaluating the recommendation quality by using the recommendation precision index, and comparing the recommendation precision of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology. The implementation steps are as follows:
(6.a) randomly selecting NU×10% users from the target data set as a target user set AU, running a respective recommendation algorithm for each target user in the set, and generating a recommendation list. Where NU represents the total number of users in the check-in dataset.
And (6. B) evaluating the accuracy of each recommendation system by using the accuracy indexes, wherein the values of the accuracy Precision, recall ratio Recall and comprehensive accuracy index F1 of each algorithm running once for the target user set AU are the average value of the indexes of all users in the AU set.
(6. C) repeating steps Ntimes (6.a) and (6. B), i.e., all algorithms run independently Ntimes.
(6.d) the values of the Precision, recall, and integrated Precision index F1 of the set recommendation algorithm are the average of the results of the Ntime runs.
(6.e) comparative analysis of each index results: if the accuracy of the time-aware self-adaptive interest point recommendation algorithm based on the K-means clustering is larger than the accuracy of other recommendation algorithms, the accuracy of the technology provided by the invention for hitting user favorite items is higher; if the Recall ratio Recall of the algorithm provided by the invention is larger than the Recall values of other recommended algorithms, the technical query capability provided by the invention is stronger; if the comprehensive precision index F1 value of the algorithm provided by the invention is larger than the F1 values of other recommendation algorithms, the technology provided by the invention has stronger comprehensive capacity in the aspect of recommendation precision.
In the following, a specific social network based on location is taken as an example to describe in detail how the time-aware adaptive interest point recommendation method based on K-means clustering in the present invention operates.
Gowalla is a location-based social networking service provider where users share their locations by checking in. The Gowalla dataset collected social relationship and check-in information for 196591 users on the website during 2 months 2009 through 10 months 2010. The number of the points of interest in the Gowalla dataset is 1256379, the number of check-in records of users on the points of interest is 6442892, and 950327 social relations are formed among the users. The Gowalla dataset has become one of the most commonly used test datasets by recommendation system researchers.
The invention selects check-in data of five hot areas of Los Angeles, san Francisco, new York, maricopa and King in Gowalla dataset as an example for illustration.
The first step, collecting and sorting the original sign-in data set of the user, converting the original sign-in data set into a three-dimensional scoring matrix of the user-time-interest points, and the operation steps are as follows:
(1. A) collecting and sorting user check-in data of Los Angeles, san Francisco, new York, maricopa and King regions in an example dataset Gowalla, obtaining a check-in dataset C consisting of 50007 historical access records of 1572 users at 1420 addresses, denoted as C= { C 1 ,c 2 ,…,c 50007 }. A schematic diagram of historical access records of users in a location social network in a Gowalla dataset is shown in FIG. 3. 13864 social relations are formed among the users, the average number of check-in records of each user is 31.81, the average number of social relations of each user is 8.82, and the average number of times that each interest point is accessed is 35.22.
Each check-in record is formed as a user ID, check-in time, geographic latitude, geographic longitude, and point of interest ID quintuple. All user sets in the check-in dataset are denoted by U, all interest point sets are denoted by L, the number of users NU is 1572, and the number of interest points NL is 1420.
(1. B) dividing the time of day into 24 discrete time slots, the set of time slots being denoted t= {0,1,2, …,23}. And rounding the check-in time in each check-in record to obtain the value (tE [0,23 ]) of the corresponding time slot t. For example, the time slot corresponding to the check-in time=15:13:23 is t=15, and the time slot corresponding to the check-in time=00:11:20 is t=0.
(1. C) counting check-in times of five-tuple set of check-in records, generating corresponding four-tuple (u) for each pair of user-time-interest points i ,t,l j ,n i,t,j ) Wherein u is i Is the ith user (i e 1,1572]),l j Is the j-th interest point (j E [1,1420)]) T is the value of the time slot obtained by rounding the time point in the check-in record (t e [0,23)]),n i,t,j Is user u i Accessing point of interest l at time slot t j Is a number of times (1).
(1. D) user u i Accessing point of interest l at time slot t j Number of check-ins n i,t,j Conversion to user u i At time slot t, point of interest l j Score r of (2) i,t,j . If user u i Go past the interest point l in the time slot t j Score r i,t,j =1; conversely, r i,t,j =0:
Wherein r is i,t,j Representing user u i For address l at time slot t j Score of n i,t,j Representing user u i Accessing a point of interest l at time slot t j Is a number of times (1).
Summarizing all scores to form a user-time-interest point three-dimensional scoring matrix R= { R i,t,j },i∈[1,1572],t∈[0,23],j∈[1,1420]Where i denotes the user number, t denotes the value of the time slot, j denotes the address number, r i,t,j Representing user u i For address l at time slot t j Is a score of (2).
And secondly, counting the number of check-in users, the number of accessed interest points and the number of check-in times in each time slot. And constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set. The specific operation steps are as follows:
(2. A) counting the number of users Unum whose check-in actions occur in the time slot t in the check-in data set t :
Where U is a user in the location social network, U represents all user sets in the check-in dataset, and the isCheck function represents whether user U has a check-in behavior in time slot t:
where L is a point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, r u,t,l Representing the score of user u for address l at time slot t.
(2. B) counting the number of points of interest Pnum in which the check-in data is concentrated in the time slot t to be accessed t :
Where L is a certain point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, and the ischcocked function represents whether the point of interest L is accessed within the time slot t:
Where U is a user in the location social network, U represents a collection of all users in the check-in dataset, r u,t,l Representing the score of user u for address l at time slot t.
(2. C) counting the total number of check-ins Cnum in which the check-in data is concentrated in the time slot t t :
Where n is the number of check-in records in the check-in dataset C, and the isTime function represents the ith check-in record C i Whether it occurs within time slot t:
wherein, time is i in t represents the ith check-in record c i Is the time of check-in time of (C) i The corresponding time slot is t.
The statistics of the number of checked-in users, the number of accessed interest points and the number of checked-in times of each time slot are shown in fig. 4.
(2. D) constructing the three-dimensional check-in feature vector x for each time slot t based on the above statistical result t ={Unum t ,Pnum t ,Cnum t Form a time slot sign-in data feature set x= { X 0 ,x 1 ,…,x 23 }. Wherein t is [0,23 ]],Unum t The number of users, pnum, who have checked-in the time slot t t Is the number of points of interest accessed in time slot t, cnum t Is the total number of check-ins that occur in time slot t.
And thirdly, clustering the time slots by adopting a K-means method based on the statistical result of the second step. And calculating the time similarity between the time slots in the same cluster. The implementation steps are as follows:
(3. A) clustering the 24 time slots by adopting a K-means method with simple algorithm and high convergence speed to generate 3 clusters, cen= { Cen 1 ,cen 2 ,cen 3 }. Wherein the first cluster time slot set is {7,8,9,10,11,12,13}, the second cluster time slot set is {0,1,2,3,16,17,18,19,20,21,22,23}, and the third cluster time slot set is {4,5,6,14,15}. A graph of K-means clustering results for 24 time slots is shown in FIG. 5.
(3.b) calculating the temporal similarity between any two time slots t and t' in the three time cluster sets:
where U is a user in the location social network, U is a set of all users in the check-in data set, L is a point of interest in the location social network, L is a set of all points of interest in the check-in data set, r u,t,l Representing the score of user u to address l at time slot t, r u,t',l Representing the score of user u for address l at time slot t'.
And fourthly, calculating the user similarity at the current recommended time by reasonably utilizing the scoring information in other time slots in the same time cluster according to the basic principles of high similarity in the clusters and low similarity among the clusters. The implementation steps are as follows:
(4. A) selecting a target user u in the location social network t As a recommended service object, the current recommended time is used for time r Conversion to time slot t r . Assume the current time of day r 20:14:13, then corresponding time slot t r 20.
(4. B) determining the time slot t based on the clustering result r Belonging cluster cen j And the number of time slots nj in the cluster, denoted cen j ={t r ,t 2 ,t 3 ,…,t nj }. For example, when time slot t is recommended r At 20, the cluster is cen j = {20,0,1,2,3,16,17,18,19,21,22,23}, the number of time slots in this cluster is 12 (nj=12).
Computing active user u t And other users v in time slot t r User similarity at time:
wherein u is t Is the target object of the current service of the recommendation system, v is one other user in the location social network, t r Is the time slot corresponding to the current recommended time, and nj is the time slot t r The cluster cen j In the number of time slots in (a),representing target user u t At cluster cen j Other time slots cen j [a]Scoring the interest point l at the time, +.>Representing that user v is clustered in cen j Other time slots cen j [b]Scoring the interest point l, a E [1, nj],b∈[1,nj]。
And fifthly, improving the scoring method of the traditional collaborative filtering algorithm based on the user by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate interest point prediction scores according to the current recommendation time, and recommending a plurality of non-access addresses with the top ranking of the current time for the user. The implementation steps are as follows:
(5.a) determining a target user u in a location social network t As a recommended service object, the current recommended time is used for time r Conversion to time slot t r 。
(5. B) determining the time slot t based on the clustering result r Belonging cluster cen j And the number of time slots nj in the cluster, denoted cen j ={t r ,t 2 ,t 3 ,…,t nj }。
(5. C) calculating the target user u t At t r Prediction score for time access point of interest/:
wherein u is t Is a target object of the current service of the recommendation system, t r Is the time slot corresponding to the current recommended time, l is an interest point which is not visited by the target user in the location social network, v is one other user in the location social network, U represents all user sets, sim (U) t ,v,t r ) Representing user u t And user v in time slot t r User similarity at time, nj is time slot t r The cluster cen j In the number of time slots in (a),representing that user v is at time cen j [i]Scoring the interest point l, i E [1, nj],timesimi(t r ,cen j [i]) Representing the current time t r With other times cen j [i]Similarity between them.
(5. D) for target user u t All addresses which are not accessed are ordered according to predictive scores, N positions which are ranked at the top are formed into a recommendation list, and the recommendation is formedList TopNList t And returned to the target user (N can be a multiple of 5, and N is more than or equal to 5 and less than or equal to 50 in general cases).
And sixthly, evaluating the recommendation quality by using the recommendation precision index, and comparing the recommendation precision of the recommendation system and other classical recommendation systems provided by the invention, and evaluating the accuracy and effectiveness of the proposed technology. The implementation steps are as follows:
(6.a) randomly selecting 157 users from the target data set as a target user set AU, and respectively running a time-aware self-adaptive interest point recommendation algorithm, a classical user-based collaborative filtering algorithm UBCF and a social relationship-based collaborative filtering algorithm SCF for each target user in the set to generate a recommendation list.
And (6. B) evaluating the accuracy of each recommendation system by using the accuracy indexes, wherein the values of the accuracy Precision, recall ratio Recall and comprehensive accuracy index F1 of each algorithm running once for the target user set AU are the average value of the indexes of all users in the AU set.
(6. C) repeating steps (6.a) and (6. B) 100 times, i.e., all algorithms run independently 100 times.
(6.d) setting the values of the accuracy Precision, recall and comprehensive Precision index F1 of the recommendation algorithm and UBCF and SCF algorithms proposed by the invention to be the average value of 100 running results. When N takes different values, the results of Precision, recall, and integrated Precision index F1 of each recommendation algorithm are shown in tables 2, 3, and 4, respectively, where the value of each row with the bold format represents the maximum value of the row index:
TABLE 2 Precision index values for different recommendation algorithms
Table 3 Recall index values for different recommendation algorithms
TABLE 4 recommendation precision F1 index values of different recommendation algorithms
The histogram of the comparison of the accuracy Precision, recall, and integrated accuracy index F1 of the recommended algorithm and the classical UBCF, SCF algorithms in this case are shown in fig. 6, 7, and 8, respectively.
(6.e) comparative analysis of each index results: the accuracy of the time perception self-adaptive interest point recommendation algorithm based on the K-means clustering is larger than that of other recommendation algorithms, so that the accuracy of the technology provided by the invention for hitting user favorite items is higher; the Recall rate Recall of the algorithm provided by the invention is larger than the Recall value of other recommended algorithms, which shows that the technical query capability of the algorithm provided by the invention is stronger; the comprehensive precision index F1 value of the algorithm provided by the invention is larger than the F1 values of other recommendation algorithms, which shows that the technology provided by the invention has stronger comprehensive capability in the aspect of recommendation precision.
Different from the conventional interest point recommendation algorithm, the method aims at constructing the interest point recommendation system which can generate an interest point list according to time points in real time and has accurate recommendation results, considers the difference and the correlation of user sign-in data characteristics in different time slots, innovatively provides an analysis mode of the distance from the time points to a clustering center, adopts a K-means clustering method to mine the correlation between the time slots, relieves the sparse problem of high-dimensional sign-in data through the time clustering, improves the accuracy and the effectiveness of scoring prediction, and strengthens the service quality of the recommendation system. The technology provided by the invention has wide application prospect and is expected to be widely applied to the social network market based on the position.
The above technical process is only a preferred embodiment of the present invention, but not represents all the details of the present invention. Any modification, equivalent replacement, and improvement made by those skilled in the art within the scope of the present disclosure, which is within the spirit and principles of the present invention, should be included in the scope of the present invention.
Claims (7)
1. A time perception self-adaptive interest point recommendation method based on K-means clustering is characterized by comprising the following steps:
step 1: collecting and sorting an original sign-in data set of a user, and converting the original sign-in data set into a three-dimensional scoring matrix of the user-time-interest point;
step 2: counting the number of checked-in users, the number of accessed interest points and the number of checked-in times in each time slot; constructing a three-dimensional sign-in feature vector of each time slot based on the statistical result to form a time slot sign-in data feature set;
step 3: based on the statistical result of the second step, clustering the time slots by adopting a K-means method, and calculating the time similarity between the time slots in the same cluster;
step 4: according to the basic principles of high similarity in clusters and low similarity among clusters, calculating the similarity of users at the current recommended time by reasonably utilizing scoring information in other time slots in the same time cluster;
Step 5: the scoring method of the traditional collaborative filtering algorithm based on the user is improved by utilizing the time clustering result and the time similarity in the clusters, so that the scoring method can adaptively generate a point-of-interest prediction score according to the current recommendation time, and a plurality of non-access addresses with the top ranking of the current time are recommended to the user;
step 6: and evaluating the recommendation quality by using a recommendation precision index, and comparing the recommendation precision with the recommendation precision of other classical recommendation systems to evaluate the accuracy and the effectiveness of the proposed technology.
2. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 1 of the method comprises:
step 11: the original check-in data set C of the user is arranged to obtain n check-in records, and the n check-in records are recorded as C= { C 1 ,c 2 ,…,c n -a }; each sign-in record is formed into a user ID, a sign-in time,Geographic latitude, geographic longitude, and point of interest ID quintuple; all user sets in the sign-in dataset are represented by U, all interest point sets are represented by L, and NU and NL are the number of users and interest points respectively;
step 12: dividing the time of day into 24 discrete time slots, the set of time slots being denoted t= {0,1,2, …,23}; rounding the check-in time in each check-in record to obtain the value of the corresponding time slot t, and t epsilon [0,23];
Step 13: counting check-in times of five-tuple set of check-in records, and generating corresponding four-tuple u for each pair of user-time-interest points i ,t,l j ,n i,t,j Wherein u is i Is the ith user (i.e. [1, NU)]),l j Is the j-th interest point, j is E [1, NL]T is the value of the time slot obtained by rounding the time point in the check-in record, t is epsilon [0,23 ]],n i,t,j Is user u i Accessing point of interest l at time slot t j Is a number of times (1);
step 14: user u i Accessing point of interest l at time slot t j Number of check-ins n i,t,j Conversion to user u i At time slot t, point of interest l j Score r of (2) i,t,j If user u i Go past the interest point l in the time slot t j Score r i,t,j =1; conversely, r i,t,j =0:
Wherein r is i,t,j Representing user u i For address l at time slot t j Score of n i,t,j Representing user u i Accessing a point of interest l at time slot t j Is a number of times (1);
summarizing all scores to form a user-time-interest point three-dimensional scoring matrix R= { R i,t,j },i∈[1,NU],t∈[0,23],j∈[1,NL]Wherein i denotes a user number, t denotes a time slot value, j denotes an address number, NU denotes a total number of users, NL denotes a total number of points of interest, r i,t,j Representation ofUser u i For address l at time slot t j Is a score of (2).
3. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 2 of the method comprises:
Step 21: counting the number of users Unum whose check-in actions occur in the time slot t in the check-in data set t :
Unum t =∑ u∈U isCheck(u,t) (2)
Where U is a user in the location social network, U represents all user sets in the check-in dataset, and the isCheck function represents whether user U has a check-in behavior in time slot t:
where L is a point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, r u,t,l A score representing the address l of user u at time slot t;
step 22: counting the number of points of interest Pnum in which the check-in data is accessed in time slot t t :
Pnum t =∑ l∈L isChecked(l,t) (4)
Where L is a certain point of interest in the location social network, L represents a set of all points of interest in the check-in dataset, and the ischcocked function represents whether the point of interest L is accessed within the time slot t:
where U is a user in the location social network, U represents a collection of all users in the check-in dataset, r u,t,l A score representing the address l of user u at time slot t;
step 23: statistics check-in dataThe total number of check-ins Cnum occurring in time slot t t :
Where n is the number of check-in records in the check-in dataset C, and the isTime function represents the ith check-in record C i Whether it occurs within time slot t:
Wherein, time is i in t represents the ith check-in record c i Is the time of check-in time of (C) i The corresponding time slot is t;
step 24: based on the statistical result, constructing a three-dimensional sign-in feature vector x of each time slot t t ={Unum t ,Pnum t ,Cnum t Form a time slot sign-in data feature set x= { X 0 ,x 1 ,…,x 23 T e [0,23 ]],Unum t The number of users, pnum, who have checked-in the time slot t t Is the number of points of interest accessed in time slot t, cnum t Is the total number of check-ins that occur in time slot t.
4. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein the step 3 comprises:
step 31: the 24 time slots are clustered by adopting a K-means method with simple algorithm and high convergence speed, and nc cluster centers Cen= { Cen are generated 1 ,cen 2 ,…,cen nc }(nc∈[2,24]);
Step 32: for any two time slots t and t' in each time cluster set, calculating the time similarity between the two time slots:
where U is a user in the location social network, U is a set of all users in the check-in data set, L is a point of interest in the location social network, L is a set of all points of interest in the check-in data set, r u,t,l Representing the score of user u to address l at time slot t, r u,t',l Representing the score of user u to address l at time slot t', NU represents the total number of users in the check-in dataset.
5. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 4 of the method comprises:
step 41: selecting a target user u in a location social network t As a recommended service object, the current recommended time is used for time r Conversion to time slot t r ;
Step 42: determining a time slot t according to the clustering result r Belonging cluster cen j And the number of time slots nj in the cluster, denoted cen j ={t r ,t 2 ,t 3 ,…,t nj Computing active user u t And other users v in time slot t r User similarity at time:
wherein u is t Is the target object of the current service of the recommendation system, v is one other user in the location social network, t r Is the time slot corresponding to the current recommended time, and nj is the time slot t r The cluster cen j NL denotes the total number of points of interest in the check-in dataset,representing target user u t At cluster cen j Other time slots cen j [a]Scoring the interest point l at the time, +.>Representing that user v is clustered in cen j Other time slots cen j [b]Scoring the interest point l, a E [1, nj],b∈[1,nj]。
6. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein step 5 of the method comprises:
Step 51: determining a target user u in a location social network t As a recommended service object, the current recommended time is used for time r Conversion to time slot t r ;
Step 52: determining a time slot t according to the clustering result r Belonging cluster cen j And the number of time slots nj in the cluster, denoted cen j ={t r ,t 2 ,t 3 ,…,t nj };
Step 53: calculating the target user u t At t r Prediction score for time access point of interest/:
wherein u is t Is a target object of the current service of the recommendation system, t r Is the time slot corresponding to the current recommended time, l is an interest point which is not visited by the target user in the location social network, v is one other user in the location social network, U represents all user sets, sim (U) t ,v,t r ) Representing user u t And user v in time slot t r User similarity at time, nj is time slot t r The cluster cen j In the number of time slots in (a),representing that user v is at time cen j [i]Scoring the interest point l, i E [1 ],nj],timesimi(t r ,cen j [i]) Representing the current time t r With other times cen j [i]Similarity between;
step 54: for target user u t All addresses which are not accessed are ordered according to predictive scores, N positions which are ranked at the top are formed into a recommendation list, and the recommendation list TopNList is formed t And returning to the target user.
7. The K-means clustering-based time-aware adaptive interest point recommendation method according to claim 1, wherein said step 6 comprises:
Step 61: randomly selecting NU×10% users from a target data set as a target user set AU, and running each recommendation algorithm for each target user in the set to generate a recommendation list, wherein NU represents the total number of users in the signed-in data set;
step 62: using the Precision index to evaluate the accuracy of each recommendation system, wherein the values of the Precision, recall rate Recall and comprehensive Precision index F1 of the target user set AU running once by each algorithm are the average value of the index of all users in the AU set;
step 63: repeating steps (6.a) and (6. B) Ntimes, i.e., all algorithms run independently Ntimes;
step 64: setting the values of the Precision, recall rate Recall and comprehensive Precision index F1 of a recommendation algorithm as the average value of Ntime running results;
step 65: comparing and analyzing the results of each index: if the Precision of the time perception self-adaptive interest point recommendation algorithm based on the K-means clustering is larger than the Precision of other recommendation algorithms, the accuracy of the user preference hit by the technology is higher; if the Recall ratio Recall is larger than the Recall values of other recommendation algorithms, the technical query capability is higher; and if the comprehensive precision index F1 value is larger than the F1 values of other recommendation algorithms, the comprehensive capability of the technology in the aspect of recommendation precision is higher.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210071029.3A CN114528480A (en) | 2022-01-21 | 2022-01-21 | Time-sensing self-adaptive interest point recommendation method based on K-means clustering |
CN2022100710293 | 2022-01-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116166878A true CN116166878A (en) | 2023-05-26 |
Family
ID=81620186
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210071029.3A Pending CN114528480A (en) | 2022-01-21 | 2022-01-21 | Time-sensing self-adaptive interest point recommendation method based on K-means clustering |
CN202211571570.7A Pending CN116166878A (en) | 2022-01-21 | 2022-12-08 | Time perception self-adaptive interest point recommendation method based on K-means clustering |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210071029.3A Pending CN114528480A (en) | 2022-01-21 | 2022-01-21 | Time-sensing self-adaptive interest point recommendation method based on K-means clustering |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN114528480A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117635237A (en) * | 2023-12-22 | 2024-03-01 | 广州方块网络技术有限公司 | Advertisement management system based on SaaS information flow and cross-platform crowd data |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115408618B (en) * | 2022-09-26 | 2023-10-20 | 南京工业职业技术大学 | Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features |
CN115687801B (en) * | 2022-09-27 | 2024-01-19 | 南京工业职业技术大学 | Position recommendation method based on position aging characteristics and time perception dynamic similarity |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657015A (en) * | 2017-09-26 | 2018-02-02 | 北京邮电大学 | A kind of point of interest recommends method, apparatus, electronic equipment and storage medium |
CN111104607A (en) * | 2018-10-25 | 2020-05-05 | 中国电子科技集团公司电子科学研究院 | Location recommendation method and device based on sign-in data |
CN114036376A (en) * | 2021-10-26 | 2022-02-11 | 南京理工大学紫金学院 | Time-aware self-adaptive interest point recommendation method based on K-means clustering |
-
2022
- 2022-01-21 CN CN202210071029.3A patent/CN114528480A/en active Pending
- 2022-12-08 CN CN202211571570.7A patent/CN116166878A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107657015A (en) * | 2017-09-26 | 2018-02-02 | 北京邮电大学 | A kind of point of interest recommends method, apparatus, electronic equipment and storage medium |
CN111104607A (en) * | 2018-10-25 | 2020-05-05 | 中国电子科技集团公司电子科学研究院 | Location recommendation method and device based on sign-in data |
CN114036376A (en) * | 2021-10-26 | 2022-02-11 | 南京理工大学紫金学院 | Time-aware self-adaptive interest point recommendation method based on K-means clustering |
Non-Patent Citations (2)
Title |
---|
司亚利: "基于用户签到行为的自适应兴趣点推荐方法研究", 《中国博士学位论文全文数据库》 * |
陶永才等: "一种结合时间因子聚类的群组兴趣点推荐模型", 《小型微型计算机系统》, vol. 42, no. 02 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117635237A (en) * | 2023-12-22 | 2024-03-01 | 广州方块网络技术有限公司 | Advertisement management system based on SaaS information flow and cross-platform crowd data |
Also Published As
Publication number | Publication date |
---|---|
CN114528480A (en) | 2022-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Christensen et al. | Social group recommendation in the tourism domain | |
Chu et al. | A hybrid recommendation system considering visual information for predicting favorite restaurants | |
Isinkaye et al. | Recommendation systems: Principles, methods and evaluation | |
Mao et al. | Multiobjective e-commerce recommendations based on hypergraph ranking | |
CN116166878A (en) | Time perception self-adaptive interest point recommendation method based on K-means clustering | |
US20090259606A1 (en) | Diversified, self-organizing map system and method | |
US20120185481A1 (en) | Method and Apparatus for Executing a Recommendation | |
US20140280548A1 (en) | Method and system for discovery of user unknown interests | |
CN114036376A (en) | Time-aware self-adaptive interest point recommendation method based on K-means clustering | |
TW201447797A (en) | Method and system for multi-phase ranking for content personalization | |
EP2353103A2 (en) | Method and system for determining topical relatedness of domain names | |
CN115408618B (en) | Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features | |
Xia et al. | Vrer: context-based venue recommendation using embedded space ranking SVM in location-based social network | |
CN111475744B (en) | Personalized position recommendation method based on ensemble learning | |
Liang et al. | Collaborative filtering based on information-theoretic co-clustering | |
Yang et al. | Design and application of handicraft recommendation system based on improved hybrid algorithm | |
Yin et al. | A fuzzy clustering based collaborative filtering algorithm for time-aware POI recommendation | |
Mohamed et al. | Sparsity and cold start recommendation system challenges solved by hybrid feedback | |
Chen et al. | A restaurant recommendation approach with the contextual information | |
Wen-ying et al. | A new framework of a personalized location-based restaurant recommendation system in mobile application | |
Haruna et al. | Location-aware recommender system: a review of application domains and current developmental processes | |
Liu et al. | Using contextual information for service recommendation | |
CN115687801B (en) | Position recommendation method based on position aging characteristics and time perception dynamic similarity | |
Nath et al. | A pragmatic review on different approaches used in e-learning recommender systems | |
Karmakar | A Context-Aware Approach To Restaurant Recommendations: System Algorithm and Case Study |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |