CN114662015A - A method and system for point of interest recommendation based on deep reinforcement learning - Google Patents
A method and system for point of interest recommendation based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114662015A CN114662015A CN202210175716.XA CN202210175716A CN114662015A CN 114662015 A CN114662015 A CN 114662015A CN 202210175716 A CN202210175716 A CN 202210175716A CN 114662015 A CN114662015 A CN 114662015A
- Authority
- CN
- China
- Prior art keywords
- user
- poi
- interest
- reinforcement learning
- recommendation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002787 reinforcement Effects 0.000 title claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 53
- 230000006399 behavior Effects 0.000 claims abstract description 18
- 230000007246 mechanism Effects 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 13
- 238000005070 sampling Methods 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 5
- 239000013604 expression vector Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 125000004122 cyclic group Chemical group 0.000 claims 1
- 238000007781 pre-processing Methods 0.000 claims 1
- 230000000306 recurrent effect Effects 0.000 abstract description 9
- 238000012552 review Methods 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种基于深度强化学习的兴趣点推荐方法,融合用户连续签到行为序列上下文特征属性实现兴趣点推荐,实现过程包括获取用户历史签到数据,进行预处理,得到用户集合和兴趣点POI集合;排序得到用户连续签到行为序列数据,构建POI‑POI图GVV、POI‑功能区图GVZ和POI‑时间段图GVT;将用户连续签到行为序列通过嵌入层转换为用户特征向量;将GVV、GVZ和GVT通过联合图嵌入学习嵌入到同一潜在空间中,得到特征向量,串联后输入基于注意力机制的门控循环单元,生成用户近期的兴趣偏好特征向量;输入至基于深度强化学习Actor‑Critic框架的推荐模型中,得到Top‑k有序兴趣点推荐列表。本发明有效融合了用户签到序列信息、兴趣点的时空信息和类别信息,提高了推荐模型的准确率。
The present invention provides a method for recommending points of interest based on deep reinforcement learning, which integrates the contextual feature attributes of user's continuous check-in behavior sequence to realize the recommendation of points of interest. ; Sort to obtain the sequence data of user's continuous check-in behavior, construct POI-POI graph G VV , POI-functional area graph G VZ and POI-time period graph G VT ; convert the user's continuous check-in behavior sequence into user feature vector through the embedding layer; G VV , G VZ and G VT are embedded in the same latent space through joint graph embedding learning to obtain feature vectors, which are then connected to the gated recurrent unit based on the attention mechanism to generate the user's recent interest preference feature vector; In the recommendation model of the Reinforcement Learning Actor-Critic framework, the Top-k ordered interest point recommendation list is obtained. The invention effectively integrates user check-in sequence information, spatiotemporal information and category information of interest points, and improves the accuracy of the recommendation model.
Description
技术领域technical field
本发明涉及用户兴趣点自动推荐的电子信息技术领域,尤其涉及一种基于深度强化学习的兴趣点推荐方法。The invention relates to the field of electronic information technology for automatic recommendation of user interest points, in particular to a method for recommending interest points based on deep reinforcement learning.
背景技术Background technique
随着信息技术和互联网的发展,人们逐渐从信息匮乏的时代走入了信息过载的时代。在这个时代,无论是信息消费者还是信息生产者都遇到了很大的挑战:信息消费者,从大量信息中找到自己感兴趣的信息是一件非常困难的事情;对于信息生产者,让自己生产的信息脱颖而出,受到广大用户的关注,也是一件非常困难的事情。而用户在日常出行中,也会遇到“信息过载”问题——选择哪家餐厅、哪个商场等。这些问题与网上购物时遇到的商品选择信息过载问题类似。在电子商务领域,为解决用户的信息过载问题,推荐系统应运而生,它通过用户的兴趣偏好等信息,将用户可能感兴趣的内容推荐给用户。而面对出行时遇到的信息过载问题,亦有越来越多的兴趣点推荐系统的研究。兴趣点推荐系统可被描述为:利用人们的历史出行记录,为人们的未来出行提供建议的个性化信息推荐系统。With the development of information technology and the Internet, people have gradually entered the era of information overload from the era of information scarcity. In this era, both information consumers and information producers have encountered great challenges: it is very difficult for information consumers to find the information they are interested in from a large amount of information; for information producers, let themselves It is also very difficult for the information produced to stand out and attract the attention of the majority of users. In daily travel, users will also encounter the problem of "information overload" - which restaurant, which shopping mall to choose, etc. These problems are similar to the product selection information overload problem encountered when shopping online. In the field of e-commerce, in order to solve the problem of users' information overload, the recommendation system emerges as the times require. It recommends the content that the user may be interested in to the user through the user's interest preferences and other information. In the face of the information overload problem encountered during travel, there are more and more researches on point-of-interest recommendation systems. A point of interest recommendation system can be described as a personalized information recommendation system that uses people's historical travel records to provide recommendations for people's future travel.
POI推荐可以帮助用户探索特定场景下的生活服务,也可以为商家吸引顾客带来可观的经济效益。不同于传统的显示反馈推荐系统(如推荐新闻、电影、商品等线上物品),可以利用用户对物品的评分直接表达用户的兴趣偏好,隐式反馈通过用户的历史POI访问轨迹记录挖掘其潜在偏好,这增加了推荐的复杂性。POI recommendation can help users explore life services in specific scenarios, and can also bring considerable economic benefits to businesses to attract customers. Different from the traditional explicit feedback recommendation system (such as recommending online items such as news, movies, commodities, etc.), the user's rating of the item can be used to directly express the user's interest and preference, and the implicit feedback can mine its potential through the user's historical POI access track records. preference, which adds complexity to the recommendation.
POI推荐主要存在以下问题:1)相比于海量的线上点击和评分数据,POI推荐面临着更为严峻的数据稀疏性问题;2)推荐系统任务中会普遍遇到的冷启动问题,在室内POI推荐任务中主要为两类:从未被访问过的位置称为冷启动POI,从未访问过任何位置的用户被称为冷启动用户。3)用户动态偏好问题,即用户偏好会随着时间推移和所处环境的改变发生变化,另外由于时空异质性,POI推荐算法要适应不同的场景以及不同文化、教育、社会经济背景的用户。因此,有必要考虑包括时空约束、时空近邻等在内的多种影响因素,以提高该任务的推荐性能。POI recommendation mainly has the following problems: 1) Compared with massive online click and rating data, POI recommendation faces more severe data sparsity problems; 2) The cold start problem commonly encountered in recommender system tasks is There are two main types of indoor POI recommendation tasks: locations that have never been visited are called cold-start POIs, and users who have never visited any location are called cold-start users. 3) User dynamic preference problem, that is, user preferences will change over time and with changes in the environment. In addition, due to the heterogeneity of space and time, POI recommendation algorithms must adapt to different scenarios and users with different cultural, educational, and socioeconomic backgrounds. . Therefore, it is necessary to consider various influencing factors including spatiotemporal constraints, spatiotemporal neighbors, etc., to improve the recommendation performance of this task.
发明内容SUMMARY OF THE INVENTION
为了解决上述现有技术中存在的不足,本发明提出了一种基于深度强化学习的兴趣点推荐方法。In order to solve the above-mentioned deficiencies in the prior art, the present invention proposes a method for recommending points of interest based on deep reinforcement learning.
为了实现上述目的,本发明技术方案提供一种基于深度强化学习的兴趣点推荐方法,融合用户连续签到行为序列上下文特征属性实现兴趣点推荐,实现过程包括以下步骤,In order to achieve the above purpose, the technical solution of the present invention provides a method for recommending points of interest based on deep reinforcement learning, which integrates the contextual feature attributes of user's continuous check-in behavior sequence to realize the recommendation of points of interest. The implementation process includes the following steps:
S1,获取用户历史签到数据,每条签到记录包含用户ID、用户评分和评论、兴趣点ID、签到时间、兴趣点种类和兴趣点地理位置;对数据集进行预处理,得到用户集合和兴趣点POI集合;S1, obtain user historical check-in data, each check-in record includes user ID, user ratings and comments, POI ID, check-in time, POI type, and POI geographic location; preprocess the data set to obtain user sets and POIs POI collection;
S2,将S1预处理后的每个用户的历史签到记录按照访问时间的先后顺序分别排序,得到用户连续签到行为序列数据;S2, sorting the historical check-in records of each user preprocessed in S1 according to the order of access time, to obtain the user's continuous check-in behavior sequence data;
S3,根据处理后的用户历史签到数据构建3个二部图,分别是POI-POI图GVV、POI-功能区图GVZ和POI-时间段图GVT;S3, construct three bipartite graphs according to the processed user history check-in data, namely POI-POI graph G VV , POI-functional area graph G VZ and POI-time period graph G VT ;
S4,将S2得到的用户连续签到行为序列通过嵌入层转换为用户特征向量;将GVV、GVZ和GVT通过联合图嵌入学习嵌入到同一潜在空间中,得到POI、功能区和时间段在共享低维空间中的特征向量;串联用户特征向量及POI、功能区、时间段特征向量;S4, convert the continuous check-in behavior sequence of users obtained in S2 into user feature vectors through the embedding layer; embed G VV , G VZ and G VT into the same latent space through joint graph embedding learning, and obtain POI, functional area and time period in Share feature vectors in low-dimensional space; concatenate user feature vectors and POI, functional area, and time period feature vectors;
S5,将串联后的特征向量输入基于注意力机制的门控循环单元,生成用户近期的兴趣偏好特征向量;S5, input the concatenated feature vector into the gated loop unit based on the attention mechanism to generate the user's recent interest preference feature vector;
S6,将用户兴趣特征向量输入至基于深度强化学习Actor-Critic框架的推荐模型中,得到Top-k有序兴趣点推荐列表。S6, input the user interest feature vector into the recommendation model based on the deep reinforcement learning Actor-Critic framework, and obtain a Top-k ordered interest point recommendation list.
而且,步骤S1中进行数据清洗,包括删除其中签到次数少于a次的用户和被签到次数少于b次的兴趣点,得到新的数据集,参数a和b预先设置。Moreover, data cleaning is performed in step S1, including deleting users whose check-in times are less than a times and points of interest whose check-in times are less than b times, to obtain a new data set, and parameters a and b are preset.
而且,步骤S3的实现过程如下,Moreover, the implementation process of step S3 is as follows,
S31、构建POI-POI图GVV=(V∪V,εvv),其中V为POI的集合,εvv是POI间边的集合;S31, construct a POI-POI graph G VV =(V∪V, εvv ), where V is the set of POIs, and εvv is the set of edges between POIs;
S32、构建POI-功能区图GVZ=(V∪Z,εvz),其中V为POI的集合,Z为功能区的集合,εvz为POI与功能区之间边的集合;POI-功能区图用于处理POI与地区之间的地理及语义关系,按照各地区所具有的、代表该地区的核心功能对城市进行划分,得到功能区集合;根据POI v的地理位置找到与之对应的功能区z,将v和z间连上边εvz,并设置该边权重为1;S32. Build a POI-functional area graph G VZ =(V∪Z, εvz ), where V is the set of POIs, Z is the set of functional areas, and εvz is the set of edges between POIs and functional areas; POI-function The district map is used to deal with the geographical and semantic relationship between POIs and regions. The city is divided according to the core functions of each region and represents the region to obtain a set of functional areas; according to the geographical location of POI v, find the corresponding Functional area z, connect v and z to the edge ε vz , and set the edge weight to 1;
S33、构建POI-时间段图GVT=(V∪T,εvt),其中V为POI的集合,T为时间段的集合,εvt为POI与时间段之间边的集合;根据用户历史签到数据,若一个POI v在一个时间段t内被访问,则将v和t间连上边,并设置该边权重为访问频率。S33, construct a POI-time period graph G VT =(V∪T,ε vt ), where V is the set of POIs, T is the set of time periods, and ε vt is the set of edges between POIs and time periods; according to the user history Check-in data, if a POI v is accessed within a time period t, connect v and t to the edge, and set the edge weight as the access frequency.
而且,步骤S4的联合图嵌入学习实现如下,Moreover, the joint graph embedding learning in step S4 is implemented as follows,
给定一个二部图GVV=(VA∪VB),VA和VB是两个互不相交的顶点集,使用负采样的方式计算图中每个顶点在潜空间的嵌入向量O,Given a bipartite graph G VV = (V A ∪V B ), V A and V B are two sets of vertices that do not intersect each other, use negative sampling to calculate the embedding vector O of each vertex in the latent space in the graph ,
其中,ε为边的合集,wij为边eij的权重,log p(vj|vi)是与vi关联的vj出现的概率,n为负采样从VB得到的顶点标记,Pn(v)为负采样的概率;vi和vj是边eij的两个端点,vi属于VA,vj属于VB,vn是通过负采样从VB得到的顶点,和分别是其对应顶点的嵌入向量;σ()是Sigmoid函数,是期望函数,K是每次采样时选取负采样的边的数目,且dv是顶点v的出度;通过联合训练的方式得到POI、地区和时间段在共享低维空间的表述向量和 where ε is the set of edges, w ij is the weight of edge e ij , log p(v j |v i ) is the probability of occurrence of v j associated with v i , n is the vertex label obtained from V B by negative sampling, P n (v) is the probability of negative sampling; vi and v j are the two endpoints of edge e ij , v i belongs to V A , v j belongs to V B , v n is the vertex obtained from V B by negative sampling, and are the embedding vectors of their corresponding vertices respectively; σ() is the Sigmoid function, is the expectation function, K is the number of negatively sampled edges per sampling, and d v is the out-degree of vertex v; the expression vector of POI, region and time period in the shared low-dimensional space is obtained through joint training and
而且,步骤S5包括以下子步骤,Moreover, step S5 includes the following sub-steps,
S51、将连续签到序列特征及<评论特征、时空特征、POI特征>作为用户的整体历史行为特征信息输入门控循环单元模型进行融合;S51, input the continuous check-in sequence feature and <comment feature, spatiotemporal feature, POI feature> as the user's overall historical behavior feature information into the gated recurrent unit model for fusion;
S52、采用注意力机制对融合信息特征进行选取,得到用户近期的兴趣偏好特征向量。S52 , using the attention mechanism to select the fusion information features, and obtain the user's recent interest preference feature vector.
而且,所述S51中一个用户u连续签到行为序列定义为 其中v表示签到兴趣点,lv表示兴趣点的经纬度坐标,t表示签到时间,Mv是一组描述兴趣点v的词组,在t时刻,GRU的状态更新由以下公式计算得到,Moreover, the continuous check-in behavior sequence of a user u in the S51 is defined as where v represents the check-in point of interest, lv represents the latitude and longitude coordinates of the point of interest, t represents the check-in time, and M v is a group of phrases describing the point of interest v. At time t, the state update of the GRU is calculated by the following formula:
其中,⊙表示点乘,{U1,U2,U3,W1,W2,W3}∈Rd×d和{b1,b2,b3}∈Rd是门控循环单元需要训练的参数矩阵,ht-1表示前一时刻t-1的隐层状态,rt和zt分别是t时刻的重置门和更新门,为候选状态,ht表示隐藏层输出向量,表示在t时刻用户u签到的输入向量,R为特征向量空间,d为特征向量维度。Among them, ⊙ represents dot product, {U 1 , U 2 , U 3 , W 1 , W 2 , W 3 }∈R d×d and {b 1 ,b 2 ,b 3 }∈R d are gated recurrent units The parameter matrix to be trained, h t-1 represents the state of the hidden layer at the previous time t-1, r t and z t are the reset gate and update gate at time t, respectively, is the candidate state, h t represents the hidden layer output vector, Represents the input vector of user u check-in at time t, R is the feature vector space, and d is the feature vector dimension.
而且,步骤S6包括以下子步骤,Moreover, step S6 includes the following sub-steps,
S61、行动者Actor框架输出当前状态State及状态动作Action:一个指定数目的候选兴趣点列表;S61, the actor Actor framework outputs the current state State and the state action Action: a list of candidate interest points of a specified number;
S62、评论家Critic框架利用深度Q值网络DQN计算动作状态价值函数估计策略的价值期望,根据期望实时选择或集成其中的优势策略进行输出或更新,提升训练速度的同时在训练中生成有效的局部策略。S62. The critic Critic framework uses the deep Q-value network DQN to calculate the value expectation of the action state value function estimation strategy, and selects or integrates the dominant strategy in real time according to the expectation for output or update, so as to improve the training speed and generate effective localization during training. Strategy.
S63、向用户推荐Top-k兴趣点集合;计算推荐精确率Precision@M和召回率Recall@M。S63. Recommend the Top-k interest point set to the user; calculate the recommendation precision rate Precision@M and the recall rate Recall@M.
本发明提出以下改进:The present invention proposes the following improvements:
1.基于图嵌入模型可以很好的融合时空、语义等多种影响因素,并提升POI推荐系统的性能;1. Based on the graph embedding model, various influencing factors such as space-time and semantics can be well integrated, and the performance of the POI recommendation system can be improved;
2.基于注意力机制的门控循环单元可以对用户的复杂动态偏好建模,并学习兴趣点之间的多种相关性;2. The gated recurrent unit based on the attention mechanism can model the complex dynamic preferences of users and learn various correlations between points of interest;
3.强化学习模型能通过与用户的自然交互,了解用户真实的需求和偏好从而进行推荐,同时一定程度的解决冷启动问题。3. The reinforcement learning model can make recommendations by understanding the real needs and preferences of users through natural interaction with users, and at the same time solve the cold start problem to a certain extent.
本发明有效融合了用户签到序列信息、兴趣点的时空信息和类别信息,解决了数据稀疏性和用户动态偏好的局限性问题,有效提高了推荐模型的准确率。The invention effectively integrates user check-in sequence information, spatiotemporal information and category information of interest points, solves the limitations of data sparsity and user dynamic preference, and effectively improves the accuracy of the recommendation model.
本发明方案实施简单方便,实用性强,解决了相关技术存在的实用性低及实际应用不便的问题,能够提高用户体验,具有重要的市场价值。The solution of the invention is simple and convenient to implement and has strong practicability, solves the problems of low practicability and inconvenient practical application of the related technologies, can improve user experience, and has important market value.
附图说明Description of drawings
图1为本发明实施例的基于深度强化学习的兴趣点推荐方法的结构示意图。FIG. 1 is a schematic structural diagram of a method for recommending points of interest based on deep reinforcement learning according to an embodiment of the present invention.
图2为本发明实施例的基于深度强化学习的兴趣点推荐方法的流程示意图。FIG. 2 is a schematic flowchart of a method for recommending points of interest based on deep reinforcement learning according to an embodiment of the present invention.
图3为本发明实施例的二部图示例,其中(a)为POI-POI二部图、(b)为POI-功能区二部图、(c)为POI-时间段二部图。3 is an example of a bipartite graph of an embodiment of the present invention, wherein (a) is a POI-POI bipartite graph, (b) is a POI-functional area bipartite graph, and (c) is a POI-time segment bipartite graph.
图4为本发明实施例的基于注意力机制的门控循环单元模型结构图。FIG. 4 is a structural diagram of a gated recurrent unit model based on an attention mechanism according to an embodiment of the present invention.
具体实施方式Detailed ways
以下结合附图和实施例具体说明本发明的技术方案。The technical solutions of the present invention will be specifically described below with reference to the accompanying drawings and embodiments.
本发明实施例提供一种融合用户连续签到行为序列上下文特征的兴趣点推荐方法,如图2所示,包括以下步骤:An embodiment of the present invention provides a method for recommending points of interest that integrates contextual features of user continuous check-in behavior sequences, as shown in FIG. 2 , including the following steps:
S1:获取用户历史签到数据,每条签到记录包含用户ID、用户评分和评论、兴趣点ID、签到时间、兴趣点种类和兴趣点地理位置;对数据集进行预处理,得到用户集合和兴趣点(Point of Interest,POI)集合。S1: Obtain user historical check-in data, each check-in record includes user ID, user ratings and comments, POI ID, check-in time, POI type, and POI geographic location; preprocess the data set to obtain user sets and POIs (Point of Interest, POI) collection.
实施例中所述S1的具体步骤实现进一步包括如下处理:The specific step implementation of S1 described in the embodiment further includes the following processing:
数据清洗;删除其中签到次数少于a次的用户和被签到次数少于b次的兴趣点,得到新的数据集。具体实施时,参数a和b可以根据需要预先设置。Data cleaning; delete users whose check-in times are less than a times and points of interest whose check-in times are less than b times to obtain a new data set. During specific implementation, the parameters a and b can be preset as required.
S2:将S1预处理后的每个用户的历史签到记录按照访问时间的先后顺序分别排序,得到用户连续签到行为序列数据;S2: Sort the historical check-in records of each user preprocessed by S1 according to the order of access time, and obtain the user's continuous check-in behavior sequence data;
S3:根据处理后的用户历史签到数据构建3个二部图,如图3所示,分别是:兴趣点-兴趣点图GVV、兴趣点-功能区图GVZ、兴趣点-时间段图GVT,根据习惯,也可称为POI-POI图GVV、POI-功能区图GVZ、POI-时间段图GVT。其中,POI表示兴趣点。例如图3(a)中,兴趣点_1、兴趣点_2、…兴趣点_6之间形成的二部图,图3(b)中,兴趣点_1、兴趣点_2、…与功能区_1、功能区_2、…之间形成的二部图,图3(c)中,兴趣点_1、兴趣点_2、…与时间段_1、时间段_2、…之间形成的二部图。S3: Build three bipartite graphs according to the processed user historical check-in data, as shown in Figure 3, they are: POI-POI map G VV , POI-Functional Zone map G VZ , POI-time segment map G VT , according to custom, may also be called POI-POI map G VV , POI-functional zone map G VZ , POI-time segment map G VT . Among them, POI represents the point of interest. For example, in Figure 3(a), a bipartite graph is formed between POI_1, POI_2, ... POINT_6, and in Figure 3(b), POINT_1, POINT_2, ... and A bipartite graph formed between functional area_1, functional area_2, ..., in Figure 3(c), POI_1, POI_2, ... and time period_1, time period_2, ... A bipartite graph formed between.
构建POI二部图的具体过程包括:The specific process of building a POI bipartite graph includes:
S31、构建POI-POI图GVV=(V∪V,εvv),其中V为POI的集合,εvv是POI间边的集合。S31. Construct a POI-POI graph G VV =(V∪V,ε vv ), where V is a set of POIs, and ε vv is a set of edges between POIs.
S311、统计全部POI的评论信息,建立语料库Creview,;将每个用户的评论以及一个POI的所有评论各看成一篇文档,根据潜在狄利克雷分配(LDA)主题模型计算每篇文档主题特征分布向量,即每个用户的主题特征向量和每个POI的主题特征向量 S311. Count the review information of all POIs, and establish a corpus C review ; regard each user's review and all the reviews of a POI as one document, and calculate the subject feature of each document according to the Latent Dirichlet Allocation (LDA) topic model Distribution vector, i.e. topic feature vector for each user and topic feature vector for each POI
S312、利用余弦公式计算两个POI的主题特征向量的空间距离,余弦距离表示POI之间的相似程度,若POI-POI图一条边的两个端点(即不同兴趣点)vi和vj的主题特征向量的余弦相似度sij大于相应的阈值α,则将vi和vj间连上边,并设置该边权重为相似度sij。S312, use the cosine formula to calculate the spatial distance of the subject feature vectors of the two POIs , and the cosine distance represents the degree of similarity between the POIs . If the cosine similarity s ij of the topic feature vector is greater than the corresponding threshold α, connect the edge between vi and v j , and set the edge weight as the similarity s ij .
S32、构建POI-功能区图GVZ=(V∪Z,εvz),其中V为POI的集合,Z为功能区的集合,εvz为POI与功能区之间边的集合。POI-功能区图用于处理POI与地区之间的地理及语义关系,具体实施时可预先按照各地区所具有的、代表该地区的核心功能对城市进行划分,得到功能区集合。例如,根据某POI v的地理位置(经纬度坐标)找到与之对应的功能区z,将v和z间连上边,并设置该边权重为1。S32, construct a POI-functional area graph G VZ =(V∪Z,ε vz ), where V is the set of POIs, Z is the set of functional areas, and ε vz is the set of edges between POIs and functional areas. The POI-functional area map is used to deal with the geographic and semantic relationship between POIs and regions. During the specific implementation, cities can be divided in advance according to the core functions that each region has and represent the region to obtain a set of functional areas. For example, according to the geographic location (latitude and longitude coordinates) of a POI v, find the corresponding functional area z, connect v and z to the upper edge, and set the edge weight to 1.
S33、构建POI-时间段图GVT=(V∪T,εvt),其中V为POI的集合,T为时间段的集合,εvt为POI与时间段之间边的集合。根据用户历史签到数据,若一个POI v在一个时间段t内被访问,则将v和t间连上边,并设置该边权重为访问频率(v在时间段t内的被访问次数与v被访问总次数的比值)。S33. Construct a POI-time segment graph G VT =(V∪T,ε vt ), where V is a set of POIs, T is a set of time segments, and ε vt is a set of edges between POIs and time segments. According to the user's historical check-in data, if a POI v is accessed within a time period t, connect v and t to the edge, and set the weight of the edge as the access frequency (the number of visits of v in the time period t is the same as the number of visits of v in the time period t). ratio of the total number of visits).
S4:将S2得到的用户连续签到行为序列通过嵌入层转换为用户特征向量;将S3所得GVV、GVZ和GVT通过联合图嵌入学习方法嵌入到同一潜在空间中,得到POI、功能区和时间段在共享低维空间中的特征向量;串联用户特征向量及POI、功能区、时间段特征向量;S4: Convert the user's continuous check-in behavior sequence obtained by S2 into a user feature vector through the embedding layer; Embed the G VV , G VZ and G VT obtained by S3 into the same latent space through the joint graph embedding learning method to obtain POI, functional area and The feature vector of the time segment in the shared low-dimensional space; the user feature vector and the POI, functional area, and time segment feature vectors are concatenated;
进一步地,所述S4中联合图嵌入学习方法实现如下:Further, the joint graph embedding learning method in the S4 is implemented as follows:
给定一个二部图GVV=(VA∪VB),VA和VB是两个互不相交的顶点集。使用负采样的方式计算图中每个顶点在潜空间的嵌入向量O:Given a bipartite graph G VV =(V A ∪V B ), V A and V B are two disjoint vertex sets. Calculate the embedding vector O of each vertex in the latent space of the graph using negative sampling:
其中,ε为边的合集,wij为边eij的权重,logp(vj|vi)是与vi关联的vj出现的概率,n为负采样从VB得到的顶点标记,Pn(v)为负采样的概率。where ε is the set of edges, w ij is the weight of edge e ij , logp(v j |v i ) is the probability of occurrence of v j associated with v i , n is the vertex label obtained from V B by negative sampling, P n (v) is the probability of negative sampling.
目标函数如公式(1)所示,其训练的目标是为了让二部图中一个端点被选择时,另一边与之关联端点出现的概率即条件概率达到最大。vi和vj是边eij的两个端点,其中vi属于VA,vj属于VB,vn是通过负采样从VB得到的顶点,和分别是其对应顶点的嵌入向量。σ()是Sigmoid函数,是期望函数,K是每次采样时选取负采样的边的数目,实施例K优选取5,且dv是顶点v的出度。通过联合训练的方式得到POI、地区和时间段在共享低维空间的表述向量:和 The objective function is shown in formula (1), and the goal of its training is to maximize the probability of occurrence of the associated endpoint on the other side when one endpoint is selected in the bipartite graph, that is, the conditional probability. vi and vj are the two endpoints of edge eij , where vi belongs to VA, vj belongs to VB , vn is the vertex obtained from VB by negative sampling, and are the embedding vectors of their corresponding vertices, respectively. σ() is the Sigmoid function, is the expected function, K is the number of negatively sampled edges selected for each sampling, and K is preferably 5 in the embodiment, and d v is the out-degree of vertex v. The expression vectors of POIs, regions and time periods in the shared low-dimensional space are obtained by joint training: and
S5:将串联后的特征向量输入基于注意力机制的门控循环单元,生成用户近期的兴趣偏好特征向量。S5: Input the concatenated feature vector into the gated recurrent unit based on the attention mechanism to generate the user's recent interest preference feature vector.
生成用户近期的兴趣偏好特征向量的具体步骤如图4所示为:The specific steps of generating the user's recent interest preference feature vector are shown in Figure 4 as follows:
S51、将用户连续签到序列特征及<评论特征、时空特征、POI特征>作为用户的整体历史行为特征信息输入门控循环单元进行融合。一个用户u连续签到行为序列可定义为 其中v表示签到兴趣点,lv表示兴趣点的经纬度坐标,t表示签到时间,Mv是一组描述兴趣点v的词组,例如:评论、评分及POI种类,下标1,2,…n分别用于标识用户连续打卡的n个兴趣点。在t时刻,门控循环单元的状态更新由以下公式计算得到:S51. Input the user's continuous check-in sequence feature and <comment feature, spatiotemporal feature, POI feature> as the user's overall historical behavior feature information into the gated loop unit for fusion. A continuous check-in behavior sequence of user u can be defined as Where v represents the check-in point of interest, l v represents the latitude and longitude coordinates of the point of interest, t represents the check-in time, M v is a group of phrases describing the point of interest v, such as comments, ratings and POI types, subscripts 1, 2,...n They are respectively used to identify the n points of interest that the user continuously punches in. At time t, the state update of the gated recurrent unit is calculated by the following formula:
其中,⊙表示点乘,{U1,U2,U3,W1,W2,W3}∈Rd×d和{b1,b2,b3}∈Rd是门控循环单元需要训练的参数矩阵,ht-1表示前一时刻t-1的隐层状态,rt和zt分别是t时刻的重置门和更新门,为候选状态,ht表示隐藏层输出向量,表示在t时刻用户u签到的输入向量,R为特征向量空间,d为特征向量维度。Among them, ⊙ represents dot product, {U 1 , U 2 , U 3 , W 1 , W 2 , W 3 }∈R d×d and {b 1 ,b 2 ,b 3 }∈R d are gated recurrent units The parameter matrix to be trained, h t-1 represents the state of the hidden layer at the previous time t-1, r t and z t are the reset gate and update gate at time t, respectively, is the candidate state, h t represents the hidden layer output vector, Represents the input vector of user u check-in at time t, R is the feature vector space, and d is the feature vector dimension.
S52、采用注意力机制对融合信息特征进行选取,得到用户近期的兴趣偏好特征向量,计算公式如下:S52 , using the attention mechanism to select the fused information features to obtain the user's recent interest preference feature vector, and the calculation formula is as follows:
其中,e(ht)表示当前注意力机制层的权重,Wa表示注意力机制层的参数,a表示注意力机制层的权重占比,h为门控循环单元,表示时间t隐藏层输出单元。输入层、嵌入层、门控单元网络和注意力机制层组成编码器。如图4中,输入层的POI、地区和时间段特征向量的第i维度vi,ti,zi,经嵌入层、门控单元网络中各时刻隐藏层单元的输出向量h1,…,hT和注意力机制层中归一化后的各时刻注意力机制权重系数a1,…,aT,最终输出状态s,其中T是一个签到序列的总时长。Among them, e(h t ) represents the weight of the current attention mechanism layer, W a represents the parameters of the attention mechanism layer, a represents the weight ratio of the attention mechanism layer, h is the gated recurrent unit, represents the time t hidden layer output unit. The input layer, embedding layer, gating unit network and attention mechanism layer make up the encoder. As shown in Figure 4, the i-th dimension v i , t i , z i of the POI, region and time period feature vector of the input layer, the output vector h 1 , . . . ,h T and the normalized attention mechanism weight coefficients a 1 ,...,a T at each moment in the attention mechanism layer, and the final output state s, where T is the total duration of a check-in sequence.
S6:将用户兴趣特征向量输入至基于深度强化学习行动者-评论家(Actor-Critic)框架的推荐模型中,得到Top-k有序兴趣点推荐列表。S6: Input the user interest feature vector into the recommendation model based on the deep reinforcement learning actor-critic (Actor-Critic) framework, and obtain a Top-k ordered interest point recommendation list.
数据源的获取可以直接从现有的基于社交网络的研究型推荐系统的网站中下载或者利用成熟的社交平台的公共API获取。The data source can be directly downloaded from the website of the existing social network-based research recommendation system or obtained by using the public API of the mature social platform.
从原始数据中提取用户集合和兴趣点集合的具体步骤为:The specific steps for extracting user sets and interest point sets from the original data are as follows:
数据清洗;删除其中签到次数少于a次的用户和被签到次数少b次的兴趣点,得到新的数据集,具体实施过程中结合实际情况a,b可取5~10。Data cleaning; delete users whose check-in times are less than a times and points of interest whose check-in times are less than b times, and obtain a new data set. In the specific implementation process, a and b can be taken as 5-10 according to the actual situation.
基于强化学习框架的兴趣点推荐具体步骤包括:The specific steps of POI recommendation based on reinforcement learning framework include:
S61、行动者(Actor)框架通过解码器对当前状态(State),即用户动态兴趣偏好特征进行解码并输出状态动作(Action):一个指定数目的候选兴趣点列表,如图1所示,通过状态s解码输出动作a;S61. The actor (Actor) framework decodes the current state (State), that is, the user's dynamic interest preference feature through the decoder, and outputs the state action (Action): a specified number of candidate interest point lists, as shown in Figure 1, through State s decodes output action a;
S62、评论家(Critic)框架利用深度Q值网络(Deep Q-Network,DQN)计算动作状态价值函数估计策略的价值期望,根据期望实时选择或集成其中的优势策略进行输出或更新,提升训练速度的同时在训练中生成有效的局部策略。实施例中,将状态s和动作a经过全连接层后输入深度Q值网络,输出Q(s,a)。Q函数Q(s,a)是指在一个给定状态s下,采取某一个动作a之后,后续的各个状态所能得到的回报的期望值。根据Q函数的计算结果,模型分析下一步采取的动作。S62. The Critic framework uses a deep Q-value network (Deep Q-Network, DQN) to calculate the value expectation of the action state value function estimation strategy, and selects or integrates the dominant strategy in real time according to the expectation to output or update to improve the training speed. while generating effective local policies during training. In the embodiment, the state s and the action a are input into the deep Q-value network after passing through the fully connected layer, and Q(s, a) is output. The Q function Q(s, a) refers to the expected value of the rewards that can be obtained by subsequent states after taking an action a in a given state s. Based on the calculation results of the Q function, the model analyzes the next action to take.
在智能体(Agent)采取动作(Action),即向用户推荐了一个POI列表后,用户可浏览这些POI并选择访问或跳过(不访问)以提供他的反馈,本文认为用户在POI的停留时间是一个隐式反馈,智能体根据用户的反馈立即获得奖励(Reward)。After the agent takes an action, that is, recommends a list of POIs to the user, the user can browse these POIs and choose to visit or skip (not visit) to provide his feedback. This paper considers the user's stay in the POIs Time is an implicit feedback, and the agent is rewarded immediately based on the user's feedback.
S63、向用户推荐Top-k兴趣点集合;计算推荐精确率Precision@M和召回率Recall@M,计算公式如下:S63. Recommend the Top-k interest point set to the user; calculate the recommendation precision rate Precision@M and the recall rate Recall@M, and the calculation formula is as follows:
其中,|Dtest|表示测试集,|Top_M|表示用户生成的大小为M的推荐,|Dtest∩Top_M|表示推荐的M个兴趣点落在测试集中的个数,即推荐准确的个数。Among them, |D test | represents the test set, |Top_M| represents the recommendation of size M generated by the user, |D test ∩Top_M| represents the number of the recommended M interest points in the test set, that is, the exact number of recommendations .
具体实施时,本发明技术方案提出的方法可由本领域技术人员采用计算机软件技术实现自动运行流程,实现方法的系统装置例如存储本发明技术方案相应计算机程序的计算机可读存储介质以及包括运行相应计算机程序的计算机设备,也应当在本发明的保护范围内。During specific implementation, the method proposed by the technical solution of the present invention can be realized by those skilled in the art by using computer software technology to realize the automatic running process. The system device for implementing the method is, for example, a computer-readable storage medium storing a computer program corresponding to the technical solution of the present invention, and a computer that runs the corresponding computer program. The computer equipment of the program should also be within the protection scope of the present invention.
在一些可能的实施例中,提供一种基于深度强化学习的兴趣点推荐系统,包括处理器和存储器,存储器用于存储程序指令,处理器用于调用存储器中的存储指令执行如上所述的一种基于深度强化学习的兴趣点推荐方法。In some possible embodiments, a system for recommending points of interest based on deep reinforcement learning is provided, including a processor and a memory, the memory is used to store program instructions, and the processor is used to call the stored instructions in the memory to execute the above-mentioned one A method for recommending points of interest based on deep reinforcement learning.
在一些可能的实施例中,提供一种基于深度强化学习的兴趣点推荐系统,包括可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序执行时,实现如上所述的一种基于深度强化学习的兴趣点推荐方法。In some possible embodiments, a system for recommending points of interest based on deep reinforcement learning is provided, including a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed, the above-mentioned implementation is realized. A deep reinforcement learning-based point of interest recommendation method.
本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代,但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definitions of the appended claims range.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210175716.XA CN114662015A (en) | 2022-02-25 | 2022-02-25 | A method and system for point of interest recommendation based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210175716.XA CN114662015A (en) | 2022-02-25 | 2022-02-25 | A method and system for point of interest recommendation based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114662015A true CN114662015A (en) | 2022-06-24 |
Family
ID=82027854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210175716.XA Pending CN114662015A (en) | 2022-02-25 | 2022-02-25 | A method and system for point of interest recommendation based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114662015A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115408621A (en) * | 2022-08-12 | 2022-11-29 | 中国测绘科学研究院 | Point-of-interest recommendation method considering linear and nonlinear interaction of auxiliary information features |
CN116091174A (en) * | 2023-04-07 | 2023-05-09 | 湖南工商大学 | Recommendation policy optimization system, method and device and related equipment |
CN116244513A (en) * | 2023-02-14 | 2023-06-09 | 烟台大学 | Random group POI recommendation method, system, equipment and storage medium |
CN116955833A (en) * | 2023-09-20 | 2023-10-27 | 四川集鲜数智供应链科技有限公司 | User behavior analysis system and method |
-
2022
- 2022-02-25 CN CN202210175716.XA patent/CN114662015A/en active Pending
Non-Patent Citations (2)
Title |
---|
HUANG, JING, ET AL.: "Personalized POI recommendation using deep reinforcement learning", 《LBS 2021: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON LOCATION BASED SERVICES》, 30 November 2021 (2021-11-30) * |
MIN XIE,ET AL.: "Learning Graph-based POI Embedding for Location-based Recommendation", 《CIKM \'16: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL ON CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》, 31 December 2016 (2016-12-31) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115408621A (en) * | 2022-08-12 | 2022-11-29 | 中国测绘科学研究院 | Point-of-interest recommendation method considering linear and nonlinear interaction of auxiliary information features |
CN116244513A (en) * | 2023-02-14 | 2023-06-09 | 烟台大学 | Random group POI recommendation method, system, equipment and storage medium |
CN116244513B (en) * | 2023-02-14 | 2023-09-12 | 烟台大学 | Random group POI recommendation method, system, equipment and storage medium |
CN116091174A (en) * | 2023-04-07 | 2023-05-09 | 湖南工商大学 | Recommendation policy optimization system, method and device and related equipment |
CN116955833A (en) * | 2023-09-20 | 2023-10-27 | 四川集鲜数智供应链科技有限公司 | User behavior analysis system and method |
CN116955833B (en) * | 2023-09-20 | 2023-11-28 | 四川集鲜数智供应链科技有限公司 | User behavior analysis system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021203819A1 (en) | Content recommendation method and apparatus, electronic device, and storage medium | |
Guo et al. | Combining geographical and social influences with deep learning for personalized point-of-interest recommendation | |
CN112749339A (en) | Tourism knowledge graph-based tourism route recommendation method and system | |
Qi et al. | Privacy-aware point-of-interest category recommendation in internet of things | |
CN110598130B (en) | A movie recommendation method integrating heterogeneous information network and deep learning | |
CN108256093B (en) | A Collaborative Filtering Recommendation Algorithm Based on User's Multi-interest and Interest Change | |
CN114662015A (en) | A method and system for point of interest recommendation based on deep reinforcement learning | |
Jiao et al. | A novel next new point-of-interest recommendation system based on simulated user travel decision-making process | |
Christoforidis et al. | RELINE: point-of-interest recommendations using multiple network embeddings | |
Qiao et al. | SocialMix: A familiarity-based and preference-aware location suggestion approach | |
Wang et al. | POI recommendation method using LSTM-attention in LBSN considering privacy protection | |
CN113569129A (en) | Click rate prediction model processing method, content recommendation method, device and equipment | |
Liu et al. | POI Recommendation Method Using Deep Learning in Location‐Based Social Networks | |
Noorian | A BERT-based sequential POI recommender system in social media | |
CN115422441A (en) | Continuous interest point recommendation method based on social space-time information and user preference | |
Liu et al. | Recommending attractive thematic regions by semantic community detection with multi-sourced VGI data | |
CN114417166B (en) | Continuous interest point recommendation method based on behavior sequence and dynamic social influence | |
Lang et al. | POI recommendation based on a multiple bipartite graph network model | |
Yu et al. | Personalized recommendation of collective points-of-interest with preference and context awareness | |
Xu et al. | Deep convolutional recurrent model for region recommendation with spatial and temporal contexts | |
Li et al. | Spatio-temporal intention learning for recommendation of next point-of-interest | |
CN116842478A (en) | User attribute prediction method based on twitter content | |
Chen et al. | A restaurant recommendation approach with the contextual information | |
Yu et al. | Exploiting location significance and user authority for point-of-interest recommendation | |
Ravi et al. | An intelligent fuzzy-induced recommender system for cloud-based cultural communities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |