CN114662015A

CN114662015A - A method and system for point of interest recommendation based on deep reinforcement learning

Info

Publication number: CN114662015A
Application number: CN202210175716.XA
Authority: CN
Inventors: 黄靖; 张彤
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-06-24

Abstract

The present invention provides a method for recommending points of interest based on deep reinforcement learning, which integrates the contextual feature attributes of user's continuous check-in behavior sequence to realize the recommendation of points of interest. ; Sort to obtain the sequence data of user's continuous check-in behavior, construct POI-POI graph G _VV , POI-functional area graph G _VZ and POI-time period graph G _VT ; convert the user's continuous check-in behavior sequence into user feature vector through the embedding layer; G _VV , G _VZ and G _VT are embedded in the same latent space through joint graph embedding learning to obtain feature vectors, which are then connected to the gated recurrent unit based on the attention mechanism to generate the user's recent interest preference feature vector; In the recommendation model of the Reinforcement Learning Actor-Critic framework, the Top-k ordered interest point recommendation list is obtained. The invention effectively integrates user check-in sequence information, spatiotemporal information and category information of interest points, and improves the accuracy of the recommendation model.

Description

A method and system for point of interest recommendation based on deep reinforcement learning

技术领域technical field

本发明涉及用户兴趣点自动推荐的电子信息技术领域，尤其涉及一种基于深度强化学习的兴趣点推荐方法。The invention relates to the field of electronic information technology for automatic recommendation of user interest points, in particular to a method for recommending interest points based on deep reinforcement learning.

背景技术Background technique

随着信息技术和互联网的发展，人们逐渐从信息匮乏的时代走入了信息过载的时代。在这个时代，无论是信息消费者还是信息生产者都遇到了很大的挑战：信息消费者，从大量信息中找到自己感兴趣的信息是一件非常困难的事情；对于信息生产者，让自己生产的信息脱颖而出，受到广大用户的关注，也是一件非常困难的事情。而用户在日常出行中，也会遇到“信息过载”问题——选择哪家餐厅、哪个商场等。这些问题与网上购物时遇到的商品选择信息过载问题类似。在电子商务领域，为解决用户的信息过载问题，推荐系统应运而生，它通过用户的兴趣偏好等信息，将用户可能感兴趣的内容推荐给用户。而面对出行时遇到的信息过载问题，亦有越来越多的兴趣点推荐系统的研究。兴趣点推荐系统可被描述为：利用人们的历史出行记录，为人们的未来出行提供建议的个性化信息推荐系统。With the development of information technology and the Internet, people have gradually entered the era of information overload from the era of information scarcity. In this era, both information consumers and information producers have encountered great challenges: it is very difficult for information consumers to find the information they are interested in from a large amount of information; for information producers, let themselves It is also very difficult for the information produced to stand out and attract the attention of the majority of users. In daily travel, users will also encounter the problem of "information overload" - which restaurant, which shopping mall to choose, etc. These problems are similar to the product selection information overload problem encountered when shopping online. In the field of e-commerce, in order to solve the problem of users' information overload, the recommendation system emerges as the times require. It recommends the content that the user may be interested in to the user through the user's interest preferences and other information. In the face of the information overload problem encountered during travel, there are more and more researches on point-of-interest recommendation systems. A point of interest recommendation system can be described as a personalized information recommendation system that uses people's historical travel records to provide recommendations for people's future travel.

POI推荐可以帮助用户探索特定场景下的生活服务，也可以为商家吸引顾客带来可观的经济效益。不同于传统的显示反馈推荐系统(如推荐新闻、电影、商品等线上物品)，可以利用用户对物品的评分直接表达用户的兴趣偏好，隐式反馈通过用户的历史POI访问轨迹记录挖掘其潜在偏好，这增加了推荐的复杂性。POI recommendation can help users explore life services in specific scenarios, and can also bring considerable economic benefits to businesses to attract customers. Different from the traditional explicit feedback recommendation system (such as recommending online items such as news, movies, commodities, etc.), the user's rating of the item can be used to directly express the user's interest and preference, and the implicit feedback can mine its potential through the user's historical POI access track records. preference, which adds complexity to the recommendation.

POI推荐主要存在以下问题：1)相比于海量的线上点击和评分数据，POI推荐面临着更为严峻的数据稀疏性问题；2)推荐系统任务中会普遍遇到的冷启动问题，在室内POI推荐任务中主要为两类：从未被访问过的位置称为冷启动POI，从未访问过任何位置的用户被称为冷启动用户。3)用户动态偏好问题，即用户偏好会随着时间推移和所处环境的改变发生变化，另外由于时空异质性，POI推荐算法要适应不同的场景以及不同文化、教育、社会经济背景的用户。因此，有必要考虑包括时空约束、时空近邻等在内的多种影响因素，以提高该任务的推荐性能。POI recommendation mainly has the following problems: 1) Compared with massive online click and rating data, POI recommendation faces more severe data sparsity problems; 2) The cold start problem commonly encountered in recommender system tasks is There are two main types of indoor POI recommendation tasks: locations that have never been visited are called cold-start POIs, and users who have never visited any location are called cold-start users. 3) User dynamic preference problem, that is, user preferences will change over time and with changes in the environment. In addition, due to the heterogeneity of space and time, POI recommendation algorithms must adapt to different scenarios and users with different cultural, educational, and socioeconomic backgrounds. . Therefore, it is necessary to consider various influencing factors including spatiotemporal constraints, spatiotemporal neighbors, etc., to improve the recommendation performance of this task.

发明内容SUMMARY OF THE INVENTION

为了解决上述现有技术中存在的不足，本发明提出了一种基于深度强化学习的兴趣点推荐方法。In order to solve the above-mentioned deficiencies in the prior art, the present invention proposes a method for recommending points of interest based on deep reinforcement learning.

为了实现上述目的，本发明技术方案提供一种基于深度强化学习的兴趣点推荐方法，融合用户连续签到行为序列上下文特征属性实现兴趣点推荐，实现过程包括以下步骤，In order to achieve the above purpose, the technical solution of the present invention provides a method for recommending points of interest based on deep reinforcement learning, which integrates the contextual feature attributes of user's continuous check-in behavior sequence to realize the recommendation of points of interest. The implementation process includes the following steps:

S1，获取用户历史签到数据，每条签到记录包含用户ID、用户评分和评论、兴趣点ID、签到时间、兴趣点种类和兴趣点地理位置；对数据集进行预处理，得到用户集合和兴趣点POI集合；S1, obtain user historical check-in data, each check-in record includes user ID, user ratings and comments, POI ID, check-in time, POI type, and POI geographic location; preprocess the data set to obtain user sets and POIs POI collection;

S2，将S1预处理后的每个用户的历史签到记录按照访问时间的先后顺序分别排序，得到用户连续签到行为序列数据；S2, sorting the historical check-in records of each user preprocessed in S1 according to the order of access time, to obtain the user's continuous check-in behavior sequence data;

S3，根据处理后的用户历史签到数据构建3个二部图，分别是POI-POI图G_VV、POI-功能区图G_VZ和POI-时间段图G_VT；S3, construct three bipartite graphs according to the processed user history check-in data, namely POI-POI graph G _VV , POI-functional area graph G _VZ and POI-time period graph G _VT ;

S4，将S2得到的用户连续签到行为序列通过嵌入层转换为用户特征向量；将G_VV、G_VZ和G_VT通过联合图嵌入学习嵌入到同一潜在空间中，得到POI、功能区和时间段在共享低维空间中的特征向量；串联用户特征向量及POI、功能区、时间段特征向量；S4, convert the continuous check-in behavior sequence of users obtained in S2 into user feature vectors through the embedding layer; embed G _VV , G _VZ and G _VT into the same latent space through joint graph embedding learning, and obtain POI, functional area and time period in Share feature vectors in low-dimensional space; concatenate user feature vectors and POI, functional area, and time period feature vectors;

S5，将串联后的特征向量输入基于注意力机制的门控循环单元，生成用户近期的兴趣偏好特征向量；S5, input the concatenated feature vector into the gated loop unit based on the attention mechanism to generate the user's recent interest preference feature vector;

S6，将用户兴趣特征向量输入至基于深度强化学习Actor-Critic框架的推荐模型中，得到Top-k有序兴趣点推荐列表。S6, input the user interest feature vector into the recommendation model based on the deep reinforcement learning Actor-Critic framework, and obtain a Top-k ordered interest point recommendation list.

而且，步骤S1中进行数据清洗，包括删除其中签到次数少于a次的用户和被签到次数少于b次的兴趣点，得到新的数据集，参数a和b预先设置。Moreover, data cleaning is performed in step S1, including deleting users whose check-in times are less than a times and points of interest whose check-in times are less than b times, to obtain a new data set, and parameters a and b are preset.

而且，步骤S3的实现过程如下，Moreover, the implementation process of step S3 is as follows,

S31、构建POI-POI图G_VV＝(V∪V,ε_vv)，其中V为POI的集合，ε_vv是POI间边的集合；S31, construct a POI-POI graph G _VV =(V∪V, _εvv ), where V is the set of POIs, and _εvv is the set of edges between POIs;

S32、构建POI-功能区图G_VZ＝(V∪Z,ε_vz)，其中V为POI的集合，Z为功能区的集合，ε_vz为POI与功能区之间边的集合；POI-功能区图用于处理POI与地区之间的地理及语义关系，按照各地区所具有的、代表该地区的核心功能对城市进行划分，得到功能区集合；根据POI v的地理位置找到与之对应的功能区z，将v和z间连上边ε_vz，并设置该边权重为1；S32. Build a POI-functional area graph G _VZ =(V∪Z, _εvz ), where V is the set of POIs, Z is the set of functional areas, and _εvz is the set of edges between POIs and functional areas; POI-function The district map is used to deal with the geographical and semantic relationship between POIs and regions. The city is divided according to the core functions of each region and represents the region to obtain a set of functional areas; according to the geographical location of POI v, find the corresponding Functional area z, connect v and z to the edge ε _vz , and set the edge weight to 1;

S33、构建POI-时间段图G_VT＝(V∪T,ε_vt)，其中V为POI的集合，T为时间段的集合，ε_vt为POI与时间段之间边的集合；根据用户历史签到数据，若一个POI v在一个时间段t内被访问，则将v和t间连上边，并设置该边权重为访问频率。S33, construct a POI-time period graph G _VT =(V∪T,ε _vt ), where V is the set of POIs, T is the set of time periods, and ε _vt is the set of edges between POIs and time periods; according to the user history Check-in data, if a POI v is accessed within a time period t, connect v and t to the edge, and set the edge weight as the access frequency.

而且，步骤S4的联合图嵌入学习实现如下，Moreover, the joint graph embedding learning in step S4 is implemented as follows,

给定一个二部图G_VV＝(V_A∪V_B)，V_A和V_B是两个互不相交的顶点集，使用负采样的方式计算图中每个顶点在潜空间的嵌入向量O，Given a bipartite graph G _VV = (V _A ∪V _B ), V _A and V _B are two sets of vertices that do not intersect each other, use negative sampling to calculate the embedding vector O of each vertex in the latent space in the graph ,

其中，ε为边的合集，w_ij为边e_ij的权重，log p(v_j|v_i)是与v_i关联的v_j出现的概率，n为负采样从V_B得到的顶点标记，P_n(v)为负采样的概率；v_i和v_j是边e_ij的两个端点，v_i属于V_A，v_j属于V_B，v_n是通过负采样从V_B得到的顶点，

和

分别是其对应顶点的嵌入向量；σ()是Sigmoid函数，

是期望函数，K是每次采样时选取负采样的边的数目，且

d_v是顶点v的出度；通过联合训练的方式得到POI、地区和时间段在共享低维空间的表述向量

和

where ε is the set of edges, w _ij is the weight of edge e _ij , log p(v _j |v _i ) is the probability of occurrence of v _j associated with v _i , n is the vertex label obtained from V _B by negative sampling, P _n (v) is the probability of negative sampling; vi and v _j are the two endpoints of edge e _ij , v _i belongs to V _A , v _j belongs to V _B , v _n _is the vertex obtained from V _B by negative sampling,

and

are the embedding vectors of their corresponding vertices respectively; σ() is the Sigmoid function,

is the expectation function, K is the number of negatively sampled edges per sampling, and

d _v is the out-degree of vertex v; the expression vector of POI, region and time period in the shared low-dimensional space is obtained through joint training

and

而且，步骤S5包括以下子步骤，Moreover, step S5 includes the following sub-steps,

S51、将连续签到序列特征及<评论特征、时空特征、POI特征>作为用户的整体历史行为特征信息输入门控循环单元模型进行融合；S51, input the continuous check-in sequence feature and <comment feature, spatiotemporal feature, POI feature> as the user's overall historical behavior feature information into the gated recurrent unit model for fusion;

S52、采用注意力机制对融合信息特征进行选取，得到用户近期的兴趣偏好特征向量。S52 , using the attention mechanism to select the fusion information features, and obtain the user's recent interest preference feature vector.

而且，所述S51中一个用户u连续签到行为序列定义为

其中v表示签到兴趣点，l_v表示兴趣点的经纬度坐标，t表示签到时间，M_v是一组描述兴趣点v的词组，在t时刻，GRU的状态更新由以下公式计算得到，Moreover, the continuous check-in behavior sequence of a user u in the S51 is defined as

where _v represents the check-in point of interest, lv represents the latitude and longitude coordinates of the point of interest, t represents the check-in time, and M _v is a group of phrases describing the point of interest v. At time t, the state update of the GRU is calculated by the following formula:

其中，⊙表示点乘，{U₁,U₂,U₃,W₁,W₂,W₃}∈R^d×d和{b₁,b₂,b₃}∈R^d是门控循环单元需要训练的参数矩阵，h^t-1表示前一时刻t-1的隐层状态，r^t和z^t分别是t时刻的重置门和更新门，

为候选状态，h^t表示隐藏层输出向量，

表示在t时刻用户u签到的输入向量，R为特征向量空间，d为特征向量维度。Among them, ⊙ represents dot product, {U ₁ , U ₂ , U ₃ , W ₁ , W ₂ , W ₃ }∈R ^d×d and {b ₁ ,b ₂ ,b ₃ }∈R ^d are gated recurrent units The parameter matrix to be trained, h ^t-1 represents the state of the hidden layer at the previous time t-1, r ^t and z ^t are the reset gate and update gate at time t, respectively,

is the candidate state, h ^t represents the hidden layer output vector,

Represents the input vector of user u check-in at time t, R is the feature vector space, and d is the feature vector dimension.

而且，步骤S6包括以下子步骤，Moreover, step S6 includes the following sub-steps,

S61、行动者Actor框架输出当前状态State及状态动作Action：一个指定数目的候选兴趣点列表；S61, the actor Actor framework outputs the current state State and the state action Action: a list of candidate interest points of a specified number;

S62、评论家Critic框架利用深度Q值网络DQN计算动作状态价值函数估计策略的价值期望，根据期望实时选择或集成其中的优势策略进行输出或更新，提升训练速度的同时在训练中生成有效的局部策略。S62. The critic Critic framework uses the deep Q-value network DQN to calculate the value expectation of the action state value function estimation strategy, and selects or integrates the dominant strategy in real time according to the expectation for output or update, so as to improve the training speed and generate effective localization during training. Strategy.

S63、向用户推荐Top-k兴趣点集合；计算推荐精确率Precision@M和召回率Recall@M。S63. Recommend the Top-k interest point set to the user; calculate the recommendation precision rate Precision@M and the recall rate Recall@M.

本发明提出以下改进：The present invention proposes the following improvements:

1.基于图嵌入模型可以很好的融合时空、语义等多种影响因素，并提升POI推荐系统的性能；1. Based on the graph embedding model, various influencing factors such as space-time and semantics can be well integrated, and the performance of the POI recommendation system can be improved;

2.基于注意力机制的门控循环单元可以对用户的复杂动态偏好建模，并学习兴趣点之间的多种相关性；2. The gated recurrent unit based on the attention mechanism can model the complex dynamic preferences of users and learn various correlations between points of interest;

3.强化学习模型能通过与用户的自然交互，了解用户真实的需求和偏好从而进行推荐，同时一定程度的解决冷启动问题。3. The reinforcement learning model can make recommendations by understanding the real needs and preferences of users through natural interaction with users, and at the same time solve the cold start problem to a certain extent.

本发明有效融合了用户签到序列信息、兴趣点的时空信息和类别信息，解决了数据稀疏性和用户动态偏好的局限性问题，有效提高了推荐模型的准确率。The invention effectively integrates user check-in sequence information, spatiotemporal information and category information of interest points, solves the limitations of data sparsity and user dynamic preference, and effectively improves the accuracy of the recommendation model.

本发明方案实施简单方便，实用性强，解决了相关技术存在的实用性低及实际应用不便的问题，能够提高用户体验，具有重要的市场价值。The solution of the invention is simple and convenient to implement and has strong practicability, solves the problems of low practicability and inconvenient practical application of the related technologies, can improve user experience, and has important market value.

附图说明Description of drawings

图1为本发明实施例的基于深度强化学习的兴趣点推荐方法的结构示意图。FIG. 1 is a schematic structural diagram of a method for recommending points of interest based on deep reinforcement learning according to an embodiment of the present invention.

图2为本发明实施例的基于深度强化学习的兴趣点推荐方法的流程示意图。FIG. 2 is a schematic flowchart of a method for recommending points of interest based on deep reinforcement learning according to an embodiment of the present invention.

图3为本发明实施例的二部图示例，其中(a)为POI-POI二部图、(b)为POI-功能区二部图、(c)为POI-时间段二部图。3 is an example of a bipartite graph of an embodiment of the present invention, wherein (a) is a POI-POI bipartite graph, (b) is a POI-functional area bipartite graph, and (c) is a POI-time segment bipartite graph.

图4为本发明实施例的基于注意力机制的门控循环单元模型结构图。FIG. 4 is a structural diagram of a gated recurrent unit model based on an attention mechanism according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图和实施例具体说明本发明的技术方案。The technical solutions of the present invention will be specifically described below with reference to the accompanying drawings and embodiments.

本发明实施例提供一种融合用户连续签到行为序列上下文特征的兴趣点推荐方法，如图2所示，包括以下步骤：An embodiment of the present invention provides a method for recommending points of interest that integrates contextual features of user continuous check-in behavior sequences, as shown in FIG. 2 , including the following steps:

S1：获取用户历史签到数据，每条签到记录包含用户ID、用户评分和评论、兴趣点ID、签到时间、兴趣点种类和兴趣点地理位置；对数据集进行预处理，得到用户集合和兴趣点(Point of Interest，POI)集合。S1: Obtain user historical check-in data, each check-in record includes user ID, user ratings and comments, POI ID, check-in time, POI type, and POI geographic location; preprocess the data set to obtain user sets and POIs (Point of Interest, POI) collection.

实施例中所述S1的具体步骤实现进一步包括如下处理：The specific step implementation of S1 described in the embodiment further includes the following processing:

数据清洗；删除其中签到次数少于a次的用户和被签到次数少于b次的兴趣点，得到新的数据集。具体实施时，参数a和b可以根据需要预先设置。Data cleaning; delete users whose check-in times are less than a times and points of interest whose check-in times are less than b times to obtain a new data set. During specific implementation, the parameters a and b can be preset as required.

S2：将S1预处理后的每个用户的历史签到记录按照访问时间的先后顺序分别排序，得到用户连续签到行为序列数据；S2: Sort the historical check-in records of each user preprocessed by S1 according to the order of access time, and obtain the user's continuous check-in behavior sequence data;

S3：根据处理后的用户历史签到数据构建3个二部图，如图3所示，分别是：兴趣点-兴趣点图G_VV、兴趣点-功能区图G_VZ、兴趣点-时间段图G_VT，根据习惯，也可称为POI-POI图G_VV、POI-功能区图G_VZ、POI-时间段图G_VT。其中，POI表示兴趣点。例如图3(a)中，兴趣点_1、兴趣点_2、…兴趣点_6之间形成的二部图，图3(b)中，兴趣点_1、兴趣点_2、…与功能区_1、功能区_2、…之间形成的二部图，图3(c)中，兴趣点_1、兴趣点_2、…与时间段_1、时间段_2、…之间形成的二部图。S3: Build three bipartite graphs according to the processed user historical check-in data, as shown in Figure 3, they are: POI-POI map G _VV , POI-Functional Zone map G _VZ , POI-time segment map G _VT , according to custom, may also be called POI-POI map G _VV , POI-functional zone map G _VZ , POI-time segment map G _VT . Among them, POI represents the point of interest. For example, in Figure 3(a), a bipartite graph is formed between POI_1, POI_2, ... POINT_6, and in Figure 3(b), POINT_1, POINT_2, ... and A bipartite graph formed between functional area_1, functional area_2, ..., in Figure 3(c), POI_1, POI_2, ... and time period_1, time period_2, ... A bipartite graph formed between.

构建POI二部图的具体过程包括：The specific process of building a POI bipartite graph includes:

S31、构建POI-POI图G_VV＝(V∪V,ε_vv)，其中V为POI的集合，ε_vv是POI间边的集合。S31. Construct a POI-POI graph G _VV =(V∪V,ε _vv ), where V is a set of POIs, and ε _vv is a set of edges between POIs.

S311、统计全部POI的评论信息，建立语料库C_review,；将每个用户的评论以及一个POI的所有评论各看成一篇文档，根据潜在狄利克雷分配(LDA)主题模型计算每篇文档主题特征分布向量，即每个用户的主题特征向量

和每个POI的主题特征向量

S311. Count the review information of all POIs, and establish a corpus C _review ; regard each user's review and all the reviews of a POI as one document, and calculate the subject feature of each document according to the Latent Dirichlet Allocation (LDA) topic model Distribution vector, i.e. topic feature vector for each user

and topic feature vector for each POI

S312、利用余弦公式计算两个POI的主题特征向量的空间距离，余弦距离表示POI之间的相似程度，若POI-POI图一条边的两个端点(即不同兴趣点)v_i和v_j的主题特征向量的余弦相似度s_ij大于相应的阈值α，则将v_i和v_j间连上边，并设置该边权重为相似度s_ij。S312, use the cosine formula to calculate the spatial distance of the subject feature vectors of the two _POIs , and the cosine distance represents the degree of similarity between the _POIs . If the cosine similarity s _ij of the topic feature vector is greater than the corresponding threshold α, connect the edge between _vi and v _j , and set the edge weight as the similarity s _ij .

S32、构建POI-功能区图G_VZ＝(V∪Z,ε_vz)，其中V为POI的集合，Z为功能区的集合，ε_vz为POI与功能区之间边的集合。POI-功能区图用于处理POI与地区之间的地理及语义关系，具体实施时可预先按照各地区所具有的、代表该地区的核心功能对城市进行划分，得到功能区集合。例如，根据某POI v的地理位置(经纬度坐标)找到与之对应的功能区z，将v和z间连上边，并设置该边权重为1。S32, construct a POI-functional area graph G _VZ =(V∪Z,ε _vz ), where V is the set of POIs, Z is the set of functional areas, and ε _vz is the set of edges between POIs and functional areas. The POI-functional area map is used to deal with the geographic and semantic relationship between POIs and regions. During the specific implementation, cities can be divided in advance according to the core functions that each region has and represent the region to obtain a set of functional areas. For example, according to the geographic location (latitude and longitude coordinates) of a POI v, find the corresponding functional area z, connect v and z to the upper edge, and set the edge weight to 1.

S33、构建POI-时间段图G_VT＝(V∪T,ε_vt)，其中V为POI的集合，T为时间段的集合，ε_vt为POI与时间段之间边的集合。根据用户历史签到数据，若一个POI v在一个时间段t内被访问，则将v和t间连上边，并设置该边权重为访问频率(v在时间段t内的被访问次数与v被访问总次数的比值)。S33. Construct a POI-time segment graph G _VT =(V∪T,ε _vt ), where V is a set of POIs, T is a set of time segments, and ε _vt is a set of edges between POIs and time segments. According to the user's historical check-in data, if a POI v is accessed within a time period t, connect v and t to the edge, and set the weight of the edge as the access frequency (the number of visits of v in the time period t is the same as the number of visits of v in the time period t). ratio of the total number of visits).

S4：将S2得到的用户连续签到行为序列通过嵌入层转换为用户特征向量；将S3所得G_VV、G_VZ和G_VT通过联合图嵌入学习方法嵌入到同一潜在空间中，得到POI、功能区和时间段在共享低维空间中的特征向量；串联用户特征向量及POI、功能区、时间段特征向量；S4: Convert the user's continuous check-in behavior sequence obtained by S2 into a user feature vector through the embedding layer; Embed the G _VV , G _VZ and G _VT obtained by S3 into the same latent space through the joint graph embedding learning method to obtain POI, functional area and The feature vector of the time segment in the shared low-dimensional space; the user feature vector and the POI, functional area, and time segment feature vectors are concatenated;

进一步地，所述S4中联合图嵌入学习方法实现如下：Further, the joint graph embedding learning method in the S4 is implemented as follows:

给定一个二部图G_VV＝(V_A∪V_B)，V_A和V_B是两个互不相交的顶点集。使用负采样的方式计算图中每个顶点在潜空间的嵌入向量O：Given a bipartite graph G _VV =(V _A ∪V _B ), V _A and V _B are two disjoint vertex sets. Calculate the embedding vector O of each vertex in the latent space of the graph using negative sampling:

其中，ε为边的合集，w_ij为边e_ij的权重，logp(v_j|v_i)是与v_i关联的v_j出现的概率，n为负采样从V_B得到的顶点标记，P_n(v)为负采样的概率。where ε is the set of edges, w _ij is the weight of edge e _ij , logp(v _j |v _i ) is the probability of occurrence of v _j associated with v _i , n is the vertex label obtained from V _B by negative sampling, P _n (v) is the probability of negative sampling.

目标函数如公式(1)所示，其训练的目标是为了让二部图中一个端点被选择时，另一边与之关联端点出现的概率即条件概率达到最大。v_i和v_j是边e_ij的两个端点，其中v_i属于V_A，v_j属于V_B，v_n是通过负采样从V_B得到的顶点，

和

分别是其对应顶点的嵌入向量。σ()是Sigmoid函数，

是期望函数，K是每次采样时选取负采样的边的数目，实施例K优选取5，且

d_v是顶点v的出度。通过联合训练的方式得到POI、地区和时间段在共享低维空间的表述向量：

和

The objective function is shown in formula (1), and the goal of its training is to maximize the probability of occurrence of the associated endpoint on the other side when one endpoint is selected in the bipartite graph, that is, the conditional probability. vi and _vj are the two endpoints of edge _eij , where _vi _{belongs to VA, vj} _belongs to _VB , _vn _is the vertex obtained from _VB by negative sampling,

and

are the embedding vectors of their corresponding vertices, respectively. σ() is the Sigmoid function,

is the expected function, K is the number of negatively sampled edges selected for each sampling, and K is preferably 5 in the embodiment, and

d _v is the out-degree of vertex v. The expression vectors of POIs, regions and time periods in the shared low-dimensional space are obtained by joint training:

and

S5：将串联后的特征向量输入基于注意力机制的门控循环单元，生成用户近期的兴趣偏好特征向量。S5: Input the concatenated feature vector into the gated recurrent unit based on the attention mechanism to generate the user's recent interest preference feature vector.

生成用户近期的兴趣偏好特征向量的具体步骤如图4所示为：The specific steps of generating the user's recent interest preference feature vector are shown in Figure 4 as follows:

S51、将用户连续签到序列特征及<评论特征、时空特征、POI特征>作为用户的整体历史行为特征信息输入门控循环单元进行融合。一个用户u连续签到行为序列可定义为

其中v表示签到兴趣点，l_v表示兴趣点的经纬度坐标，t表示签到时间，M_v是一组描述兴趣点v的词组，例如：评论、评分及POI种类，下标1,2,…n分别用于标识用户连续打卡的n个兴趣点。在t时刻，门控循环单元的状态更新由以下公式计算得到：S51. Input the user's continuous check-in sequence feature and <comment feature, spatiotemporal feature, POI feature> as the user's overall historical behavior feature information into the gated loop unit for fusion. A continuous check-in behavior sequence of user u can be defined as

Where v represents the check-in point of interest, l _v represents the latitude and longitude coordinates of the point of interest, t represents the check-in time, M _v is a group of phrases describing the point of interest v, such as comments, ratings and POI types, subscripts 1, 2,...n They are respectively used to identify the n points of interest that the user continuously punches in. At time t, the state update of the gated recurrent unit is calculated by the following formula:

为候选状态，h^t表示隐藏层输出向量，

is the candidate state, h ^t represents the hidden layer output vector,

S52、采用注意力机制对融合信息特征进行选取，得到用户近期的兴趣偏好特征向量，计算公式如下：S52 , using the attention mechanism to select the fused information features to obtain the user's recent interest preference feature vector, and the calculation formula is as follows:

其中，e(h_t)表示当前注意力机制层的权重，W^a表示注意力机制层的参数，a表示注意力机制层的权重占比，h为门控循环单元，

表示时间t隐藏层输出单元。输入层、嵌入层、门控单元网络和注意力机制层组成编码器。如图4中，输入层的POI、地区和时间段特征向量的第i维度v_i,t_i,z_i，经嵌入层、门控单元网络中各时刻隐藏层单元的输出向量h₁,…,h_T和注意力机制层中归一化后的各时刻注意力机制权重系数a₁,…,a_T，最终输出状态s，其中T是一个签到序列的总时长。Among them, e(h _t ) represents the weight of the current attention mechanism layer, W ^a represents the parameters of the attention mechanism layer, a represents the weight ratio of the attention mechanism layer, h is the gated recurrent unit,

represents the time t hidden layer output unit. The input layer, embedding layer, gating unit network and attention mechanism layer make up the encoder. As shown in Figure 4, the i-th dimension v _i , t _i , z _i of the POI, region and time period feature vector of the input layer, the output vector h ₁ , . . . ,h _T and the normalized attention mechanism weight coefficients a ₁ ,...,a _T at each moment in the attention mechanism layer, and the final output state s, where T is the total duration of a check-in sequence.

S6：将用户兴趣特征向量输入至基于深度强化学习行动者-评论家(Actor-Critic)框架的推荐模型中，得到Top-k有序兴趣点推荐列表。S6: Input the user interest feature vector into the recommendation model based on the deep reinforcement learning actor-critic (Actor-Critic) framework, and obtain a Top-k ordered interest point recommendation list.

数据源的获取可以直接从现有的基于社交网络的研究型推荐系统的网站中下载或者利用成熟的社交平台的公共API获取。The data source can be directly downloaded from the website of the existing social network-based research recommendation system or obtained by using the public API of the mature social platform.

从原始数据中提取用户集合和兴趣点集合的具体步骤为：The specific steps for extracting user sets and interest point sets from the original data are as follows:

数据清洗；删除其中签到次数少于a次的用户和被签到次数少b次的兴趣点，得到新的数据集，具体实施过程中结合实际情况a，b可取5～10。Data cleaning; delete users whose check-in times are less than a times and points of interest whose check-in times are less than b times, and obtain a new data set. In the specific implementation process, a and b can be taken as 5-10 according to the actual situation.

基于强化学习框架的兴趣点推荐具体步骤包括：The specific steps of POI recommendation based on reinforcement learning framework include:

S61、行动者(Actor)框架通过解码器对当前状态(State)，即用户动态兴趣偏好特征进行解码并输出状态动作(Action)：一个指定数目的候选兴趣点列表，如图1所示，通过状态s解码输出动作a；S61. The actor (Actor) framework decodes the current state (State), that is, the user's dynamic interest preference feature through the decoder, and outputs the state action (Action): a specified number of candidate interest point lists, as shown in Figure 1, through State s decodes output action a;

S62、评论家(Critic)框架利用深度Q值网络(Deep Q-Network，DQN)计算动作状态价值函数估计策略的价值期望，根据期望实时选择或集成其中的优势策略进行输出或更新，提升训练速度的同时在训练中生成有效的局部策略。实施例中，将状态s和动作a经过全连接层后输入深度Q值网络，输出Q(s,a)。Q函数Q(s,a)是指在一个给定状态s下，采取某一个动作a之后，后续的各个状态所能得到的回报的期望值。根据Q函数的计算结果，模型分析下一步采取的动作。S62. The Critic framework uses a deep Q-value network (Deep Q-Network, DQN) to calculate the value expectation of the action state value function estimation strategy, and selects or integrates the dominant strategy in real time according to the expectation to output or update to improve the training speed. while generating effective local policies during training. In the embodiment, the state s and the action a are input into the deep Q-value network after passing through the fully connected layer, and Q(s, a) is output. The Q function Q(s, a) refers to the expected value of the rewards that can be obtained by subsequent states after taking an action a in a given state s. Based on the calculation results of the Q function, the model analyzes the next action to take.

在智能体(Agent)采取动作(Action)，即向用户推荐了一个POI列表后，用户可浏览这些POI并选择访问或跳过(不访问)以提供他的反馈，本文认为用户在POI的停留时间是一个隐式反馈，智能体根据用户的反馈立即获得奖励(Reward)。After the agent takes an action, that is, recommends a list of POIs to the user, the user can browse these POIs and choose to visit or skip (not visit) to provide his feedback. This paper considers the user's stay in the POIs Time is an implicit feedback, and the agent is rewarded immediately based on the user's feedback.

S63、向用户推荐Top-k兴趣点集合；计算推荐精确率Precision@M和召回率Recall@M，计算公式如下：S63. Recommend the Top-k interest point set to the user; calculate the recommendation precision rate Precision@M and the recall rate Recall@M, and the calculation formula is as follows:

具体实施时，本发明技术方案提出的方法可由本领域技术人员采用计算机软件技术实现自动运行流程，实现方法的系统装置例如存储本发明技术方案相应计算机程序的计算机可读存储介质以及包括运行相应计算机程序的计算机设备，也应当在本发明的保护范围内。During specific implementation, the method proposed by the technical solution of the present invention can be realized by those skilled in the art by using computer software technology to realize the automatic running process. The system device for implementing the method is, for example, a computer-readable storage medium storing a computer program corresponding to the technical solution of the present invention, and a computer that runs the corresponding computer program. The computer equipment of the program should also be within the protection scope of the present invention.

在一些可能的实施例中，提供一种基于深度强化学习的兴趣点推荐系统，包括处理器和存储器，存储器用于存储程序指令，处理器用于调用存储器中的存储指令执行如上所述的一种基于深度强化学习的兴趣点推荐方法。In some possible embodiments, a system for recommending points of interest based on deep reinforcement learning is provided, including a processor and a memory, the memory is used to store program instructions, and the processor is used to call the stored instructions in the memory to execute the above-mentioned one A method for recommending points of interest based on deep reinforcement learning.

在一些可能的实施例中，提供一种基于深度强化学习的兴趣点推荐系统，包括可读存储介质，所述可读存储介质上存储有计算机程序，所述计算机程序执行时，实现如上所述的一种基于深度强化学习的兴趣点推荐方法。In some possible embodiments, a system for recommending points of interest based on deep reinforcement learning is provided, including a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed, the above-mentioned implementation is realized. A deep reinforcement learning-based point of interest recommendation method.

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种各样的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the art to which the present invention pertains can make various modifications or additions to the described specific embodiments or substitute in similar manners, but will not deviate from the spirit of the present invention or go beyond the definitions of the appended claims range.

Claims

1. An interest point recommendation method based on deep reinforcement learning is characterized by comprising the following steps: the method realizes the recommendation of the interest points by fusing the context characteristic attributes of the continuous sign-in behavior sequence of the user, and the realization process comprises the following steps,

s1, obtaining historical sign-in data of the user, wherein each sign-in record comprises a user ID, a user score and comment, an interest point ID, sign-in time, interest point types and an interest point geographic position; preprocessing the data set to obtain a user set and a point of interest (POI) set;

s2, sorting the historical sign-in records of each user preprocessed in S1 according to the sequence of access time to obtain continuous sign-in behavior sequence data of the users;

s3, constructing 3 bipartite graphs according to the processed user historical check-in data, wherein the bipartite graphs are POI-POI graphs G_VVPOI-function area map G_VZAnd POI-time period map G_VT；

S4, converting the user continuous sign-in behavior sequence obtained in the S2 into a user feature vector through an embedding layer; g is to be_VV、G_VZAnd G_VTEmbedding the POI, the functional area and the time period into the same potential space through joint graph embedding learning to obtain a feature vector of the POI, the functional area and the time period in a shared low-dimensional space; connecting the user feature vector and POI, functional area and time period feature vector in series;

s5, inputting the feature vectors after series connection into a gate control circulation unit based on an attention mechanism to generate the recent interest preference feature vectors of the user;

and S6, inputting the user interest feature vector into a recommendation model based on a deep reinforcement learning Actor-criticic framework to obtain a Top-k ordered interest point recommendation list.

2. The interest point recommendation method based on deep reinforcement learning of claim 1, characterized in that: in step S1, data cleansing is performed, including deleting the user whose check-in times are less than a times and the interest point whose check-in times are less than b times, to obtain a new data set, and parameters a and b are preset.

3. The interest point recommendation method based on deep reinforcement learning of claim 1, characterized in that: the implementation of step S3 is as follows,

s31, constructing a POI-POI graph G_VV＝(V∪V,ε_vv) Where V is the set of POIs, ε_vvIs a set of edges between POIs;

s32, constructing a POI-functional area map G_VZ＝(V∪Z,ε_vz) Where V is the set of POIs, Z is the set of functional regions, ε_vzIs a set of edges between the POI and the functional area; the POI-functional area map is used for processing the geographical and semantic relations between the POI and the regions, and dividing the cities according to the core functions of the regions representing the regions to obtain a functional area set; finding out a corresponding functional area z according to the geographic position of the POI v, and connecting an edge epsilon between v and z_vzAnd is provided with the edgeThe weight is 1;

s33, constructing a POI-time period graph G_VT＝(V∪T,ε_vt) Where V is the set of POIs, T is the set of time periods, ε_vtIs a set of edges between the POI and the time period; according to the historical sign-in data of the user, if a POIv is accessed within a time period t, connecting an edge between v and t, and setting the weight of the edge as the access frequency.

4. The point of interest recommendation method based on deep reinforcement learning according to claim 1, wherein: the joint graph embedding learning of step S4 is implemented as follows,

given a bipartite graph G_VV＝(V_A∪V_B)，V_AAnd V_BIs two mutually disjoint vertex sets, uses a negative sampling mode to calculate an embedded vector O of each vertex in a latent space in the graph,

where ε is the collection of edges, w_ijIs an edge e_ijWeight of (d), logp (v)_j|v_i) Is and v_iAssociated v_jProbability of occurrence, n being a negative sample from V_BResulting vertex Mark, P_n(v) Is the probability of negative sampling; v. of_iAnd v_jIs an edge e_ijTwo end points of (a), v_iBelong to V_A，v_jBelong to V_B，v_nIs sampled from V by negative_BThe resulting vertex points are then used to generate a vertex,

and

respectively embedding vectors of corresponding vertexes thereof; σ () is a Sigmoid function,

is an expectation function, K is the number of edges that are sampled negative each time, and

d_vis the out degree of the vertex v; expression vectors of POI, region and time period in shared low-dimensional space are obtained through a joint training mode

And

5. the interest point recommendation method based on deep reinforcement learning of claim 1, characterized in that: step S5 includes the following sub-steps,

s51, inputting the continuous sign-in sequence characteristics and the < comment characteristics, space-time characteristics and POI characteristics > as the integral historical behavior characteristic information of the user into a gate control cycle unit model for fusion;

and S52, selecting the fusion information features by adopting an attention mechanism to obtain the recent interest preference feature vector of the user.

6. The interest point recommendation method based on deep reinforcement learning of claim 5, wherein: a sequence of consecutive sign-in behaviors of a user u in S51 is defined as

Where v denotes a check-in point of interest, l_vRepresenting longitude and latitude coordinates of the point of interest, t representing time of sign-inM is_vIs a group of phrases describing the interest points v, at the time t, the state update of the GRU is calculated by the following formula,

wherein, an |, indicates a dot product, { U₁,U₂,U₃,W₁,W₂,W₃}∈R^d×dAnd b₁,b₂,b₃}∈R^dIs a parameter matrix, h, of the gated cyclic unit to be trained^t-1Representing the hidden state at the previous time t-1, r^tAnd z^tRespectively a reset gate and an update gate at time t,

is a candidate state, h^tRepresents the output vector of the hidden layer(s),

and (3) representing an input vector of the user u signed in at the moment t, wherein R is a feature vector space and d is a feature vector dimension.

7. The method for recommending interest points based on deep reinforcement learning according to claim 1,2, 3, 4, 5 or 6, wherein: step S6 includes the following sub-steps,

s61, the Actor framework outputs the current State and the State Action: a specified number of candidate interest point lists;

s62, the Critic framework utilizes the depth Q value network DQN to calculate the value expectation of the action state value function estimation strategy, the dominant strategies are selected or integrated in real time according to the expectation to be output or updated, the training speed is improved, and meanwhile an effective local strategy is generated in the training process.

S63, recommending a Top-k interest point set to the user; and calculating a Precision recommendation @ M and a Recall @ M.

8. A point of interest recommendation system based on deep reinforcement learning is characterized in that: the method for implementing the point of interest recommendation based on deep reinforcement learning according to any one of claims 1 to 7.

9. The deep reinforcement learning-based interest point recommendation system according to claim 8, wherein: comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the interest point recommendation method based on deep reinforcement learning according to any one of claims 1-7.

10. The deep reinforcement learning-based interest point recommendation system according to claim 8, wherein: comprising a readable storage medium, on which a computer program is stored, which, when executed, implements a method for point of interest recommendation based on deep reinforcement learning according to any one of claims 1-7.