CN114662015A - Interest point recommendation method and system based on deep reinforcement learning - Google Patents
Interest point recommendation method and system based on deep reinforcement learning Download PDFInfo
- Publication number
- CN114662015A CN114662015A CN202210175716.XA CN202210175716A CN114662015A CN 114662015 A CN114662015 A CN 114662015A CN 202210175716 A CN202210175716 A CN 202210175716A CN 114662015 A CN114662015 A CN 114662015A
- Authority
- CN
- China
- Prior art keywords
- user
- poi
- interest
- reinforcement learning
- deep reinforcement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000002787 reinforcement Effects 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 54
- 230000006399 behavior Effects 0.000 claims abstract description 17
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 17
- 230000009471 action Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 239000013604 expression vector Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 102220473585 Putative ankyrin repeat domain-containing protein 26-like protein_S51A_mutation Human genes 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006735 deficit Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an interest point recommendation method based on deep reinforcement learning, which integrates context characteristic attributes of a user continuous check-in behavior sequence to realize interest point recommendation, and the realization process comprises the steps of acquiring historical check-in data of the user, and preprocessing the data to obtain a user set and an interest point POI set; sequencing to obtain user continuous check-in behavior sequence data, and constructing a POI-POI graph GVVPOI-function area map GVZAnd POI-time period map GVT(ii) a Converting the user continuous sign-in behavior sequence into a user characteristic vector through an embedding layer; g is to beVV、GVZAnd GVTEmbedding the joint graph embedding learning into the same potential space to obtain a characteristic vector, inputting the characteristic vector after series connection into a gate control cycle unit based on an attention mechanism, and generating a recent interest preference characteristic vector of a user; input deviceAnd obtaining a Top-k ordered interest point recommendation list in a recommendation model based on the Actor-criticic framework of deep reinforcement learning. The invention effectively integrates the user sign-in sequence information, the spatio-temporal information and the category information of the interest points, and improves the accuracy of the recommendation model.
Description
Technical Field
The invention relates to the technical field of electronic information for automatically recommending interest points of users, in particular to an interest point recommending method based on deep reinforcement learning.
Background
With the development of information technology and the internet, people gradually move from the times of lacking information to the times of information overload. In this age, both information consumers and information producers have met with significant challenges: information consumers find information in which the consumers are interested from a large amount of information, which is very difficult; for information producers, it is very difficult to make the information produced by the producers stand out and get the attention of the users. And the user also encounters the problem of 'information overload' in daily travel, namely which restaurant, which mall and the like are selected. These problems are similar to the problem of overloading the merchandise selection information encountered in online shopping. In the field of electronic commerce, in order to solve the problem of information overload of users, a recommendation system is developed, and contents which may be interested by the users are recommended to the users through information such as interest preference of the users. In the face of the problem of information overload during travel, there are also more and more researches on point of interest recommendation systems. The point of interest recommendation system may be described as: the personalized information recommendation system provides suggestions for future trips of people by utilizing historical trip records of people.
The POI recommendation can help a user to explore life services in a specific scene and can also bring considerable economic benefits for merchants to attract customers. Unlike traditional display feedback recommendation systems (e.g., recommending news, movies, goods, etc.) which directly express the interest preference of the user by using the rating of the user to the goods, implicit feedback mines the potential preference of the user through the historical POI access track record of the user, and this increases the complexity of recommendation.
POI recommendation mainly has the following problems: 1) compared with massive online clicking and scoring data, POI recommendation faces a more severe data sparsity problem; 2) the cold start problem commonly encountered in the recommendation system task mainly includes two types in the indoor POI recommendation task: locations that have never been visited are referred to as cold-start POIs and users that have never visited any location are referred to as cold-start users. 3) The POI recommendation algorithm is suitable for users with different scenes and different cultural, educational and socioeconomic backgrounds due to the temporal and spatial heterogeneity. Therefore, it is necessary to consider various influencing factors including spatio-temporal constraints, spatio-temporal neighbors, etc. to improve the recommendation performance of the task.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an interest point recommendation method based on deep reinforcement learning.
In order to achieve the above object, the technical solution of the present invention provides a point of interest recommendation method based on deep reinforcement learning, which integrates context feature attributes of a user continuous check-in behavior sequence to implement point of interest recommendation, and the implementation process includes the following steps,
s1, obtaining historical sign-in data of the user, wherein each sign-in record comprises a user ID, a user score and comment, an interest point ID, sign-in time, interest point types and interest point geographic positions; preprocessing the data set to obtain a user set and a point of interest (POI) set;
s2, sorting the historical sign-in records of each user preprocessed in S1 according to the sequence of access time to obtain continuous sign-in behavior sequence data of the users;
s3, constructing 3 bipartite graphs according to the processed user historical check-in data, wherein the bipartite graphs are POI-POI graphs GVVPOI-function area map GVZAnd POI-time period map GVT;
S4, converting the user continuous sign-in behavior sequence obtained in the S2 into a user characteristic vector through an embedding layer; g is to beVV、GVZAnd GVTEmbedding the POI, the functional area and the time period into the same potential space through joint graph embedding learning to obtain a feature vector of the POI, the functional area and the time period in a shared low-dimensional space; connecting the user feature vector and POI, functional area and time period feature vector in series;
s5, inputting the feature vectors after series connection into a gate control circulation unit based on an attention mechanism, and generating the recent interest preference feature vectors of the user;
and S6, inputting the user interest feature vector into a recommendation model based on a deep reinforcement learning Actor-criticic framework to obtain a Top-k ordered interest point recommendation list.
And, in step S1, data cleansing is performed, including deleting the user whose check-in times are less than a times and the interest point whose check-in times are less than b times, to obtain a new data set, and parameters a and b are preset.
Furthermore, the implementation of step S3 is as follows,
s31, constructing POI-POI graph GVV=(V∪V,εvv) Where V is the set of POIs, εvvIs a set of edges between POIs;
s32, constructing a POI-functional area map GVZ=(V∪Z,εvz) Where V is the set of POIs, Z is the set of functional regions, εvzIs a set of edges between the POI and the functional area; the POI-functional area map is used for processing the geographical and semantic relation between the POI and the area, and dividing the cities according to the core functions of the areas representing the area to obtain a functional area set; finding out a corresponding functional area z according to the geographic position of the POI v, and connecting an edge epsilon between v and zvzAnd setting the edge weight to 1;
s33, constructing a POI-time period graph GVT=(V∪T,εvt) Where V is the set of POIs, T is the set of time periods, εvtIs a set of edges between the POI and the time period; according to the historical sign-in data of the user, if a POI v is accessed within a time period t, connecting an edge between v and t, and setting the weight of the edge as the access frequency.
Further, the joint graph embedding learning of step S4 is implemented as follows,
given a bipartite graph GVV=(VA∪VB),VAAnd VBIs two mutually disjoint vertex sets, uses a negative sampling mode to calculate an embedded vector O of each vertex in a latent space in the graph,
where ε is the collection of edges, wijIs an edge eijWeight of (d), log p (v)j|vi) Is and viAssociated vjProbability of occurrence, n being a negative sample from VBResulting vertex Mark, Pn(v) Is the probability of negative sampling; v. ofiAnd vjIs an edge eijTwo end points of, viBelong to VA,vjBelong to VB,vnIs sampled from V by negativeBThe resulting vertex points are then used to generate a vertex,andrespectively embedding vectors of corresponding vertexes thereof; σ () is a Sigmoid function,is an expectation function, K is the number of edges that are sampled negative each time, anddvis the out degree of the vertex v; expression vectors of POI, region and time period in shared low-dimensional space are obtained through a joint training modeAnd
further, step S5 includes the following sub-steps,
s51, inputting the continuous sign-in sequence characteristics and the < comment characteristics, space-time characteristics and POI characteristics > as the integral historical behavior characteristic information of the user into a gate control cycle unit model for fusion;
and S52, selecting the fusion information features by adopting an attention mechanism to obtain the recent interest preference feature vector of the user.
Moreover, the step S51A sequence of consecutive sign-in behaviors of a user u is defined as Where v denotes a check-in point of interest, lvLatitude and longitude coordinates representing points of interest, t represents check-in time, MvIs a group of phrases describing the interest points v, at the time t, the state update of the GRU is calculated by the following formula,
wherein, an |, indicates a dot product, { U1,U2,U3,W1,W2,W3}∈Rd×dAnd b1,b2,b3}∈RdIs a parameter matrix, h, of the gated cyclic unit to be trainedt-1Representing the hidden state at the previous time t-1, rtAnd ztRespectively a reset gate and an update gate at time t,is a candidate state, htRepresents the output vector of the hidden layer(s),indicating that user u checked in at time tThe input vector, R is the feature vector space, and d is the feature vector dimension.
Further, step S6 includes the following sub-steps,
s61, the Actor framework outputs the current State and the State Action: a specified number of candidate interest point lists;
s62, the Critic framework utilizes the depth Q value network DQN to calculate the value expectation of the action state value function estimation strategy, the dominant strategies are selected or integrated in real time according to the expectation to be output or updated, the training speed is improved, and meanwhile an effective local strategy is generated in the training process.
S63, recommending a Top-k interest point set to the user; and calculating a recommendation Precision ratio Precision @ M and a Recall ratio Recall @ M.
The present invention proposes the following improvements:
1. multiple influence factors such as time and space, semantics and the like can be well fused on the basis of the graph embedding model, and the performance of the POI recommendation system is improved;
2. the attention mechanism-based gating cycle unit can model the complex dynamic preference of the user and learn various correlations among the interest points;
3. the reinforcement learning model can understand the real requirements and preferences of the user through natural interaction with the user so as to recommend, and meanwhile, the cold start problem is solved to a certain degree.
The method effectively integrates the user sign-in sequence information, the spatio-temporal information and the category information of the interest points, solves the limitation problems of data sparsity and user dynamic preference, and effectively improves the accuracy of the recommendation model.
The scheme of the invention is simple and convenient to implement, has strong practicability, solves the problems of low practicability and inconvenient practical application of the related technology, can improve the user experience, and has important market value.
Drawings
Fig. 1 is a schematic structural diagram of a point of interest recommendation method based on deep reinforcement learning according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a point of interest recommendation method based on deep reinforcement learning according to an embodiment of the present invention.
Fig. 3 is an example of a bipartite graph according to an embodiment of the present invention, in which (a) is a bipartite graph of POI-POI, (b) is a bipartite graph of POI-functional area, and (c) is a bipartite graph of POI-time period.
FIG. 4 is a block diagram of an attention-based gated loop unit model according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is specifically described below with reference to the accompanying drawings and examples.
The embodiment of the invention provides an interest point recommendation method fusing context characteristics of a user continuous check-in behavior sequence, which comprises the following steps as shown in FIG. 2:
s1: acquiring historical sign-in data of users, wherein each sign-in record comprises a user ID, a user score and comment, an interest point ID, sign-in time, interest point types and interest point geographic positions; preprocessing the data set to obtain a user set and a Point of Interest (POI) set.
The specific implementation of the step S1 in the embodiment further includes the following steps:
data cleaning; and deleting the user with the check-in times less than a and the interest point with the check-in times less than b to obtain a new data set. In specific implementation, the parameters a and b can be preset according to needs.
S2: respectively sorting the historical sign-in records of each user preprocessed by the S1 according to the sequence of the access time to obtain the continuous sign-in behavior sequence data of the user;
s3: constructing 3 bipartite graphs according to the processed user history check-in data, as shown in fig. 3, which respectively are as follows: point of interest-Point of interest graph GVVPoint of interest-functional area graph GVZPoint of interest-time period graph GVTAlso called POI-POI graph G according to customVVPOI-functional area map GVZPOI-time period graph GVT. Where POI represents a point of interest. For example, in fig. 3(a), a bipartite graph is formed between the interest points _1, 2, and … and the interest point _6, and in fig. 3(b), the interest points _1, 2, and … are located between the function area _1 and the function area _2, and …A bipartite graph formed, in fig. 3(c), between the point of interest _1, the point of interest _2, and the point of interest … and the time slot _1, the time slot _2, and the time slot ….
The specific process for constructing the bipartite graph of the POI comprises the following steps:
s31, constructing a POI-POI graph GVV=(V∪V,εvv) Where V is the set of POIs, εvvIs a collection of edges between POIs.
S311, counting comment information of all POI, and establishing corpus CreviewB, carrying out the following steps of; the comment of each user and all comments of one POI are respectively regarded as a document, and a topic feature distribution vector of each document is calculated according to a Latent Dirichlet Allocation (LDA) topic model, namely the topic feature vector of each userAnd topic feature vector of each POI
S312, calculating the space distance of the topic feature vectors of the two POIs by utilizing a cosine formula, wherein the cosine distance represents the similarity degree between the POIs, and if two end points (namely different interest points) v of one edge of the POI-POI graph are presentiAnd vjCosine similarity s of the subject feature vector of (1)ijGreater than the corresponding threshold value alpha, v will beiAnd vjConnecting the upper edges and setting the weight of the edges as the similarity sij。
S32, constructing a POI-functional area map GVZ=(V∪Z,εvz) Where V is the set of POIs, Z is the set of functional regions, εvzIs a collection of edges between the POI and the functional area. The POI-functional area map is used for processing the geographical and semantic relations between the POI and the regions, and during specific implementation, the cities can be divided in advance according to the core functions which are possessed by each region and represent the region, so that a functional area set is obtained. For example, a functional area z corresponding to a certain POI v is found according to its geographical position (latitude and longitude coordinates), an edge is connected between v and z, and the weight of the edge is set to 1.
S33, constructing a POI-time period graph GVT=(V∪T,εvt) Where V is the set of POIs, T is the set of time periods, εvtIs the set of edges between the POI and the time period. According to historical check-in data of a user, if a POI v is accessed within a time period t, connecting an edge between v and t, and setting the weight of the edge as an access frequency (the ratio of the number of times that v is accessed within the time period t to the total number of times that v is accessed).
S4: converting the user continuous sign-in behavior sequence obtained in the step S2 into a user feature vector through an embedding layer; g obtained in S3VV、GVZAnd GVTEmbedding the POI, the functional area and the time period into the same potential space by a joint graph embedding learning method to obtain a feature vector of the POI, the functional area and the time period in a shared low-dimensional space; connecting the user feature vector and POI, functional area and time period feature vector in series;
further, the joint graph embedding learning method in S4 is implemented as follows:
given a bipartite graph GVV=(VA∪VB),VAAnd VBAre two mutually disjoint sets of vertices. The embedding vector O of each vertex in the potential space in the graph is calculated by using a negative sampling mode:
where ε is the collection of edges, wijIs an edge eijWeight of (d), logp (v)j|vi) Is and viAssociated vjProbability of occurrence, n being a negative sample from VBResulting vertex marks, Pn(v) Is the probability of a negative sample.
The objective function is shown in formula (1), and the training is aimed at maximizing the probability of the other end of the bipartite graph to appear when the other end is selected, i.e., the conditional probability. v. ofiAnd vjIs an edge eijThe two end points of (a) are,wherein v isiBelong to VA,vjBelong to VB,vnIs sampled from V by negativeBThe resulting vertex points are then used to generate a vertex,andrespectively, the embedded vectors of their corresponding vertices. σ () is a Sigmoid function,is an expectation function, K is the number of edges sampled negatively per sample, example K is preferably 5, anddvis the out degree of the vertex v. Obtaining expression vectors of POI, regions and time periods in a shared low-dimensional space in a joint training mode:and
s5: and inputting the concatenated feature vectors into a gating cycle unit based on an attention mechanism to generate a recent interest preference feature vector of the user.
The specific steps of generating the recent interest preference feature vector of the user are shown in fig. 4 as follows:
s51, signature of user continuous sign-in sequence and<comment feature, spatio-temporal feature, POI feature>And inputting the information as the overall historical behavior characteristic information of the user into a gating circulation unit for fusion. A sequence of consecutive sign-in actions for a user u may be defined as Wherein v represents a labelTo a point of interest,/vRepresenting longitude and latitude coordinates of a point of interest, t representing sign-in time, MvIs a set of phrases that describe points of interest v, such as: comment, score and POI category, subscripts 1,2, … n are used to identify n points of interest that the user is continuously crediting, respectively. At time t, the state update of the gated-cycle cell is calculated by the following equation:
wherein, "" indicates dot product "{ U1,U2,U3,W1,W2,W3}∈Rd×dAnd { b }1,b2,b3}∈RdIs a parameter matrix, h, of the gated cyclic unit to be trainedt-1Representing the hidden state at the previous time t-1, rtAnd ztRespectively a reset gate and an update gate at time t,is a candidate state, htRepresents the output vector of the hidden layer and,and (3) representing an input vector of the user u signed in at the moment t, wherein R is a feature vector space and d is a feature vector dimension.
S52, selecting the fusion information features by adopting an attention mechanism to obtain a recent interest preference feature vector of the user, wherein the calculation formula is as follows:
wherein, e (h)t) Weight, W, representing current attention deficit layeraParameters representing attention mechanism layers, a representing weight ratios of the attention mechanism layers, h being a gating cycle unit,representing time t hiding the layer output unit. The input layer, the embedding layer, the gate unit network and the attention mechanism layer form an encoder. As in FIG. 4, the ith dimension v of the POI, region and time period feature vector of the input layeri,ti,ziThe output vector h of the hidden layer unit at each moment in the embedded layer and gate control unit network1,…,hTAnd the attention mechanism weight coefficient a at each time moment after normalization in the attention mechanism layer1,…,aTFinally, state s is output, where T is the total duration of a check-in sequence.
S6: and inputting the user interest feature vector into a recommendation model based on an Actor-Critic (Actor-Critic) framework of deep reinforcement learning to obtain a Top-k ordered interest point recommendation list.
The data source can be directly downloaded from the website of the existing research-type recommendation system based on the social network or acquired by utilizing the public API of the mature social platform.
The specific steps of extracting the user set and the interest point set from the original data are as follows:
data cleaning; and deleting the user with the check-in times less than a and the interest point with the check-in times less than b to obtain a new data set, wherein the actual conditions a and b can be 5-10 in the specific implementation process.
The specific steps of interest point recommendation based on the reinforcement learning framework comprise:
s61, the Actor (Actor) framework decodes the current State (State), i.e. the user dynamic interest preference feature, through a decoder and outputs a State Action (Action): a list of a specified number of candidate points of interest, shown in FIG. 1, outputs action a by state s decoding;
s62, a Critic (Critic) framework calculates the value expectation of the action state value function estimation strategy by using a Deep Q-Network (DQN), and outputs or updates the dominant strategy selected or integrated in real time according to the expectation, so that the training speed is increased and an effective local strategy is generated in the training process. In the embodiment, the state s and the action a pass through the full connection layer, then are input into the deep Q value network, and Q (s, a) is output. The Q function Q (s, a) is the expected value of the reward that can be obtained for each state following an action a in a given state s. And according to the calculation result of the Q function, the model analyzes the action taken next.
After the Agent takes Action (Action), i.e. recommends a list of POIs to the user, the user may browse the POIs and choose to visit or skip (not visit) to provide his feedback, where the user's stay at the POIs is considered an implicit feedback, and the Agent immediately receives a Reward (Reward) based on the user's feedback.
S63, recommending a Top-k interest point set to the user; calculating a Precision recommendation @ M and a Recall @ M according to the following calculation formulas:
wherein, | DtestI represents a test set, | Top _ M | represents a user-generated recommendation of size M, | DtestAndd Top _ M | represents the number of recommended M points of interest falling in the test set, i.e. the number of recommended points of interest that are accurate.
In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer-readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including a corresponding computer program for operating the computer program, should also be within the scope of the present invention.
In some possible embodiments, a deep reinforcement learning-based interest point recommendation system is provided, which includes a processor and a memory, where the memory is used to store program instructions, and the processor is used to call the stored instructions in the memory to execute a deep reinforcement learning-based interest point recommendation method as described above.
In some possible embodiments, a deep reinforcement learning-based interest point recommendation system is provided, which includes a readable storage medium, on which a computer program is stored, and when the computer program is executed, the deep reinforcement learning-based interest point recommendation method is implemented as described above.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (10)
1. An interest point recommendation method based on deep reinforcement learning is characterized by comprising the following steps: the method realizes the recommendation of the interest points by fusing the context characteristic attributes of the continuous sign-in behavior sequence of the user, and the realization process comprises the following steps,
s1, obtaining historical sign-in data of the user, wherein each sign-in record comprises a user ID, a user score and comment, an interest point ID, sign-in time, interest point types and an interest point geographic position; preprocessing the data set to obtain a user set and a point of interest (POI) set;
s2, sorting the historical sign-in records of each user preprocessed in S1 according to the sequence of access time to obtain continuous sign-in behavior sequence data of the users;
s3, constructing 3 bipartite graphs according to the processed user historical check-in data, wherein the bipartite graphs are POI-POI graphs GVVPOI-function area map GVZAnd POI-time period map GVT;
S4, converting the user continuous sign-in behavior sequence obtained in the S2 into a user feature vector through an embedding layer; g is to beVV、GVZAnd GVTEmbedding the POI, the functional area and the time period into the same potential space through joint graph embedding learning to obtain a feature vector of the POI, the functional area and the time period in a shared low-dimensional space; connecting the user feature vector and POI, functional area and time period feature vector in series;
s5, inputting the feature vectors after series connection into a gate control circulation unit based on an attention mechanism to generate the recent interest preference feature vectors of the user;
and S6, inputting the user interest feature vector into a recommendation model based on a deep reinforcement learning Actor-criticic framework to obtain a Top-k ordered interest point recommendation list.
2. The interest point recommendation method based on deep reinforcement learning of claim 1, characterized in that: in step S1, data cleansing is performed, including deleting the user whose check-in times are less than a times and the interest point whose check-in times are less than b times, to obtain a new data set, and parameters a and b are preset.
3. The interest point recommendation method based on deep reinforcement learning of claim 1, characterized in that: the implementation of step S3 is as follows,
s31, constructing a POI-POI graph GVV=(V∪V,εvv) Where V is the set of POIs, εvvIs a set of edges between POIs;
s32, constructing a POI-functional area map GVZ=(V∪Z,εvz) Where V is the set of POIs, Z is the set of functional regions, εvzIs a set of edges between the POI and the functional area; the POI-functional area map is used for processing the geographical and semantic relations between the POI and the regions, and dividing the cities according to the core functions of the regions representing the regions to obtain a functional area set; finding out a corresponding functional area z according to the geographic position of the POI v, and connecting an edge epsilon between v and zvzAnd is provided with the edgeThe weight is 1;
s33, constructing a POI-time period graph GVT=(V∪T,εvt) Where V is the set of POIs, T is the set of time periods, εvtIs a set of edges between the POI and the time period; according to the historical sign-in data of the user, if a POIv is accessed within a time period t, connecting an edge between v and t, and setting the weight of the edge as the access frequency.
4. The point of interest recommendation method based on deep reinforcement learning according to claim 1, wherein: the joint graph embedding learning of step S4 is implemented as follows,
given a bipartite graph GVV=(VA∪VB),VAAnd VBIs two mutually disjoint vertex sets, uses a negative sampling mode to calculate an embedded vector O of each vertex in a latent space in the graph,
where ε is the collection of edges, wijIs an edge eijWeight of (d), logp (v)j|vi) Is and viAssociated vjProbability of occurrence, n being a negative sample from VBResulting vertex Mark, Pn(v) Is the probability of negative sampling; v. ofiAnd vjIs an edge eijTwo end points of (a), viBelong to VA,vjBelong to VB,vnIs sampled from V by negativeBThe resulting vertex points are then used to generate a vertex,andrespectively embedding vectors of corresponding vertexes thereof; σ () is a Sigmoid function,is an expectation function, K is the number of edges that are sampled negative each time, anddvis the out degree of the vertex v; expression vectors of POI, region and time period in shared low-dimensional space are obtained through a joint training modeAnd
5. the interest point recommendation method based on deep reinforcement learning of claim 1, characterized in that: step S5 includes the following sub-steps,
s51, inputting the continuous sign-in sequence characteristics and the < comment characteristics, space-time characteristics and POI characteristics > as the integral historical behavior characteristic information of the user into a gate control cycle unit model for fusion;
and S52, selecting the fusion information features by adopting an attention mechanism to obtain the recent interest preference feature vector of the user.
6. The interest point recommendation method based on deep reinforcement learning of claim 5, wherein: a sequence of consecutive sign-in behaviors of a user u in S51 is defined as Where v denotes a check-in point of interest, lvRepresenting longitude and latitude coordinates of the point of interest, t representing time of sign-inM isvIs a group of phrases describing the interest points v, at the time t, the state update of the GRU is calculated by the following formula,
wherein, an |, indicates a dot product, { U1,U2,U3,W1,W2,W3}∈Rd×dAnd b1,b2,b3}∈RdIs a parameter matrix, h, of the gated cyclic unit to be trainedt-1Representing the hidden state at the previous time t-1, rtAnd ztRespectively a reset gate and an update gate at time t,is a candidate state, htRepresents the output vector of the hidden layer(s),and (3) representing an input vector of the user u signed in at the moment t, wherein R is a feature vector space and d is a feature vector dimension.
7. The method for recommending interest points based on deep reinforcement learning according to claim 1,2, 3, 4, 5 or 6, wherein: step S6 includes the following sub-steps,
s61, the Actor framework outputs the current State and the State Action: a specified number of candidate interest point lists;
s62, the Critic framework utilizes the depth Q value network DQN to calculate the value expectation of the action state value function estimation strategy, the dominant strategies are selected or integrated in real time according to the expectation to be output or updated, the training speed is improved, and meanwhile an effective local strategy is generated in the training process.
S63, recommending a Top-k interest point set to the user; and calculating a Precision recommendation @ M and a Recall @ M.
8. A point of interest recommendation system based on deep reinforcement learning is characterized in that: the method for implementing the point of interest recommendation based on deep reinforcement learning according to any one of claims 1 to 7.
9. The deep reinforcement learning-based interest point recommendation system according to claim 8, wherein: comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the interest point recommendation method based on deep reinforcement learning according to any one of claims 1-7.
10. The deep reinforcement learning-based interest point recommendation system according to claim 8, wherein: comprising a readable storage medium, on which a computer program is stored, which, when executed, implements a method for point of interest recommendation based on deep reinforcement learning according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210175716.XA CN114662015A (en) | 2022-02-25 | 2022-02-25 | Interest point recommendation method and system based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210175716.XA CN114662015A (en) | 2022-02-25 | 2022-02-25 | Interest point recommendation method and system based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114662015A true CN114662015A (en) | 2022-06-24 |
Family
ID=82027854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210175716.XA Pending CN114662015A (en) | 2022-02-25 | 2022-02-25 | Interest point recommendation method and system based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114662015A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115408621A (en) * | 2022-08-12 | 2022-11-29 | 中国测绘科学研究院 | Interest point recommendation method considering linear and nonlinear interaction of auxiliary information features |
CN116091174A (en) * | 2023-04-07 | 2023-05-09 | 湖南工商大学 | Recommendation policy optimization system, method and device and related equipment |
CN116244513A (en) * | 2023-02-14 | 2023-06-09 | 烟台大学 | Random group POI recommendation method, system, equipment and storage medium |
CN116955833A (en) * | 2023-09-20 | 2023-10-27 | 四川集鲜数智供应链科技有限公司 | User behavior analysis system and method |
-
2022
- 2022-02-25 CN CN202210175716.XA patent/CN114662015A/en active Pending
Non-Patent Citations (2)
Title |
---|
HUANG, JING, ET AL.: "Personalized POI recommendation using deep reinforcement learning", 《LBS 2021: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON LOCATION BASED SERVICES》, 30 November 2021 (2021-11-30) * |
MIN XIE,ET AL.: "Learning Graph-based POI Embedding for Location-based Recommendation", 《CIKM \'16: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL ON CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》, 31 December 2016 (2016-12-31) * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115408621A (en) * | 2022-08-12 | 2022-11-29 | 中国测绘科学研究院 | Interest point recommendation method considering linear and nonlinear interaction of auxiliary information features |
CN116244513A (en) * | 2023-02-14 | 2023-06-09 | 烟台大学 | Random group POI recommendation method, system, equipment and storage medium |
CN116244513B (en) * | 2023-02-14 | 2023-09-12 | 烟台大学 | Random group POI recommendation method, system, equipment and storage medium |
CN116091174A (en) * | 2023-04-07 | 2023-05-09 | 湖南工商大学 | Recommendation policy optimization system, method and device and related equipment |
CN116955833A (en) * | 2023-09-20 | 2023-10-27 | 四川集鲜数智供应链科技有限公司 | User behavior analysis system and method |
CN116955833B (en) * | 2023-09-20 | 2023-11-28 | 四川集鲜数智供应链科技有限公司 | User behavior analysis system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Urban2vec: Incorporating street view imagery and pois for multi-modal urban neighborhood embedding | |
CN111061946B (en) | Method, device, electronic equipment and storage medium for recommending scenerized content | |
CN114662015A (en) | Interest point recommendation method and system based on deep reinforcement learning | |
Liu et al. | Predicting the next location: A recurrent model with spatial and temporal contexts | |
CN107133262B (en) | A kind of personalized POI recommended methods based on more influence insertions | |
CN111061961A (en) | Multi-feature-fused matrix decomposition interest point recommendation method and implementation system thereof | |
CN109062962B (en) | Weather information fused gated cyclic neural network interest point recommendation method | |
Hu et al. | A graph embedding based model for fine-grained POI recommendation | |
KR102340463B1 (en) | Sample weight setting method and device, electronic device | |
CN113569129A (en) | Click rate prediction model processing method, content recommendation method, device and equipment | |
CN107644036A (en) | A kind of method, apparatus and system of data object push | |
CN115422441A (en) | Continuous interest point recommendation method based on social space-time information and user preference | |
CN115408618B (en) | Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features | |
Ma et al. | Exploring multiple spatio-temporal information for point-of-interest recommendation | |
CN111695046A (en) | User portrait inference method and device based on spatio-temporal mobile data representation learning | |
CN113469752A (en) | Content recommendation method and device, storage medium and electronic equipment | |
CN112597389A (en) | Control method and device for realizing article recommendation based on user behavior | |
CN109684561B (en) | Interest point recommendation method based on deep semantic analysis of user sign-in behavior change | |
CN115186197A (en) | User recommendation method based on end-to-end hyperbolic space | |
Noorian | A BERT-based sequential POI recommender system in social media | |
CN117633371A (en) | Recommendation method, device and readable storage medium based on multi-attention mechanism | |
CN116628345B (en) | Content recommendation method and device, electronic equipment and storage medium | |
CN115470362A (en) | Interest point real-time recommendation method based on city space-time knowledge graph | |
Chen et al. | A restaurant recommendation approach with the contextual information | |
CN116976961A (en) | Address selection method, address selection device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |