CN114021011A

CN114021011A - Self-attention mechanism-based next interest point recommendation method

Info

Publication number: CN114021011A
Application number: CN202111299901.1A
Authority: CN
Inventors: 朱金侠; 邢长征; 孟祥福
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-08

Abstract

The invention discloses a next interest point recommendation method based on a self-attention mechanism, which comprises the steps of firstly, carrying out integrated modeling on sequence information, space-time information and context-related dynamic social relations; secondly, two parallel channels (long/short-term channels) are designed to model the long/short-term preference of the user and friends thereof, and a self-attention mechanism is utilized to capture the long-distance dependency relationship between any two historical check-ins of the user; and finally, predicting the preference score of the user on the interest point at the next moment. The invention constructs the interaction between the user and the interest points into an L2L graph, and an L2L graph represents the proximity of the geographic distance between the interest points, which is essentially a weighted undirected graph, wherein a vertex represents one interest point, edges represent the spatial correlation between the interest points, and weights on the edges represent the geographic distance.

Description

Self-attention mechanism-based next interest point recommendation method

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a next interest point recommendation method based on a self-attention mechanism.

Background

Next point of interest recommendation has become an important task for location based social networks, with the goal of predicting the next point of interest that may be of interest to the user. The location-based social network is a special online social network, allows users to share information such as check-in and comments, and can interact with people with interests in a virtual world. Thus, in a location-based social network, the user's next decision may be influenced by friends (social relationships). Past models suggest that social information of users has little impact on the recommendation effect because these models ignore behavioral similarities between users and their friends. The invention excavates the sign-in information of the user and the friend thereof through a self-attention mechanism so as to capture the behavior similarity between the user and the friend thereof. Second, user interests are dynamic in nature and exhibit different access preferences in different contexts over different periods of time. For example, a user may be interested in extreme motion over a period of time, and then check in to some locations like bungee jumping and rock climbing; the user may be interested in the historical culture for a period of time and then check in some historical relics and museums. In addition, different user preferences may depend on different friends, and thus, social relationships that affect user decisions are also dynamic. For example, a user may prefer friends who like history when checking in to a museum; and when checking in to the gym, the user can be more confident of friends who like sports.

Disclosure of Invention

Based on the defects of the prior art, the technical problem to be solved by the invention is to provide a next interest point recommendation method based on a self-attention mechanism, so that the recommendation precision and the model interpretability are better improved.

In order to implement the above invention, a next interest point recommendation method based on an attention-free mechanism is provided, which includes the following steps:

step 1, data acquisition;

data processing: constructing the collected European space data into non-European space data, and constructing interaction data between the user and the interest points into an L2L graph between the user and the interest points;

data set partitioning: randomly selecting 80% of historical interaction from the processed data set as a training set for training a model; the rest is used as a test set for evaluating the recommendation effect of the model; regarding each observable user in the data set to interact with the interest points, regarding the user as a positive sample, and then executing a negative sampling strategy to pair the negative samples for the interest points which are not interacted with by the user;

step 2, constructing a model:

firstly, performing integrated modeling on sequence information, spatio-temporal information and context-related dynamic social relations; secondly, two parallel channels are designed to model the long/short-term preference of the user and friends thereof; finally, capturing a long-distance dependency relationship between any two historical check-ins of the user by using a self-attention mechanism;

step 3, model training and next interest point recommendation:

and (3) respectively using the training set and the test set obtained in the step (1) for training and evaluating the model constructed in the step (2), obtaining a preference score between the user and the interest point by the model through probability calculation, and finally judging whether the interest point is recommended to the user according to the obtained preference score.

Preferably, in step S1, the construction and calculation method is as follows:

firstly downloading a Gowalla set, secondly preprocessing data, constructing collected European space data into non-European space data, and constructing interactive data between a user and interest points into an L2L graph between the user and the interest points; for each observed user interaction with a point of interest in the dataset, consider it as a positive sample, and then implement a negative sampling strategy to pair negative samples for points of interest for which the user has no interaction.

Preferably, in step S2, the calculation method is as follows:

s201: constructing an embedded layer: the embedded layer mainly comprises five characteristic information of user behaviors: user information, interest point information, spatial information, time information and position information;

s202: long/short user channel: obtaining learned user and interest point embedded vectors through the learning of a long/short channel according to the embedded vectors input by the embedded layer;

s203: constructing a prediction layer: a preference score between the user and the item is predicted.

Optionally, in step S3, the calculation method includes:

and (3) obtaining a preference score between the user and the item for predicting the final prediction layer embedded into the vector input model and learned by the long/short channel, and recommending the next interest point for the final user.

From the above, the method for recommending the next interest point based on the self-attention mechanism at least has the following beneficial effects:

(1) the method and the system have the advantages that a self-attention mechanism is used for simulating the similarity between the user check-ins, and the integrated modeling is carried out on sequence information, space-time information and dynamic social relations without considering the type of the check-in information; in addition, the self-attention mechanism can automatically analyze the similarity of user behaviors and dynamically allocate weight to the user behaviors according to the similarity (namely, the higher the similarity is, the higher the weight is); and finally, the aggregation mode is independent of a graph structure, so that the generalization capability of the model is improved, and the recommendation system has better recommendation effect and interpretability.

(2) The three finally obtained embedding functions are directly added to obtain the preference score between the user and the project, so that the time complexity and the space complexity of the model are simplified, and the recommendation performance of the model is improved.

(3) The method is a combination of an embedding technology and social network diagram data recommendation, and can well solve the problems of data sparsity and cold start.

(4) In order to reduce the training difficulty of the model, the model only captures the behavior similarity between direct (or first-order) friends in the LBSN through a multi-attention mechanism, so that the training and generalization are easier.

(5) The invention improves the recommendation accuracy by considering the dynamic preference of the user.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following detailed description is given in conjunction with the preferred embodiments, together with the accompanying drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments will be briefly described below.

FIG. 1 is a diagram of a model architecture of a next point of interest recommendation method based on the self-attention mechanism according to the present invention;

FIG. 2 is a diagram of a prediction layer of a model;

fig. 3 is a graph of the effect of long/short term preferences on model recommendation.

Detailed Description

Other aspects, features and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, which form a part of this specification, and which illustrate, by way of example, the principles of the invention. In the referenced drawings, the same or similar components in different drawings are denoted by the same reference numerals.

The invention discloses a self-attention mechanism-based next interest point recommendation method, which comprises the following steps of:

step 1: data acquisition, data processing and data set division.

Data acquisition: the inventive dataset used the Gowalla dataset, which included 1947 pieces of user data, 569651 pieces of point-of-interest data, 231192 pieces of user trajectories, and 10274 pieces of friendship data. Firstly, Gowalla sets are downloaded, secondly, data preprocessing is carried out, collected European space data are constructed into non-European space data (namely graph data), and the main method is that interactive data between users and interest points are constructed into L2L graphs between users and the interest points. For each observed user interaction with a point of interest in the dataset, consider it as a positive sample, and then implement a negative sampling strategy to pair negative samples for points of interest for which the user has no interaction.

Data processing: the collected Euclidean space data is constructed into non-Euclidean space data (namely graph data), and the main way is to construct interaction data between a user and interest points into an L2L graph between the user and the interest points.

Data set partitioning: randomly selecting 80% of historical interaction from the processed data set as a training set for training a model; and the rest is used as a test set for evaluating the recommendation effect of the model. For each observed user interaction with a point of interest in the dataset, consider it as a positive sample, and then implement a negative sampling strategy to pair negative samples for points of interest for which the user has no interaction.

Step 2: and (5) constructing a model.

Firstly, performing integrated modeling on sequence information, spatio-temporal information and context-related dynamic social relations; secondly, two parallel channels (long/short-term channels) are designed to model the long/short-term preference of the user and friends thereof; and finally, capturing the long-distance dependency relationship between any two historical check-ins of the user by using a self-attention mechanism, wherein the calculation method is as follows:

s201: build-in layer (input layer): the embedded layer mainly comprises five characteristic information of user behaviors: user information, point of interest information, spatial information, temporal information, and location information (in the entire user trajectory).

S202: long/short user channel: and obtaining the learned user and interest point embedded vector through the learning of a long/short channel according to the embedded vector input by the embedded layer.

S203: construct prediction layer (output layer): a preference score between the user and the item is predicted.

And step 3: model training and next interest point recommendation:

and (3) respectively using the training set and the test set obtained in the step (1) for training and evaluating the model constructed in the step (2), and obtaining the preference score between the user and the interest point through probability calculation by the model. And finally, judging whether to recommend the interest points to the user according to the obtained preference scores, wherein the calculation method comprises the following steps:

The specific algorithm of the model architecture diagram is as follows:

the user behavior of the invention comprises five characteristics: user information, point of interest information, spatial information, temporal information, and location information (in the entire user trajectory). These five types of feature information are in an embedded form, and potential characterizations of each user are obtained.

(1) User behavior feature embedding

1. Embedding user information: according to the invention, the user and friend information is encoded into a low-dimensional vector by a node2vec embedding method, and for a user u, a feature embedding vector u is output by an embedding method.

2. Embedding interest points: for check-in data

Point of interest in

And directly searching by using the interest point ID to obtain an interest point feature embedded vector v.

3. Site embedding: the present invention describes two-dimensional geographic impact by L2L graph (L2L represents the proximity of geographic distance between points of interest), which is essentially a weighted undirected graph, where a vertex represents a point of interest, edges represent spatial correlation between points of interest, and weights on the edges represent geographic distance. Each place is coded as a low-dimensional vector on an L2L diagram by using a node2vec embedding method, and the place is coded as a low-dimensional vector

Outputting feature embedding vectors by an embedding method

4. Time embedding: since different user behaviors may have time slices of different granularities, the present invention uses the time coding method proposed by the literature. Firstly, slicing time according to different lengths; secondly, classifying the time slices according to different length time slices of the time slices; and finally, inquiring the time vector according to the type to obtain a characteristic embedded vector t of the time point t.

5. Position embedding: to better model the sequence effects, the model vector embeds the positions using Timing Signal methods. Compared with the traditional position coding method, the Timing Signal method does not introduce additional parameters, and the component of the feature embedding vector p of the position p can be expressed as:

p_(2i)＝sin(p/10000^2i/d) (1)

p_(2i+1)＝cos(p/10000^2i/d) (2)

where 2i and 2i +1 are even and odd integers, respectively, and d is the dimension of the underlying variable.

(2) Concatenation of user feature embedded vectors

After the embedded vector of the user behavior characteristics is obtained, the model carries out series operation on the embedded vector to obtain the hidden representation of the sign-in of each user. When short-term preferences of a user and friends of the user are modeled, the sequence information of continuous check-in plays an important role in predicting the next decision of the user, and therefore position embedding needs to be introduced into a short-term channel. Since the long-term channel takes into account the historical check-in of each user and his friends, no location embedding is required.

For user u, its feature embedding vectors u, v, l, t and p are input to the d-dimensional fully-connected layer, each sign-in output in the long/short-term channel

And

the potential representations of (a) are respectively as follows:

wherein W_u∈R^d×d，W_v∈R^d×d，W_l∈R^d×d，W_t∈R^d×dAnd W_p∈R^d×dIs a transition matrix.

(3) Self-attention layer

The invention uses a self-attention mechanism to simulate the similarity between user check-ins, and performs integrated modeling on sequence information, spatio-temporal information and dynamic social relations without considering the type of check-in information. In addition, the self-attention mechanism can automatically analyze the similarity of user behaviors and dynamically allocate weight to the user behaviors according to the similarity (namely, the higher the similarity is, the higher the weight is). To reduce the training difficulty of the model, the model only captures the behavioral similarities between direct (or first-order) friends in the LBSN through a multi-head attention mechanism.

In the short-term channel, given the current trajectory matrix of user u

And current track matrix of user u and friends N (u)

Wherein

Is a matrix composed of hidden eigenvectors generated by the embedding layer, d is the dimension of the hidden vector, and M is the length of the user trajectory. Through a multi-head self-attention mechanism, a new sign-in representation is generated in the user track, and the calculation mode is as follows:

where concat () is a series operation, H is the number of heads of a multi-headed self-attention mechanism, W^s,k∈R^d×dIs a matrix of parameters that is,

is from

The h-th head's hidden feature vector of middle partition,

the attention weight is used for measuring the influence degree of the short-term preference of the friends of the user on the decision of the user.

Attention weight matrix

For each pair of k-th nonlinear layers input in the short-term channel

Latent feature vector, attention weight

The calculation method of (c) is as follows:

wherein

As a function of attention.

At the Scaled Dot-Product Attention layer, the most commonly used Attention functions are additive and multiplicative Attention. The invention uses multiplicative attention, in practice both attention functions behave similarly when d is small, and additive attention is better than dot product attention when d is large. In order to counteract thisInfluence, the invention scales the dot product to

The calculation method is as follows:

given user u and his friends N (u) at time t_NAll previous historical access information

Calculating each pair of check-in the long-term channel by a multi-head attention mechanism

Attention weight of

Wherein, in the kth self-attention layer of the long-term preference model, a new check-in is generated

Vector representation of

The calculation method is as follows:

wherein the content of the first and second substances,

is the output of the (k-1) th non-linear layer,

is from

Implicit feature vector of h-th head of middle division, W^l,k∈R^d×dIs a parameter matrix.

In order to alleviate the gradient disappearance problem caused by increasing the depth in the deep neural network, the model adds a residual error connection in the self-attention layer of the long/short-term channel, and uses normalization in the residual error connection layer to stabilize the learning result of the K-layer neural network. Giving inputs in long/short channels, respectively

The k-th output in the channel is respectively

Where layer _ norm () is a normalization function.

The model adopts a fully-connected feedforward network to process the output from the attention layer, and the representation capability of the model is improved. In the long/short-term channels, the input generated by the k-th self-attention layer is given

Separately computing the output of Position-wise feedforward layers using the ReLu activation function

Wherein

Is a trainable parameter matrix.

Friends with different affinities have different influences on the next decision of the user, and the Vanilla attention layer is designed to dynamically capture more influential user preferences and social relations. In the short-term channel, given the current check-in data of user u and its friends n (u), the attention weight of the short-term social relationship is:

where u' ∈ N (u) U { u },

representing the user and his friends at the next time t_N+1A potential representation of a check-in point of interest v.

Hidden representation of a user for a point of interest v in a short-term channel

The calculation method of (c) is as follows:

finally, the user u at the next moment t can be obtained_N+1Potential representation of check-in point of interest v:

in the long-term channel, a given user u and its friends N (u) are at time t_NPrevious historical sign-in data can be obtained, and the user u at the next moment t_N+1Potential representation of check-in point of interest v:

wherein t is_i∈[t₁,t_N]。

Model prediction layer, the user's preference score is defined as three embedded functions, one for each

And u, the user selects the interest point at the time of tN +1The probability calculation is as follows:

wherein, (u)^Tv represents the inherent interest of the user u in the point of interest v.

The invention uses Bayesian (BPR) loss function to learn model parameters, and considers the pair-wise preference between the observed interest points v and the unobserved interest points v' in the model optimization process, and the final objective function is as follows:

where λ is the regularization coefficient and Θ represents the parameter set.

The Dropout strategy can effectively prevent overfitting of the model during model training, weaken fixed dependency relationships among the nodes of the neural network, and improve generalization errors in the deep neural network. The invention adopts a Dropout strategy of message discarding, and nodes of a disposable output layer are randomly discarded according to the probability p in the training process, so that the nodes are not aggregated to a model prediction layer.

For user u at time t_NCan be expressed as

Wherein

In order to be the current track of the current track,

historical check-in data representing the user and his friends, { (v, v') } is the point of interest paired at the next time. The model optimization process is shown as algorithm 1:

the experimental results show that

The data set adopted by the invention is a Gowalla data set, for the data set, the invention randomly selects 80% of interaction history of each user to form a training set, and the rest 20% is used as a test set. Table 1 gives the statistical information of the data set.

Table 1 experimental data information

1. Comparison algorithm

The next interest point recommendation model based on the self-attention mechanism provided by the invention is compared with the following four advanced recommendation models respectively, and the 4 models are introduced:

(1) PRME-G: the model is a measurement Embedding method, each interest point is mapped into a low-dimensional Euler space through distance Embedding (Metric Embedding), the change of the interest point is effectively predicted by using a Markov chain model, and the Euler distance of two interest points is used for measuring the sequence relation of the two interest points.

(2) GRU4Rec + ST: the model is a recommendation model based on conversation, models all conversations and independently processes each event (such as time sequence and the like). The model embeds the space-time information signed by the user into a dense vector space by using the real sequence information, thereby solving the problem of modeling the sparse sequence data.

(3) ATST-LSTM: the model is an attention-based spatiotemporal long-short memory model that selectively focuses on relevant historical check-in records in a check-in sequence by using an attention mechanism to simultaneously model temporal and geographic impacts.

(4) DAN-SNR: the model is a deep attention network model, and the context dependency of social influence is modeled by using a self-attention mechanism to model the behavior correlation between a user and friends of the user, and the social influence and sequence influence of the user are obtained at the same time.

2. Evaluation index

For each user in the test set, all items which are not interacted with by the user are regarded as negative samples, and the items which are interacted with by the user are regarded as positive samples. The model and the comparison model selected in the invention both output preference scores of the user for all items, and in order to evaluate the validity of top-K recommendation and preference sorting, the top-K Recall ratio (Recall @ K) and normalized discount accumulated yield (NDCG @ K) are used as evaluation indexes in the experiment.

Analysis of Experimental results

1. Overall comparison

The invention respectively performs experiments on the model provided by the invention and the comparative model on the same data set, and the experimental results are shown in table 2.

TABLE 2 Overall comparison

From the overall comparison of model performances, the model provided by the invention has obvious improvement in the Recall @20 and NDCG @20 of two different data sets compared with other models, and the model has good generalization capability.

2. Influence of dynamic social relationships

The IDS-NPR obtains end-user preferences that combine user preferences with context-dependent dynamic social relationship preferences. In order to explore the influence of the two preferences on the model, the invention designs two variants of IDS-NPR-self (only including user preferences) and IDS-NPR-dynamic social (only considering context-dependent dynamic social relation preferences). In addition, to explore the influence of dynamic social relationships and social relationships (only considering the long-term preferences of the friends of the user) on the model, an IDS-NPR-social variant was designed, and the experimental results are shown in table 3.

TABLE 3 comparison of recommended effectiveness of models and variants thereof

As can be seen from a review of Table 3, IDS-NPR-self performed better on the data set than the other variants, indicating that the user's personal preferences had a greater impact on the user's next decision. Secondly, the IDS-NPR-dynamic social variant is superior to the IDS-NPR-social variant in performance, and the effectiveness of dynamic social influence is verified. Comparing table 2 with table 3, it can be seen that modeling user preferences and dynamic social relationships simultaneously can improve the recommendation effect of the model.

3. Effect of Long/term preference

The invention utilizes a long/short-term channel to model the long/short-term preference of the user and friends thereof, and designs four variants (respectively: SNI-SNRself-short, SNI-SNRself-long, SNI-SNRsocial-short, SNI-SNRsocial-long) to explore the influence of the four preferences on the model effect, wherein the experimental result is shown in figure 3.

FIG. 3 shows that the short-term preferences of the user and his friends have a greater impact on the user's next decision because the short-term preferences may better capture context-dependent dynamic preferences, and the long-term preferences primarily capture long-distance dependencies of historical check-in traces, but lack sequence effects in the check-in traces.

Firstly, carrying out integrated modeling on sequence information, spatio-temporal information and a dynamic social relation related to context; secondly, two parallel channels (long/short-term channels) are designed to model the long/short-term preference of the user and friends thereof, and a self-attention mechanism is utilized to capture the long-distance dependency relationship between any two historical check-ins of the user; and finally, predicting the preference score of the user on the interest point at the next moment. The invention constructs the interaction between the user and the interest points into an L2L graph, and an L2L graph represents the proximity of the geographic distance between the interest points, which is essentially a weighted undirected graph, wherein a vertex represents one interest point, edges represent the spatial correlation between the interest points, and weights on the edges represent the geographic distance.

While the foregoing is directed to the preferred embodiment of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

1. A next interest point recommendation method based on a self-attention mechanism is characterized by comprising the following steps:

step 1, data acquisition;

step 2, constructing a model:

step 3, model training and next interest point recommendation:

2. The self-attention mechanism-based next point of interest recommendation method of claim 1, wherein in step S1, it is constructed and calculated as follows:

3. The method for recommending a next point of interest based on a self-attention mechanism as claimed in claim 1, wherein in step S2, the calculation method is as follows:

4. The method for recommending a next point of interest based on a self-attention mechanism as claimed in claim 1, wherein in step S3, the calculation method is: