CN112258262B - Session recommendation method based on convolution self-attention network - Google Patents

Session recommendation method based on convolution self-attention network Download PDF

Info

Publication number
CN112258262B
CN112258262B CN202010969069.0A CN202010969069A CN112258262B CN 112258262 B CN112258262 B CN 112258262B CN 202010969069 A CN202010969069 A CN 202010969069A CN 112258262 B CN112258262 B CN 112258262B
Authority
CN
China
Prior art keywords
session
recommendation
item
self
attention network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010969069.0A
Other languages
Chinese (zh)
Other versions
CN112258262A (en
Inventor
张寅�
汪千缘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010969069.0A priority Critical patent/CN112258262B/en
Publication of CN112258262A publication Critical patent/CN112258262A/en
Application granted granted Critical
Publication of CN112258262B publication Critical patent/CN112258262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a session recommendation method based on a convolution self-attention network. The invention comprises the following steps: 1) Firstly, each article in a session is expressed as a low-dimensional vector, and the low-dimensional vector is formed by article embedding and position embedding; 2) Performing sequence modeling and intention modeling on the low-dimensional vector, capturing sequence information of a session by the sequence modeling, and capturing key intention information of the session by the intention modeling; 3) Based on the obtained splice sequence information and the key intention information, the user is selectively predicted to click on the repeated object or the non-repeated object next time. Compared with the prior art, firstly, the invention can capture the interdependence among different fragments in the session to obtain the sensitive article representation of the session fragment. Then, the present invention uses a bi-directional linear decoder to reduce the parameters of the model and improve the performance and robustness of the model. Finally, the present invention uses gaussian offset to improve the attention layer, and computes gaussian weighting factors to improve the performance of the repeated recommendation decoder.

Description

Session recommendation method based on convolution self-attention network
Technical Field
The invention relates to application of a neural network method in a session recommendation technology, in particular to a technical method for capturing local segment characteristics of a session by adopting convolution operation and enriching weight factor information by adopting Gaussian offset.
Background
In the context of the big data age, "information overload" is a common problem. How to obtain valuable information from complex data is a key problem for the development of big data technology. The recommendation system (Recommender Systems, RS for short) is an effective way to solve information overload. The recommendation system is a technical system for modeling consumers and interaction information thereof by utilizing historical interaction information of the consumers and websites, mining interests and hobbies of the consumers, filtering and evaluating mass selections, and finally performing personalized recommendation for the consumers.
Conventional personalized recommendation systems often need to grasp user information to make a featured recommendation. Many e-commerce recommendation systems (especially small retailer systems) and most news and media websites typically do not track the identity of users who access their websites for long periods of time. While browser caches may provide certain information to assist websites in user identification and portrayal, these techniques tend to be unreliable and may involve privacy concerns. The session predicts the user's next behavior based on a sequence of anonymous behaviors over a period of time (e.g., click, purchase, collection, shopping cart, etc.). Such anonymous behavior sequences are referred to herein as "conversations". Behavior within a session, the present invention is referred to as "item".
In recent years, deep learning techniques such as a recurrent neural network and a self-attention network are successfully applied to a session recommendation algorithm. Compared with a cyclic neural network (RNN), a self-care network (SAN) has obvious advantages in modeling long-term dependence and avoiding information forgetting, but the existing model still has three problems:
1) Local dependencies are ignored. Local correlation refers to interdependence between different sequence segments within a session. A sequence fragment is a more abstract unit of feature than a single item. The method has the advantages that local correlation is captured when the object is modeled, better object representation can be obtained, and the prediction accuracy is improved.
2) The conventional full-connection decoder has huge parameter quantity, long training time and poor model robustness.
3) The influence of the clicked sequence of the items in the session on the repeated recommendation result is ignored. In the repetitive consumption phenomenon, the item clicked by the user at the next time is more likely to be the most recently clicked item.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a session recommendation method based on a convolution self-attention network. The invention captures the local correlation in the session by using the encoder based on the convolution self-attention network to obtain the sensitive object representation of the session fragment, thereby improving the performance of the modeling session. The invention reduces the parameter quantity of the model by using the bidirectional linear decoder, and improves the performance and the robustness of the model. The invention improves the performance of the repeated recommendation decoder by utilizing the distance relation between each item and the last item in the Gaussian weight modeling session.
The technical scheme adopted by the invention is as follows:
a session recommendation method based on a convolution self-attention network comprises the following steps:
s1: giving a conversation as input, and acquiring a low-dimensional vector of each article in the conversation, wherein the low-dimensional vector is formed by adding article embedding and position embedding of the article in the conversation;
s2: modeling sequence information of a session by using a sequence encoder based on a convolution self-attention network on the basis of the low-dimensional vector obtained in the step S1, modeling key intention information of the session by using an intention encoder based on the convolution self-attention network and a Gaussian attention mechanism, and calculating Gaussian weights;
s3: splicing the sequence information and the key intention information obtained in the S2 to obtain a session hidden layer representation, and inputting the session hidden layer representation into a repetition-exploration selector to predict the probability of the user to select repeated or unrepeated articles next; then, the conditional probability of each repeated object is calculated in the repeated recommendation decoder, the conditional probability of each non-repeated object is calculated in the exploration recommendation decoder, and the edge probabilities output by the two decoders are added to obtain the prediction probability of the model on all possible objects.
Preferably, the method for modeling the sequence information of the session by the sequence encoder based on the convolution self-attention network comprises the following steps:
s211: capturing the characteristics of the conversation fragments around each article in the conversation by using convolution operation, and interacting with the characteristics when modeling the article representation to obtain the article representation sensitive to the conversation fragments;
s212: based on the item representation obtained in S211, sequence information of the session is modeled using the interdependencies between different items in the session captured by the self-attention network.
Further, the self-attention network in S212 is a masking multi-head self-attention network.
Further, the specific method for modeling key intention information of a session and calculating Gaussian weights by the intention encoder based on a convolution self-attention network and a Gaussian attention mechanism is as follows:
s221: capturing interdependencies between different items using a convolutional self-attention network based on the item representation obtained in S211;
s222: calculating the weight of each item in the session by using an attention mechanism based on the item representations obtained in S221, wherein the weighted sum of different item representations is the key intention information of the session; the gaussian weights for each item that is gaussian shifted are then calculated with the last item in the session being the desired center of the gaussian distribution.
Further, the specific method for predicting the probability of the user selecting the repeated or non-repeated articles in the repeated-explored selector is as follows:
and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain repeated recommendation probability and exploration recommendation probability, wherein the repeated recommendation probability and the exploration recommendation probability are used for judging whether clicked articles or non-clicked articles are recommended to a user.
Further, the specific method for calculating the conditional probability of each repeated object in the repeated recommendation decoder is as follows:
and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated object next by the user.
Further, the specific method for calculating the conditional probability of each non-duplicate item in the exploration recommendation decoder is as follows:
and splicing the sequence information and the key intention information to obtain a session hidden layer representation, mapping the session hidden layer representation to the classification of the non-clicked items through a bidirectional linear transformation matrix and an item embedding matrix, and finally normalizing the non-clicked items through a softmax layer to obtain the conditional probability of the user for clicking the non-repeated items next time.
Compared with the prior art, firstly, the invention can capture the interdependence among different fragments in the session to obtain the sensitive article representation of the session fragment. Then, the present invention uses a bi-directional linear decoder to reduce the parameters of the model and improve the performance and robustness of the model. Finally, the invention uses Gaussian shift to improve the attention layer, calculates Gaussian weight factors to contain the distance relation between each article and the last article in the conversation in the position sequence, so as to improve the performance of the repeated recommendation decoder.
Drawings
FIG. 1 is a flow chart of a session recommendation method based on a convolutional self-attention network;
fig. 2 is an overall frame of the present invention.
Fig. 3 is a block diagram of a multi-headed convolution self-attention network.
Detailed Description
The invention is further illustrated and described below with reference to the drawings and detailed description.
As shown in fig. 1, the present invention provides a session recommendation method based on a convolutional self-attention network, which comprises the following steps:
s1: given a session as input, a low-dimensional vector is obtained for each item within the session, the low-dimensional vector being formed by adding the item embedding and the item's location embedding in the session.
S2: based on the low-dimensional vector obtained in S1, modeling the sequence information of the conversation using a sequence encoder based on a convolution self-attention network, modeling the key intention information of the conversation using an intention encoder based on a convolution self-attention network and a gaussian attention mechanism, and calculating a gaussian weight.
The method for modeling the sequence information of the session by the sequence encoder based on the convolution self-attention network comprises the following steps:
s211: the convolution operation is used for capturing the characteristics of the session fragments around each article in the session, and the characteristics are used for interaction when the article representation is modeled, so that the article representation sensitive to the session fragments is obtained.
S212: based on the item representation obtained in S211, sequence information of the session is modeled using the interdependencies between different items in the session captured by the self-attention network. Wherein the self-attention network is a masking, preferably a multi-headed self-attention network.
The specific method for modeling key intention information of a conversation by an intention encoder based on a convolution self-attention network and a Gaussian attention mechanism and calculating Gaussian weights is as follows:
s221: capturing interdependencies between different items using a convolutional self-attention network based on the item representation obtained in S211;
s222: calculating the weight of each item in the session by using an attention mechanism based on the item representations obtained in S221, wherein the weighted sum of different item representations is the key intention information of the session; the gaussian weights for each item that is gaussian shifted are then calculated with the last item in the session being the desired center of the gaussian distribution.
S3: splicing the sequence information and the key intention information obtained in the S2 to obtain a session hidden layer representation, and inputting the session hidden layer representation into a repetition-exploration selector to predict the probability of the user to select repeated or unrepeated articles next; then, the conditional probability of each repeated object is calculated in the repeated recommendation decoder, the conditional probability of each non-repeated object is calculated in the exploration recommendation decoder, and the edge probabilities output by the two decoders are added to obtain the prediction probability of the model on all possible objects.
The specific method for predicting the probability of the user selecting the repeated or non-repeated articles next in the repeated-explored selector is as follows:
and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain repeated recommendation probability and exploration recommendation probability, wherein the repeated recommendation probability and the exploration recommendation probability are used for judging whether clicked articles or non-clicked articles are recommended to a user.
The specific method for calculating the conditional probability of each repeated object in the repeated recommendation decoder is as follows:
and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated object next by the user.
The specific method for calculating the conditional probability of each non-repeated object in the exploration recommendation decoder is as follows:
and splicing the sequence information and the key intention information to obtain a session hidden layer representation, mapping the session hidden layer representation to the classification of the non-clicked items through a bidirectional linear transformation matrix and an item embedding matrix, and finally normalizing the non-clicked items through a softmax layer to obtain the conditional probability of the user for clicking the non-repeated items next time.
The above method is applied to the specific embodiments in order to further illustrate the specific implementation of the present invention.
Examples
The overall framework of the method in this embodiment is shown in fig. 2. In order to facilitate the understanding of the following text and the unification of writing, this section gives a formulated description of some terms referred to below. The relevant mathematical symbols and their meanings are shown in table 1.
TABLE 1 Session recommendation related math symbols and meanings
The session recommendation method based on the convolution self-attention network specifically comprises the following steps:
step 1, obtaining a vector representation of each article
1.1 For a given input session, the input sequence of items [ x ] is entered using an item embedding matrix emb 0 ,x 1 ,…,x t-1 ,x t ]The index is mapped into a real value vector sequence of a low-dimensional space, and the embedded representation of the object is obtained.
1.2 To supplement the positional sequence information of the items within the session, a position code is additionally added. Using trigonometric function based position coding, the calculation formula is as follows:
where pos is the location of the item in the session. I in both formulas refers to the dimension d of the item embedding model Is the dimension of the position code, and 2i and 2i+1 in the formula are the parity to distinguish the dimensions.
1.3 Adding the item-embedded representation to the position code, which is the final item-vector representation [ x ] 0 ,x 1 ,…,x t-1 ,x t ]。
Step 2, modeling sequence information, key intention information and calculating Gaussian weights
2.1 With the result [ x ] of step 1 0 ,x 1 ,…,x t-1 ,x t ]For input, sequence information was modeled using a sequence encoder based on a convolutional self-attention network (Convolutional Self-Attention Networks, convSAN for short). Capturing local characteristics of each article to obtain a session fragment sensitive article representation, and outputting a hidden layer representationThe last hidden layer represents the sequence information comprising the session, denoted +.>The calculation process performed throughout the network can be expressed as follows:
ConvSAN contains two sublayers: a multi-headed convolution self-attention network layer and a feedforward neural network layer. The input and output of each sub-layer is subjected to residual connection and layer regularization. Residual connection helps return gradients, and layer regularization can accelerate model convergence. The calculation formula is as follows:
SubLayerOutput=LayerNorm(x+SubLayer(x))
a multi-headed convolutional self-attention network is first introduced, the overall framework of which is shown in fig. 3. Q, K the V vectors are Query (Q) vectors, key (K) vectors and Value (V) vectors, respectively, in the network. To capture the features of the segments around each item, both Q and K are convolved using a convolution kernel of size K. When modeling the object i, the sequence segment characteristics with the length k around the object are extracted through convolution (the surrounding refers to the object i to the left, so that future information leakage is prevented), and any two objects interact through the characteristics. The convolution results in an article i Q, K with a V vector ofV i . The specific calculation process comprises the following steps:
V i =x i
wherein W is Q ,B Q Representing the weight matrix and bias for convolving Q. Similarly, W K ,B K Is a weight matrix and bias for convolving K (W and B appearing in the following expression are trainable parameter matrices and will not be described in detail). The self-attention operation is performed next:
wherein the method comprises the steps ofIs a scaling factor, avoid Q conv (K conv ) T The product is too large into the saturation domain of the softmax function. d, d k Is K conv Is a dimension of (c).
Q is achieved by utilizing a multi-head mechanism conv 、K conv V is mapped into a plurality of subspaces with the same dimension, and attention calculation results of different subspaces are spliced. This helps the network capture more information.
MultiHead(Q conv ,K conv ,V)=concat(h 0 ,h 1 ,ch t )
Wherein the attention in the ith subspace is calculated as follows:
h i =Attention(Q conv W i Q ,K conv W i K ,VW i V )
the feed-forward neural network layer will be described again. The output of the convolution self-attention network layer is the input of the feedforward neural network layer after layer regularization and residual connection. This layer implements two linear transformations and one Relu activation:
FFN(x)=(Relu(xW 1 +b 1 ))W 2 +b 2
it should be noted that for the session recommendation scenario, modeling x t At the time of (1), only x is known 0 ,x 1 ,…,x t Without knowing x t+1 . Therefore, in order to prevent future information leakage, the present invention additionally adds a mask (mask) to the self-attention mechanism, masking x t The latter information becomes Masked Multi-head Attention (Masked Multi-head Attention).
2.2 With the result [ x ] of step 1 0 ,x 1 ,…,x t-1 ,x t ]For input, key intent information for the intent encoder modeling session is used and gaussian weights are calculated.
The intention encoder includes two parts: a single conv san layer and a gaussian attention layer (Gaussian Attention, gaussAtten for short). The calculations are performed in the intention encoder:
session is embedded in ConvSAN layer and output hidden layer representationInput into Gaussatten layer, output hidden layer representation of user key intention +.>And Gaussian weight factor weight Gauss . The calculation process performed in the network is as follows:
the invention first introduces alpha tjtj Is a weight factor, the greater the weight factor, the more likely the session isMiddle->The greater the specific gravity. />Is a gaussian weight factor. The calculation modes are as follows:
q is the calculationAnd->And a function of the degree of similarity between the two. The similarity function calculation process is as follows:
sigma is an activation function, which may be a sigmoid function or a softmax function, and the model of this embodiment selects the sigmoid function for use. A is that 1 Is a linear transformation matrix to beMapping into hidden layer space. Same theory A 2 And v.
Then introduceRatio alpha tj One more G tj . The invention first introduces the matrix G. />Is a position alignment matrix based on gaussian distribution, I is the session length. G tj Is one of the matrices, and measures the compactness between the article j and the article t at the central position, and the calculation process is as follows:
wherein sigma t Is the standard deviation, generally set as Gaussian window D t Half of (2); j is the position of item j in the session, P t A predicted center position of the article t corresponds to the expected position; g tj <0. Predicting the center position P t And Gaussian window D t All are learned:
obviously P t And D t Is limited in (0,I). P is p t And z t For scalar, the calculation process is as follows:
wherein the method comprises the steps ofH is->Is a dimension of (c). />And->For a linear mapping matrix, the output is mapped as a scalar. They share the same W p This is because there may be some correlation between the expectations and variances of the gaussian distribution when applied to the conversation, with the weights of the other items being gaussian shifted with the last item as the center position.
Output of sequence encoderInput to intent encoderGo out->Splicing to obtain the final hidden layer representation of the session: />c t Into subsequent decoders, weight Gauss Into a repeated recommendation decoder.
Step 3, predicting the probability of clicking each item by the user next by using a repetition-exploration decoder to make a recommendation
3.1 Using a Repeat-Explore Selector (RES) to calculate the probability that the user clicks on a duplicate item or a non-duplicate item next. RES corresponds to a classifier for determining whether a clicked item (repetition mechanism) or an uncracked item (exploration mechanism) is recommended to the user. It comprises two parts: the first part is a linear transformation layer, mapping the hidden layer representation of the session into scores of two mechanisms; the second part is the softmax layer, calculating normalized probabilities. The specific calculation process is as follows:
[P(r|[x 0 ,x 1 ,…,x t ]),P(e|[x 0 ,x 1 ,…,x t ])]=softmax(c t W re )
wherein P (r < x ] 0 ,x 1 ,…,x t ]) For the repetition regime probability, P (e| [ x) 0 ,x 1 ,…,x t ]) To explore the mechanism probabilities. W (W) re H×2 Is a weight matrix, H is c t Is a dimension of (c).
3.2 Using a repeated recommendation decoder (Repeat Recommendation Decoder, abbreviated as D) R ) And calculating the probability of clicking the repeated object by the user under the repetition mechanism. Its input is weight Gauss The output is a conditional probability distribution of the item clicked by the user. The specific calculation process is as follows:
wherein the method comprises the steps ofRepresentation [ x ] 0 ,x 1 ,…,x t ]All x of (3) i Because of the sum of Gaussian weight factors in session x 0 ,x 1 ,…,x t ]In the same article x i May occur multiple times.
3.3 Using a search recommendation decoder (Explore Recommendation Decoder, abbreviated as D) E ) And calculating the score of the article which is not clicked by the user. It comprises two parts, the first part is a bi-directional linear transformation layer, mapping the representation of the encoder onto the classification of the non-clicked item; the second part is the softmax function, which normalizes the probability of the classification result. The specific calculation process is as follows:
P(x i |,[x 0 ,x 1 ,…,x t ])=softmax(f xi )
where emb is the embedding matrix of the article. B is a bidirectional linear transformation matrix, and the size is |H|×|D|. Wherein H is c t D is the object embedding dimension.
The sum of the two analogies is the recommendation score of all the articles. To predict individual articles x i For example, the calculation process is as follows:
P(x i )=P(x i |r,[x 0 ,x 1 ,…,x t ])P(r|[x 0 ,x 1 ,…,x t ])+P(x i |e,[x 0 ,x 1 ,…,x t ])P(e|[x 0 ,x 1, …,x t ])
step 4, optimizing model parameters by using an optimizer, and performing multiple iteration experiments to enable the model to converge
The present embodiment employs a cross entropy loss function, adam optimizer. The loss function is as follows:
where m is the number of samples, y i,k Refers to the kth class of sample i, positive class 1 and negative class 0.P is p i,k Refers to the predicted probability of the kth class of sample i. The positive class refers to the next item clicked by the user in this embodiment, and the negative class refers to all other items.
In order to test the actual effect of the session recommendation model based on the convolutional self-attention network described above, a test is performed below based on the corresponding data set.
1. And downloading the LASTFM and the YOOCHOOSE data set, selecting a song listening record of the LASTFM and a purchase record of the YOOCHOOSE as the data set, and preprocessing. For the yoochose dataset, the present invention first deletes sequences less than 3 in length. Only 4% of the data after the statistics are processed have a session length greater than 10. Therefore, the invention deletes sessions with length greater than 10 and retains the remaining data. This dataset represents a session dataset of shorter length, abbreviated YOO in subsequent experiments. For the LASTFM data set, the present invention uses it to generate two longer length data sets. The specific method comprises the following steps: for the first data set, 2000 pieces of played music are randomly selected, and all records containing the music are screened out; setting a maximum session length L=20, and generating a session with the maximum session length not exceeding 20 by utilizing a sliding window with the size of L and the step length of L; finally, sessions between two items that are more than 2 hours apart are selected for discarding in this embodiment due to the lengthy time interval. This data set is abbreviated music_m20. This dataset represents a medium-length session. For the third data set, 20000 pieces of MUSIC are randomly selected in this embodiment, l=50 is set, and the third data set music_l50 is generated, which represents a long session, in the same manner as music_m20. For these three data sets, first, they are randomly divided into training and test sets, respectively, in this embodiment, with percentages of 55% and 45% in the entire data set, respectively. Then, 5% is randomly chosen from the training set as the validation set. Feeding inIn one step, the training set is data enhanced in this embodiment. Specifically: for sessions with length greater than 2 [ x ] 0 ,x 1 ,…,x t-1 ,x t ]The invention generates a plurality of sub-sessions [0, …, x ] by filling 0 0 ,x 1 ],[0,…x 0 ,x 1 ,x 2 ],……,[x 0 ,x 1 ,…,x t-1 ,x t ]. The present embodiment refers to these additionally generated subsets in the training set as sub-data set-training set (subssions-T, hereinafter referred to as english abbreviation). The present embodiment performs data enhancement only on the training set.
2. And setting an evaluation index. In order to comprehensively evaluate the actual effect of the invention, 6 evaluation indexes are set in the embodiment: MRR@5, HR@5, NDCG@5 and MRR@20, HR@20, NDCG@20.
3. The model is iterated a fixed number of times. Each iteration process is as follows: firstly, randomly acquiring a batch of session data from a training set, simultaneously sending the session data into an encoder and a decoder for prediction output, calculating a loss value according to the prediction output and a real label of a session, and carrying out back propagation to update parameters of a model. And observing the model performance obtained by training by using 6 evaluation indexes on the verification set, and selecting the model parameters with the optimal performance on the verification set as optimal parameters. And the model test set result obtained under the parameters is used as the final performance of the model.
This embodiment compares the impact of the presence or absence of convolution operations in the encoder on the performance of the model, where NoConv represents no convolution operations in the model and WithConv represents convolution operations in the model. To demonstrate only the effect of convolution operation on the encoder, no gaussian weights are in the model, a bi-directional linear transform decoder is used. As shown in table 2:
TABLE 2 comparative experiment results with or without convolution operations
From the experimental results of table 2, it can be concluded that: on the YOO dataset, the WithConv and NoConv performance were almost identical, and the difference was about 0.05% in the evaluation index. On the music_m20 dataset, the accuracy of WithConv is improved by about 1% over NoConv. On the music_l50 dataset, the accuracy of WithConv is improved by about 1.5%. The method has the advantages that the convolution self-attention is used, the object is modeled by utilizing the local correlation, and the object representation comprises the characteristics of surrounding sequence fragments, so that the accuracy of the model can be effectively improved.
The performance of different decoders is also compared in this embodiment, and the model using a fully connected decoder is shown as Full, and the model using a bi-directional linear transform decoder is shown as BiLinear. To demonstrate only the improvement of the accuracy of the two-way linear transform decoder over the recommended system, only the convolution operation was used in the encoder without gaussian weights, the present invention compares the results of the two decoders in the evaluation index and training period, as shown in tables 3 and 4.
Table 3 comparative experiments for different decoders
Table 4 training time for different decoders
Through comparative analysis, the following conclusions can be drawn:
BiLinear performs optimally on three data sets. On the YOO dataset, biLinear was 0.3% -0.6% higher than Full on 6 evaluation indicators. On music_m20 and music_m50, biLinear is about 0.2% and 0.15% higher than Full on the evaluation index, respectively.
BiLinear training time is significantly shorter than Full. Parameter matrix size in fully-connected layer decoder depends on object spaceIs of poor robustness. While the bi-directional transform matrix size of the BiLinear decoder remains unchanged. Obviously, using a bi-directional linear transform decoderThe model precision is higher, and model parameter is fewer, and the robustness is better.
This example also compares the effect of gaussian offset on model performance, as shown in table 5. The comparative experimental model uses a convolution operation with a repetition-exploration decoder. The model not using gaussian offset is denoted nogaus, and the model applying gaussian offset weight factors only in the repeated recommended decoder is denoted only dec.
TABLE 5 influence of Gaussian offset weighting factors on model Performance experiments
From the experimental results of table 5, the following can be concluded: the use of gaussian offset weight factors in the repeated recommendation decoder can effectively improve the performance of the model.
The above embodiment is only a preferred embodiment of the present invention, but it is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.

Claims (4)

1. A session recommendation method based on a convolution self-attention network is characterized by comprising the following steps:
s1: giving a conversation as input, and acquiring a low-dimensional vector of each article in the conversation, wherein the low-dimensional vector is formed by adding article embedding and position embedding of the article in the conversation;
s2: modeling sequence information of a session by using a sequence encoder based on a convolution self-attention network on the basis of the low-dimensional vector obtained in the step S1, modeling key intention information of the session by using an intention encoder based on the convolution self-attention network and a Gaussian attention mechanism, and calculating Gaussian weights;
the method for modeling the sequence information of the session by the sequence encoder based on the convolution self-attention network comprises the following steps:
s211: capturing the characteristics of the conversation fragments around each article in the conversation by using convolution operation, and interacting with the characteristics when modeling the article representation to obtain the article representation sensitive to the conversation fragments;
s212: based on the item representation obtained in S211, capturing interdependencies between different items in the session using the self-attention network, modeling sequence information of the session; the self-attention network in S212 is a masked multi-head self-attention network;
the specific method for modeling key intention information of a conversation by an intention encoder based on a convolution self-attention network and a Gaussian attention mechanism and calculating Gaussian weights comprises the following steps:
s221: capturing interdependencies between different items using a convolutional self-attention network based on the item representation obtained in S211;
s222: calculating the weight of each item in the session by using an attention mechanism based on the item representations obtained in S221, wherein the weighted sum of different item representations is the key intention information of the session; then taking the last article in the session as the expected center of Gaussian distribution, and calculating Gaussian weights of the articles subjected to Gaussian shift;
s3: splicing the sequence information and the key intention information obtained in the S2 to obtain a session hidden layer representation, and inputting the session hidden layer representation into a repetition-exploration selector to predict the probability of the user to select repeated or unrepeated articles next; then, the conditional probability of each repeated object is calculated in the repeated recommendation decoder, the conditional probability of each non-repeated object is calculated in the exploration recommendation decoder, and the edge probabilities output by the two decoders are added to obtain the prediction probability of the model on all possible objects.
2. The convolutional self-attention network-based session recommendation method of claim 1, wherein the specific method for predicting the probability of a user next selecting a duplicate or non-duplicate item in the duplicate-explore selector is:
and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain repeated recommendation probability and exploration recommendation probability, wherein the repeated recommendation probability and the exploration recommendation probability are used for judging whether clicked articles or non-clicked articles are recommended to a user.
3. The session recommendation method based on a convolution self-attention network according to claim 1, wherein the specific method for calculating the conditional probability of each duplicate item in the duplicate recommendation decoder is as follows:
and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated object next by the user.
4. The session recommendation method based on a convolutional self-attention network as recited in claim 1, wherein the specific method for calculating the conditional probability of each nonrepeating item in the exploration recommendation decoder is as follows:
and splicing the sequence information and the key intention information to obtain a session hidden layer representation, mapping the session hidden layer representation to the classification of the non-clicked items through a bidirectional linear transformation matrix and an item embedding matrix, and finally normalizing the non-clicked items through a softmax layer to obtain the conditional probability of the user for clicking the non-repeated items next time.
CN202010969069.0A 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network Active CN112258262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010969069.0A CN112258262B (en) 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010969069.0A CN112258262B (en) 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network

Publications (2)

Publication Number Publication Date
CN112258262A CN112258262A (en) 2021-01-22
CN112258262B true CN112258262B (en) 2023-09-26

Family

ID=74231420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010969069.0A Active CN112258262B (en) 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network

Country Status (1)

Country Link
CN (1) CN112258262B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255780B (en) * 2021-05-28 2024-05-03 润联智能科技股份有限公司 Reduction gearbox fault prediction method and device, computer equipment and storage medium
CN113343097B (en) * 2021-06-24 2023-01-13 中山大学 Sequence recommendation method and system based on fragment and self-attention mechanism
CN113821724B (en) * 2021-09-23 2023-10-20 湖南大学 Time interval enhancement-based graph neural network recommendation method
CN113961816B (en) * 2021-11-26 2022-07-01 重庆理工大学 Graph convolution neural network session recommendation method based on structure enhancement
CN114791983B (en) * 2022-04-13 2023-04-07 湖北工业大学 Sequence recommendation method based on time sequence article similarity
CN115906863B (en) * 2022-10-25 2023-09-12 华南师范大学 Emotion analysis method, device, equipment and storage medium based on contrast learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119467A (en) * 2019-05-14 2019-08-13 苏州大学 A kind of dialogue-based item recommendation method, device, equipment and storage medium
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning
CN110619082A (en) * 2019-09-20 2019-12-27 苏州市职业大学 Project recommendation method based on repeated search mechanism
CN111080400A (en) * 2019-11-25 2020-04-28 中山大学 Commodity recommendation method and system based on gate control graph convolution network and storage medium
CN111127165A (en) * 2019-12-26 2020-05-08 纪信智达(广州)信息技术有限公司 Sequence recommendation method based on self-attention self-encoder
CN111241425A (en) * 2019-10-17 2020-06-05 陕西师范大学 POI recommendation method based on hierarchical attention mechanism
CN111259243A (en) * 2020-01-14 2020-06-09 中山大学 Parallel recommendation method and system based on session
CN111429234A (en) * 2020-04-16 2020-07-17 电子科技大学中山学院 Deep learning-based commodity sequence recommendation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11521326B2 (en) * 2018-05-23 2022-12-06 Prove Labs, Inc. Systems and methods for monitoring and evaluating body movement

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119467A (en) * 2019-05-14 2019-08-13 苏州大学 A kind of dialogue-based item recommendation method, device, equipment and storage medium
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning
CN110619082A (en) * 2019-09-20 2019-12-27 苏州市职业大学 Project recommendation method based on repeated search mechanism
CN111241425A (en) * 2019-10-17 2020-06-05 陕西师范大学 POI recommendation method based on hierarchical attention mechanism
CN111080400A (en) * 2019-11-25 2020-04-28 中山大学 Commodity recommendation method and system based on gate control graph convolution network and storage medium
CN111127165A (en) * 2019-12-26 2020-05-08 纪信智达(广州)信息技术有限公司 Sequence recommendation method based on self-attention self-encoder
CN111259243A (en) * 2020-01-14 2020-06-09 中山大学 Parallel recommendation method and system based on session
CN111429234A (en) * 2020-04-16 2020-07-17 电子科技大学中山学院 Deep learning-based commodity sequence recommendation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Dual Sequential Prediction Models Linking Sequential Recommendation and Information Dissemination;Qitian Wu等;Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining;全文 *
Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks;Ruihong Qiu等;Proceedings of the 28th ACM International Conference on Information and Knowledge Management;全文 *
基于双层注意力机制的联合深度推荐模型;刘慧婷;纪强;刘慧敏;赵鹏;;华南理工大学学报(自然科学版)(第06期);全文 *

Also Published As

Publication number Publication date
CN112258262A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN112258262B (en) Session recommendation method based on convolution self-attention network
Wu et al. Session-based recommendation with graph neural networks
Yoon et al. Data valuation using reinforcement learning
CN110196946B (en) Personalized recommendation method based on deep learning
Wang et al. Session-based recommendation with hypergraph attention networks
CN112381581B (en) Advertisement click rate estimation method based on improved Transformer
CN108647251B (en) Recommendation sorting method based on wide-depth gate cycle combination model
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN111209386B (en) Personalized text recommendation method based on deep learning
CN111199343A (en) Multi-model fusion tobacco market supervision abnormal data mining method
CN111753209B (en) Sequence recommendation list generation method based on improved time sequence convolution network
CN114493755B (en) Self-attention sequence recommendation method fusing time sequence information
Pan et al. A variational point process model for social event sequences
CN110781401A (en) Top-n project recommendation method based on collaborative autoregressive flow
CN112749274A (en) Chinese text classification method based on attention mechanism and interference word deletion
Choe et al. Recommendation system with hierarchical recurrent neural network for long-term time series
Yakhchi et al. Towards a deep attention-based sequential recommender system
Goldblum et al. The no free lunch theorem, kolmogorov complexity, and the role of inductive biases in machine learning
Cheng et al. Long-term effect estimation with surrogate representation
CN113609388A (en) Sequence recommendation method based on counterfactual user behavior sequence generation
CN115048855A (en) Click rate prediction model, training method and application device thereof
Lim et al. Regular time-series generation using SGM
CN117196763A (en) Commodity sequence recommending method based on time sequence perception self-attention and contrast learning
CN111079011A (en) Deep learning-based information recommendation method
CN116680456A (en) User preference prediction method based on graph neural network session recommendation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant