CN112258262B

CN112258262B - Session recommendation method based on convolution self-attention network

Info

Publication number: CN112258262B
Application number: CN202010969069.0A
Authority: CN
Inventors: 张寅�; 汪千缘
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2023-09-26
Anticipated expiration: 2040-09-15
Also published as: CN112258262A

Abstract

The invention discloses a session recommendation method based on a convolution self-attention network. The invention comprises the following steps: 1) Firstly, each article in a session is expressed as a low-dimensional vector, and the low-dimensional vector is formed by article embedding and position embedding; 2) Performing sequence modeling and intention modeling on the low-dimensional vector, capturing sequence information of a session by the sequence modeling, and capturing key intention information of the session by the intention modeling; 3) Based on the obtained splice sequence information and the key intention information, the user is selectively predicted to click on the repeated object or the non-repeated object next time. Compared with the prior art, firstly, the invention can capture the interdependence among different fragments in the session to obtain the sensitive article representation of the session fragment. Then, the present invention uses a bi-directional linear decoder to reduce the parameters of the model and improve the performance and robustness of the model. Finally, the present invention uses gaussian offset to improve the attention layer, and computes gaussian weighting factors to improve the performance of the repeated recommendation decoder.

Description

Session recommendation method based on convolution self-attention network

Technical Field

The invention relates to application of a neural network method in a session recommendation technology, in particular to a technical method for capturing local segment characteristics of a session by adopting convolution operation and enriching weight factor information by adopting Gaussian offset.

Background

In the context of the big data age, "information overload" is a common problem. How to obtain valuable information from complex data is a key problem for the development of big data technology. The recommendation system (Recommender Systems, RS for short) is an effective way to solve information overload. The recommendation system is a technical system for modeling consumers and interaction information thereof by utilizing historical interaction information of the consumers and websites, mining interests and hobbies of the consumers, filtering and evaluating mass selections, and finally performing personalized recommendation for the consumers.

Conventional personalized recommendation systems often need to grasp user information to make a featured recommendation. Many e-commerce recommendation systems (especially small retailer systems) and most news and media websites typically do not track the identity of users who access their websites for long periods of time. While browser caches may provide certain information to assist websites in user identification and portrayal, these techniques tend to be unreliable and may involve privacy concerns. The session predicts the user's next behavior based on a sequence of anonymous behaviors over a period of time (e.g., click, purchase, collection, shopping cart, etc.). Such anonymous behavior sequences are referred to herein as "conversations". Behavior within a session, the present invention is referred to as "item".

In recent years, deep learning techniques such as a recurrent neural network and a self-attention network are successfully applied to a session recommendation algorithm. Compared with a cyclic neural network (RNN), a self-care network (SAN) has obvious advantages in modeling long-term dependence and avoiding information forgetting, but the existing model still has three problems:

1) Local dependencies are ignored. Local correlation refers to interdependence between different sequence segments within a session. A sequence fragment is a more abstract unit of feature than a single item. The method has the advantages that local correlation is captured when the object is modeled, better object representation can be obtained, and the prediction accuracy is improved.

2) The conventional full-connection decoder has huge parameter quantity, long training time and poor model robustness.

3) The influence of the clicked sequence of the items in the session on the repeated recommendation result is ignored. In the repetitive consumption phenomenon, the item clicked by the user at the next time is more likely to be the most recently clicked item.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a session recommendation method based on a convolution self-attention network. The invention captures the local correlation in the session by using the encoder based on the convolution self-attention network to obtain the sensitive object representation of the session fragment, thereby improving the performance of the modeling session. The invention reduces the parameter quantity of the model by using the bidirectional linear decoder, and improves the performance and the robustness of the model. The invention improves the performance of the repeated recommendation decoder by utilizing the distance relation between each item and the last item in the Gaussian weight modeling session.

The technical scheme adopted by the invention is as follows:

a session recommendation method based on a convolution self-attention network comprises the following steps:

s1: giving a conversation as input, and acquiring a low-dimensional vector of each article in the conversation, wherein the low-dimensional vector is formed by adding article embedding and position embedding of the article in the conversation;

s2: modeling sequence information of a session by using a sequence encoder based on a convolution self-attention network on the basis of the low-dimensional vector obtained in the step S1, modeling key intention information of the session by using an intention encoder based on the convolution self-attention network and a Gaussian attention mechanism, and calculating Gaussian weights;

s3: splicing the sequence information and the key intention information obtained in the S2 to obtain a session hidden layer representation, and inputting the session hidden layer representation into a repetition-exploration selector to predict the probability of the user to select repeated or unrepeated articles next; then, the conditional probability of each repeated object is calculated in the repeated recommendation decoder, the conditional probability of each non-repeated object is calculated in the exploration recommendation decoder, and the edge probabilities output by the two decoders are added to obtain the prediction probability of the model on all possible objects.

Preferably, the method for modeling the sequence information of the session by the sequence encoder based on the convolution self-attention network comprises the following steps:

s211: capturing the characteristics of the conversation fragments around each article in the conversation by using convolution operation, and interacting with the characteristics when modeling the article representation to obtain the article representation sensitive to the conversation fragments;

s212: based on the item representation obtained in S211, sequence information of the session is modeled using the interdependencies between different items in the session captured by the self-attention network.

Further, the self-attention network in S212 is a masking multi-head self-attention network.

Further, the specific method for modeling key intention information of a session and calculating Gaussian weights by the intention encoder based on a convolution self-attention network and a Gaussian attention mechanism is as follows:

s221: capturing interdependencies between different items using a convolutional self-attention network based on the item representation obtained in S211;

s222: calculating the weight of each item in the session by using an attention mechanism based on the item representations obtained in S221, wherein the weighted sum of different item representations is the key intention information of the session; the gaussian weights for each item that is gaussian shifted are then calculated with the last item in the session being the desired center of the gaussian distribution.

Further, the specific method for predicting the probability of the user selecting the repeated or non-repeated articles in the repeated-explored selector is as follows:

and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain repeated recommendation probability and exploration recommendation probability, wherein the repeated recommendation probability and the exploration recommendation probability are used for judging whether clicked articles or non-clicked articles are recommended to a user.

Further, the specific method for calculating the conditional probability of each repeated object in the repeated recommendation decoder is as follows:

and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated object next by the user.

Further, the specific method for calculating the conditional probability of each non-duplicate item in the exploration recommendation decoder is as follows:

and splicing the sequence information and the key intention information to obtain a session hidden layer representation, mapping the session hidden layer representation to the classification of the non-clicked items through a bidirectional linear transformation matrix and an item embedding matrix, and finally normalizing the non-clicked items through a softmax layer to obtain the conditional probability of the user for clicking the non-repeated items next time.

Compared with the prior art, firstly, the invention can capture the interdependence among different fragments in the session to obtain the sensitive article representation of the session fragment. Then, the present invention uses a bi-directional linear decoder to reduce the parameters of the model and improve the performance and robustness of the model. Finally, the invention uses Gaussian shift to improve the attention layer, calculates Gaussian weight factors to contain the distance relation between each article and the last article in the conversation in the position sequence, so as to improve the performance of the repeated recommendation decoder.

Drawings

FIG. 1 is a flow chart of a session recommendation method based on a convolutional self-attention network;

fig. 2 is an overall frame of the present invention.

Fig. 3 is a block diagram of a multi-headed convolution self-attention network.

Detailed Description

The invention is further illustrated and described below with reference to the drawings and detailed description.

As shown in fig. 1, the present invention provides a session recommendation method based on a convolutional self-attention network, which comprises the following steps:

s1: given a session as input, a low-dimensional vector is obtained for each item within the session, the low-dimensional vector being formed by adding the item embedding and the item's location embedding in the session.

S2: based on the low-dimensional vector obtained in S1, modeling the sequence information of the conversation using a sequence encoder based on a convolution self-attention network, modeling the key intention information of the conversation using an intention encoder based on a convolution self-attention network and a gaussian attention mechanism, and calculating a gaussian weight.

The method for modeling the sequence information of the session by the sequence encoder based on the convolution self-attention network comprises the following steps:

s211: the convolution operation is used for capturing the characteristics of the session fragments around each article in the session, and the characteristics are used for interaction when the article representation is modeled, so that the article representation sensitive to the session fragments is obtained.

S212: based on the item representation obtained in S211, sequence information of the session is modeled using the interdependencies between different items in the session captured by the self-attention network. Wherein the self-attention network is a masking, preferably a multi-headed self-attention network.

The specific method for modeling key intention information of a conversation by an intention encoder based on a convolution self-attention network and a Gaussian attention mechanism and calculating Gaussian weights is as follows:

The specific method for predicting the probability of the user selecting the repeated or non-repeated articles next in the repeated-explored selector is as follows:

The specific method for calculating the conditional probability of each repeated object in the repeated recommendation decoder is as follows:

The specific method for calculating the conditional probability of each non-repeated object in the exploration recommendation decoder is as follows:

The above method is applied to the specific embodiments in order to further illustrate the specific implementation of the present invention.

Examples

The overall framework of the method in this embodiment is shown in fig. 2. In order to facilitate the understanding of the following text and the unification of writing, this section gives a formulated description of some terms referred to below. The relevant mathematical symbols and their meanings are shown in table 1.

TABLE 1 Session recommendation related math symbols and meanings

The session recommendation method based on the convolution self-attention network specifically comprises the following steps:

step 1, obtaining a vector representation of each article

1.1 For a given input session, the input sequence of items [ x ] is entered using an item embedding matrix emb ₀ ,x ₁ ,…,x _t-1 ,x _t ]The index is mapped into a real value vector sequence of a low-dimensional space, and the embedded representation of the object is obtained.

1.2 To supplement the positional sequence information of the items within the session, a position code is additionally added. Using trigonometric function based position coding, the calculation formula is as follows:

where pos is the location of the item in the session. I in both formulas refers to the dimension d of the item embedding _model Is the dimension of the position code, and 2i and 2i+1 in the formula are the parity to distinguish the dimensions.

1.3 Adding the item-embedded representation to the position code, which is the final item-vector representation [ x ] ₀ ,x ₁ ,…,x _t-1 ,x _t ]。

Step 2, modeling sequence information, key intention information and calculating Gaussian weights

2.1 With the result [ x ] of step 1 ₀ ,x ₁ ,…,x _t-1 ,x _t ]For input, sequence information was modeled using a sequence encoder based on a convolutional self-attention network (Convolutional Self-Attention Networks, convSAN for short). Capturing local characteristics of each article to obtain a session fragment sensitive article representation, and outputting a hidden layer representationThe last hidden layer represents the sequence information comprising the session, denoted +.>The calculation process performed throughout the network can be expressed as follows:

ConvSAN contains two sublayers: a multi-headed convolution self-attention network layer and a feedforward neural network layer. The input and output of each sub-layer is subjected to residual connection and layer regularization. Residual connection helps return gradients, and layer regularization can accelerate model convergence. The calculation formula is as follows:

SubLayerOutput＝LayerNorm(x+SubLayer(x))

a multi-headed convolutional self-attention network is first introduced, the overall framework of which is shown in fig. 3. Q, K the V vectors are Query (Q) vectors, key (K) vectors and Value (V) vectors, respectively, in the network. To capture the features of the segments around each item, both Q and K are convolved using a convolution kernel of size K. When modeling the object i, the sequence segment characteristics with the length k around the object are extracted through convolution (the surrounding refers to the object i to the left, so that future information leakage is prevented), and any two objects interact through the characteristics. The convolution results in an article i Q, K with a V vector ofV _i . The specific calculation process comprises the following steps:

V _i ＝x _i

wherein W is ^Q ，B ^Q Representing the weight matrix and bias for convolving Q. Similarly, W ^K ，B ^K Is a weight matrix and bias for convolving K (W and B appearing in the following expression are trainable parameter matrices and will not be described in detail). The self-attention operation is performed next:

wherein the method comprises the steps ofIs a scaling factor, avoid Q ^conv (K ^conv ) ^T The product is too large into the saturation domain of the softmax function. d, d _k Is K ^conv Is a dimension of (c).

Q is achieved by utilizing a multi-head mechanism ^conv 、K ^conv V is mapped into a plurality of subspaces with the same dimension, and attention calculation results of different subspaces are spliced. This helps the network capture more information.

MultiHead(Q ^conv ,K ^conv ,V)＝concat(h ₀ ,h ₁ ,ch _t )

Wherein the attention in the ith subspace is calculated as follows:

h _i ＝Attention(Q ^conv W _i ^Q ,K ^conv W _i ^K ,VW _i ^V )

the feed-forward neural network layer will be described again. The output of the convolution self-attention network layer is the input of the feedforward neural network layer after layer regularization and residual connection. This layer implements two linear transformations and one Relu activation:

FFN(x)＝(Relu(xW ₁ +b ₁ ))W ₂ +b ₂

it should be noted that for the session recommendation scenario, modeling x _t At the time of (1), only x is known ₀ ,x ₁ ,…,x _t Without knowing x _t+1 . Therefore, in order to prevent future information leakage, the present invention additionally adds a mask (mask) to the self-attention mechanism, masking x _t The latter information becomes Masked Multi-head Attention (Masked Multi-head Attention).

2.2 With the result [ x ] of step 1 ₀ ,x ₁ ,…,x _t-1 ,x _t ]For input, key intent information for the intent encoder modeling session is used and gaussian weights are calculated.

The intention encoder includes two parts: a single conv san layer and a gaussian attention layer (Gaussian Attention, gaussAtten for short). The calculations are performed in the intention encoder:

session is embedded in ConvSAN layer and output hidden layer representationInput into Gaussatten layer, output hidden layer representation of user key intention +.>And Gaussian weight factor weight _Gauss . The calculation process performed in the network is as follows:

the invention first introduces alpha _tj ,α _tj Is a weight factor, the greater the weight factor, the more likely the session isMiddle->The greater the specific gravity. />Is a gaussian weight factor. The calculation modes are as follows:

q is the calculationAnd->And a function of the degree of similarity between the two. The similarity function calculation process is as follows:

sigma is an activation function, which may be a sigmoid function or a softmax function, and the model of this embodiment selects the sigmoid function for use. A is that ₁ Is a linear transformation matrix to beMapping into hidden layer space. Same theory A ₂ And v.

Then introduceRatio alpha _tj One more G _tj . The invention first introduces the matrix G. />Is a position alignment matrix based on gaussian distribution, I is the session length. G _tj Is one of the matrices, and measures the compactness between the article j and the article t at the central position, and the calculation process is as follows:

wherein sigma _t Is the standard deviation, generally set as Gaussian window D _t Half of (2); j is the position of item j in the session, P _t A predicted center position of the article t corresponds to the expected position; g _tj <0. Predicting the center position P _t And Gaussian window D _t All are learned:

obviously P _t And D _t Is limited in (0,I). P is p _t And z _t For scalar, the calculation process is as follows:

wherein the method comprises the steps ofH is->Is a dimension of (c). />And->For a linear mapping matrix, the output is mapped as a scalar. They share the same W _p This is because there may be some correlation between the expectations and variances of the gaussian distribution when applied to the conversation, with the weights of the other items being gaussian shifted with the last item as the center position.

Output of sequence encoderInput to intent encoderGo out->Splicing to obtain the final hidden layer representation of the session: />c _t Into subsequent decoders, weight _Gauss Into a repeated recommendation decoder.

Step 3, predicting the probability of clicking each item by the user next by using a repetition-exploration decoder to make a recommendation

3.1 Using a Repeat-Explore Selector (RES) to calculate the probability that the user clicks on a duplicate item or a non-duplicate item next. RES corresponds to a classifier for determining whether a clicked item (repetition mechanism) or an uncracked item (exploration mechanism) is recommended to the user. It comprises two parts: the first part is a linear transformation layer, mapping the hidden layer representation of the session into scores of two mechanisms; the second part is the softmax layer, calculating normalized probabilities. The specific calculation process is as follows:

[P(r|[x ₀ ,x ₁ ,…,x _t ]),P(e|[x ₀ ,x ₁ ,…,x _t ])]＝softmax(c _t W _re )

wherein P (r < x ] ₀ ,x ₁ ,…,x _t ]) For the repetition regime probability, P (e| [ x) ₀ ,x ₁ ,…,x _t ]) To explore the mechanism probabilities. W (W) _re ^H×2 Is a weight matrix, H is c _t Is a dimension of (c).

3.2 Using a repeated recommendation decoder (Repeat Recommendation Decoder, abbreviated as D) _R ) And calculating the probability of clicking the repeated object by the user under the repetition mechanism. Its input is weight _Gauss The output is a conditional probability distribution of the item clicked by the user. The specific calculation process is as follows:

wherein the method comprises the steps ofRepresentation [ x ] ₀ ，x ₁ ，…，x _t ]All x of (3) _i Because of the sum of Gaussian weight factors in session x ₀ ，x ₁ ，…，x _t ]In the same article x _i May occur multiple times.

3.3 Using a search recommendation decoder (Explore Recommendation Decoder, abbreviated as D) _E ) And calculating the score of the article which is not clicked by the user. It comprises two parts, the first part is a bi-directional linear transformation layer, mapping the representation of the encoder onto the classification of the non-clicked item; the second part is the softmax function, which normalizes the probability of the classification result. The specific calculation process is as follows:

P(x _i |，[x ₀ ，x ₁ ，…，x _t ])＝softmax(f _xi )

where emb is the embedding matrix of the article. B is a bidirectional linear transformation matrix, and the size is |H|×|D|. Wherein H is c _t D is the object embedding dimension.

The sum of the two analogies is the recommendation score of all the articles. To predict individual articles x _i For example, the calculation process is as follows:

P(x _i )＝P(x _i |r，[x ₀ ，x ₁ ，…，x _t ])P(r|[x ₀ ，x ₁ ，…，x _t ])+P(x _i |e，[x ₀ ，x ₁ ，…，x _t ])P(e|[x ₀ ，x _1， …，x _t ])

step 4, optimizing model parameters by using an optimizer, and performing multiple iteration experiments to enable the model to converge

The present embodiment employs a cross entropy loss function, adam optimizer. The loss function is as follows:

where m is the number of samples, y _i，k Refers to the kth class of sample i, positive class 1 and negative class 0.P is p _i，k Refers to the predicted probability of the kth class of sample i. The positive class refers to the next item clicked by the user in this embodiment, and the negative class refers to all other items.

In order to test the actual effect of the session recommendation model based on the convolutional self-attention network described above, a test is performed below based on the corresponding data set.

1. And downloading the LASTFM and the YOOCHOOSE data set, selecting a song listening record of the LASTFM and a purchase record of the YOOCHOOSE as the data set, and preprocessing. For the yoochose dataset, the present invention first deletes sequences less than 3 in length. Only 4% of the data after the statistics are processed have a session length greater than 10. Therefore, the invention deletes sessions with length greater than 10 and retains the remaining data. This dataset represents a session dataset of shorter length, abbreviated YOO in subsequent experiments. For the LASTFM data set, the present invention uses it to generate two longer length data sets. The specific method comprises the following steps: for the first data set, 2000 pieces of played music are randomly selected, and all records containing the music are screened out; setting a maximum session length L=20, and generating a session with the maximum session length not exceeding 20 by utilizing a sliding window with the size of L and the step length of L; finally, sessions between two items that are more than 2 hours apart are selected for discarding in this embodiment due to the lengthy time interval. This data set is abbreviated music_m20. This dataset represents a medium-length session. For the third data set, 20000 pieces of MUSIC are randomly selected in this embodiment, l=50 is set, and the third data set music_l50 is generated, which represents a long session, in the same manner as music_m20. For these three data sets, first, they are randomly divided into training and test sets, respectively, in this embodiment, with percentages of 55% and 45% in the entire data set, respectively. Then, 5% is randomly chosen from the training set as the validation set. Feeding inIn one step, the training set is data enhanced in this embodiment. Specifically: for sessions with length greater than 2 [ x ] ₀ ,x ₁ ,…,x _t-1 ,x _t ]The invention generates a plurality of sub-sessions [0, …, x ] by filling 0 ₀ ,x ₁ ]，[0,…x ₀ ,x ₁ ,x ₂ ]，……，[x ₀ ,x ₁ ,…,x _t-1 ,x _t ]. The present embodiment refers to these additionally generated subsets in the training set as sub-data set-training set (subssions-T, hereinafter referred to as english abbreviation). The present embodiment performs data enhancement only on the training set.

2. And setting an evaluation index. In order to comprehensively evaluate the actual effect of the invention, 6 evaluation indexes are set in the embodiment: MRR@5, HR@5, NDCG@5 and MRR@20, HR@20, NDCG@20.

3. The model is iterated a fixed number of times. Each iteration process is as follows: firstly, randomly acquiring a batch of session data from a training set, simultaneously sending the session data into an encoder and a decoder for prediction output, calculating a loss value according to the prediction output and a real label of a session, and carrying out back propagation to update parameters of a model. And observing the model performance obtained by training by using 6 evaluation indexes on the verification set, and selecting the model parameters with the optimal performance on the verification set as optimal parameters. And the model test set result obtained under the parameters is used as the final performance of the model.

This embodiment compares the impact of the presence or absence of convolution operations in the encoder on the performance of the model, where NoConv represents no convolution operations in the model and WithConv represents convolution operations in the model. To demonstrate only the effect of convolution operation on the encoder, no gaussian weights are in the model, a bi-directional linear transform decoder is used. As shown in table 2:

TABLE 2 comparative experiment results with or without convolution operations

From the experimental results of table 2, it can be concluded that: on the YOO dataset, the WithConv and NoConv performance were almost identical, and the difference was about 0.05% in the evaluation index. On the music_m20 dataset, the accuracy of WithConv is improved by about 1% over NoConv. On the music_l50 dataset, the accuracy of WithConv is improved by about 1.5%. The method has the advantages that the convolution self-attention is used, the object is modeled by utilizing the local correlation, and the object representation comprises the characteristics of surrounding sequence fragments, so that the accuracy of the model can be effectively improved.

The performance of different decoders is also compared in this embodiment, and the model using a fully connected decoder is shown as Full, and the model using a bi-directional linear transform decoder is shown as BiLinear. To demonstrate only the improvement of the accuracy of the two-way linear transform decoder over the recommended system, only the convolution operation was used in the encoder without gaussian weights, the present invention compares the results of the two decoders in the evaluation index and training period, as shown in tables 3 and 4.

Table 3 comparative experiments for different decoders

Table 4 training time for different decoders

Through comparative analysis, the following conclusions can be drawn:

BiLinear performs optimally on three data sets. On the YOO dataset, biLinear was 0.3% -0.6% higher than Full on 6 evaluation indicators. On music_m20 and music_m50, biLinear is about 0.2% and 0.15% higher than Full on the evaluation index, respectively.

BiLinear training time is significantly shorter than Full. Parameter matrix size in fully-connected layer decoder depends on object spaceIs of poor robustness. While the bi-directional transform matrix size of the BiLinear decoder remains unchanged. Obviously, using a bi-directional linear transform decoderThe model precision is higher, and model parameter is fewer, and the robustness is better.

This example also compares the effect of gaussian offset on model performance, as shown in table 5. The comparative experimental model uses a convolution operation with a repetition-exploration decoder. The model not using gaussian offset is denoted nogaus, and the model applying gaussian offset weight factors only in the repeated recommended decoder is denoted only dec.

TABLE 5 influence of Gaussian offset weighting factors on model Performance experiments

From the experimental results of table 5, the following can be concluded: the use of gaussian offset weight factors in the repeated recommendation decoder can effectively improve the performance of the model.

The above embodiment is only a preferred embodiment of the present invention, but it is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.

Claims

1. A session recommendation method based on a convolution self-attention network is characterized by comprising the following steps:

s212: based on the item representation obtained in S211, capturing interdependencies between different items in the session using the self-attention network, modeling sequence information of the session; the self-attention network in S212 is a masked multi-head self-attention network;

the specific method for modeling key intention information of a conversation by an intention encoder based on a convolution self-attention network and a Gaussian attention mechanism and calculating Gaussian weights comprises the following steps:

s222: calculating the weight of each item in the session by using an attention mechanism based on the item representations obtained in S221, wherein the weighted sum of different item representations is the key intention information of the session; then taking the last article in the session as the expected center of Gaussian distribution, and calculating Gaussian weights of the articles subjected to Gaussian shift;

2. The convolutional self-attention network-based session recommendation method of claim 1, wherein the specific method for predicting the probability of a user next selecting a duplicate or non-duplicate item in the duplicate-explore selector is:

3. The session recommendation method based on a convolution self-attention network according to claim 1, wherein the specific method for calculating the conditional probability of each duplicate item in the duplicate recommendation decoder is as follows:

4. The session recommendation method based on a convolutional self-attention network as recited in claim 1, wherein the specific method for calculating the conditional probability of each nonrepeating item in the exploration recommendation decoder is as follows: