CN112258262A

CN112258262A - Conversation recommendation method based on convolution self-attention network

Info

Publication number: CN112258262A
Application number: CN202010969069.0A
Authority: CN
Inventors: 张寅�; 汪千缘
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2021-01-22
Anticipated expiration: 2040-09-15
Also published as: CN112258262B

Abstract

The invention discloses a session recommendation method based on a convolution self-attention network. The invention comprises the following steps: 1) each article in the conversation is represented as a low-dimensional vector, and the low-dimensional vector is formed by adding article embedding and position embedding; 2) performing sequence modeling and intention modeling on the low-dimensional vector, wherein the sequence modeling captures sequence information of a conversation, and the intention modeling captures key intention information of the conversation; 3) and selectively predicting whether the user clicks a repeated article or a non-repeated article in the next step based on the obtained splicing sequence information and the key intention information. Compared with the prior art, the method and the system can capture the interdependency between different segments in the conversation to obtain the sensitive item representation of the conversation segment. Then, the invention uses a bidirectional linear decoder, reduces the parameter quantity of the model and improves the performance and robustness of the model. Finally, the present invention uses the gaussian offset to improve the attention layer and calculates the gaussian weight factor to improve the performance of the iterative recommendation decoder.

Description

Conversation recommendation method based on convolution self-attention network

Technical Field

The invention relates to application of a neural network method in a session recommendation technology, in particular to a technical method for capturing session local segment characteristics by adopting convolution operation and enriching weight factor information quantity by adopting Gaussian offset.

Background

"information overload" is a common problem in the context of the big data era. How to obtain valuable information from complicated data is a key problem in the development of big data technology. Recommendation Systems (RS for short) are effective methods for solving information overload. The recommendation system is a technical system which is used for modeling the consumers and the interactive information thereof by utilizing the historical interactive information of the consumers and the websites, mining the interests and hobbies of the consumers, further filtering and evaluating mass choices and finally performing personalized recommendation for the consumers.

Conventional personalized recommendation systems often need to master user information to perform personalized recommendation. Many e-commerce recommendation systems (especially small retailer systems) and most news and media websites do not typically track the identity of users who visit their websites for long periods of time. While browser caches may provide certain information to aid in the identification and portrayal of users by websites, these techniques are often not reliable enough and may involve privacy concerns. The session predicts the user's next step behavior based on a sequence of anonymous behaviors over a period of time (e.g., click, purchase, collection, shopping cart add, etc.). Such a sequence of anonymous behaviors is referred to herein as a "session". The behavior within a session is referred to herein as an "item".

In recent years, deep learning techniques such as recurrent neural networks and self-attention networks have been successfully applied to the session recommendation algorithm. Compared with a Recurrent Neural Network (RNN), a self-attention network (SAN) has obvious advantages in modeling long-term dependence and avoiding information forgetting, but the existing model still has three problems:

1) local dependencies are ignored. Local correlation refers to the interdependence between different sequence segments within a session. Sequence fragments are a more abstract unit of features than a single item. The local correlation is captured when the object is modeled, so that a better object representation can be obtained, and the accuracy of prediction is improved.

2) The conventional full-connection decoder has huge parameter quantity, long training time and poor model robustness.

3) The influence of the sequence of clicking the items in the conversation on the repeated recommendation result is ignored. In the repeat consumption phenomenon, the item clicked by the user at the next moment is more likely to be the most recently clicked item.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a session recommendation method based on a convolution self-attention network. According to the method, the local correlation in the conversation is captured by using the encoder based on the convolution self-attention network, the sensitive article representation of the conversation fragment is obtained, and the performance of the modeling conversation is improved. The invention utilizes the bidirectional linear decoder to reduce the parameter quantity of the model and improve the performance and robustness of the model. The distance relationship between each article and the last article in the conversation is modeled by utilizing the Gaussian weight, so that the performance of the repeated recommendation decoder is improved.

The technical scheme adopted by the invention is as follows:

a conversation recommendation method based on a convolution self-attention network comprises the following steps:

s1: giving a conversation as input, and obtaining a low-dimensional vector of each item in the conversation, wherein the low-dimensional vector is formed by adding item embedding and position embedding of the item in the conversation;

s2: on the basis of the low-dimensional vector obtained in S1, modeling the sequence information of the session by using a sequence encoder based on a convolution self-attention network, modeling the key intention information of the session by using an intention encoder based on the convolution self-attention network and a Gaussian attention mechanism, and calculating Gaussian weight;

s3: splicing the sequence information and the key intention information obtained in the S2 to obtain a conversation hidden layer representation, and inputting the conversation hidden layer representation into a repeat-exploration selector to predict the probability of selecting repeated or non-repeated articles in the next step; then, the conditional probability of each repeated article is calculated in a repeated recommended decoder, the conditional probability of each non-repeated article is calculated in an exploring recommended decoder, and the marginal probabilities output by the two decoders are added to obtain the prediction probability of the model to all possible articles.

Preferably, the method for modeling the sequence information of the session by the sequence encoder based on the convolutional self-attention network comprises the following steps:

s211: capturing conversation fragment characteristics around each item in the conversation by using convolution operation, and interacting with the characteristics when modeling the item representation to obtain item representations sensitive to conversation fragments;

s212: sequence information of the session is modeled based on the item representation obtained in S211 by capturing interdependencies between different items in the session using the self-attention network.

Further, the self-attention network in S212 is a mask multi-head self-attention network.

Further, the specific method for modeling the key intention information of the session and calculating the gaussian weight by the intention encoder based on the convolutional self-attention network and the gaussian attention mechanism includes:

s221: based on the item representation obtained in S211, capturing interdependencies between different items using a convolutional self-attention network;

s222: calculating the weight of each article in the conversation by using an attention mechanism based on the article representations obtained in the step S221, wherein the weighted sum of different article representations is the key intention information of the conversation; the gaussian weight of each item subject to the gaussian offset is then calculated with the last item in the session as the expected center of the gaussian distribution.

Further, the specific method for predicting the probability of selecting the repeated or non-repeated item next step by the user in the repeat-search selector is as follows:

and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain a repeated recommendation probability and an exploration recommendation probability for judging whether to recommend a clicked article or an unchecked article to a user.

Further, the specific method for calculating the conditional probability of each duplicate item in the duplicate recommendation decoder is as follows:

and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated article by the user in the next step.

Further, the specific method for calculating the conditional probability of each non-duplicate item in the search recommendation decoder is as follows:

and splicing the sequence information and the key intention information to obtain a conversation hidden layer representation, mapping the conversation hidden layer representation to the classification of the un-clicked articles through a bidirectional linear transformation matrix and an article embedding matrix, and finally performing normalization through a softmax layer to obtain the conditional probability of the user clicking the un-clicked articles in the next step.

Compared with the prior art, the method and the system can capture the interdependency between different segments in the conversation to obtain the sensitive item representation of the conversation segment. Then, the invention uses a bidirectional linear decoder, reduces the parameter quantity of the model and improves the performance and robustness of the model. Finally, the invention uses the Gaussian offset to improve the attention layer, and calculates the Gaussian weight factor to make the Gaussian weight factor contain the position order distance relationship between each article and the last article in the conversation, thereby improving the performance of the repeated recommendation decoder.

Drawings

FIG. 1 is a flow chart of a session recommendation method based on a convolutional self-attention network;

figure 2 is the overall framework of the invention.

FIG. 3 is a block diagram of a multi-headed convolutional self-attention network.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description.

As shown in fig. 1, the present invention provides a session recommendation method based on a convolutional self-attention network, which includes the following steps:

s1: given a conversation as an input, a low-dimensional vector for each item within the conversation is obtained, the low-dimensional vector being formed by adding the item embedding and the position embedding of the item in the conversation.

S2: on the basis of the low-dimensional vector obtained at S1, sequence information of the session is modeled using a sequence encoder based on the convolutional self-attention network, key intention information of the session is modeled using an intention encoder based on the convolutional self-attention network and the gaussian attention mechanism, and gaussian weights are calculated.

The method for modeling the sequence information of the session by the sequence encoder based on the convolutional self-attention network comprises the following steps:

s211: and capturing the conversation fragment characteristics around each item in the conversation by using convolution operation, and interacting with the characteristics when the item representation is modeled to obtain the item representation sensitive to the conversation fragment.

S212: sequence information of the session is modeled based on the item representation obtained in S211 by capturing interdependencies between different items in the session using the self-attention network. Wherein, the self-attention network is a mask, preferably a multi-head self-attention network.

The specific method for modeling the key intention information of the session and calculating the Gaussian weight by the intention encoder based on the convolution self-attention network and the Gaussian attention mechanism comprises the following steps:

The specific method for predicting the probability of selecting the repeated or non-repeated article in the next step of the user in the repeat-exploration selector comprises the following steps:

The specific method for calculating the conditional probability of each repeated article in the repeated recommendation decoder comprises the following steps:

The specific method for calculating the conditional probability of each non-repetitive item in the exploration recommendation decoder comprises the following steps:

In order to further illustrate the specific implementation of the present invention, the above method is applied to specific embodiments.

Examples

The overall framework of the method in this example is shown in figure 2. To facilitate the understanding of the following discussion, and to facilitate the uniformity of writing and understanding, this section sets forth a formulation for some of the terms referred to below. The relevant mathematical symbols and their meanings are shown in table 1.

TABLE 1 conversational recommendation related mathematical symbols and meanings

The invention relates to a session recommendation method based on a convolution self-attention network, which specifically comprises the following steps:

step 1. obtaining a vector representation of each item

1.1) for a given input session, input sequence of items [ x ] using item embedding matrix emb₀,x₁,…,x_t-1,x_t]And mapping the index into a real-valued vector sequence of a low-dimensional space to obtain an article embedded representation.

1.2) in order to supplement the position precedence information of the articles in the conversation, additionally adding a position code. Using trigonometric function based position coding, the calculation formula is as follows:

where pos is the location of the item in the conversation. I in the two equations refers to the dimension d of the item embedding_modelIs the dimension of the position code, and 2i +1 in the formula are parity to distinguish the dimension.

1.3) adding the article embedding representation and the position code to obtain the final article vector representation [ x₀,x₁,…,x_t-1,x_t]。

Step 2, modeling sequence information, key intention information and calculating Gaussian weight

2.1) with the result of step 1 [ x₀,x₁,…,x_t-1,x_t]For input, sequence encoder modeling sequence information based on Convolutional Self-Attention Networks (ConvSAN) is used. Capturing local features of each item, obtaining item representation sensitive to conversation fragments, and outputting hidden layer representation

The last hidden layer represents the sequence information including the session, and is marked as

The calculation process performed in the whole network can be expressed as the following expression:

ConvSAN contains two sublayers: the multi-head convolution is composed of an attention network layer and a feedforward neural network layer. The input and output of each sub-layer are subject to residual concatenation and layer regularization. Residual concatenation helps in passing back the gradient, and layer regularization can accelerate model convergence. The calculation formula is as follows:

SubLayerOutput＝LayerNorm(x+SubLayer(x))

a multi-headed convolutional self-attention network is introduced, the overall framework of which is shown in fig. 3. Q, K, the V vectors are Query (Q) vectors, Key (K) vectors and Value (V) vectors in the network, respectively. To capture the features of the fragments around each item, both Q and K were convolved using a convolution kernel of size K. When modeling item i in this way, a sequence segment feature of length k around the item is extracted by convolution (the "around" means that item i is left to prevent future information leakage), and any two items interact through the feature. Noting Q, K of the convolved article i, the V vector is

V_i. The specific calculation process is as follows:

V_i＝x_i

wherein W^Q，B^QRepresenting the weight matrix and the offset for the convolution operation on Q. In the same way, W^K，B^KAre the weight matrix and the bias for the convolution operation on K (W and B appearing in the following expression are both trainable parameter matrices and will not be described in detail). The self-attention calculation is performed next:

wherein

Is a scaling factor, avoids Q^conv(K^conv)^TThe product is too large into the saturation domain of the softmax function. d_kIs K^convOf (c) is calculated.

Q is converted by a multi-head mechanism^conv、K^convAnd mapping the V into a plurality of subspaces with the same dimension, and splicing attention calculation results of different subspaces. This helps the network to capture richer information.

MultiHead(Q^conv,K^conv,V)＝concat(h₀,h₁,ch_t)

Wherein the attention performed in the ith subspace is calculated as follows:

h_i＝Attention(Q^convW_i ^Q,K^convW_i ^K,VW_i ^V)

the feedforward neural network layer is introduced. The output of the convolution self-attention network layer is the input of the feedforward neural network layer after layer regular and residual connection. This layer implements two linear transformations and one Relu activation:

FFN(x)＝(Relu(xW₁+b₁))W₂+b₂

note that for the session recommendation scenario, x is modeled_tWhen only x is known₀,x₁,…,x_tWithout knowing x_t+1. Therefore, in order to prevent the future information leakage, the invention adds a mask (mask) on the self-attention mechanism, and masks x_tThe latter information becomes mask Multi-head Attention (Masked Multi-head Attention).

2.2) with the result of step 1 [ x ]₀,x₁,…,x_t-1,x_t]Modeling key intent information and meters for a session using an intent encoder for inputAnd calculating the Gaussian weight.

Two parts are included in the intent encoder: a single-layer ConvSAN layer and a Gaussian Attention layer (Gaussian Attention, abbreviated as GaussAtten). The calculation is performed in the intent encoder:

session embedding input into ConvSAN layer, output hidden layer representation

Inputting the data into GaussAtten layer, and outputting hidden layer representation of key intention of user

And the Gaussian weight factor weight_Gauss. The calculation process performed in the network is as follows:

the present invention first introduces alpha_tj,α_tjIs a weighting factor, with a greater weighting factor indicating a conversation

In

The greater the specific gravity occupied.

Is a Gaussian weightA factor. They are calculated as follows:

q is a calculation

And

as a function of the similarity between them. The similarity function is calculated as follows:

σ is an activation function, which may be a sigmoid function or a softmax function, and the sigmoid function is selected and used in the model of this embodiment. A. the₁Is a linear transformation matrix, will

Mapping into hidden layer space. Same principle A₂And v, as well.

Then, introduce

Than α_tjAdd an item G_tj. The invention first introduces the matrix G.

Is a position alignment matrix based on gaussian distribution, I is the session length. G_tjIs one of the matrix, and measures the closeness between the item j and the item t at the center position, and the calculation process is as follows:

wherein sigma_tIs the standard deviation and is typically set to a Gaussian window D_tHalf of (1); j is the position of item j in the conversation, P_tA predicted center position for item t, corresponding to expectation; g_tj<0. Predicted center position P_tAnd Gaussian window D_tAll are obtained by learning:

obviously, P_tAnd D_tIs limited to the range of (0, I). p is a radical of_tAnd z_tFor scalar quantities, the calculation process is as follows:

wherein

H is

Of (c) is calculated.

And

to map the matrix linearly, the output is mapped as a scalar. They share the same W_pThis is because, when applied to a session, there may be some correlation between the expectation and variance of the gaussian distribution when the weights of other items are gaussian offset centered around the last item.

Output of sequence encoder

With the output of the intent encoder

And splicing to obtain the final session hidden layer representation:

c_tfed to a subsequent decoder, weight_GaussAnd feeding the data into a repeated recommendation decoder.

Step 3, predicting the probability of each item clicked by the user in the next step by using a repeat-exploration decoder to make recommendation

3.1) calculating the probability of the user clicking the repeated item or not by using a Repeat-Explore Selector (RES). RES acts as a two-classifier for determining whether a clicked item (repeat mechanism) or an unchecked item (explore mechanism) is recommended to the user. It comprises two parts: the first part is a linear transformation layer, and the hidden layer representation of the conversation is mapped into scores of two mechanisms; the second part is the softmax layer, which computes the normalized probability. The specific calculation process is as follows:

[P(r|[x₀,x₁,…,x_t]),P(e|[x₀,x₁,…,x_t])]＝softmax(c_tW_re)

wherein P (r | [ x ]₀,x₁,…,x_t]) For the repetition mechanism probability, P (e | [ x ]₀,x₁,…,x_t]) To explore the mechanism probabilities. W_re ^H×2Is a weight matrix, H is c_tOf (c) is calculated.

3.2) using a Repeat Recommendation Decoder (D for short)_R) And calculating the probability of the user clicking the repeated item under the repeated mechanism. Its input is weight_GaussThe output is the conditional probability distribution of the item clicked by the user. The specific calculation process is as follows:

wherein

Is represented by [ x ]₀，x₁，…，x_t]All of x in_iThe sum of the gaussian weight factors because of the number of sessions x₀，x₁，…，x_t]In, the same article x_iMay occur multiple times.

3.3) Recommendation Decoder (explicit Recommendation Decoder, D for short) is used_E) And calculating the scores of the items which are not clicked by the user. The method comprises two parts, wherein the first part is a bidirectional linear transformation layer and maps the representation of an encoder to the classification of an unchecked article; the second part is a softmax function, and probability normalization is carried out on the classification result. The specific calculation process is as follows:

P(x_i|，[x₀，x₁，…，x_t])＝softmax(f_xi)

where emb is an embedded matrix of items. B is a bidirectional linear transformation matrix with the size of | H | × | D |. Wherein H is c_tD is the item embedding dimension.

The two recommendation scores are added to obtain the recommendation scores of all the items. To predict individual items x_iThe recommended score of (2) is an example, and the calculation process is as follows:

P(x_i)＝P(x_i|r，[x₀，x₁，…，x_t])P(r|[x₀，x₁，…，x_t])+P(x_i|e，[x₀，x₁，…，x_t])P(e|[x₀，x_1，…，x_t])

and 4, optimizing model parameters by using an optimizer, and performing multiple iteration experiments to converge the model

This embodiment employs a cross entropy loss function, Adam optimizer. The loss function is as follows:

where m is the number of samples, y_i，kRefer to the kth class of sample i, with a positive class of 1 and a negative class of 0. p is a radical of_i，kRefers to the prediction probability of the kth class of sample i. The positive category refers in this embodiment to the next item clicked by the user, and the negative category refers to all other items.

In order to test the practical effect of the above-described convolutional self-attention network-based session recommendation model, the following test was performed based on the corresponding data set.

1. Downloading LASTFM and Yoochoose data sets, selecting a listening record of LASTFM and a purchasing record of Yoochoose as the data sets, and then preprocessing. For the youchhroose dataset, the present invention first deletes sequences that are less than 3 in length. Statistics only 4% of the processed data had session lengths greater than 10. Therefore, the present invention deletes sessions longer than 10 and retains the remaining data. This data set represents a session data set of shorter length, abbreviated YOO in subsequent experiments. For the LASTFM dataset, the present invention utilizes it to generate two longer length datasets. The specific method comprises the following steps: for the first data set, firstly randomly selecting 2000 played music, and screening out all records containing the music; then setting the maximum session length L as 20, and generating the session with the maximum session length not more than 20 by using a sliding window with the size of L and the step length of L; finally, those sessions in which two items are separated by more than 2 hours, are selected for discarding in this embodiment because the time interval is too long. This data set is abbreviated MUSIC _ M20. This data set represents a medium-length session. For the third data set, 20000 pieces of MUSIC are randomly selected in this embodiment, set L to 50, and in the same way as MUSIC _ M20, a third data set MUSIC _ L50 is generated, this data set representing a long conversation. For these three data sets, first, thisIn the examples, they were randomly divided into training and test sets, respectively, with the percentage of the training and test sets in the entire data set being 55% and 45%, respectively. Then, 5% was randomly chosen from the training set as the validation set. Further, in the present embodiment, data enhancement is performed on the training set. Specifically, the method comprises the following steps: for session x with length greater than 2₀,x₁,…,x_t-1,x_t]The invention generates a plurality of sub-sessions [0,0, …, x ] by filling 0₀,x₁]，[0,…x₀,x₁,x₂]，……，[x₀,x₁,…,x_t-1,x_t]. The present embodiment refers to these subsets additionally generated in the training set as a child data set-training set (hereinafter, abbreviated as "child data set-T"). The present embodiment performs data enhancement only on the training set.

2. And setting an evaluation index. In order to comprehensively evaluate the actual effect of the invention, 6 evaluation indexes are set in the embodiment: MRR @5, HR @5, NDCG @5 and MRR @20, HR @20, NDCG @ 20.

3. The model performs a fixed number of iterations. Each iteration is as follows: firstly, randomly acquiring a batch of session data from a training set, simultaneously sending the session data to an encoder and a decoder for prediction output, calculating a loss value according to the prediction output and a real label of the session, and performing back propagation to update parameters of a model. And observing the model performance obtained by training by using 6 evaluation indexes on the verification set, and selecting the model parameter which is optimal in representation on the verification set as the optimal parameter. The model test set result obtained under the parameters is used as the final performance of the model.

This embodiment compares the effect of the presence or absence of convolution operation in the encoder on the model performance, where NoConv denotes the absence of convolution operation in the model and WithConv denotes the presence of convolution operation in the model. To demonstrate only the effect of the convolution operation on the encoder, no gaussian weights are used in the model, and a bi-directional linear transform decoder is used. Specifically, as shown in table 2:

TABLE 2 comparative experimental results with and without convolution operation

From the experimental results of table 2, it can be concluded that: in the YOO data set, the WithConv and NoConv performances were almost the same, and were reflected in the evaluation index, with a gap of about 0.05%. The accuracy of WithConv was improved by about 1% over nocconv on MUSIC _ M20 data set. On the MUSIC _ L50 data set, the accuracy of WithConv is improved by about 1.5%. The method uses convolution self-attention, utilizes local correlation to model the article, and includes the characteristics of the surrounding sequence segments in the article representation, thereby effectively improving the accuracy of the model.

In this embodiment, the performance of different decoders is compared, and it is noted that the model using the fully-connected decoder is Full, and the model using the bi-directional linear transform decoder is bifilar. In order to demonstrate only the improvement of the accuracy of the proposed system by the bi-directional linear transform decoder, which uses only convolution operation without gaussian weights in the encoder, the present invention compares the results of the two decoders in evaluating the metrics and training duration, as shown in tables 3 and 4.

TABLE 3 comparative experiments with different decoders

TABLE 4 training times of different decoders

Through comparative analysis, the following conclusions can be drawn:

BiLinear performs best on three datasets. On the YOO dataset, BiLinear was 0.3% -0.6% higher than Full on 6 evaluation indices. On MUSIC _ M20 and MUSIC _ M50, BiLinear is about 0.2% higher and 0.15% higher than Full on the evaluation index, respectively.

BiLinear has a significantly shorter training time than Full. The size of the parameter matrix in the decoder of the fully-connected layer depends on the commodity space

Is poor in robustness. While the bi-directional transform matrix size of the bifilar decoder remains unchanged. Obviously, by using the bidirectional linear transformation decoder, the model precision is higher, the model parameters are less, and the robustness is better.

This example also compares the effect of the gaussian offset on the model performance, as shown in table 5. The comparative experimental model used a convolution operation with a repeat-and-explore decoder. Let the model that does not use gaussian offset be NoGauss, and the model that only applies gaussian offset weighting factors in the iterative recommendation decoder be onyydec.

TABLE 5 experiment of the Effect of Gaussian offset weighting factors on model Performance

From the experimental results of table 5, the following conclusions can be drawn: the use of gaussian offset weighting factors in the iterative predictive decoder can effectively improve the performance of the model.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. A conversation recommendation method based on a convolution self-attention network is characterized by comprising the following steps:

2. The method for session recommendation based on convolutional self-attention network as claimed in claim 1, wherein the method for modeling the sequence information of the session by the convolutional self-attention network based sequence encoder is:

3. The convolutional self-attention network-based session recommendation method of claim 2, wherein the self-attention network in S212 is a mask-multi self-attention network.

4. The method for recommending a conversation based on a convolutional self-attention network as claimed in claim 2, wherein the intention encoder based on the convolutional self-attention network and the gaussian attention mechanism models the key intention information of the conversation and calculates the gaussian weight by:

5. The method of claim 1, wherein the specific method of predicting the probability of the user selecting the repeated or non-repeated item next in the repeat-explorer selector is as follows:

6. The conversational recommendation method based on convolutional self-attention network of claim 1, wherein the specific method for calculating the conditional probability of each duplicate item in the duplicate recommendation decoder is as follows:

7. The conversational recommendation method based on convolutional self-attention network of claim 1, wherein the specific method for calculating the conditional probability of each non-duplicate item in the exploration recommendation decoder is as follows: