CN112258262A - Conversation recommendation method based on convolution self-attention network - Google Patents

Conversation recommendation method based on convolution self-attention network Download PDF

Info

Publication number
CN112258262A
CN112258262A CN202010969069.0A CN202010969069A CN112258262A CN 112258262 A CN112258262 A CN 112258262A CN 202010969069 A CN202010969069 A CN 202010969069A CN 112258262 A CN112258262 A CN 112258262A
Authority
CN
China
Prior art keywords
conversation
item
article
attention network
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010969069.0A
Other languages
Chinese (zh)
Other versions
CN112258262B (en
Inventor
张寅�
汪千缘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010969069.0A priority Critical patent/CN112258262B/en
Publication of CN112258262A publication Critical patent/CN112258262A/en
Application granted granted Critical
Publication of CN112258262B publication Critical patent/CN112258262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a session recommendation method based on a convolution self-attention network. The invention comprises the following steps: 1) each article in the conversation is represented as a low-dimensional vector, and the low-dimensional vector is formed by adding article embedding and position embedding; 2) performing sequence modeling and intention modeling on the low-dimensional vector, wherein the sequence modeling captures sequence information of a conversation, and the intention modeling captures key intention information of the conversation; 3) and selectively predicting whether the user clicks a repeated article or a non-repeated article in the next step based on the obtained splicing sequence information and the key intention information. Compared with the prior art, the method and the system can capture the interdependency between different segments in the conversation to obtain the sensitive item representation of the conversation segment. Then, the invention uses a bidirectional linear decoder, reduces the parameter quantity of the model and improves the performance and robustness of the model. Finally, the present invention uses the gaussian offset to improve the attention layer and calculates the gaussian weight factor to improve the performance of the iterative recommendation decoder.

Description

Conversation recommendation method based on convolution self-attention network
Technical Field
The invention relates to application of a neural network method in a session recommendation technology, in particular to a technical method for capturing session local segment characteristics by adopting convolution operation and enriching weight factor information quantity by adopting Gaussian offset.
Background
"information overload" is a common problem in the context of the big data era. How to obtain valuable information from complicated data is a key problem in the development of big data technology. Recommendation Systems (RS for short) are effective methods for solving information overload. The recommendation system is a technical system which is used for modeling the consumers and the interactive information thereof by utilizing the historical interactive information of the consumers and the websites, mining the interests and hobbies of the consumers, further filtering and evaluating mass choices and finally performing personalized recommendation for the consumers.
Conventional personalized recommendation systems often need to master user information to perform personalized recommendation. Many e-commerce recommendation systems (especially small retailer systems) and most news and media websites do not typically track the identity of users who visit their websites for long periods of time. While browser caches may provide certain information to aid in the identification and portrayal of users by websites, these techniques are often not reliable enough and may involve privacy concerns. The session predicts the user's next step behavior based on a sequence of anonymous behaviors over a period of time (e.g., click, purchase, collection, shopping cart add, etc.). Such a sequence of anonymous behaviors is referred to herein as a "session". The behavior within a session is referred to herein as an "item".
In recent years, deep learning techniques such as recurrent neural networks and self-attention networks have been successfully applied to the session recommendation algorithm. Compared with a Recurrent Neural Network (RNN), a self-attention network (SAN) has obvious advantages in modeling long-term dependence and avoiding information forgetting, but the existing model still has three problems:
1) local dependencies are ignored. Local correlation refers to the interdependence between different sequence segments within a session. Sequence fragments are a more abstract unit of features than a single item. The local correlation is captured when the object is modeled, so that a better object representation can be obtained, and the accuracy of prediction is improved.
2) The conventional full-connection decoder has huge parameter quantity, long training time and poor model robustness.
3) The influence of the sequence of clicking the items in the conversation on the repeated recommendation result is ignored. In the repeat consumption phenomenon, the item clicked by the user at the next moment is more likely to be the most recently clicked item.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a session recommendation method based on a convolution self-attention network. According to the method, the local correlation in the conversation is captured by using the encoder based on the convolution self-attention network, the sensitive article representation of the conversation fragment is obtained, and the performance of the modeling conversation is improved. The invention utilizes the bidirectional linear decoder to reduce the parameter quantity of the model and improve the performance and robustness of the model. The distance relationship between each article and the last article in the conversation is modeled by utilizing the Gaussian weight, so that the performance of the repeated recommendation decoder is improved.
The technical scheme adopted by the invention is as follows:
a conversation recommendation method based on a convolution self-attention network comprises the following steps:
s1: giving a conversation as input, and obtaining a low-dimensional vector of each item in the conversation, wherein the low-dimensional vector is formed by adding item embedding and position embedding of the item in the conversation;
s2: on the basis of the low-dimensional vector obtained in S1, modeling the sequence information of the session by using a sequence encoder based on a convolution self-attention network, modeling the key intention information of the session by using an intention encoder based on the convolution self-attention network and a Gaussian attention mechanism, and calculating Gaussian weight;
s3: splicing the sequence information and the key intention information obtained in the S2 to obtain a conversation hidden layer representation, and inputting the conversation hidden layer representation into a repeat-exploration selector to predict the probability of selecting repeated or non-repeated articles in the next step; then, the conditional probability of each repeated article is calculated in a repeated recommended decoder, the conditional probability of each non-repeated article is calculated in an exploring recommended decoder, and the marginal probabilities output by the two decoders are added to obtain the prediction probability of the model to all possible articles.
Preferably, the method for modeling the sequence information of the session by the sequence encoder based on the convolutional self-attention network comprises the following steps:
s211: capturing conversation fragment characteristics around each item in the conversation by using convolution operation, and interacting with the characteristics when modeling the item representation to obtain item representations sensitive to conversation fragments;
s212: sequence information of the session is modeled based on the item representation obtained in S211 by capturing interdependencies between different items in the session using the self-attention network.
Further, the self-attention network in S212 is a mask multi-head self-attention network.
Further, the specific method for modeling the key intention information of the session and calculating the gaussian weight by the intention encoder based on the convolutional self-attention network and the gaussian attention mechanism includes:
s221: based on the item representation obtained in S211, capturing interdependencies between different items using a convolutional self-attention network;
s222: calculating the weight of each article in the conversation by using an attention mechanism based on the article representations obtained in the step S221, wherein the weighted sum of different article representations is the key intention information of the conversation; the gaussian weight of each item subject to the gaussian offset is then calculated with the last item in the session as the expected center of the gaussian distribution.
Further, the specific method for predicting the probability of selecting the repeated or non-repeated item next step by the user in the repeat-search selector is as follows:
and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain a repeated recommendation probability and an exploration recommendation probability for judging whether to recommend a clicked article or an unchecked article to a user.
Further, the specific method for calculating the conditional probability of each duplicate item in the duplicate recommendation decoder is as follows:
and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated article by the user in the next step.
Further, the specific method for calculating the conditional probability of each non-duplicate item in the search recommendation decoder is as follows:
and splicing the sequence information and the key intention information to obtain a conversation hidden layer representation, mapping the conversation hidden layer representation to the classification of the un-clicked articles through a bidirectional linear transformation matrix and an article embedding matrix, and finally performing normalization through a softmax layer to obtain the conditional probability of the user clicking the un-clicked articles in the next step.
Compared with the prior art, the method and the system can capture the interdependency between different segments in the conversation to obtain the sensitive item representation of the conversation segment. Then, the invention uses a bidirectional linear decoder, reduces the parameter quantity of the model and improves the performance and robustness of the model. Finally, the invention uses the Gaussian offset to improve the attention layer, and calculates the Gaussian weight factor to make the Gaussian weight factor contain the position order distance relationship between each article and the last article in the conversation, thereby improving the performance of the repeated recommendation decoder.
Drawings
FIG. 1 is a flow chart of a session recommendation method based on a convolutional self-attention network;
figure 2 is the overall framework of the invention.
FIG. 3 is a block diagram of a multi-headed convolutional self-attention network.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and the detailed description.
As shown in fig. 1, the present invention provides a session recommendation method based on a convolutional self-attention network, which includes the following steps:
s1: given a conversation as an input, a low-dimensional vector for each item within the conversation is obtained, the low-dimensional vector being formed by adding the item embedding and the position embedding of the item in the conversation.
S2: on the basis of the low-dimensional vector obtained at S1, sequence information of the session is modeled using a sequence encoder based on the convolutional self-attention network, key intention information of the session is modeled using an intention encoder based on the convolutional self-attention network and the gaussian attention mechanism, and gaussian weights are calculated.
The method for modeling the sequence information of the session by the sequence encoder based on the convolutional self-attention network comprises the following steps:
s211: and capturing the conversation fragment characteristics around each item in the conversation by using convolution operation, and interacting with the characteristics when the item representation is modeled to obtain the item representation sensitive to the conversation fragment.
S212: sequence information of the session is modeled based on the item representation obtained in S211 by capturing interdependencies between different items in the session using the self-attention network. Wherein, the self-attention network is a mask, preferably a multi-head self-attention network.
The specific method for modeling the key intention information of the session and calculating the Gaussian weight by the intention encoder based on the convolution self-attention network and the Gaussian attention mechanism comprises the following steps:
s221: based on the item representation obtained in S211, capturing interdependencies between different items using a convolutional self-attention network;
s222: calculating the weight of each article in the conversation by using an attention mechanism based on the article representations obtained in the step S221, wherein the weighted sum of different article representations is the key intention information of the conversation; the gaussian weight of each item subject to the gaussian offset is then calculated with the last item in the session as the expected center of the gaussian distribution.
S3: splicing the sequence information and the key intention information obtained in the S2 to obtain a conversation hidden layer representation, and inputting the conversation hidden layer representation into a repeat-exploration selector to predict the probability of selecting repeated or non-repeated articles in the next step; then, the conditional probability of each repeated article is calculated in a repeated recommended decoder, the conditional probability of each non-repeated article is calculated in an exploring recommended decoder, and the marginal probabilities output by the two decoders are added to obtain the prediction probability of the model to all possible articles.
The specific method for predicting the probability of selecting the repeated or non-repeated article in the next step of the user in the repeat-exploration selector comprises the following steps:
and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain a repeated recommendation probability and an exploration recommendation probability for judging whether to recommend a clicked article or an unchecked article to a user.
The specific method for calculating the conditional probability of each repeated article in the repeated recommendation decoder comprises the following steps:
and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated article by the user in the next step.
The specific method for calculating the conditional probability of each non-repetitive item in the exploration recommendation decoder comprises the following steps:
and splicing the sequence information and the key intention information to obtain a conversation hidden layer representation, mapping the conversation hidden layer representation to the classification of the un-clicked articles through a bidirectional linear transformation matrix and an article embedding matrix, and finally performing normalization through a softmax layer to obtain the conditional probability of the user clicking the un-clicked articles in the next step.
In order to further illustrate the specific implementation of the present invention, the above method is applied to specific embodiments.
Examples
The overall framework of the method in this example is shown in figure 2. To facilitate the understanding of the following discussion, and to facilitate the uniformity of writing and understanding, this section sets forth a formulation for some of the terms referred to below. The relevant mathematical symbols and their meanings are shown in table 1.
TABLE 1 conversational recommendation related mathematical symbols and meanings
Figure BDA0002683400830000051
The invention relates to a session recommendation method based on a convolution self-attention network, which specifically comprises the following steps:
step 1. obtaining a vector representation of each item
1.1) for a given input session, input sequence of items [ x ] using item embedding matrix emb0,x1,…,xt-1,xt]And mapping the index into a real-valued vector sequence of a low-dimensional space to obtain an article embedded representation.
1.2) in order to supplement the position precedence information of the articles in the conversation, additionally adding a position code. Using trigonometric function based position coding, the calculation formula is as follows:
Figure BDA0002683400830000052
Figure BDA0002683400830000061
where pos is the location of the item in the conversation. I in the two equations refers to the dimension d of the item embeddingmodelIs the dimension of the position code, and 2i +1 in the formula are parity to distinguish the dimension.
1.3) adding the article embedding representation and the position code to obtain the final article vector representation [ x0,x1,…,xt-1,xt]。
Step 2, modeling sequence information, key intention information and calculating Gaussian weight
2.1) with the result of step 1 [ x0,x1,…,xt-1,xt]For input, sequence encoder modeling sequence information based on Convolutional Self-Attention Networks (ConvSAN) is used. Capturing local features of each item, obtaining item representation sensitive to conversation fragments, and outputting hidden layer representation
Figure BDA0002683400830000062
The last hidden layer represents the sequence information including the session, and is marked as
Figure BDA0002683400830000063
The calculation process performed in the whole network can be expressed as the following expression:
Figure BDA0002683400830000064
ConvSAN contains two sublayers: the multi-head convolution is composed of an attention network layer and a feedforward neural network layer. The input and output of each sub-layer are subject to residual concatenation and layer regularization. Residual concatenation helps in passing back the gradient, and layer regularization can accelerate model convergence. The calculation formula is as follows:
SubLayerOutput=LayerNorm(x+SubLayer(x))
a multi-headed convolutional self-attention network is introduced, the overall framework of which is shown in fig. 3. Q, K, the V vectors are Query (Q) vectors, Key (K) vectors and Value (V) vectors in the network, respectively. To capture the features of the fragments around each item, both Q and K were convolved using a convolution kernel of size K. When modeling item i in this way, a sequence segment feature of length k around the item is extracted by convolution (the "around" means that item i is left to prevent future information leakage), and any two items interact through the feature. Noting Q, K of the convolved article i, the V vector is
Figure BDA0002683400830000065
Vi. The specific calculation process is as follows:
Figure BDA0002683400830000066
Figure BDA0002683400830000071
Vi=xi
wherein WQ,BQRepresenting the weight matrix and the offset for the convolution operation on Q. In the same way, WK,BKAre the weight matrix and the bias for the convolution operation on K (W and B appearing in the following expression are both trainable parameter matrices and will not be described in detail). The self-attention calculation is performed next:
Figure BDA0002683400830000072
wherein
Figure BDA0002683400830000073
Is a scaling factor, avoids Qconv(Kconv)TThe product is too large into the saturation domain of the softmax function. dkIs KconvOf (c) is calculated.
Q is converted by a multi-head mechanismconv、KconvAnd mapping the V into a plurality of subspaces with the same dimension, and splicing attention calculation results of different subspaces. This helps the network to capture richer information.
MultiHead(Qconv,Kconv,V)=concat(h0,h1,cht)
Wherein the attention performed in the ith subspace is calculated as follows:
hi=Attention(QconvWi Q,KconvWi K,VWi V)
the feedforward neural network layer is introduced. The output of the convolution self-attention network layer is the input of the feedforward neural network layer after layer regular and residual connection. This layer implements two linear transformations and one Relu activation:
FFN(x)=(Relu(xW1+b1))W2+b2
note that for the session recommendation scenario, x is modeledtWhen only x is known0,x1,…,xtWithout knowing xt+1. Therefore, in order to prevent the future information leakage, the invention adds a mask (mask) on the self-attention mechanism, and masks xtThe latter information becomes mask Multi-head Attention (Masked Multi-head Attention).
2.2) with the result of step 1 [ x ]0,x1,…,xt-1,xt]Modeling key intent information and meters for a session using an intent encoder for inputAnd calculating the Gaussian weight.
Two parts are included in the intent encoder: a single-layer ConvSAN layer and a Gaussian Attention layer (Gaussian Attention, abbreviated as GaussAtten). The calculation is performed in the intent encoder:
Figure BDA0002683400830000081
Figure BDA0002683400830000082
session embedding input into ConvSAN layer, output hidden layer representation
Figure BDA0002683400830000083
Inputting the data into GaussAtten layer, and outputting hidden layer representation of key intention of user
Figure BDA0002683400830000084
And the Gaussian weight factor weightGauss. The calculation process performed in the network is as follows:
Figure BDA0002683400830000085
Figure BDA0002683400830000086
the present invention first introduces alphatjtjIs a weighting factor, with a greater weighting factor indicating a conversation
Figure BDA0002683400830000087
In
Figure BDA0002683400830000088
The greater the specific gravity occupied.
Figure BDA0002683400830000089
Is a Gaussian weightA factor. They are calculated as follows:
Figure BDA00026834008300000810
Figure BDA00026834008300000811
q is a calculation
Figure BDA00026834008300000812
And
Figure BDA00026834008300000813
as a function of the similarity between them. The similarity function is calculated as follows:
Figure BDA00026834008300000814
σ is an activation function, which may be a sigmoid function or a softmax function, and the sigmoid function is selected and used in the model of this embodiment. A. the1Is a linear transformation matrix, will
Figure BDA00026834008300000815
Mapping into hidden layer space. Same principle A2And v, as well.
Then, introduce
Figure BDA00026834008300000816
Than αtjAdd an item Gtj. The invention first introduces the matrix G.
Figure BDA00026834008300000817
Is a position alignment matrix based on gaussian distribution, I is the session length. GtjIs one of the matrix, and measures the closeness between the item j and the item t at the center position, and the calculation process is as follows:
Figure BDA00026834008300000818
wherein sigmatIs the standard deviation and is typically set to a Gaussian window DtHalf of (1); j is the position of item j in the conversation, PtA predicted center position for item t, corresponding to expectation; gtj<0. Predicted center position PtAnd Gaussian window DtAll are obtained by learning:
Figure BDA0002683400830000091
obviously, PtAnd DtIs limited to the range of (0, I). p is a radical oftAnd ztFor scalar quantities, the calculation process is as follows:
Figure BDA0002683400830000092
Figure BDA0002683400830000093
wherein
Figure BDA0002683400830000094
H is
Figure BDA0002683400830000095
Of (c) is calculated.
Figure BDA0002683400830000096
And
Figure BDA0002683400830000097
to map the matrix linearly, the output is mapped as a scalar. They share the same WpThis is because, when applied to a session, there may be some correlation between the expectation and variance of the gaussian distribution when the weights of other items are gaussian offset centered around the last item.
Output of sequence encoder
Figure BDA0002683400830000098
With the output of the intent encoder
Figure BDA0002683400830000099
And splicing to obtain the final session hidden layer representation:
Figure BDA00026834008300000910
ctfed to a subsequent decoder, weightGaussAnd feeding the data into a repeated recommendation decoder.
Step 3, predicting the probability of each item clicked by the user in the next step by using a repeat-exploration decoder to make recommendation
3.1) calculating the probability of the user clicking the repeated item or not by using a Repeat-Explore Selector (RES). RES acts as a two-classifier for determining whether a clicked item (repeat mechanism) or an unchecked item (explore mechanism) is recommended to the user. It comprises two parts: the first part is a linear transformation layer, and the hidden layer representation of the conversation is mapped into scores of two mechanisms; the second part is the softmax layer, which computes the normalized probability. The specific calculation process is as follows:
[P(r|[x0,x1,…,xt]),P(e|[x0,x1,…,xt])]=softmax(ctWre)
wherein P (r | [ x ]0,x1,…,xt]) For the repetition mechanism probability, P (e | [ x ]0,x1,…,xt]) To explore the mechanism probabilities. Wre H×2Is a weight matrix, H is ctOf (c) is calculated.
3.2) using a Repeat Recommendation Decoder (D for short)R) And calculating the probability of the user clicking the repeated item under the repeated mechanism. Its input is weightGaussThe output is the conditional probability distribution of the item clicked by the user. The specific calculation process is as follows:
Figure BDA0002683400830000101
wherein
Figure BDA0002683400830000102
Is represented by [ x ]0,x1,…,xt]All of x iniThe sum of the gaussian weight factors because of the number of sessions x0,x1,…,xt]In, the same article xiMay occur multiple times.
3.3) Recommendation Decoder (explicit Recommendation Decoder, D for short) is usedE) And calculating the scores of the items which are not clicked by the user. The method comprises two parts, wherein the first part is a bidirectional linear transformation layer and maps the representation of an encoder to the classification of an unchecked article; the second part is a softmax function, and probability normalization is carried out on the classification result. The specific calculation process is as follows:
P(xi|,[x0,x1,…,xt])=softmax(fxi)
Figure BDA0002683400830000103
where emb is an embedded matrix of items. B is a bidirectional linear transformation matrix with the size of | H | × | D |. Wherein H is ctD is the item embedding dimension.
The two recommendation scores are added to obtain the recommendation scores of all the items. To predict individual items xiThe recommended score of (2) is an example, and the calculation process is as follows:
P(xi)=P(xi|r,[x0,x1,…,xt])P(r|[x0,x1,…,xt])+P(xi|e,[x0,x1,…,xt])P(e|[x0,x1,…,xt])
and 4, optimizing model parameters by using an optimizer, and performing multiple iteration experiments to converge the model
This embodiment employs a cross entropy loss function, Adam optimizer. The loss function is as follows:
Figure BDA0002683400830000104
where m is the number of samples, yi,kRefer to the kth class of sample i, with a positive class of 1 and a negative class of 0. p is a radical ofi,kRefers to the prediction probability of the kth class of sample i. The positive category refers in this embodiment to the next item clicked by the user, and the negative category refers to all other items.
In order to test the practical effect of the above-described convolutional self-attention network-based session recommendation model, the following test was performed based on the corresponding data set.
1. Downloading LASTFM and Yoochoose data sets, selecting a listening record of LASTFM and a purchasing record of Yoochoose as the data sets, and then preprocessing. For the youchhroose dataset, the present invention first deletes sequences that are less than 3 in length. Statistics only 4% of the processed data had session lengths greater than 10. Therefore, the present invention deletes sessions longer than 10 and retains the remaining data. This data set represents a session data set of shorter length, abbreviated YOO in subsequent experiments. For the LASTFM dataset, the present invention utilizes it to generate two longer length datasets. The specific method comprises the following steps: for the first data set, firstly randomly selecting 2000 played music, and screening out all records containing the music; then setting the maximum session length L as 20, and generating the session with the maximum session length not more than 20 by using a sliding window with the size of L and the step length of L; finally, those sessions in which two items are separated by more than 2 hours, are selected for discarding in this embodiment because the time interval is too long. This data set is abbreviated MUSIC _ M20. This data set represents a medium-length session. For the third data set, 20000 pieces of MUSIC are randomly selected in this embodiment, set L to 50, and in the same way as MUSIC _ M20, a third data set MUSIC _ L50 is generated, this data set representing a long conversation. For these three data sets, first, thisIn the examples, they were randomly divided into training and test sets, respectively, with the percentage of the training and test sets in the entire data set being 55% and 45%, respectively. Then, 5% was randomly chosen from the training set as the validation set. Further, in the present embodiment, data enhancement is performed on the training set. Specifically, the method comprises the following steps: for session x with length greater than 20,x1,…,xt-1,xt]The invention generates a plurality of sub-sessions [0,0, …, x ] by filling 00,x1],[0,…x0,x1,x2],……,[x0,x1,…,xt-1,xt]. The present embodiment refers to these subsets additionally generated in the training set as a child data set-training set (hereinafter, abbreviated as "child data set-T"). The present embodiment performs data enhancement only on the training set.
2. And setting an evaluation index. In order to comprehensively evaluate the actual effect of the invention, 6 evaluation indexes are set in the embodiment: MRR @5, HR @5, NDCG @5 and MRR @20, HR @20, NDCG @ 20.
3. The model performs a fixed number of iterations. Each iteration is as follows: firstly, randomly acquiring a batch of session data from a training set, simultaneously sending the session data to an encoder and a decoder for prediction output, calculating a loss value according to the prediction output and a real label of the session, and performing back propagation to update parameters of a model. And observing the model performance obtained by training by using 6 evaluation indexes on the verification set, and selecting the model parameter which is optimal in representation on the verification set as the optimal parameter. The model test set result obtained under the parameters is used as the final performance of the model.
This embodiment compares the effect of the presence or absence of convolution operation in the encoder on the model performance, where NoConv denotes the absence of convolution operation in the model and WithConv denotes the presence of convolution operation in the model. To demonstrate only the effect of the convolution operation on the encoder, no gaussian weights are used in the model, and a bi-directional linear transform decoder is used. Specifically, as shown in table 2:
TABLE 2 comparative experimental results with and without convolution operation
Figure BDA0002683400830000121
From the experimental results of table 2, it can be concluded that: in the YOO data set, the WithConv and NoConv performances were almost the same, and were reflected in the evaluation index, with a gap of about 0.05%. The accuracy of WithConv was improved by about 1% over nocconv on MUSIC _ M20 data set. On the MUSIC _ L50 data set, the accuracy of WithConv is improved by about 1.5%. The method uses convolution self-attention, utilizes local correlation to model the article, and includes the characteristics of the surrounding sequence segments in the article representation, thereby effectively improving the accuracy of the model.
In this embodiment, the performance of different decoders is compared, and it is noted that the model using the fully-connected decoder is Full, and the model using the bi-directional linear transform decoder is bifilar. In order to demonstrate only the improvement of the accuracy of the proposed system by the bi-directional linear transform decoder, which uses only convolution operation without gaussian weights in the encoder, the present invention compares the results of the two decoders in evaluating the metrics and training duration, as shown in tables 3 and 4.
TABLE 3 comparative experiments with different decoders
Figure BDA0002683400830000122
TABLE 4 training times of different decoders
Figure BDA0002683400830000131
Through comparative analysis, the following conclusions can be drawn:
BiLinear performs best on three datasets. On the YOO dataset, BiLinear was 0.3% -0.6% higher than Full on 6 evaluation indices. On MUSIC _ M20 and MUSIC _ M50, BiLinear is about 0.2% higher and 0.15% higher than Full on the evaluation index, respectively.
BiLinear has a significantly shorter training time than Full. The size of the parameter matrix in the decoder of the fully-connected layer depends on the commodity space
Figure BDA0002683400830000133
Is poor in robustness. While the bi-directional transform matrix size of the bifilar decoder remains unchanged. Obviously, by using the bidirectional linear transformation decoder, the model precision is higher, the model parameters are less, and the robustness is better.
This example also compares the effect of the gaussian offset on the model performance, as shown in table 5. The comparative experimental model used a convolution operation with a repeat-and-explore decoder. Let the model that does not use gaussian offset be NoGauss, and the model that only applies gaussian offset weighting factors in the iterative recommendation decoder be onyydec.
TABLE 5 experiment of the Effect of Gaussian offset weighting factors on model Performance
Figure BDA0002683400830000132
From the experimental results of table 5, the following conclusions can be drawn: the use of gaussian offset weighting factors in the iterative predictive decoder can effectively improve the performance of the model.
The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims (7)

1. A conversation recommendation method based on a convolution self-attention network is characterized by comprising the following steps:
s1: giving a conversation as input, and obtaining a low-dimensional vector of each item in the conversation, wherein the low-dimensional vector is formed by adding item embedding and position embedding of the item in the conversation;
s2: on the basis of the low-dimensional vector obtained in S1, modeling the sequence information of the session by using a sequence encoder based on a convolution self-attention network, modeling the key intention information of the session by using an intention encoder based on the convolution self-attention network and a Gaussian attention mechanism, and calculating Gaussian weight;
s3: splicing the sequence information and the key intention information obtained in the S2 to obtain a conversation hidden layer representation, and inputting the conversation hidden layer representation into a repeat-exploration selector to predict the probability of selecting repeated or non-repeated articles in the next step; then, the conditional probability of each repeated article is calculated in a repeated recommended decoder, the conditional probability of each non-repeated article is calculated in an exploring recommended decoder, and the marginal probabilities output by the two decoders are added to obtain the prediction probability of the model to all possible articles.
2. The method for session recommendation based on convolutional self-attention network as claimed in claim 1, wherein the method for modeling the sequence information of the session by the convolutional self-attention network based sequence encoder is:
s211: capturing conversation fragment characteristics around each item in the conversation by using convolution operation, and interacting with the characteristics when modeling the item representation to obtain item representations sensitive to conversation fragments;
s212: sequence information of the session is modeled based on the item representation obtained in S211 by capturing interdependencies between different items in the session using the self-attention network.
3. The convolutional self-attention network-based session recommendation method of claim 2, wherein the self-attention network in S212 is a mask-multi self-attention network.
4. The method for recommending a conversation based on a convolutional self-attention network as claimed in claim 2, wherein the intention encoder based on the convolutional self-attention network and the gaussian attention mechanism models the key intention information of the conversation and calculates the gaussian weight by:
s221: based on the item representation obtained in S211, capturing interdependencies between different items using a convolutional self-attention network;
s222: calculating the weight of each article in the conversation by using an attention mechanism based on the article representations obtained in the step S221, wherein the weighted sum of different article representations is the key intention information of the conversation; the gaussian weight of each item subject to the gaussian offset is then calculated with the last item in the session as the expected center of the gaussian distribution.
5. The method of claim 1, wherein the specific method of predicting the probability of the user selecting the repeated or non-repeated item next in the repeat-explorer selector is as follows:
and splicing the sequence information and the key intention information, inputting the sequence information and the key intention information into a linear network layer for mapping transformation, and normalizing the sequence information and the key intention information through a softmax layer to obtain a repeated recommendation probability and an exploration recommendation probability for judging whether to recommend a clicked article or an unchecked article to a user.
6. The conversational recommendation method based on convolutional self-attention network of claim 1, wherein the specific method for calculating the conditional probability of each duplicate item in the duplicate recommendation decoder is as follows:
and taking the Gaussian weight obtained in the step S222 as input, and integrating and calculating the conditional probability of clicking each repeated article by the user in the next step.
7. The conversational recommendation method based on convolutional self-attention network of claim 1, wherein the specific method for calculating the conditional probability of each non-duplicate item in the exploration recommendation decoder is as follows:
and splicing the sequence information and the key intention information to obtain a conversation hidden layer representation, mapping the conversation hidden layer representation to the classification of the un-clicked articles through a bidirectional linear transformation matrix and an article embedding matrix, and finally performing normalization through a softmax layer to obtain the conditional probability of the user clicking the un-clicked articles in the next step.
CN202010969069.0A 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network Active CN112258262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010969069.0A CN112258262B (en) 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010969069.0A CN112258262B (en) 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network

Publications (2)

Publication Number Publication Date
CN112258262A true CN112258262A (en) 2021-01-22
CN112258262B CN112258262B (en) 2023-09-26

Family

ID=74231420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010969069.0A Active CN112258262B (en) 2020-09-15 2020-09-15 Session recommendation method based on convolution self-attention network

Country Status (1)

Country Link
CN (1) CN112258262B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255780A (en) * 2021-05-28 2021-08-13 润联软件系统(深圳)有限公司 Reduction gearbox fault prediction method and device, computer equipment and storage medium
CN113343097A (en) * 2021-06-24 2021-09-03 中山大学 Sequence recommendation method and system based on fragment and self-attention mechanism
CN113821724A (en) * 2021-09-23 2021-12-21 湖南大学 Graph neural network recommendation method based on time interval enhancement
CN113961816A (en) * 2021-11-26 2022-01-21 重庆理工大学 Graph convolution neural network session recommendation method based on structure enhancement
CN114791983A (en) * 2022-04-13 2022-07-26 湖北工业大学 Sequence recommendation method based on time sequence article similarity
CN115906863A (en) * 2022-10-25 2023-04-04 华南师范大学 Emotion analysis method, device and equipment based on comparative learning and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110119467A (en) * 2019-05-14 2019-08-13 苏州大学 A kind of dialogue-based item recommendation method, device, equipment and storage medium
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning
US20190362506A1 (en) * 2018-05-23 2019-11-28 Prove Labs, Inc. Systems and methods for monitoring and evaluating body movement
CN110619082A (en) * 2019-09-20 2019-12-27 苏州市职业大学 Project recommendation method based on repeated search mechanism
CN111080400A (en) * 2019-11-25 2020-04-28 中山大学 Commodity recommendation method and system based on gate control graph convolution network and storage medium
CN111127165A (en) * 2019-12-26 2020-05-08 纪信智达(广州)信息技术有限公司 Sequence recommendation method based on self-attention self-encoder
CN111241425A (en) * 2019-10-17 2020-06-05 陕西师范大学 POI recommendation method based on hierarchical attention mechanism
CN111259243A (en) * 2020-01-14 2020-06-09 中山大学 Parallel recommendation method and system based on session
CN111429234A (en) * 2020-04-16 2020-07-17 电子科技大学中山学院 Deep learning-based commodity sequence recommendation method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190362506A1 (en) * 2018-05-23 2019-11-28 Prove Labs, Inc. Systems and methods for monitoring and evaluating body movement
CN110119467A (en) * 2019-05-14 2019-08-13 苏州大学 A kind of dialogue-based item recommendation method, device, equipment and storage medium
CN110196946A (en) * 2019-05-29 2019-09-03 华南理工大学 A kind of personalized recommendation method based on deep learning
CN110619082A (en) * 2019-09-20 2019-12-27 苏州市职业大学 Project recommendation method based on repeated search mechanism
CN111241425A (en) * 2019-10-17 2020-06-05 陕西师范大学 POI recommendation method based on hierarchical attention mechanism
CN111080400A (en) * 2019-11-25 2020-04-28 中山大学 Commodity recommendation method and system based on gate control graph convolution network and storage medium
CN111127165A (en) * 2019-12-26 2020-05-08 纪信智达(广州)信息技术有限公司 Sequence recommendation method based on self-attention self-encoder
CN111259243A (en) * 2020-01-14 2020-06-09 中山大学 Parallel recommendation method and system based on session
CN111429234A (en) * 2020-04-16 2020-07-17 电子科技大学中山学院 Deep learning-based commodity sequence recommendation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QITIAN WU等: "Dual Sequential Prediction Models Linking Sequential Recommendation and Information Dissemination", PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING *
RUIHONG QIU等: "Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks", PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT *
刘慧婷;纪强;刘慧敏;赵鹏;: "基于双层注意力机制的联合深度推荐模型", 华南理工大学学报(自然科学版), no. 06 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255780A (en) * 2021-05-28 2021-08-13 润联软件系统(深圳)有限公司 Reduction gearbox fault prediction method and device, computer equipment and storage medium
CN113255780B (en) * 2021-05-28 2024-05-03 润联智能科技股份有限公司 Reduction gearbox fault prediction method and device, computer equipment and storage medium
CN113343097A (en) * 2021-06-24 2021-09-03 中山大学 Sequence recommendation method and system based on fragment and self-attention mechanism
CN113821724A (en) * 2021-09-23 2021-12-21 湖南大学 Graph neural network recommendation method based on time interval enhancement
CN113821724B (en) * 2021-09-23 2023-10-20 湖南大学 Time interval enhancement-based graph neural network recommendation method
CN113961816A (en) * 2021-11-26 2022-01-21 重庆理工大学 Graph convolution neural network session recommendation method based on structure enhancement
CN114791983A (en) * 2022-04-13 2022-07-26 湖北工业大学 Sequence recommendation method based on time sequence article similarity
CN114791983B (en) * 2022-04-13 2023-04-07 湖北工业大学 Sequence recommendation method based on time sequence article similarity
CN115906863A (en) * 2022-10-25 2023-04-04 华南师范大学 Emotion analysis method, device and equipment based on comparative learning and storage medium
CN115906863B (en) * 2022-10-25 2023-09-12 华南师范大学 Emotion analysis method, device, equipment and storage medium based on contrast learning

Also Published As

Publication number Publication date
CN112258262B (en) 2023-09-26

Similar Documents

Publication Publication Date Title
Wu et al. Session-based recommendation with graph neural networks
CN112258262B (en) Session recommendation method based on convolution self-attention network
Yoon et al. Data valuation using reinforcement learning
CN111797321B (en) Personalized knowledge recommendation method and system for different scenes
CN112381581B (en) Advertisement click rate estimation method based on improved Transformer
CN111753209B (en) Sequence recommendation list generation method based on improved time sequence convolution network
CN110781401A (en) Top-n project recommendation method based on collaborative autoregressive flow
Sina Mirabdolbaghi et al. Model optimization analysis of customer churn prediction using machine learning algorithms with focus on feature reductions
Choe et al. Recommendation system with hierarchical recurrent neural network for long-term time series
CN113609388A (en) Sequence recommendation method based on counterfactual user behavior sequence generation
CN117196763A (en) Commodity sequence recommending method based on time sequence perception self-attention and contrast learning
CN113535964B (en) Enterprise classification model intelligent construction method, device, equipment and medium
CN111259264A (en) Time sequence scoring prediction method based on generation countermeasure network
CN112232388A (en) ELM-RFE-based shopping intention key factor identification method
CN115293812A (en) E-commerce platform session perception recommendation prediction method based on long-term and short-term interests
CN113010774B (en) Click rate prediction method based on dynamic deep attention model
CN112884019B (en) Image language conversion method based on fusion gate circulation network model
Duan et al. Context-aware short-term interest first model for session-based recommendation
Yan et al. Modeling long-and short-term user behaviors for sequential recommendation with deep neural networks
Szwabe et al. Logistic regression setup for RTB CTR estimation
CN112559905A (en) Conversation recommendation method based on dual-mode attention mechanism and social similarity
Burnap et al. Predicting" Design Gaps" in the Market: Deep Consumer Choice Models under Probabilistic Design Constraints
CN116992157B (en) Advertisement recommendation method based on biological neural network
CN116610783B (en) Service optimization method based on artificial intelligent decision and digital online page system
CN113688229B (en) Text recommendation method, system, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant