Disclosure of Invention
The technical problem to be solved by the invention is to provide a project recommendation method of a neural network based on directed and undirected structure information, so as to solve the problems of neglecting repeated projects in a click sequence and not well utilizing the structure information in a conversation sequence diagram when generating vector representation of the projects in the prior art.
In order to solve the above technical problem, the present invention provides a project recommendation method based on a neural network of a graph with directed and undirected structural information, the overall framework of which is shown in fig. 1, and the method comprises the following steps:
s1, let V represent the collection of items appearing in all conversation sequences, i.e.
Then an anonymous session sequence of length n may be used
Is expressed and the items in the conversation s are chronologically arranged, each v
iE.g. V represents the item clicked by the user in the session s, and the recommendation is to predict the next click of the user, i.e. predict the sequence label V in the session s
n+1;
S2, receiving the historical conversation sequence, and converting the historical conversation sequence into a directed conversation sequence chart G
s=(V
s,ε
s,A
s) In which V is
sRepresents a set of points,. epsilon
sRepresents the edge set, A
sRepresents a set of adjacency matrices, A
sDefined as three adjacency matrixes
And
the splicing of (a), wherein,
representing weighted adjacency matrices of undirected graphs, and
and
respectively representing weighted in-degree adjacent momentsArray and outgoing adjacency matrix;
s3, connecting point v
iMapping the e to the random embedded vector space to obtain d-dimensional vector representation
Extracting a first intermediate implicit vector of an item in the conversation sequence diagram by using a graph convolution network, extracting a second intermediate implicit vector of item conversion in the conversation sequence diagram by using a gated diagram neural network, and obtaining an item implicit vector through first linear transformation;
s4, inputting the item implicit vector into the target attention network, thereby obtaining a session implicit vector corresponding to the session sequence S;
s5, acquiring global information and local information of the conversation sequence S, and constructing a conversation vector representation through second linear transformation;
s6, predicting the probability of all possible items being clicked on the conversation sequence S by the softmax function, and recommending items with a high probability.
Further, a first intermediate implicit vector of an item in the session sequence diagram is extracted by using the graph volume network, that is, undirected structure information with attention is extracted by using the graph volume network, and the steps are as follows:
s31, generating a feature matrix X of the conversation sequence diagram; each node v in the session sequence diagram
iCorresponding d-dimensional feature vector
The stack of (2) constituting a feature matrix of a session sequence diagram
X=[x
1,…,x
n]
T;
S32, for the graph convolution layer of the k layer, use the matrix H(k-1)Input vectors representing all nodes, by H(k)An output vector representing the node, wherein the initial d-dimensional node vector is the initial inputThe first layer of the graph convolution layer network is characterized by the following formula:
H(0)=X, (1)
before the input of each graph convolution layer, each node viThe feature of (a) is averaged with the feature vector of its local neighbor, and the calculation formula is:
wherein, aijIs node viAnd vjEdge weight between, di=∑jaij;
S33, the output of the graph convolution network is a first intermediate implicit vector.
Further, for equation (2), the simple matrix operation of the whole graph is simplified, and S represents the result after symmetric normalization, as follows:
wherein, for the formula (3),
and is
Is that
An degree matrix of,. is a point-by-point operator.
Weighted adjacency matrix that is an undirected graph. I is the identity matrix.
Further, in order to increase the weight of the edge item and reduce the interference of the noise of other items, the propagation matrix
The left half of equation (4) for the items with edges connected in (b) is weighted up, i.e., the attention of the self information in the matrix is raised, as shown in fig. 2.
Further, with respect to the formula (4), α and β are hyperparameters, respectively, to control the ratio of the propagation matrix information and the unit matrix information, thereby controlling the absorption ratio of the node information with attention during the propagation process. As shown in FIG. 2, the adjacency matrix
And propagation matrix
Item v with repeated clicks therein
2And item v converted during repeat click
3There will be higher attention information, i.e. weight.
The above steps carry out local smoothing on the implicit vector representation of the nodes along the graph, and after the graph convolution network is used as a feature preprocessing method to transmit the features, the nodes can absorb the attention information of adjacent nodes, and finally the locally connected nodes can have similar prediction performance;
further, a second intermediate implicit vector of item conversion in the session sequence diagram is extracted by using a gated graph neural network, namely, the directed structure information with attention is extracted by using the gated graph neural network, and the steps are as follows:
for nodes in the session sequence chart, the node vector updating steps are as follows:
wherein, for the formula (6),
and
the weights and the magnitude of the bias terms are controlled,
to represent the result of the interaction between a node and an adjacent node,
and
reset gate and update gate, respectively, weight matrix W
z、U
z,W
r、U
rAnd W
o、U
oRespectively represent learnable network parameters in the reset gate, the update gate and the output gate,
represents node v
iIs used to generate the second intermediate implicit vector of (c),
is a sequence of node vectors in the session, and
for the first intermediate implicit vector output by the graph convolution network, σ (·) is a sigmoid function, which is a point-by-point operator. Adjacency matrix
Representing the communication of the nodes in the graph,
representative node v
iIn that
Two columns of matrix blocks.
Wherein, the matrix
Is defined as an in-degree matrix
Sum degree matrix
Which represent weighted connections of the input and output edges, respectively, in the session sequence diagram. For example, given a session sequence s ═ v
1,v
2,v
3,v
2,v
4]Corresponding session sequence diagram G
sAnd adjacency matrix
As shown in fig. 3;
further, a project implicit vector is generated, which comprises the following steps:
GCN and GCN over graph convolution networkAfter the information of the gate control graph neural network GGNN is processed, a second intermediate implicit vector is obtained
In order to balance the proportion of the non-directional structural information with attention and the directional structural information, the following formula is adopted for control:
wherein gamma is a hyper-parameter, thus obtaining a final accurate item implicit vector H;
after the implicit vector representation of each item is obtained, a target vector is further constructed, so that the correlation of historical behaviors can be analyzed on the premise of considering the target item. The target items are all items to be predicted.
Computing all items v in a conversation s using a local target attention model
iFor each target item v
tAn attention score β for e V
i,tWherein
And
are respectively item v
iAnd v
tIs determined by the second intermediate implicit vector representation of (1).
In equation (12), all items in the conversation sequence are matched with the target item respectively, and the weighted matrix is used
To perform a pair-wise nonlinear transformation; the resulting self-attention score is then normalized by the softmax function to obtain the final attention score β
i,t;
Finally, for each conversation sequence s, the user is directed to a target item v
tCan be expressed as
I.e. vectors based on target attention
It represents the level of interest a user has in generating between different target items;
further, a conversation sequence vector is generated, which comprises the following steps:
representing a user's short-term interest as a local vector
By the last item in the conversation sequence
Represents the local vector, as shown in equation (14):
defining a user's long-term preferences as a global vector
Where all the appearing item vectors in session s are aggregated. While using a mechanism of attention to introduce items of last interaction
With items [ v ] appearing throughout the conversation
1,v
2,…,v
n]The dependency relationship between them.
Wherein the ratio of q,
and W
1,
Is a corresponding weight parameter, α
iRepresenting the dependency between the last item and the items that appear in the entire sequence of conversations.
And finally, splicing the local vector, the global vector and the vector based on the target attention obtained in the previous step, and obtaining a session vector corresponding to the session sequence s by utilizing linear conversion.
Wherein the weight parameter
Projecting the results of three vector concatenations into vector space
Performing the following steps;
further, the step of generating the recommendation at step S6 is as follows:
all the items v
iSecond implicit vector of e.V
The session vector s corresponding thereto
hThe multiplication is carried out, and the result is obtained,
then obtaining an output vector of the model through a softmax function pair
Wherein
Represents the predicted recommendation scores of all the target items, and
representing the probability that the target item is clicked at the next moment in the session sequence s.
The top K items with the highest ranking are the items to be recommended.
For each session sequence diagram, defining a loss function as the cross entropy of the predicted value and the actual value,
wherein y isiA one-hot encoded vector representing the actual click item at the next instance in the session sequence.
Finally, the steps S2-S6 are iteratively trained by using a time-based back propagation algorithm BPTT algorithm to generate parameters related thereto, such as W, α, β, and the like.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention provides a project recommendation method of a graph neural network based on undirected structure information and directed structure information, aiming at the problems that the existing method ignores projects which repeatedly appear in a click sequence, and the structure information in a conversation sequence graph is not well utilized when vector representation of the projects is generated. The method gives the prediction of the item clicked by the user at the next moment according to the current conversation sequence data of the user, and does not depend on the long-term preference information of the user in the process.
For the recommendation based on the conversation sequence, firstly, a directed conversation sequence diagram is constructed from the information of the historical conversation sequence; and extracting undirected structure information and directed structure information of item conversion in the conversation sequence diagram respectively by using a graph convolution network GCN and a gate control graph neural network GGNN, generating an accurate item implicit vector, inputting the obtained item implicit vector into an attention network, and considering global information and local information of the conversation simultaneously, thereby constructing a more reliable conversation representation and deducing a next click item.
As shown in fig. 1 and 8, the item recommendation method of the neural network based on undirected structure information and directed structure information includes:
s1, let V represent the collection of items appearing in all conversation sequences, i.e.
Then an anonymous session sequence of length n may be used
Is expressed and the items in the conversation s are chronologically arranged, each v
iE.g. V represents the item clicked by the user in the session s, and the recommendation is to predict the next click of the user, i.e. predict the sequence label V in the session s
n+1。
S2, converting the historical conversation sequence into a directed conversation sequenceDrawing G
s=(V
s,ε
s,A
s) In which V is
sRepresents a set of points,. epsilon
sRepresents the edge set, A
sRepresenting a set of adjacency matrices. In the session sequence chart G
sEach node in the set represents an item v
iE.g. V, and each edge (V)
i-1,v
i)∈ε
sAll represent that the user successively clicks the item v
i-1And item v
iAnd A is
sDefined as three adjacency matrixes
And
the splicing of the two pieces of the paper is carried out,
representing weighted adjacency matrices of undirected graphs, and
and
respectively representing weighted in-degree adjacency matrix and out-degree adjacency matrix.
S3, pair session sequence chart GsProcessing to obtain the item implicit vectors of all nodes in the session sequence diagram;
s4, inputting the item implicit vector into a target attention network, and obtaining a conversation sequence attention-based vector;
s5, acquiring global information and local information of the conversation sequence S, and constructing a conversation vector representation through second linear transformation;
s6, predicting the probability of all the target items being clicked on the conversation sequence S by the softmax function, and recommending items with a high probability.
In step S3, the session sequence chart G
sEach node v in
iMapping the e s to a random embedded vector space to obtain a d-dimensional vector representation
And obtaining an item implicit vector H through graph convolution network, gated graph neural network and linear processing.
The method specifically comprises the following steps:
(1) according to the session sequence chart G
sGenerating each item v
iInitial item implicit vector x corresponding to e V
iWherein V represents the set of items that appear in all sequences of conversations; each node v in the session sequence diagram
iCorresponding d-dimensional feature vector
The stack of (2) constituting a feature matrix of a session sequence diagram
X=[x
1,…,x
n]
T。
Generating xiThe method comprises the following specific steps:
1) session sequence diagram G
sWeighted undirected adjacency matrix in (1)
Is a sparse and symmetric adjacency matrix in which a
ijRepresents node v
iAnd v
jThe edge weight between the nodes, and the no connection relation between the nodes is expressed as a
ij=0。
2) Defining the degree matrix D as the diagonal matrix D ═ diag (D)1,…,dn) And the value on the diagonal is equal to the sum d of the row elements of the adjacent matrixi=∑jaij。
3) By passing
Get each node v in the graph
iAll have corresponding d-dimensional feature vectors
So the feature matrix of the conversation sequence
Namely, stacking of feature vectors corresponding to each node in the graph, i.e., X ═ X
1,…,x
n]
T。
(2) X is to beiAnd inputting the graph convolution network with the undirected structure information to obtain a first intermediate implicit vector of all nodes in the graph (the first intermediate implicit vector has the undirected structure information).
(3) And obtaining a second intermediate implicit vector (with the directional structure information) of all nodes in the graph through the gated graph neural network with the directional structure information.
(4) And inputting the second intermediate implicit vector into the first linear transformation to obtain an accurate item implicit vector.
The graph neural network has natural adaptability to the recommendation based on the conversation sequence, because the graph neural network can automatically extract the characteristics of the conversation sequence graph under the premise of considering rich node connection relation.
Similar to convolutional neural networks CNNs, multi-layer perceptron MLPs, the graph convolution network GCN is for each node v in a multi-layer structureiThe features of (a) are learned and a new feature representation is obtained and then input into the corresponding linear classifier.
In the step (2), the step of generating the first intermediate implicit vector includes:
s31, for the graph convolution layer of the k layer, use the matrix H(k-1)Input vector h representing all nodesiBy H(k)An output vector representing the node. The initial d-dimensional node vector is the feature of the initial input and is input into the first-layer GCN:
H(0)=X, (1)
a GCN with a number of layers K is equivalent to the eigenvector x for all nodes in the graphiA K-layer MLPs model is applied, except that the implicit vector representation of each node is averaged with its neighbor nodes at the beginning of each layer. Direction of nodes in each graph convolution layerThe volume representation has three update phases: feature propagation, linear transformation and point-by-point nonlinear activation. The only feature propagation stage used in the present invention for learning the item implicit vector.
S32, at the beginning of each layer, each node ShIs averaged with the feature vector of its local (random) neighbors:
the input of the graph convolution network is the first intermediate implicit vector.
Preferably, equation (2) is simplified with a simple matrix operation of the whole graph, as follows:
let S represent the contiguous matrix with self-loops after "symmetric normalization":
wherein
And is
Is that
The degree matrix of (n) is a dot-by-dot operator, and I is an identity matrix. Due to the adjacent matrix
Taking into account the weight of the repeated items, so the matrix
With additional attention paid to repeatedly clicking on the item. Meanwhile, due to the fact that S is provided with a self-loop, the symmetrical normalization process causes the weight of the items connected with multiple edges to be smaller than that of the items connected with a single edge or without the edges. Alpha, beta are hyper-parameters.
In order to increase the weight of the edge term and reduce the interference of the noise of other terms, the propagation matrix
For the items with connected edges, the weights are increased through the left half part of formula (4), i.e. the attention of self information in the matrix is increased.
The hyper-parameter alpha and the hyper-parameter beta are used for controlling the proportion of the propagation matrix information and the unit matrix information, thereby controlling the absorption proportion of the node information with attention in the propagation process. The adjacency matrix can be seen from the specific example given in fig. 2
And propagation matrix
Item α with repeat clicks therein
2And item alpha converted during repeat clicks
3There will be higher attention information, i.e. weight.
Thus, the equivalent updated form of equation (2) can be changed to a simple sparse matrix multiplication for all nodes.
The above steps locally smooth the implicit vector representation of the nodes along the graph, and after the graph convolution network is used as a feature preprocessing method to transmit the features, the nodes can absorb the attention information of adjacent nodes, and finally the locally connected nodes can have similar prediction performance.
In step (3), a gated graph neural network GGNN is constructed using the method of Li et al.
For session sequence diagram GsNode v iniThe node vector updating steps are as follows:
in equation (6), the adjacency matrix
Representing the communication of the nodes in the graph,
representative node v
iIn that
Two columns of matrix blocks in (1) are,
is a sequence of node vectors in the session, and
namely, the final output of the graph convolution network is used as the initial input of the gated graph neural network,
and
the weights and the magnitude of the bias terms are controlled,
which is used to represent the result of the interaction between a node and an adjacent node.
For equation (7), the reset gates are obtained by sigma (·) sigmoid function
And a retrofit gate
Weight matrix W
z、U
z,W
r、U
rAnd W
o、U
oNetwork parameters that can be learned in the reset gate, the update gate and the output gate, respectively, are point-by-point operators. Finally, the
Represents node v
iImplicit vectors generated by GGNN gated graph neural networks.
Matrix array
Is defined as an in-degree matrix
Sum degree matrix
Which represent weighted connections of the input and output edges, respectively, in the session sequence diagram. For example, given a session sequence s ═ v
1,α
2,v
3,v
2,v
4]Corresponding session sequence diagram G
sAnd adjacency matrix
As shown in fig. 3. It can be seen that the weights in the directed adjacency matrix are set according to the degree of closeness between nodes, e.g., v
2From alpha
3To alpha
4Each having an edge, but the two weights are different because of α
2And alpha
3The more edges are connected with each other, which means that the similarity between the two is higher. To achieve better prediction effect, from alpha
2Should absorb a more
3So the model should pay more attention to v
3On the body other than v
4。
Therefore, the sequence diagram G is for each sessionsThe GGNN model propagates node information with attention between adjacent nodes, while reset gating and update gating determine the next information that needs to be discarded or retained, respectively.
In the step (4), after the information processing of the graph convolution network GCN and the gated graph neural network GGNN, the information is respectively obtained
The former is to perform attention-bearing undirected structure information processing on an initial embedded vector, and the latter is to extract attention-bearing directed structure information in a graph structure more finely on the basis of the former.
In order to balance the proportion of the non-directional structural information with attention and the directional structural information, the following formula is adopted for control:
wherein gamma is a hyperparameter, thus obtaining a final precise item implicit vector H.
In step S4, the specific steps include:
s41, calculating all items v in the conversation S by using a local target attention model (the model is prior art and is not described in detail)
iFor each target item v
tAn attention score β for e V
i,tWherein
And
are respectively item v
iAnd v
tIs represented by an implicit vector.
In the above equation, items in the conversation are matched with the candidate targets, respectively, and the weighted matrix is used
To perform a pair-wise nonlinear conversion. The resulting self-attention score is then normalized by the softmax function and the final attention score is obtained.
S42, for each conversation sequence S, the user aims at the target item v
tCan be expressed as
Finally, a vector based on the target attention is obtained
Which represents the level of interest a user has in generating between different target items.
In step S5, the short-term and long-term preferences of the user are further explored using the item vectors involved in the session S, so as to obtain local vectors and global vectors in the session, and a final session vector is generated by synthesizing the target attention-based vectors calculated in the above section.
And S51, acquiring a local vector. In a session sequence s, withThe final behavior of the user is often determined by the last interactive item in the current sequence. Therefore, the short-term interest of the user is expressed as a local vector
And the local vector is the last item in the conversation sequence
Is represented by a vector of (a).
S52, obtaining a global vector, and defining the long-term preference of the user as the global vector
Where all the appearing item vectors in session s are aggregated. While also taking advantage of the attention mechanism to introduce items of last interaction
With items [ v ] appearing throughout the conversation
1,v
2,…,v
n]The dependency relationship between them.
Wherein the ratio of q,
and W
1,
Are the corresponding weight parameters.
And S53, splicing the obtained local variable, the global variable and the target attention-based vector, and obtaining a session vector corresponding to the session sequence S by utilizing linear conversion.
Wherein the weight parameter
Projecting the results of three vector concatenations into vector space
In (1). It is noted that different session vectors may be generated for different target items (items in the session sequence) correspondingly.
In step S6, a session vector S corresponding to each session sequence S is obtained
hThereafter, for all items v
iScore for e.V
Performing calculations, i.e. vectors of candidate items
And session vector s
hThe multiplication is carried out, and the result is obtained,
then obtaining an output vector of the model through a softmax function pair
Wherein
Represents the predicted recommendation scores of all the target items, and
representing the probability that the target item is clicked at the next moment in the session sequence s,
the top K items with the highest ranking are the items to be recommended.
Sequence diagram G for each sessionsDefining the loss function as the cross entropy of the predicted value and the actual value,
wherein y isiA one-hot encoding vector (one-hot embedding) representing the real click item at the next instant in the session sequence.
In use, the data sets may be used to iteratively train steps S2-S6, such as training using the time-based back propagation BPTT algorithm, to obtain parameters such as W, W1, W2, W3, etc. in the above steps, which may be initially randomly set and then learned during training.
In training, each sequence is used as a training sample, so the total error is the sum of the errors at each time step (recommendation). Note that in a recommendation scenario based on conversational sequences, most conversations are relatively short sequences. To prevent the occurrence of overfitting, a smaller number of training passes is used.
Experimental analysis:
1. experimental data set
The method was evaluated in actual practice using the public data set Digineica published as the public data set Yoochoose and CIKM Cup 2016. The Yoochoose data set contains the user click stream within 6 months on the electronic shopping platform, and the Diginetica data set only contains the data of successful transaction, namely the purchase stream of the user.
At the same time, corresponding sequences and tags are further generated by slicing the input sequence data. For an input session sequence s ═ v1,v2,…,vn]As a data enhancement strategy, a series of sequences and tags ([ v ] are generated1],v2),([v1,v2],v3),…,([v1,v2,…,vn-1],vn) Among them, [ v ]1,v2,…,vn-1]Is the sequence generated, and vnA label representing the item, i.e. the sequence, clicked on at the next moment.
The details of the data set finally used are shown in table 1.
TABLE 1 Experimental data set statistics
2. Evaluation criteria
After the data set is determined, two metrics that are very common in the recommendation based on the conversation sequence are adopted as evaluation indexes of the algorithm.
(1) P @20(Precision) is a widely used measure of prediction accuracy. It represents the proportion of correct recommendations in the top 20 items of the algorithm recommendation.
(2) MRR @20(Mean Recircular Rank) is the Reciprocal Rank Mean of the correctly recommended items in the algorithm recommendations. When the true result exceeds 20 in the recommended ranks of the algorithm, the corresponding reciprocal rank is 0. The MMR metric is a method considering the recommendation order, and a larger MRR value represents that the real result is positioned at the top of the ranking list in the recommendation list, so that the effectiveness of the recommendation system is proved.
3. Experimental setup
The dimension of the implicit vector is set to d 100 in both datasets. All hyper-parameter settings utilize a mean of 0, standard deviationInitialization is performed for a gaussian distribution function of 0.1. These parameters were also optimized using a small batch Adam optimizer and the initial learning rate η was set to 0.001 and attenuated by 0.1 every three training cycles. Further, the batch size was set to 100, and the L2 regularization parameter was set to 10-5。
4. Analysis of Experimental results
The performance of several methods on both the P @20 and MRR @20 indices is shown in table 2, where bolding shows the best results. The method provided by the invention can flexibly construct the relation between the items on the conversation sequence diagram, and extract the directed structure information and the undirected structure information with attention, so that the subsequent learning of target attention can be more accurate, and the final recommendation can be given by integrating the global interest and the local interest of the user in the conversation. From the experimental data in table 2, it is clear that the method achieves the best performance results on both indexes on the three data sets, which proves the effectiveness of the method.
Conventional recommendation methods such as POP and S-POP do not perform well on the problem based on session sequences because they ignore the user' S preferences in the current session and only consider the top K most popular items. BPR-MF states that it is meaningful to use semantic information in a conversation, while better performing FPMC states that modeling conversation sequences using first-order Markov chains is a relatively efficient method. Also as a traditional recommendation method, Item-KNN is superior to the first two. It is worth noting that Item-KNN relies only on computing similarity between items, which suggests that the simultaneous presence of items is also a relatively important piece of information. However, Item-KNN does not take into account timing information in the conversation, and there is no way to capture information for switching between items.
Unlike the traditional method, the deep learning-based method basically has better performance in the index results of all data sets. GRU4Rec is a recurrent neural network-based method that can perform better than and to a similar degree as some conventional methods. This demonstrates that the recurrent neural network has some modeling capability for sequence data. However, GRU4Rec focuses mainly on modeling the session sequence and cannot capture the user preferences in the session. Later emerging methods such as NARM and STAMP both significantly improved GRU4 Rec. Nar explicitly captures the user's main preferences in the session, whereas STAMP takes advantage of the attention mechanism to consider the user's short-term interests, which is why they are superior to GRU4 Rec. The RepeatNet is also an algorithm based on a recurrent neural network, and achieves a better prediction effect by considering the repeated click behavior of the user, which indicates that the model has certain importance on the behavior habit of the user. However, repatnet has limited promotion over NARM and STAMP, possibly because modeling the user's repeated click habits through project features alone is inadequate and RNN-based structures cannot capture some common dependencies within a session.
The graph neural network based approach constructs each session sequence as a subgraph and encodes all the items in the session through the graph neural network. Both SR-GNN and TAGNN gave better results than all RNN-based models. SR-GNN utilizes a gated graph neural network to learn dependencies between items within a sequence of conversations, while TAGNN further exploits the dependencies between items within a conversation and target items with a mechanism of attention. However, these methods completely learn according to the directed relationship of the session sequence diagram, and do not comprehensively consider the undirected relationship in the session sequence diagram, because the relationship between the items in the session sequence is sometimes not unidirectional but bidirectional, and the more comprehensive relationship between the items can be captured by using undirected structure information. Moreover, they ignore the repetitive click feature that occurs in conversational sequences, and intuitively should the importance of a repeated occurrence of an item in a sequence be greater. In addition, in an actual recommendation scene, the association degree between items is variable, and the methods adopt an averaging method for the dependency relationship between items in the session, so that the different dependency degree of a certain item on other items cannot be reflected through a weighting or attention method.
The methods presented herein perform better than the methods described above. Specifically, there were relative lifts of 3.55%, 1.38%, 1.18% for the P @20 pair of best performing correlation methods and 1.92%, 4.34%, 1.98% for the MRR @20 pair of best performing correlation methods on the three datasets. The method can well extract the structural information in the session sequence diagram, sequentially extracts the undirected structural information and the directed structural information in the diagram by utilizing the graph convolution network and the gated graph neural network, and linearly combines the undirected structural information and the directed structural information so as to achieve accurate expression of the vector. In addition, repeated click items in the conversation sequence are considered, and the weight of repeated information is improved through an attention network; meanwhile, the self-information proportion of the nodes in the conversation sequence diagram is improved by adding self-loop and matrix operation, so that the nodes are not easily interfered by noise of other nodes. And then different weights are distributed by using the attention network according to different dependency relations among the projects, so that the network can generate accurate vector representation.
TABLE 2 comparison of the results
Ablation experiment:
the method can flexibly capture the relationship between the structural information and the items in the conversation sequence diagram. In order to verify the actual effect of each composition in the model, several model variants were set up for ablation experiments. In the experimental link, SR-GNN is selected as a reference method for comparison, and data in the experiment are displayed in the form of relative promotion percentage of comparison SR-GNN.
Firstly, performing combined analysis of directed structure information and undirected structure information: (a) -GCN, extracting only undirected structure information in the session sequence diagram. (b) GNN, extracting only the directed structure information in the session sequence graph. (c) GCN + GNN, inputting random initial vectors into two neural networks simultaneously, and then linearly combining the model output results. (d) -GCN + GNN (GCN), inputting the random initial vector into GCN, taking the output vector of GCN model as the input of GNN model, and finally linearly combining the output results of the two models. The results of the experimental comparison are shown in fig. 4 and 5.
Where AVG represents the average performance of the four combination conditions over the three data sets, respectively. As can be seen from fig. 4 and 5, the method GCN + Gnn (GCN) for integrating the directional structure information and the undirected structure information obtains the best results on both indicators P @20 and MRR @20 of the three data sets, which proves the importance of comprehensively considering the directional structure information and the undirected structure information. The average data AVG in fig. 5 also shows that the undirected structure information alone performs better than the directed structure information alone, whereas the directed structure information performs slightly better than the undirected structure information in terms of the performance of the single data set, on the MRR @20 index of the Yoochoose 1/4 data set and the dignetica data set instead. This reflects to some extent the connection between the user's preferences and the items in the conversation sequence based recommendations, the direction of the transition between items being of different importance in different scenarios, but on average also the absence of structural information is more important. This illustrates that, in the context of conversation-based sequence recommendation, the direction of the user's transition between items is worth considering, but the user needs to consider the relationship between the items viewed by the user in order to learn the user's preference better. And better than the former two methods, namely the directional structure information and the undirected structure information are comprehensively considered, the comprehensive performance and the average performance AVG of the GCN and the GNN in the methods of the figure 4 and the figure 5 are basically better than the performance of the GCN or the GNN which is singly used. The input data of the two network models in the GCN + GNN method are random embedded vectors, and the input of the GNN model in the GCN + GNN (GCN) method is a vector for extracting the undirected structure information through the GCN model, which shows that compared with the method of directly using the random vector, the embedded vector which is more accurately represented can be obtained by extracting the undirected structure information firstly and then extracting the directed structure information.
And then, performing combined analysis of repeated click attention information and the dependency relationship between the items: (a) GCN + GNN, does not consider the different dependencies between the click-repeatedly attention information and the items. (b) AttGCN + GNN, attention information for a repeatedly clicked item is considered only in GCN. (c) GCN + AttGNN, only consider varying degrees of dependency between items in GNN. (d) AttGCN + AttGNN, while fusing repeatedly clicked attention information in GCN and attention-bearing item dependencies in GNN. The experimental results are shown in fig. 6 and 7.
As can be seen from the data in fig. 6 and 7, the AttGCN + AttGNN, which comprehensively considers the repeated click attention information and the inter-item dependency, has the best experimental results on both indicators of the three data sets, which indicates that the repeated click behavior and the inter-item dependency are of certain importance in the recommendation based on the conversation sequence. Meanwhile, according to the experimental expression of the P @20 index in FIG. 6, the attention considering the repeated click attention and the relation between items independently has better expression than the GCN + GNN not considering any attention information, and the AttGCN + AttGNN comprehensively considering the two has the best effect, which shows that the attention information can ensure that important information is kept as much as possible and more accurately expressed as an embedded vector. However, according to the experimental performance of the MRR @20 index in fig. 7, although the AttGCN + AttGNN considering two attention points together still can obtain the best experimental performance, the performance considering one of the two attention points alone is slightly worse than the performance considering GCN + GNN not considering any attention information, that is, the accuracy of the recommendation result can be improved by using the attention information alone, but the prediction effect on the recommendation ranking is not good. The possible reason is that if one model considers attention and the other model does not consider when the vectors are input into the GCN and GNN models, the expression patterns of the adjacent matrixes in the two models are not unified, so that the vectors cannot simultaneously use attention to retain structural information when being input and output between the two models, and the structural information is interfered because of the inconsistent attention patterns, so that an embedded vector with accurate representation is not generated finally, and an accurate prediction score of each item cannot be calculated in the prediction stage.
In the recommendation scene based on the conversation sequence, the repeated click behavior of the user and the graph structure information are both considerable and considerable, because the behavior of the user can be well predicted without knowing the historical preference of the user. The patent refers to the field of 'electric digital data processing'. The invention not only utilizes GCN and GNN models to extract the directed structure information and the undirected structure information in the conversation sequence diagram and carry out linear combination, but also introduces an attention mechanism when generating the hidden vector of the project, effectively extracts the repeated click of the user and the complex conversion information between the projects, and leads the generated conversation vector to be predicted more accurately in the recommendation process. On the actual data sets in three realistic scenarios, the present invention verifies that the proposed algorithm is superior to other most advanced methods, and verifies the effectiveness of the attention mechanism and the complex structural information through exhaustive ablation experiments.
Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once the basic inventive concepts have been learned, numerous changes and modifications may be made without departing from the principles of the invention, which shall be deemed to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.