CN114117229A

CN114117229A - An Item Recommendation Method Based on Directed and Undirected Structural Information of Graph Neural Networks

Info

Publication number: CN114117229A
Application number: CN202111447363.6A
Authority: CN
Inventors: 王庆梅; 王铮; 胡承佐; 靳博文
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-03-01
Anticipated expiration: 2041-12-01
Also published as: CN114117229B

Abstract

The invention discloses a graph neural network project recommendation method based on directed and undirected structural information, comprising: extracting undirected structural information in the graph by using a graph convolutional network according to the adjacency relationship of the session sequence graph, and using gate control The graph neural network extracts the directed structure information in the graph, and then calculates the implicit vector of the intermediate item, and obtains the final item implicit vector through linear transformation on the obtained implicit vector of the intermediate item; assigns the repeated click items that appear in the session sequence Higher attention, and the attention mechanism is introduced when generating the item latent vector, and the weight coefficient of the corresponding item is modified according to the degree of inter-item dependency. The present invention enables the generated session vector to be more accurately predicted in the recommendation process.

Description

Project recommendation method of graph neural network based on directed and undirected structural information

Technical Field

The invention relates to the technical field of recommendation methods, in particular to a project recommendation method of a graph neural network based on directed and undirected structural information.

Background

Recommendation systems are one of the most important downstream applications in the field of data mining and machine learning. It can help platform users alleviate the problem of information overload and sort out valuable information in many web applications for e-commerce platforms, music websites, and the like. In most recommendation systems the sequence of user actions is time-ordered and characterized by anonymity and large data volumes. In order to predict the behavior information of the user at the next moment, the recommendation based on the conversation sequence learns the preference of the user by mining the sequence order characteristic information in the historical behavior of the user.

A conversation sequence refers to a sequence of items generated by user clicks over a time interval; and the recommendation based on the conversation sequence can capture the importance of the dependency relationship in the sequence to the sequence prediction. In other words, users often have a common purpose in a certain sequence of sessions, such as purchasing summer clothing; the behavior characteristics of the user between different sequences may not be relevant, for example, the user may aim to purchase a cell phone accessory in other sessions. The recommendation based on the session sequence is to predict the next click of the user, i.e. the sequence tag v in the session s_n+1. Using a recommendation model based on the sequence of conversations, the probability of all possible items is available for each conversation s

Wherein

Probability vector

All possible cases of the next click item occurring after the current session are included, and the value of each element represents the recommendation score of the corresponding item,

middle rowThe top K items are candidate items to be recommended.

In view of its high practical value, recommendations based on conversational sequences have received great attention in recent years and many well-worked research results have emerged. Early methods were based primarily on markov chains and recurrent neural networks. With the recent rise of neural networks and their unsophisticated performance in many downstream tasks, some research efforts have applied GNNs to conversational-based sequence recommendations. Although these GNNs-based methods perform well, there are some problems with these methods.

(1) Repeatedly appearing items in the click sequence are ignored. In fact, items that appear multiple times are not as important as other items, and to some extent, can reflect user preference information.

(2) The structural information in the session sequence diagram is not well utilized in generating the vector representation of the item. It is not sufficient to actually consider only the directionality between items, and introducing undirected relationships between items enables better learning of the user's behavioral information.

For example, the paper improvement conversation in session-based collaborative filtering by using temporal context of FONSECA et al proposes to convert the sparse session vector into a dense vector by using a clustering method; the Session-based formatting for predicting the next song of park.s.e et al, proposes a method of converting Session sequences into vectors and then calculating cosine similarity between the Session vectors. It can be seen that the neighborhood-based approach is simple but effective; meanwhile, the method is also influenced by data sparsity, and more importantly, the method does not consider the complex conversion relation among the items in the session vector.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a project recommendation method of a neural network based on directed and undirected structure information, so as to solve the problems of neglecting repeated projects in a click sequence and not well utilizing the structure information in a conversation sequence diagram when generating vector representation of the projects in the prior art.

In order to solve the above technical problem, the present invention provides a project recommendation method based on a neural network of a graph with directed and undirected structural information, the overall framework of which is shown in fig. 1, and the method comprises the following steps:

s1, let V represent the collection of items appearing in all conversation sequences, i.e.

Then an anonymous session sequence of length n may be used

Is expressed and the items in the conversation s are chronologically arranged, each v_iE.g. V represents the item clicked by the user in the session s, and the recommendation is to predict the next click of the user, i.e. predict the sequence label V in the session s_n+1；

S2, receiving the historical conversation sequence, and converting the historical conversation sequence into a directed conversation sequence chart G_s＝(V_s,ε_s,A_s) In which V is_sRepresents a set of points,. epsilon_sRepresents the edge set, A_sRepresents a set of adjacency matrices, A_sDefined as three adjacency matrixes

And

the splicing of (a), wherein,

representing weighted adjacency matrices of undirected graphs, and

and

respectively representing weighted in-degree adjacent momentsArray and outgoing adjacency matrix;

s3, connecting point v_iMapping the e to the random embedded vector space to obtain d-dimensional vector representation

Extracting a first intermediate implicit vector of an item in the conversation sequence diagram by using a graph convolution network, extracting a second intermediate implicit vector of item conversion in the conversation sequence diagram by using a gated diagram neural network, and obtaining an item implicit vector through first linear transformation;

s4, inputting the item implicit vector into the target attention network, thereby obtaining a session implicit vector corresponding to the session sequence S;

s5, acquiring global information and local information of the conversation sequence S, and constructing a conversation vector representation through second linear transformation;

s6, predicting the probability of all possible items being clicked on the conversation sequence S by the softmax function, and recommending items with a high probability.

Further, a first intermediate implicit vector of an item in the session sequence diagram is extracted by using the graph volume network, that is, undirected structure information with attention is extracted by using the graph volume network, and the steps are as follows:

s31, generating a feature matrix X of the conversation sequence diagram; each node v in the session sequence diagram_iCorresponding d-dimensional feature vector

The stack of (2) constituting a feature matrix of a session sequence diagram

X＝[x₁,…,x_n]^T；

S32, for the graph convolution layer of the k layer, use the matrix H^(k-1)Input vectors representing all nodes, by H^(k)An output vector representing the node, wherein the initial d-dimensional node vector is the initial inputThe first layer of the graph convolution layer network is characterized by the following formula:

H⁽⁰⁾＝X, (1)

before the input of each graph convolution layer, each node v_iThe feature of (a) is averaged with the feature vector of its local neighbor, and the calculation formula is:

wherein, a_ijIs node v_iAnd v_jEdge weight between, d_i＝∑_ja_ij；

S33, the output of the graph convolution network is a first intermediate implicit vector.

Further, for equation (2), the simple matrix operation of the whole graph is simplified, and S represents the result after symmetric normalization, as follows:

wherein, for the formula (3),

and is

Is that

An degree matrix of,. is a point-by-point operator.

Weighted adjacency matrix that is an undirected graph. I is the identity matrix.

Further, in order to increase the weight of the edge item and reduce the interference of the noise of other items, the propagation matrix

The left half of equation (4) for the items with edges connected in (b) is weighted up, i.e., the attention of the self information in the matrix is raised, as shown in fig. 2.

Further, with respect to the formula (4), α and β are hyperparameters, respectively, to control the ratio of the propagation matrix information and the unit matrix information, thereby controlling the absorption ratio of the node information with attention during the propagation process. As shown in FIG. 2, the adjacency matrix

And propagation matrix

Item v with repeated clicks therein₂And item v converted during repeat click₃There will be higher attention information, i.e. weight.

The above steps carry out local smoothing on the implicit vector representation of the nodes along the graph, and after the graph convolution network is used as a feature preprocessing method to transmit the features, the nodes can absorb the attention information of adjacent nodes, and finally the locally connected nodes can have similar prediction performance;

further, a second intermediate implicit vector of item conversion in the session sequence diagram is extracted by using a gated graph neural network, namely, the directed structure information with attention is extracted by using the gated graph neural network, and the steps are as follows:

for nodes in the session sequence chart, the node vector updating steps are as follows:

wherein, for the formula (6),

and

the weights and the magnitude of the bias terms are controlled,

to represent the result of the interaction between a node and an adjacent node,

and

reset gate and update gate, respectively, weight matrix W_z、U_z，W_r、U_rAnd W_o、U_oRespectively represent learnable network parameters in the reset gate, the update gate and the output gate,

represents node v_iIs used to generate the second intermediate implicit vector of (c),

is a sequence of node vectors in the session, and

for the first intermediate implicit vector output by the graph convolution network, σ (·) is a sigmoid function, which is a point-by-point operator. Adjacency matrix

Representing the communication of the nodes in the graph,

representative node v_iIn that

Two columns of matrix blocks.

Wherein, the matrix

Is defined as an in-degree matrix

Sum degree matrix

Which represent weighted connections of the input and output edges, respectively, in the session sequence diagram. For example, given a session sequence s ═ v₁,v₂,v₃,v₂,v₄]Corresponding session sequence diagram G_sAnd adjacency matrix

As shown in fig. 3;

further, a project implicit vector is generated, which comprises the following steps:

GCN and GCN over graph convolution networkAfter the information of the gate control graph neural network GGNN is processed, a second intermediate implicit vector is obtained

In order to balance the proportion of the non-directional structural information with attention and the directional structural information, the following formula is adopted for control:

wherein gamma is a hyper-parameter, thus obtaining a final accurate item implicit vector H;

after the implicit vector representation of each item is obtained, a target vector is further constructed, so that the correlation of historical behaviors can be analyzed on the premise of considering the target item. The target items are all items to be predicted.

Computing all items v in a conversation s using a local target attention model_iFor each target item v_tAn attention score β for e V_i,tWherein

And

are respectively item v_iAnd v_tIs determined by the second intermediate implicit vector representation of (1).

In equation (12), all items in the conversation sequence are matched with the target item respectively, and the weighted matrix is used

To perform a pair-wise nonlinear transformation; the resulting self-attention score is then normalized by the softmax function to obtain the final attention score β_i,t；

Finally, for each conversation sequence s, the user is directed to a target item v_tCan be expressed as

I.e. vectors based on target attention

It represents the level of interest a user has in generating between different target items;

further, a conversation sequence vector is generated, which comprises the following steps:

representing a user's short-term interest as a local vector

By the last item in the conversation sequence

Represents the local vector, as shown in equation (14):

defining a user's long-term preferences as a global vector

Where all the appearing item vectors in session s are aggregated. While using a mechanism of attention to introduce items of last interaction

With items [ v ] appearing throughout the conversation₁,v₂,…,v_n]The dependency relationship between them.

Wherein the ratio of q,

and W₁,

Is a corresponding weight parameter, α_iRepresenting the dependency between the last item and the items that appear in the entire sequence of conversations.

And finally, splicing the local vector, the global vector and the vector based on the target attention obtained in the previous step, and obtaining a session vector corresponding to the session sequence s by utilizing linear conversion.

Wherein the weight parameter

Projecting the results of three vector concatenations into vector space

Performing the following steps;

further, the step of generating the recommendation at step S6 is as follows:

all the items v_iSecond implicit vector of e.V

The session vector s corresponding thereto_hThe multiplication is carried out, and the result is obtained,

then obtaining an output vector of the model through a softmax function pair

Wherein

Represents the predicted recommendation scores of all the target items, and

representing the probability that the target item is clicked at the next moment in the session sequence s.

The top K items with the highest ranking are the items to be recommended.

For each session sequence diagram, defining a loss function as the cross entropy of the predicted value and the actual value,

wherein y is_iA one-hot encoded vector representing the actual click item at the next instance in the session sequence.

Finally, the steps S2-S6 are iteratively trained by using a time-based back propagation algorithm BPTT algorithm to generate parameters related thereto, such as W, α, β, and the like.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of the overall framework of the model provided by the present invention;

FIG. 2 is an exemplary session sequence diagram and its corresponding undirected adjacency matrix with weights provided by the present invention

And weighted retransmission matrix

A schematic diagram;

FIG. 3 is an exemplary session sequence diagram and its corresponding adjacency matrix provided by the present invention

A schematic diagram;

FIG. 4 is a graph showing the behavior of different components in the P @20 index between structural information components in an ablation experiment according to the present invention;

FIG. 5 is a graph illustrating the behavior of the invention in terms of different components in the MRR @20 index between structural information components in an ablation experiment;

FIG. 6 is a graph of the present invention showing the behavior of different components in the P @20 index between the components of the attention information in the ablation experiment);

FIG. 7 is a graph illustrating the behavior of different components in the MRR @20 index of the component of the attention information in an ablation experiment according to the present invention;

FIG. 8 is a flowchart illustrating an item recommendation method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention provides a project recommendation method of a graph neural network based on undirected structure information and directed structure information, aiming at the problems that the existing method ignores projects which repeatedly appear in a click sequence, and the structure information in a conversation sequence graph is not well utilized when vector representation of the projects is generated. The method gives the prediction of the item clicked by the user at the next moment according to the current conversation sequence data of the user, and does not depend on the long-term preference information of the user in the process.

For the recommendation based on the conversation sequence, firstly, a directed conversation sequence diagram is constructed from the information of the historical conversation sequence; and extracting undirected structure information and directed structure information of item conversion in the conversation sequence diagram respectively by using a graph convolution network GCN and a gate control graph neural network GGNN, generating an accurate item implicit vector, inputting the obtained item implicit vector into an attention network, and considering global information and local information of the conversation simultaneously, thereby constructing a more reliable conversation representation and deducing a next click item.

As shown in fig. 1 and 8, the item recommendation method of the neural network based on undirected structure information and directed structure information includes:

Then an anonymous session sequence of length n may be used

Is expressed and the items in the conversation s are chronologically arranged, each v_iE.g. V represents the item clicked by the user in the session s, and the recommendation is to predict the next click of the user, i.e. predict the sequence label V in the session s_n+1。

S2, converting the historical conversation sequence into a directed conversation sequenceDrawing G_s＝(V_s,ε_s,A_s) In which V is_sRepresents a set of points,. epsilon_sRepresents the edge set, A_sRepresenting a set of adjacency matrices. In the session sequence chart G_sEach node in the set represents an item v_iE.g. V, and each edge (V)_i-1,v_i)∈ε_sAll represent that the user successively clicks the item v_i-1And item v_iAnd A is_sDefined as three adjacency matrixes

And

the splicing of the two pieces of the paper is carried out,

representing weighted adjacency matrices of undirected graphs, and

and

respectively representing weighted in-degree adjacency matrix and out-degree adjacency matrix.

S3, pair session sequence chart G_sProcessing to obtain the item implicit vectors of all nodes in the session sequence diagram;

s4, inputting the item implicit vector into a target attention network, and obtaining a conversation sequence attention-based vector;

s6, predicting the probability of all the target items being clicked on the conversation sequence S by the softmax function, and recommending items with a high probability.

In step S3, the session sequence chart G_sEach node v in_iMapping the e s to a random embedded vector space to obtain a d-dimensional vector representation

And obtaining an item implicit vector H through graph convolution network, gated graph neural network and linear processing.

The method specifically comprises the following steps:

(1) according to the session sequence chart G_sGenerating each item v_iInitial item implicit vector x corresponding to e V_iWherein V represents the set of items that appear in all sequences of conversations; each node v in the session sequence diagram_iCorresponding d-dimensional feature vector

The stack of (2) constituting a feature matrix of a session sequence diagram

X＝[x₁,…,x_n]^T。

Generating x_iThe method comprises the following specific steps:

1) session sequence diagram G_sWeighted undirected adjacency matrix in (1)

Is a sparse and symmetric adjacency matrix in which a_ijRepresents node v_iAnd v_jThe edge weight between the nodes, and the no connection relation between the nodes is expressed as a_ij＝0。

2) Defining the degree matrix D as the diagonal matrix D ═ diag (D)₁,…,d_n) And the value on the diagonal is equal to the sum d of the row elements of the adjacent matrix_i＝∑_ja_ij。

3) By passing

Get each node v in the graph_iAll have corresponding d-dimensional feature vectors

So the feature matrix of the conversation sequence

Namely, stacking of feature vectors corresponding to each node in the graph, i.e., X ═ X₁,…,x_n]^T。

(2) X is to be_iAnd inputting the graph convolution network with the undirected structure information to obtain a first intermediate implicit vector of all nodes in the graph (the first intermediate implicit vector has the undirected structure information).

(3) And obtaining a second intermediate implicit vector (with the directional structure information) of all nodes in the graph through the gated graph neural network with the directional structure information.

(4) And inputting the second intermediate implicit vector into the first linear transformation to obtain an accurate item implicit vector.

The graph neural network has natural adaptability to the recommendation based on the conversation sequence, because the graph neural network can automatically extract the characteristics of the conversation sequence graph under the premise of considering rich node connection relation.

Similar to convolutional neural networks CNNs, multi-layer perceptron MLPs, the graph convolution network GCN is for each node v in a multi-layer structure_iThe features of (a) are learned and a new feature representation is obtained and then input into the corresponding linear classifier.

In the step (2), the step of generating the first intermediate implicit vector includes:

s31, for the graph convolution layer of the k layer, use the matrix H^(k-1)Input vector h representing all nodes_iBy H^(k)An output vector representing the node. The initial d-dimensional node vector is the feature of the initial input and is input into the first-layer GCN:

H⁽⁰⁾＝X, (1)

a GCN with a number of layers K is equivalent to the eigenvector x for all nodes in the graph_iA K-layer MLPs model is applied, except that the implicit vector representation of each node is averaged with its neighbor nodes at the beginning of each layer. Direction of nodes in each graph convolution layerThe volume representation has three update phases: feature propagation, linear transformation and point-by-point nonlinear activation. The only feature propagation stage used in the present invention for learning the item implicit vector.

S32, at the beginning of each layer, each node S_hIs averaged with the feature vector of its local (random) neighbors:

the input of the graph convolution network is the first intermediate implicit vector.

Preferably, equation (2) is simplified with a simple matrix operation of the whole graph, as follows:

let S represent the contiguous matrix with self-loops after "symmetric normalization":

wherein

And is

Is that

The degree matrix of (n) is a dot-by-dot operator, and I is an identity matrix. Due to the adjacent matrix

Taking into account the weight of the repeated items, so the matrix

With additional attention paid to repeatedly clicking on the item. Meanwhile, due to the fact that S is provided with a self-loop, the symmetrical normalization process causes the weight of the items connected with multiple edges to be smaller than that of the items connected with a single edge or without the edges. Alpha, beta are hyper-parameters.

In order to increase the weight of the edge term and reduce the interference of the noise of other terms, the propagation matrix

For the items with connected edges, the weights are increased through the left half part of formula (4), i.e. the attention of self information in the matrix is increased.

The hyper-parameter alpha and the hyper-parameter beta are used for controlling the proportion of the propagation matrix information and the unit matrix information, thereby controlling the absorption proportion of the node information with attention in the propagation process. The adjacency matrix can be seen from the specific example given in fig. 2

And propagation matrix

Item α with repeat clicks therein₂And item alpha converted during repeat clicks₃There will be higher attention information, i.e. weight.

Thus, the equivalent updated form of equation (2) can be changed to a simple sparse matrix multiplication for all nodes.

The above steps locally smooth the implicit vector representation of the nodes along the graph, and after the graph convolution network is used as a feature preprocessing method to transmit the features, the nodes can absorb the attention information of adjacent nodes, and finally the locally connected nodes can have similar prediction performance.

In step (3), a gated graph neural network GGNN is constructed using the method of Li et al.

For session sequence diagram G_sNode v in_iThe node vector updating steps are as follows:

in equation (6), the adjacency matrix

Representing the communication of the nodes in the graph,

representative node v_iIn that

Two columns of matrix blocks in (1) are,

is a sequence of node vectors in the session, and

namely, the final output of the graph convolution network is used as the initial input of the gated graph neural network,

and

the weights and the magnitude of the bias terms are controlled,

which is used to represent the result of the interaction between a node and an adjacent node.

For equation (7), the reset gates are obtained by sigma (·) sigmoid function

And a retrofit gate

Weight matrix W_z、U_z，W_r、U_rAnd W_o、U_oNetwork parameters that can be learned in the reset gate, the update gate and the output gate, respectively, are point-by-point operators. Finally, the

Represents node v_iImplicit vectors generated by GGNN gated graph neural networks.

Matrix array

Is defined as an in-degree matrix

Sum degree matrix

Which represent weighted connections of the input and output edges, respectively, in the session sequence diagram. For example, given a session sequence s ═ v₁,α₂,v₃,v₂,v₄]Corresponding session sequence diagram G_sAnd adjacency matrix

As shown in fig. 3. It can be seen that the weights in the directed adjacency matrix are set according to the degree of closeness between nodes, e.g., v₂From alpha₃To alpha₄Each having an edge, but the two weights are different because of α₂And alpha₃The more edges are connected with each other, which means that the similarity between the two is higher. To achieve better prediction effect, from alpha₂Should absorb a more₃So the model should pay more attention to v₃On the body other than v₄。

Therefore, the sequence diagram G is for each session_sThe GGNN model propagates node information with attention between adjacent nodes, while reset gating and update gating determine the next information that needs to be discarded or retained, respectively.

In the step (4), after the information processing of the graph convolution network GCN and the gated graph neural network GGNN, the information is respectively obtained

The former is to perform attention-bearing undirected structure information processing on an initial embedded vector, and the latter is to extract attention-bearing directed structure information in a graph structure more finely on the basis of the former.

wherein gamma is a hyperparameter, thus obtaining a final precise item implicit vector H.

In step S4, the specific steps include:

s41, calculating all items v in the conversation S by using a local target attention model (the model is prior art and is not described in detail)_iFor each target item v_tAn attention score β for e V_i,tWherein

And

are respectively item v_iAnd v_tIs represented by an implicit vector.

In the above equation, items in the conversation are matched with the candidate targets, respectively, and the weighted matrix is used

To perform a pair-wise nonlinear conversion. The resulting self-attention score is then normalized by the softmax function and the final attention score is obtained.

S42, for each conversation sequence S, the user aims at the target item v_tCan be expressed as

Finally, a vector based on the target attention is obtained

Which represents the level of interest a user has in generating between different target items.

In step S5, the short-term and long-term preferences of the user are further explored using the item vectors involved in the session S, so as to obtain local vectors and global vectors in the session, and a final session vector is generated by synthesizing the target attention-based vectors calculated in the above section.

And S51, acquiring a local vector. In a session sequence s, withThe final behavior of the user is often determined by the last interactive item in the current sequence. Therefore, the short-term interest of the user is expressed as a local vector

And the local vector is the last item in the conversation sequence

Is represented by a vector of (a).

S52, obtaining a global vector, and defining the long-term preference of the user as the global vector

Where all the appearing item vectors in session s are aggregated. While also taking advantage of the attention mechanism to introduce items of last interaction

Wherein the ratio of q,

and W₁,

Are the corresponding weight parameters.

And S53, splicing the obtained local variable, the global variable and the target attention-based vector, and obtaining a session vector corresponding to the session sequence S by utilizing linear conversion.

Wherein the weight parameter

Projecting the results of three vector concatenations into vector space

In (1). It is noted that different session vectors may be generated for different target items (items in the session sequence) correspondingly.

In step S6, a session vector S corresponding to each session sequence S is obtained_hThereafter, for all items v_iScore for e.V

Performing calculations, i.e. vectors of candidate items

And session vector s_hThe multiplication is carried out, and the result is obtained,

then obtaining an output vector of the model through a softmax function pair

Wherein

Represents the predicted recommendation scores of all the target items, and

representing the probability that the target item is clicked at the next moment in the session sequence s,

the top K items with the highest ranking are the items to be recommended.

Sequence diagram G for each session_sDefining the loss function as the cross entropy of the predicted value and the actual value,

wherein y is_iA one-hot encoding vector (one-hot embedding) representing the real click item at the next instant in the session sequence.

In use, the data sets may be used to iteratively train steps S2-S6, such as training using the time-based back propagation BPTT algorithm, to obtain parameters such as W, W1, W2, W3, etc. in the above steps, which may be initially randomly set and then learned during training.

In training, each sequence is used as a training sample, so the total error is the sum of the errors at each time step (recommendation). Note that in a recommendation scenario based on conversational sequences, most conversations are relatively short sequences. To prevent the occurrence of overfitting, a smaller number of training passes is used.

Experimental analysis:

1. experimental data set

The method was evaluated in actual practice using the public data set Digineica published as the public data set Yoochoose and CIKM Cup 2016. The Yoochoose data set contains the user click stream within 6 months on the electronic shopping platform, and the Diginetica data set only contains the data of successful transaction, namely the purchase stream of the user.

At the same time, corresponding sequences and tags are further generated by slicing the input sequence data. For an input session sequence s ═ v₁,v₂,…,v_n]As a data enhancement strategy, a series of sequences and tags ([ v ] are generated₁],v₂)，([v₁,v₂],v₃)，…，([v₁,v₂,…,v_n-1],v_n) Among them, [ v ]₁,v₂,…,v_n-1]Is the sequence generated, and v_nA label representing the item, i.e. the sequence, clicked on at the next moment.

The details of the data set finally used are shown in table 1.

TABLE 1 Experimental data set statistics

2. Evaluation criteria

After the data set is determined, two metrics that are very common in the recommendation based on the conversation sequence are adopted as evaluation indexes of the algorithm.

(1) P @20(Precision) is a widely used measure of prediction accuracy. It represents the proportion of correct recommendations in the top 20 items of the algorithm recommendation.

(2) MRR @20(Mean Recircular Rank) is the Reciprocal Rank Mean of the correctly recommended items in the algorithm recommendations. When the true result exceeds 20 in the recommended ranks of the algorithm, the corresponding reciprocal rank is 0. The MMR metric is a method considering the recommendation order, and a larger MRR value represents that the real result is positioned at the top of the ranking list in the recommendation list, so that the effectiveness of the recommendation system is proved.

3. Experimental setup

The dimension of the implicit vector is set to d 100 in both datasets. All hyper-parameter settings utilize a mean of 0, standard deviationInitialization is performed for a gaussian distribution function of 0.1. These parameters were also optimized using a small batch Adam optimizer and the initial learning rate η was set to 0.001 and attenuated by 0.1 every three training cycles. Further, the batch size was set to 100, and the L2 regularization parameter was set to 10^-5。

4. Analysis of Experimental results

The performance of several methods on both the P @20 and MRR @20 indices is shown in table 2, where bolding shows the best results. The method provided by the invention can flexibly construct the relation between the items on the conversation sequence diagram, and extract the directed structure information and the undirected structure information with attention, so that the subsequent learning of target attention can be more accurate, and the final recommendation can be given by integrating the global interest and the local interest of the user in the conversation. From the experimental data in table 2, it is clear that the method achieves the best performance results on both indexes on the three data sets, which proves the effectiveness of the method.

Conventional recommendation methods such as POP and S-POP do not perform well on the problem based on session sequences because they ignore the user' S preferences in the current session and only consider the top K most popular items. BPR-MF states that it is meaningful to use semantic information in a conversation, while better performing FPMC states that modeling conversation sequences using first-order Markov chains is a relatively efficient method. Also as a traditional recommendation method, Item-KNN is superior to the first two. It is worth noting that Item-KNN relies only on computing similarity between items, which suggests that the simultaneous presence of items is also a relatively important piece of information. However, Item-KNN does not take into account timing information in the conversation, and there is no way to capture information for switching between items.

Unlike the traditional method, the deep learning-based method basically has better performance in the index results of all data sets. GRU4Rec is a recurrent neural network-based method that can perform better than and to a similar degree as some conventional methods. This demonstrates that the recurrent neural network has some modeling capability for sequence data. However, GRU4Rec focuses mainly on modeling the session sequence and cannot capture the user preferences in the session. Later emerging methods such as NARM and STAMP both significantly improved GRU4 Rec. Nar explicitly captures the user's main preferences in the session, whereas STAMP takes advantage of the attention mechanism to consider the user's short-term interests, which is why they are superior to GRU4 Rec. The RepeatNet is also an algorithm based on a recurrent neural network, and achieves a better prediction effect by considering the repeated click behavior of the user, which indicates that the model has certain importance on the behavior habit of the user. However, repatnet has limited promotion over NARM and STAMP, possibly because modeling the user's repeated click habits through project features alone is inadequate and RNN-based structures cannot capture some common dependencies within a session.

The graph neural network based approach constructs each session sequence as a subgraph and encodes all the items in the session through the graph neural network. Both SR-GNN and TAGNN gave better results than all RNN-based models. SR-GNN utilizes a gated graph neural network to learn dependencies between items within a sequence of conversations, while TAGNN further exploits the dependencies between items within a conversation and target items with a mechanism of attention. However, these methods completely learn according to the directed relationship of the session sequence diagram, and do not comprehensively consider the undirected relationship in the session sequence diagram, because the relationship between the items in the session sequence is sometimes not unidirectional but bidirectional, and the more comprehensive relationship between the items can be captured by using undirected structure information. Moreover, they ignore the repetitive click feature that occurs in conversational sequences, and intuitively should the importance of a repeated occurrence of an item in a sequence be greater. In addition, in an actual recommendation scene, the association degree between items is variable, and the methods adopt an averaging method for the dependency relationship between items in the session, so that the different dependency degree of a certain item on other items cannot be reflected through a weighting or attention method.

The methods presented herein perform better than the methods described above. Specifically, there were relative lifts of 3.55%, 1.38%, 1.18% for the P @20 pair of best performing correlation methods and 1.92%, 4.34%, 1.98% for the MRR @20 pair of best performing correlation methods on the three datasets. The method can well extract the structural information in the session sequence diagram, sequentially extracts the undirected structural information and the directed structural information in the diagram by utilizing the graph convolution network and the gated graph neural network, and linearly combines the undirected structural information and the directed structural information so as to achieve accurate expression of the vector. In addition, repeated click items in the conversation sequence are considered, and the weight of repeated information is improved through an attention network; meanwhile, the self-information proportion of the nodes in the conversation sequence diagram is improved by adding self-loop and matrix operation, so that the nodes are not easily interfered by noise of other nodes. And then different weights are distributed by using the attention network according to different dependency relations among the projects, so that the network can generate accurate vector representation.

TABLE 2 comparison of the results

Ablation experiment:

the method can flexibly capture the relationship between the structural information and the items in the conversation sequence diagram. In order to verify the actual effect of each composition in the model, several model variants were set up for ablation experiments. In the experimental link, SR-GNN is selected as a reference method for comparison, and data in the experiment are displayed in the form of relative promotion percentage of comparison SR-GNN.

Firstly, performing combined analysis of directed structure information and undirected structure information: (a) -GCN, extracting only undirected structure information in the session sequence diagram. (b) GNN, extracting only the directed structure information in the session sequence graph. (c) GCN + GNN, inputting random initial vectors into two neural networks simultaneously, and then linearly combining the model output results. (d) -GCN + GNN (GCN), inputting the random initial vector into GCN, taking the output vector of GCN model as the input of GNN model, and finally linearly combining the output results of the two models. The results of the experimental comparison are shown in fig. 4 and 5.

Where AVG represents the average performance of the four combination conditions over the three data sets, respectively. As can be seen from fig. 4 and 5, the method GCN + Gnn (GCN) for integrating the directional structure information and the undirected structure information obtains the best results on both indicators P @20 and MRR @20 of the three data sets, which proves the importance of comprehensively considering the directional structure information and the undirected structure information. The average data AVG in fig. 5 also shows that the undirected structure information alone performs better than the directed structure information alone, whereas the directed structure information performs slightly better than the undirected structure information in terms of the performance of the single data set, on the MRR @20 index of the Yoochoose 1/4 data set and the dignetica data set instead. This reflects to some extent the connection between the user's preferences and the items in the conversation sequence based recommendations, the direction of the transition between items being of different importance in different scenarios, but on average also the absence of structural information is more important. This illustrates that, in the context of conversation-based sequence recommendation, the direction of the user's transition between items is worth considering, but the user needs to consider the relationship between the items viewed by the user in order to learn the user's preference better. And better than the former two methods, namely the directional structure information and the undirected structure information are comprehensively considered, the comprehensive performance and the average performance AVG of the GCN and the GNN in the methods of the figure 4 and the figure 5 are basically better than the performance of the GCN or the GNN which is singly used. The input data of the two network models in the GCN + GNN method are random embedded vectors, and the input of the GNN model in the GCN + GNN (GCN) method is a vector for extracting the undirected structure information through the GCN model, which shows that compared with the method of directly using the random vector, the embedded vector which is more accurately represented can be obtained by extracting the undirected structure information firstly and then extracting the directed structure information.

And then, performing combined analysis of repeated click attention information and the dependency relationship between the items: (a) GCN + GNN, does not consider the different dependencies between the click-repeatedly attention information and the items. (b) AttGCN + GNN, attention information for a repeatedly clicked item is considered only in GCN. (c) GCN + AttGNN, only consider varying degrees of dependency between items in GNN. (d) AttGCN + AttGNN, while fusing repeatedly clicked attention information in GCN and attention-bearing item dependencies in GNN. The experimental results are shown in fig. 6 and 7.

As can be seen from the data in fig. 6 and 7, the AttGCN + AttGNN, which comprehensively considers the repeated click attention information and the inter-item dependency, has the best experimental results on both indicators of the three data sets, which indicates that the repeated click behavior and the inter-item dependency are of certain importance in the recommendation based on the conversation sequence. Meanwhile, according to the experimental expression of the P @20 index in FIG. 6, the attention considering the repeated click attention and the relation between items independently has better expression than the GCN + GNN not considering any attention information, and the AttGCN + AttGNN comprehensively considering the two has the best effect, which shows that the attention information can ensure that important information is kept as much as possible and more accurately expressed as an embedded vector. However, according to the experimental performance of the MRR @20 index in fig. 7, although the AttGCN + AttGNN considering two attention points together still can obtain the best experimental performance, the performance considering one of the two attention points alone is slightly worse than the performance considering GCN + GNN not considering any attention information, that is, the accuracy of the recommendation result can be improved by using the attention information alone, but the prediction effect on the recommendation ranking is not good. The possible reason is that if one model considers attention and the other model does not consider when the vectors are input into the GCN and GNN models, the expression patterns of the adjacent matrixes in the two models are not unified, so that the vectors cannot simultaneously use attention to retain structural information when being input and output between the two models, and the structural information is interfered because of the inconsistent attention patterns, so that an embedded vector with accurate representation is not generated finally, and an accurate prediction score of each item cannot be calculated in the prediction stage.

In the recommendation scene based on the conversation sequence, the repeated click behavior of the user and the graph structure information are both considerable and considerable, because the behavior of the user can be well predicted without knowing the historical preference of the user. The patent refers to the field of 'electric digital data processing'. The invention not only utilizes GCN and GNN models to extract the directed structure information and the undirected structure information in the conversation sequence diagram and carry out linear combination, but also introduces an attention mechanism when generating the hidden vector of the project, effectively extracts the repeated click of the user and the complex conversion information between the projects, and leads the generated conversation vector to be predicted more accurately in the recommendation process. On the actual data sets in three realistic scenarios, the present invention verifies that the proposed algorithm is superior to other most advanced methods, and verifies the effectiveness of the attention mechanism and the complex structural information through exhaustive ablation experiments.

Finally, it should be noted that while the above describes a preferred embodiment of the invention, it will be appreciated by those skilled in the art that, once the basic inventive concepts have been learned, numerous changes and modifications may be made without departing from the principles of the invention, which shall be deemed to be within the scope of the invention. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Claims

1. A method for project recommendation based on a graph neural network of directed and undirected structural information, wherein the method comprises:

S1. Let V represent the set of items that have appeared in all session sequences, namely

Then an anonymous session sequence of length n can be used

represents, and the items in the session s are arranged in chronological order, and each v _i ∈ V represents the item clicked by the user in the session s;

S2. Receive a historical session sequence, and convert the historical session sequence into a directed session sequence graph G _s =(V _s ,ε _s ,A _s ), where V _s represents a point set, ε _s represents an edge set, and A _s represents a The set of adjacency matrices, defining A _s as three adjacency matrices

and

splicing, where,

represents the weighted adjacency matrix of the undirected graph, and

and

respectively represent the weighted in-degree adjacency matrix and out-degree adjacency matrix;

S3. Map the node v _i ∈ V to the random embedding vector space to obtain a d-dimensional vector representation

Use the graph convolutional network to extract the first intermediate latent vector of the item in the session sequence diagram, use the gated graph neural network to extract the second intermediate latent vector of the item transformation in the session sequence diagram, and obtain the item latent vector through the first linear transformation ;

S4. Input the item latent vector into the target attention network, so as to obtain the target attention-based vector of the session sequence s;

S5. Obtain the global information and local information of the session sequence s, and construct a session vector representation through the second linear transformation;

S6. Predict the probability of all target items being clicked on the session sequence s through the softmax function, thereby recommending items with high probability.

2. the item recommendation method based on the graph neural network of directed and undirected structural information according to claim 1, is characterized in that, the step that utilizes graph convolutional network to extract the first intermediate latent vector of item in conversation sequence graph is as follows :

S31. Generate a feature matrix X of the session sequence diagram; a d-dimensional feature vector corresponding to each node v _i in the session sequence diagram

The stacking of the feature matrices that form the session sequence graph

S32. For the graph convolution layer of the kth layer, use the matrix H ^(k-1) to represent the input vector of all nodes, and use H ^(k) to represent the output vector of the node, where the initial d-dimensional node vector is The initial input to the features of the first layer of the graph convolutional layer network, the formula is:

H ⁽⁰⁾ = X, (1)

Before the input of each graph convolution layer, the feature of each node v _i is averaged with the feature vector of its local neighbors, and the calculation formula is:

Among them, a _ij is the edge weight between nodes v _i and v _j , d _i =∑ _j a _ij ;

S33. The output of the graph convolution network is the first intermediate implicit vector.

3. The item recommendation method according to claim 2, wherein the step of using a graph convolutional network to extract the first intermediate implicit vector of the item in the session sequence diagram is as follows:

where S represents the adjacency matrix with self-loop after "symmetric normalization",

and

Yes

The degree matrix of , ⊙ is the dot product operator,

is the propagation matrix of the graph convolutional network, α and β are hyperparameters, I is the identity matrix,

is the first intermediate implicit vector of the output.

4. The item recommendation method according to claim 3, characterized in that, for items connected by edges, by increasing the weight of the left half of formula (4), the attention of self-information in the matrix is improved, and the hyperparameter α and The hyperparameter β is used to control the ratio of the propagation matrix information to the identity matrix information, thereby controlling the absorption ratio of the node information with attention in the propagation process.

5. The item recommendation method according to any one of claims 2-4, wherein the step of extracting the second intermediate latent vector of item conversion in the session sequence diagram by using a gated graph neural network is:

in,

and

is used to control the size of the weights and bias terms,

is used to represent the result of the interaction between a node and adjacent nodes,

and

are the reset gate and the update gate, respectively. The weight matrices W _z , U _z , W _r , _Ur and W _o , U _o represent the learnable network parameters in the reset gate, update gate and output gate, respectively,

represents the second intermediate implicit vector of node v _i ,

is the sequence of node vectors in the session, and

is the first intermediate latent vector output by the graph convolutional network, σ( ) is the sigmoid function, ⊙ is the dot product operator; the adjacency matrix

represents the communication of nodes in the graph,

represents the node v _i in

A two-column matrix block in ; matrix

Defined as an in-degree matrix

and out-degree matrix

splicing.

6. The item recommendation method according to claim 1, wherein in the step S3, the first linear transformation is:

where γ is a hyperparameter used to balance the ratio of undirected structural information with attention to directed structural information; H is the item latent vector,

is the output of the gated graph neural network.

7. The item recommendation method according to claim 1, wherein in the step S4, the step of obtaining the vector of the conversation sequence s based on the target attention is as follows:

S41. Use the local target attention model to calculate the attention distribution e _i _,t of all items vi in the session sequence s to each target item v _t ∈ V, and then obtain the attention score β _i,t through the softmax(·) function , the formula is:

in,

and

are the item latent vector representations of all items _vi and target item v _t , respectively,

is a weighted matrix, generated through training, and exp( ) is the nth power function for calculating the constant e;

S42. For each session sequence s, the user's interest in the target item v _t can be expressed as

The calculation formula is:

8. The item recommendation method according to claim 7, wherein the step S5 comprises:

S51. Obtain a local vector

The local vector is the last item in the session sequence s

the vector representation of ,

That is, its implicit vector, the formula is:

S52, obtain the global vector

The global vector aggregates all the item vectors that appear in the session sequence s, and also uses the attention mechanism to introduce dependencies with the items [v ₁ , v ₂ ,..., v _n ] that appear in the entire session sequence s, the formula for:

in

is the last item in the session,

and

is the corresponding weight parameter, α _i represents the dependency between the last item and the items appearing in the entire session sequence;

S53, splicing the local vector, the global vector and the vector based on the target attention, and using the second linear transformation to obtain the session vector corresponding to the session sequence s, the formula is:

where the weight parameter

Project the result of concatenating three vectors into a vector space

middle;

The W1, W2, and W3 are weight parameter matrices, which are generated through training.

9. The item recommendation method according to claim 8, wherein the step S6 comprises:

S61. After obtaining the session vector sh corresponding to each session sequence s, the scores for all items v _i _∈ V

Do the computation, i.e. the second latent vector of all items v _i ∈ V

_Multiply it with the corresponding session vector sh, the formula is as follows:

S62, output the vector through the second softmax function

The formula is as follows:

in

represents the predicted recommendation scores of all target items, and

represents the probability that the target item is clicked at the next moment of the session sequence s,

The top K items with the highest ranking are the items to be recommended; among them, for the session sequence graph G _s , the loss function is defined as the cross entropy between the predicted value and the actual value.

10. The item recommendation method according to claims 1-9, wherein steps S2-S6 are repeatedly performed, and parameters are trained using a time-based back-propagation algorithm.