CN116738047A

CN116738047A - Session recommendation method based on multi-layer aggregation enhanced contrast learning

Info

Publication number: CN116738047A
Application number: CN202310704176.4A
Authority: CN
Inventors: 高世伟; 王静宇; 曾宇峰; 党小超; 董晓辉; 陈致伟; 方则宇; 赵文丰; 张稣艾
Original assignee: Northwest Normal University
Current assignee: Northwest Normal University
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2023-09-12

Abstract

The invention discloses a session recommendation model based on multi-layer aggregation enhancement contrast learning, which firstly proposes to create a contrast view by using a simple and effective noise-based multi-layer aggregation embedding enhancement, then model a complex transfer mode of a session sequence by stacking a Star Graph Neural Network (SGNN), solve the problem of remote information propagation by adding a star node to consider non-adjacent items on the basis of a gate-controlled neural network, and add uniform noise to each layer of learned representation. A new contrast view is then generated by aggregating the representations of each layer, thereby enabling more efficient representation-level data enhancement without disrupting the context of the session sequence. Mutual information between two session embedded learned session characterizations is maximized through contrast learning to improve its performance in item/session feature extraction. Finally, the recommended tasks and the self-supervision tasks are unified under one framework through multi-task learning. By jointly optimizing these two tasks, a more robust embedded representation is learned, accurately predicting the next item of interest to the user. The invention solves the problem of data sparsity in the recommendation field.

Description

Session recommendation method based on multi-layer aggregation enhanced contrast learning

Technical Field

The invention relates to a session recommendation method, which has important application prospect in the recommendation field.

Background

The recommendation system is widely used in short video recommendation, online shopping, news recommendation and the like because it can predict the interests of the user according to the user's historical behavior. Most existing recommendations rely on the user's personal information and long-term historical behavioral data, which are used as input features and processed using machine learning and deep learning related techniques to predict what the user is interested in. However, in many modern online platforms, acquiring a user's historical browsing behavior will result in some performance penalty, and the user's current session preferences are easily overwhelmed by long-term historical behavior. Therefore, in order to enhance the user experience, a session recommendation system has been developed.

As a subtask of the recommendation system, the session recommendation system has the advantage that it relies only on the clicking behavior of the user's current session for recommendation. The conversational recommendation may be characterized as input only by the user and item ID information, which increases the difficulty of capturing the user's interests. Session recommendations aim to predict items of interest to a user in chronological order from a given sequence of short-term behaviors of anonymous users.

Disclosure of Invention

The graph contrast learning utilizes the inherent relation of the data and learns the characteristics of the data through different enhancement views, does not depend on manually marked data, has great advantages in solving the problem of data sparsity, but still has the following problems: most of the recommendation models based on graph comparison learning at present generate comparison views by adopting a mode of randomly discarding nodes or edges, and because the session sequence of session recommendation is extremely sparse, the random discarding of the nodes or edges can possibly destroy the context of the current session, and the traditional graph comparison learning data enhancement method is not suitable for session recommendation. Therefore, in order to solve the above problems, the present invention proposes a session recommendation model based on multi-layer aggregation enhanced contrast learning.

The invention mainly comprises the following parts: (1) constructing a session map. And (2) data preprocessing. (3) constructing a session representation module. (4) constructing a comparison module. (5) constructing a multi-task learning module.

(1) And constructing a session diagram. The session sequence is constructed as a session graph, the items of the session sequence are taken as satellite nodes, and one satellite node is added to capture the long-distance dependency relationship of non-adjacent nodes. In this way, dependencies between items can be obtained.

(2) And (5) preprocessing data. Because there is some data in the session that is less long or less frequent, the data contains too little information, and thus the information in these common data sets can be selectively filtered out.

(3) And constructing a session representation module. The star map neural network is used for learning the related information of the satellite nodes in the star map, and the satellite nodes in the star map are updated mainly by transmitting the information of the adjacent nodes and the star nodes. And information of the adjacent nodes and the star nodes is fused by using a gating mechanism. Meanwhile, the purpose of data enhancement is achieved by adding noise to the representation of the satellite nodes.

To alleviate the over-fitting problem often present in graph neural networks, a high-speed network is applied at the last layer of the stacked star graph neural network SGNN to calculate the final hidden state. Finally, session embeddings representing global preferences are generated by aggregating all satellite node embeddings in the session sequence, using a soft-attention mechanism to distinguish the importance of the items.

(4) And constructing a comparison module. The traditional graph enhancement method is abandoned in the data enhancement part, and a simple and effective noise-based embedding enhancement is used to create views for contrast learning in a multi-layer aggregation manner. The comparison module is basically consistent with the session representation generation module, and only two differences exist, namely, the input of the high-speed network is different, and the other is that position codes are added to the items in the session in the comparison module so as to integrate the sequence information into the representation.

(5) And constructing a multi-task learning module. Graph contrast learning is a multitasking learning method, and the performance of a session recommendation model can be improved by simultaneously optimizing a plurality of targets. In this model, the primary task is the next recommendation and the contrast learning serves as an auxiliary task to help extract general features of the item from different contrast views. For the next recommendation task, an embedded representation of the session sequence has been generated by the session representation generation module, followed by a next recommendation by the prediction layer. In order to alleviate the problem of popular deviation commonly existing in recommendation, layer normalization is respectively carried out on session embedding and item embedding, then products are calculated on the normalized session embedding and item embedding to recommend scores, and finally a Softmax function is used on all the scores to obtain final output probability.

The session recommendation model based on the multi-layer aggregation enhanced contrast learning comprises the following detailed implementation steps:

step 1: and constructing a session diagram. For each session sequence S, it can be modeled as a session graph g= (V, E), where V represents the sessionNode v of (a) _s Representing star nodes, E representing a set of edges in a session, edge +.>Represented as two adjacent point-in-time clicked items in a session. First, the edges are divided into input edges and output edges, and one is allocatedThe normalized weight is calculated by dividing the number of occurrences of the edge by the degree of departure of the starting node of the edge. Taking the item of the session sequence as a satellite node, and adding a satellite node v _s To capture long-range dependencies of non-contiguous nodes. The satellite nodes are connected in one way, and the satellite nodes are connected in two directions.

Step 2: and (5) preprocessing data. Entries with a length of 1 session and less than 5 occurrences in the common dataset are filtered out. Further, for each session sequenceDividing it into sequences and corresponding tags, i.e.>

Step 3: and constructing a session representation module. The detailed module construction is divided into three steps, and the specific construction process is described in steps 3.1, 3.2 and 3.3:

step 3.1: initializing. Before learning the item representation, all items in V need to be encoded into a unified embedding space Rd, d representing the size of the embedding dimension. Each session S and item v _i Embedded in the same space. It is noted that the initialization of the satellite nodes is not the same as the initialization of the satellite nodes. The embedding of the unique item in the session is directly used as the representation of the satellite node: h is a ⁰ ＝[v ₁ ，v ₂ ，v ₃ ，...，v _k], wherein v_i E R is the d-dimensional embedding of satellite node i in the star map. Star nodeThe initial embedding of the satellite node is obtained by average aggregation of the initial embedding of the satellite node:

step 3.2: the item is embedded in the study. Satellite nodes in the star map are learned by adopting a star map neural network SGNN, and the satellite nodes in the map are updated mainly by transmitting information of adjacent nodes and the star nodes. The purpose of data enhancement is achieved by adding noise to the representation of the satellite nodes. To alleviate the overfitting problem in the graph neural network, a high-speed network is used to calculate the final hidden state. Details of updating nodes, noise addition, and high-speed network are described in detail in steps 3.2.1, 3.2.2, and 3.3.3:

step 3.2.1: updating the nodes. First, consider information from neighboring nodes. For satellite node v on star map _i The update function is as follows:

W、W _z 、W _r 、W _o ∈R ^d×2d and U_z 、U _r 、U _o ∈R ^d×d Is a trainable parameter, z _s，i and r_s，i The update gate and the reset gate are respectively,is a node embedded list of session S at layer 1-1, +.The product, σ, represents the sigmoid activation function. A is that _s ∈R ^k×2k Is a concatenation of two adjacency matrices of input and output edges. For one session s= [ v ₁ ，v ₂ ，v ₄ ，v ₃ ，v ₂ ，v ₃ ]Its corresponding matrix A _s As shown in fig. 1. A is that _s，i： ∈R ^1×2k Is node v _s，i In the adjacent matrix A _s Corresponding to two columns. /> and />Is an update gate and a reset gate for deciding which information needs to be retained and which information needs to be discarded. Final State->Is the previous hidden state->And candidate State->Under control of an update gate, all satellite nodes in the star map are updated.

Next, neighboring nodes are fused by gating mechanismAnd Star node->Is incorporated into the satellite nodes to capture long range dependencies.

wherein ,for adjacent nodes->And Star node->Importance weight of (c):

W ₁ 、W ₂ ∈R ^d×d is a matrix of weights that are to be used, and />Respectively satellite nodes v _i And star node v _s The corresponding item is represented by a representation of the item,is a scaling factor.

Step 3.2.2: noise addition at the node. The purpose of data enhancement is achieved by adding noise to the representation of the satellite nodes. Formally, for satellite node v _i And its representation at layer 1The following representation level enhancements may be implemented:

Δ′ _i is added withThe vector of the added noise is used to determine, ||delta I ₂ = e, e is a very small constant, geometrically, by adding the scaled noise vector to ei, it can be rotated as shown in fig. 3. Each rotation corresponds to a deviation of ei and generates an augmented representation. Since the angle of rotation is small enough, the noisy representation retains most of the information of the original representation and is different from the representation. The option of generating noise from a uniform distribution may impart uniformity to the enhancement.

After updating the embedded representation of the satellite node, the embedded representation of the satellite node also needs to be updated. The importance of different satellite nodes is distinguished by using an attention mechanism. The importance of each satellite node is determined by the similarity between the satellite nodes:

W ₃ 、W ₄ ∈R ^d×d is a weight matrix, and the embedded representation of the satellite nodes is updated by calculating the linear combination of the satellite nodes corresponding to each coefficient:

step 3.2.3: high speed networks. One of the most common problems in graph neural networks is the over-fitting problem, which greatly affects the performance of the network, with the high-speed network applied at the last layer of the star graph neural network SGNN. Calculate the final hidden state h ^f Initial embedding h of satellite nodes ⁰ And embedding v of last layer ^L+n Is a weighted sum of (c). The high speed network can be expressed as:

h ^f ＝γ⊙h ⁰ +(1-Y)⊙v ^L+n

γ＝σ(W ₅ [h ⁰ ||v ^L+n ])

wherein +.is the element product, σ is the sigmoid function, ||is the concatenation operation, W ₅ ∈R ^d×2d Is a trainable oneIs a weight matrix of (a).

Step 3.3: session embedded learning. After obtaining the satellite nodes and embedded representations of the satellite nodes, from the corresponding satellite nodes h ^f ∈R ^d×m The obtained sequence item is embedded into x epsilon R ^d×k . The global preferences and the recent preferences of the user are then taken into account to generate a final session representation as the user's preferences. Last item ie., S in session sequence _last ＝x _k As a recent preference of the user.

For the global preferences of the user, a session embedding representing the global preferences is generated by aggregating the embeddings of all satellite nodes in the session sequence. Because different items have different degrees of importance in modeling user preferences, a soft-attention mechanism is used to determine the importance of an item. It is noted that the importance of each item in the session sequence is determined by the star node v _s Current item x _i And the user' S nearest preference S _last Jointly determined:

q ₀ ∈R ^d ，q ₁ 、q ₂ and q₃ ∈R ^d×d Is a trainable weight matrix. The global preference S of the user is then set _g With the current interest S _last Taken together as the final session representation:

S _h ＝q ₄ [S _g ||S _last ]

i indicates a splicing operation, q ₄ ∈R ^d×2d Is a trainable weight matrix.

Step 4: and constructing a comparison module. Data enhancement is a key component in contrast learning where a simple and efficient noise-based embedding enhancement is used and views are created for contrast learning by way of multi-layer aggregation. Specifically, the two views have the same initial embedding and adjacency matrix, different comparison views are generated by using a cross-layer comparison mode, and item embedments with different layers obtained through the multi-layer stacked star-shaped graph neural network SGNN are aggregated to be used as a new item view v ^c The polymerization mode is mean polymerization:

the comparison module is basically consistent with the session representation generation module, and only two differences exist, namely, the input of the high-speed network is different, and the other is that position codes are added to the items in the session in the comparison module so as to integrate the sequence information into the representation. The high speed network of the contrast module is as follows:

γ _c ＝σ(W ₆ [h ⁰ ||v ^c ])

wherein +.is the element product, σ is the sigmoid function, ||is the concatenation operation, W ₆ ∈R ^d×2d Is a trainable weight matrix. The session representation generation in the comparison module is the same as the session representation generation in the session representation generation module, and the final session representation S of the comparison module is obtained by adopting the formula in the step 3.3 session embedding learning _c It should be noted that two modules share the same star node v _s 。

Step 5: and constructing a multi-task learning module. Graph contrast learning is a multitasking learning method, and the performance of a session recommendation model can be improved by simultaneously optimizing a plurality of targets. In the model, the primary task is the next recommendation and the contrast learning serves as an auxiliary task to help extract general features of the item from the different contrast views. The two tasks are unified and jointly optimized:

L _total ＝L _main +λL _cl

where lambda controls the magnitude of the contrast loss.

For the next recommendation task, an embedded representation of the session sequence has been generated by the session representation generation module, followed by a next recommendation by the prediction layer. Embedding S for sessions to alleviate the popularity bias problem prevalent in recommendations _h And item embedding v _i Respectively carrying out layer normalization, and then embedding normalized sessionAnd project embedding->To obtain a recommendation score +.>Finally, all->Obtaining final output probability +.>

wherein ,for all candidate items v _i E.v. The loss function using the cross entropy loss function as the main task can be expressed as:

wherein Representing item v _i Probability of clicking on the item next.

Contrast learning can be seen as a method of maximizing mutual information between two potential representations. Using InfoNCE as a contrast loss function, different representations of the same session sequence are used as facing and />Normalized session representations generated by the session representation module and the comparison module, respectively. The negative pair is the other session +.>The sim (a, b) function is implemented as a dot product between two vectors:

where τ is a temperature parameter and B is the size of the batch.

Step 6: and (3) verifying the effectiveness of the model, and after the whole session recommendation model is trained, storing parameters of each part. And forward transmitting the test data processed in advance through the model to obtain a predicted value. The performance of the model was evaluated by the evaluation indices p@20 and mrr@20. The formula of the evaluation index is defined as follows:

where N is the total number of sessions, y _i Indicating whether the first 20 recommendations in the session contain a tag entry. Y if the recommended item contains a corresponding tag _i And the value of (2) is 1, otherwise 0.

Wherein Rank (i) is the ranking of the tags in session i, rec (i) is the reciprocal of the ranking of the tags in session i, and if the ranking is greater than 20, the value of Rec (i) is set to 0. Both the evaluation index p@20 and the mrr@20 are values which are larger, representing better recommended performance.

Drawings

FIG. 1 is a schematic illustration of a session diagram in the present invention

FIG. 2 is a diagram of a model construction of the present invention

FIG. 3 is a diagram of an exemplary noise-based representation enhancement in the present invention

FIG. 4 is a graph showing the performance of the present invention under different comparative loss parameters

Detailed Description

The invention will be further described with reference to the drawings and examples.

The session recommendation method of the invention is applied to the Diginetica, tmall and newplay data sets. The three data sets all differ in size, sparsity, and scene.

Diginetica was from CIKMCup in 2016, and consisted of typical trade data.

Tmall from IJCAI major in 2015, which contains the shopping log of anonymous users on a day cat online shopping platform.

Nowplay describes the listening behavior of music extracted from Twitter.

After the pretreatment method in the step 2 is adopted to the data set, the statistical result of the obtained data set is shown in the following table:

for fair comparison, the same data preprocessing method was used for all baseline models, and the initial parameters were initialized with a gaussian distribution with 0 mean, 0.1 standard deviation, and L ₂ Regularization as a penalty term, with a value of 10 ^-5 . All models were embedded at 100, using Adam as the optimizer, with an initial learning rate of 0.001 and decay of 0.1 after every 3 epochs. A search is performed over the validation set to obtain its optimal value. 10% of the data in the training set was randomly selected as the validation set. Performance comparisons were made on the Tmall dataset with other excellent models currently in common, with the results shown in the following table:

wherein the best results are highlighted in bold and the second best results are underlined.

The table above shows the overall performance of SR-MACL compared to the baseline model. The average of the results of 5 runs was taken as the final result. By comparing the experimental results, the following three experimental conclusions can be drawn:

(1) The model based on the graph neural network is superior to the model based on the traditional deep learning (RNN-based, attention-based), which shows the strong capability of the graph neural network in modeling the transduction relationship of the session sequence, and also shows that the method of the graph neural network is more suitable for session recommendation.

(2) The performance of the conventional session recommendation model based on graph comparison learning is not as good as that of a session model based on a graph neural network alone. This suggests that the manner in which enhanced views are generated using other session information for contrast learning may introduce information of irrelevant items that result in suboptimal model performance.

(3) SR-MACL is superior to other baseline models in data sets, indicating that the proposed method of contrast learning through multi-layer aggregated enhanced item representations is superior to the method of contrast learning described above using other session information to generate enhanced views.

In summary, the session recommendation model provided by the invention can achieve excellent prediction effect in the common field of session recommendation.

Claims

1. A session recommendation method based on multi-layer aggregation enhancement contrast learning is used for the fields of data sparsity and data enhancement session recommendation, and is characterized in that:

step 1: and constructing a session diagram. For each session sequence S, it can be modeled as a session graph g= (V, E), where V represents the sessionNode v of (a) _s Representing star nodes, E representing a set of edges in a session, edgesRepresented as two adjacent point-in-time clicked items in a session. First, the edges are divided into input edges and output edges, and a normalized weight is assigned by dividing the number of occurrences of the edge by the degree of departure of the starting node of the edge. Taking the item of the session sequence as a satellite node, and adding a satellite node v _s To capture long-range dependencies of non-contiguous nodes. The satellite nodes are connected in one way, and the satellite nodes are connected in two directionsAnd (5) connection.

Step 2: and (5) preprocessing data. Entries with a length of 1 session and less than 5 occurrences in the common dataset are filtered out. Further, for each session sequenceDividing it into sequences and corresponding tags, i.e

step 3.1: initializing. Before learning the item representation, all the items in V need to be encoded into a unified embedding space R ^d And (3) inner part. Each session S and item v _i Embedded in the same space. It is noted that the initialization of the satellite nodes is not the same as the initialization of the satellite nodes. Embedding unique items in a session directly as a representation of satellite nodesThe initial embedding of the satellite node is obtained by average aggregation of the initial embedding of the satellite node.

Step 3.2: the item is embedded in the study. The satellite nodes in the graph are updated primarily by propagating information of neighboring nodes and the satellite nodes. The purpose of data enhancement is achieved by adding noise to the representation of the satellite nodes. To alleviate the overfitting problem in the graph neural network, a high-speed network is used to calculate the final hidden state. Details of updating nodes, noise addition, and high-speed network are described in detail in steps 3.2.1, 3.2.2, and 3.3.3:

W、W _z 、W _r 、W _o ∈R ^d×2d and U_z 、U _r 、U _o ∈R ^d×d Is a trainable parameter, z _s,i and r_s,i The update gate and the reset gate are respectively,is the node embedded list of session S at layer 1, +.. A is that _s ∈R ^k×2k Is a concatenation of two adjacency matrices of input and output edges. For one session s= [ v ₁ ,v ₂ ,v ₄ ,v ₃ ,v ₂ ,v ₃ ]Its corresponding matrix A _s As shown in fig. 1. A is that _s,i: ∈R ^1×2k Is node v _s,i In the adjacent matrix A _s Corresponding to two columns. /> and />Is an update gate and a reset gate for deciding which information needs to be retained and which information needs to be discarded. Final State->Is the previous hidden state->And candidate State->Under control of an update gate, all satellite nodes in the star map are updated.

Next, neighboring nodes are fused by gating mechanismAnd Star node->Incorporating information of the satellite nodes into the satellite nodes to capture long range dependencies:

wherein ,for adjacent nodes->And Star node->Importance weight of (c):

W ₁ 、W ₂ ∈R ^d×d is a matrix of weights that are to be used, and />Respectively satellite nodes v _i And star node v _s Corresponding item representation,/->Is a scaling factor.

Step 3.2.2: noise addition at the node. The purpose of data enhancement is achieved by adding noise to the representation of the satellite nodes. Formally, for satellite node v _i And its representation at the first layerThe following representation level enhancements may be implemented:

Δ' _i is the noise vector that is added to the signal, ||delta I ₂ = e, e is a very small constant, geometrically, by adding the scaled noise vector to ei, it can be rotated as shown in fig. 3. Each rotation corresponds to a deviation of ei and generates an augmented representation. Since the angle of rotation is small enough, the noisy representation retains most of the information of the original representation and is different from the representation. Selecting to produce from a uniform distributionNoise, which may impart uniformity to the enhancement.

h ^f ＝γ⊙h ⁰ +(1-γ)⊙v ^L+n

γ＝σ(W ₅ [h ⁰ ||v ^L+n ])

wherein +.is the element product, σ is the sigmoid function, ||is the concatenation operation, W ₅ ∈R ^d×2d Is a trainable weight matrix.

Step 3.3: session embedded learning. After obtaining the satellite nodes and embedded representations of the satellite nodes, from the corresponding satellite nodes h ^f ∈R ^d×m The obtained sequence item is embedded into x epsilon R ^d×k . The global preferences and the recent preferences of the user are then taken into account to generate a final session representation as the user's preferences. In a sequence of sessionsLast item ie., S _last ＝x _k As a recent preference of the user.

q ₀ ∈R ^d ,q ₁ 、q ₂ and q₃ ∈R ^d×d Is a trainable weight matrix. The global preference S of the user is then set _g With the current interest S _last Taken together as the final session representation:

S _h ＝q ₄ [S _g ||S _last ]

Step 4: and constructing a comparison module. Data enhancement is a key component in contrast learning where a simple and efficient noise-based embedding enhancement is used and views are created for contrast learning by way of multi-layer aggregation. Specifically, the two views have the same initial embedding and adjacency matrix, different comparison views are generated by using a cross-layer comparison mode, and the item embedments with different layers obtained through the multi-layer star-shaped graph neural network SGNN are aggregated to be used as a new item view v ^c The polymerization mode is mean polymerization:

γ _c ＝σ(W ₆ [h ⁰ ||v ^c ])

L _total ＝L _main +λL _cl

where lambda controls the magnitude of the contrast loss.

For the next recommendation task, an embedded representation of the session sequence has been generated by the session representation generation module, followed by a next recommendation by the prediction layer. Embedding S for sessions to alleviate the popularity bias problem prevalent in recommendations _h And item embedding v _i Respectively are provided withLayer normalization is carried out, and then normalized session is embeddedAnd project embedding->To obtain a recommendation score +.>Finally, all->Obtaining final output probability +.>

wherein ,for all candidate items v _i E.v. The cross entropy loss function is adopted as the loss function of the main task, and canExpressed as:

wherein Representing item v _i Probability of clicking on the item next.

where τ is a temperature parameter and B is the size of the batch.