CN114154070A

CN114154070A - MOOC recommendation method based on graph convolution neural network

Info

Publication number: CN114154070A
Application number: CN202111489426.4A
Authority: CN
Inventors: 王曙燕; 郭睿涵; 孙家泽; 王小银
Original assignee: Xian University of Posts and Telecommunications
Current assignee: Xian University of Posts and Telecommunications
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2022-03-08

Abstract

The invention discloses an MOOC recommendation method based on a graph convolution neural network, which aims at the problem that a recommendation system is applied to MOOC course recommendation, and belongs to the field of deep learning recommendation systems. The method comprises the steps of firstly, performing data enhancement on a bipartite graph interacted between a user and a project on an MOOC platform to obtain two sub-views, extracting the representation of a node on the sub-views and an original bipartite graph by using a graph convolution neural network, then constructing a self-supervision learning auxiliary task, combining a supervision learning task and a self-supervision learning task of a recommendation system to form a multi-task learning mode, and finally recommending a target user. The method can relieve the problem of data sparsity in the field of online education, improve the recommendation accuracy of the model for the long-tailed projects and the robustness of the model for noise data, provide personalized recommended learning service for learners and realize personalized education.

Description

MOOC recommendation method based on graph convolution neural network

Technical Field

The invention belongs to the field of deep learning recommendation systems, particularly relates to an MOOC recommendation problem based on a graph neural network in the field of deep learning recommendation systems, and provides an MOOC recommendation method based on a graph convolution neural network.

Background

The data age personalized recommendation technology is widely applied to various fields of the internet, but is less applied to the education field. With the rapid growth of MOOC education big data, the massive data are analyzed and mined, valuable information is extracted from the massive data, and personalized recommended learning service is provided for learners, which is an important way for realizing personalized education, so that the method for recommending in the field of online education is very necessary and meaningful for providing efficient and personalized recommendation for student users.

The model structure of the recommendation system mainly comprises three major types, namely a shallow model, a neural model and a model based on a graph neural network. In the model based on the graph neural network, an end-to-end mode is provided by the model based on the graph convolution network, multi-hop neighbors are integrated into node representation learning, and the most advanced recommendation performance is realized. However, the model based on the graph convolution network still has some limitations, such as sparse interaction information of users and items, poor exposure of long-tail items and high-frequency items, noise in interaction records, and the like. According to the method, an auxiliary self-supervision learning task is added on the basis of the traditional recommendation supervision task, so that the recommendation accuracy of a model on a long-tail project on an MOOC (model object oriented object) data set and the robustness of the model on noise data are improved, and the self-discrimination capability of node characterization learning is enhanced.

Disclosure of Invention

The invention provides an MOOC recommendation method based on a graph convolution neural network, which constructs a self-supervised learning auxiliary task by using the thought of contrast learning in self-supervised learning and combines the contrast learning self-supervised task with the supervised learning task of a recommendation system to form a multi-task learning paradigm. The goal is to train the encoder by maximizing the consistency between the two extended views of an instance versus learning, taking advantage of the unlabeled data space by modifying the input data, and thereby achieving significant improvements in downstream tasks.

The invention comprises the following steps:

the method comprises the following steps: acquiring user project interaction data of each user of an MOOC data set, wherein the user project interaction data comprises a user ID, a project ID and project interaction record data;

step two: carrying out preprocessing operation on the MOOC data set: ID remapping, data screening and missing data filling to obtain a user and project interaction data set X_dataWill beUser and project interaction data set X_dataConverting into a bipartite graph G of user and project interaction;

step three: recording the data enhancement operation as s, and performing two completely independent data enhancement operations on the bipartite graph G of user and project interaction to form two views s₁(G) And s₂(G)；

Step four: carrying out node coding on bipartite graph G interacted with user and project by using graph convolution neural network, wherein n is the number of layers of nodes, and Z is⁽ⁿ⁾Characterizing vectors, Z, for the nodes of the layer^(n-1)For the node characterization vector of the previous layer, H represents a function of neighborhood aggregation, and the formula is as follows:

Z⁽ⁿ⁾＝H(Z^(n-1)，G)

step five: two views s obtained by enhancing data of bipartite graph G of user and project interaction₁(G) And s₂(G) Performing graph convolution operation to obtain representation Z of node₁ ⁽ⁿ⁾And Z₂ ⁽ⁿ⁾The formula is as follows:

step six: carrying out node coding on bipartite graph G interacted with user and project by using graph convolution neural network to obtain representation Z of user node_uAnd characterization of project node Z_iWill predict the score

Representation Z as a user node_uAnd characterization of project node Z_iThe dot product of (a) is given by the formula:

step seven: by combining user and project modeling to construct a model and performing joint optimization on an objective function, the total loss function of the joint optimization is expressed as:

wherein

Representing the bayesian personalized ranking penalty commonly used by recommendation systems,

the goal of (a) is to make the difference in score between positive and negative samples as large as possible,

representing the information noise versus estimate loss function,

the goal of the optimization is to expect to maximize the similarity between the token vectors learned by the same node under different views through the convolutional neural network, minimize the similarity between the token vectors of different nodes, theta represents the trainable parameters in the model,

representing the L2 regular term to prevent the overfitting phenomenon, λ₁，λ₂Is a hyper-parameter;

step eight: and for any user, predicting whether the target user can interact with the project according to the result of the dot product of the node representation vector of the target user and the node representation vector of the target project.

The invention has the beneficial effects that:

the method adds an auxiliary task of self-supervised learning in a recommendation model based on a graph convolution neural network, trains the network through the supervision information constructed by the auxiliary task of self-supervised learning, enables the network to learn to the representation valuable to the downstream task, thereby relieving the problem of data sparsity, and improving the recommendation accuracy of the model to the long-tailed project and the robustness to noise data.

Drawings

FIG. 1 is a schematic diagram of a flow chart of a recommendation method of the present invention.

Detailed description of the invention

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to specific examples.

The method comprises the following steps: acquiring a user ID, an item ID and item interaction record data from the data set;

step two: performing data preprocessing operation on original data, including ID remapping, data screening and missing value filling, mapping complex project IDs in an original data set into simple digital IDs, screening out users and projects with interaction records larger than K through K-kernel data filtering, and then converting the data of a data set into a format friendly to a model, namely designing an intermediate data structure based on Python. The key value corresponds to the input characteristic, and the characteristic name can be conveniently used for indexing when an algorithm is written; the values correspond to tensors that will be used for calculations and parameter updates;

step three: the formula for constructing the graph convolution neural network model, learning the neighborhood aggregation function represented by the user and acquiring the final characterization vector of the user is as follows:

wherein z is_uThe final characterization vector representing the user,

representing the node characterization vector of user u at n level, f_combineRepresenting a merging function, f_aggregateRepresenting a neighborhood aggregation function, f_readoutThe representation of the read-out function is,

the node representing the user of the previous level characterizes the vector,

a node characterizing vector representing the items of the previous layer, i representing the items interacted with by the user,

n represents the number of layers of the graph convolution neural network for the set of items interacted with the target user;

step four: two completely independent data enhancement operations are carried out on a bipartite graph G of user and item interaction to form two views s1(G) and s2(G), wherein s comprises two methods of node random discarding and edge random discarding, and formulas of the node random discarding and the edge random discarding are as follows:

where M ', M' is e {0, 1}^|ν|Two mask vectors, M, based on a set of nodes₁，M₂∈{0，1}^|ε|Two mask vectors of an edge-based set, an indication of the product of two vectors,

representing a set of nodes, epsilon representing a set of edgesCombining;

step five: two views s obtained by enhancing data of bipartite graph G of user and project interaction₁(G) And s₂(G) Carrying out graph convolution operation to obtain a characterization vector Z of the node₁ ⁽ⁿ⁾And Z₂ ⁽ⁿ⁾The formula is as follows:

step six: carrying out graph convolution neural network on user and project interactive bipartite graph G to carry out node coding to obtain a characterization vector Z of a user node_uAnd a characterization vector Z for the project node_iWill predict the score

Characterization vector Z represented as a user node_uAnd a characterization vector Z for the project node_iThe dot product of (a) is given by the formula:

wherein

Expressing the Bayes personalized ranking loss commonly used by a recommendation system, and the formula is as follows:

where O represents a data record in the user and project interaction data set, u represents a user, i represents a project that the user has interacted, j represents a project that the user has not interacted,

which represents the score of the positive sample,

score representing negative examples, Bayesian personalized rank loss

The goal of (1) is to make the difference in score between positive and negative samples as large as possible.

The user side information noise contrast estimation loss function is represented by the formula:

s represents a calculated function of the similarity between the user vectors, τ is a temperature parameter, in two views s₁(G) And s₂(G) Obtaining a node representation of the user u as z 'by carrying out graph convolution operation'_u，z″_u，z′_u，z″_uFor a node characterization vector learned by the same user u through a graph convolution neural network under different views, the user v (u ≠ v) is in the view s₂(G) The representation learned by the neural network of the graph convolution is denoted as z ″)_v，z′_u，z″_vAnd characterizing vectors for nodes learned by different user nodes through the graph convolution neural network under different views. The project side information noise contrast estimation loss function is consistent with the user side information noise contrast estimation loss function.

The loss formula is:

the optimization aims at expecting to maximize the similarity between the characteristic vectors which are learned by the same node through the graph convolution neural network under different views, and minimize the similarity between the characteristic vectors which are learned by different nodes through the graph convolution neural network;

The experimental results obtained by this method and several other recommended methods on the MOOC dataset are shown in table 1:

TABLE 1 data of the results

Method	Recall@5	NDCG@5
			DMF	0.3326	0.2897
GC-MC	0.3485	0.302
			NGCF	0.3586	0.3097
LightGCN	0.3584	0.311
			Method for producing a composite material	0.3863	0.3337

The data set is an online user learning record data set of a student hall, a first column in table 1 is a common recommendation method, a second column is an evaluation index Recall @5, Recall @5 represents the Recall ratio of the first 5 items, the Recall ratio represents the probability that a positive sample is correctly predicted to occupy the actual positive sample, the higher the Recall ratio is, the better the recommendation effect of the model is represented, a third column is an evaluation index NDCG @5, NDCG @5 represents the normalized breaking loss accumulated gain of the first 5 items, and when a result with high relevance appears at a position closer to the front, the higher the NDCG @5 index is, the better the recommendation effect is.

Screening the MOOC data set according to the interaction times of the items to obtain a data set of the long-tail item with less interaction times, wherein the experimental results obtained by the method and other recommendation methods on the long-tail item data set are shown in Table 2:

table 2 long-tailed article recommendation effect experimental result data

Method	Recall@5
		GC-MC	0.2835
NGCF	0.3042
		LightGCN	0.3511
Method for producing a composite material	0.3706

The first column in table 2 is a common recommendation method, the second column is an evaluation index Recall @5, and the higher Recall @5 is, the better the recommendation effect of the model on the long-tail item is. As can be seen from the experimental results in Table 2, the method can improve the recommendation accuracy of the model for the long-tailed project.

A certain amount of randomly generated interaction record data of users and items is added into the MOOC data set to obtain a data set added with noise data, and experimental results obtained by the method and other recommendation methods on the noise data set are shown in Table 3:

TABLE 3 validation of model robustness test results data

Method	Recall@5	NDCG@5
			DMF	0.3005	0.2634
GC-MC	0.3122	0.2745
			NGCF	0.3266	0.2842
LightGCN	0.3344	0.2876
			Method for producing a composite material	0.3494	0.3022

The first column in table 3 is a common recommendation method, the second column is an evaluation index Recall @5, the higher Recall @5 is, the better the recommendation effect of the model is, the third column is an evaluation index NDCG @5, and the higher the NDCG @5 is, the better the recommendation effect of the model is. As can be seen from the experimental results in Table 3, the method can improve the robustness of the model to noise data.

Claims

1. An MOOC recommendation method based on a graph convolution neural network is characterized by comprising the following steps:

step two: carrying out preprocessing operation on the MOOC data set: ID remapping, data screening and missing data filling to obtain a user and project interaction data set X_dataInteraction of user with project data set X_dataConverting into a bipartite graph G of user and project interaction;

step three: recording the data enhancement operation as s, and taking the bipartite graph G of user and project interactionThe operation of two completely independent data enhancements forms two views s₁(G) And s₂(G)；

Z⁽ⁿ⁾＝H(Z^(n-1),G)

step five: two views s obtained by enhancing data of bipartite graph G of user and project interaction₁(G) And s₂(G) Performing graph convolution operation to obtain representation Z of node₁ ⁽ⁿ⁾And Z₂ ^(n)，The formula is as follows:

wherein

representing the information noise versus estimate loss function,