CN113343113A

CN113343113A - Cold start entity recommendation method for knowledge distillation based on graph convolution network

Info

Publication number: CN113343113A
Application number: CN202110755889.4A
Authority: CN
Inventors: 张琨; 汪帅; 吴乐; 洪日昌; 汪萌; 曾宪锋
Original assignee: Guangzhou Boguanwen Language Technology Co ltd; Hefei University of Technology
Current assignee: Guangzhou Boguanwen Language Technology Co ltd; Hefei University of Technology
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-09-03

Abstract

The invention discloses a cold start entity recommendation method for knowledge distillation based on a graph convolution network, which comprises the following steps: 1. constructing an implicit feedback matrix of a user to a product, a user attribute matrix, a product attribute matrix, a user image adjacency matrix of a user student model and a product image adjacency matrix of a product student model; 2. constructing an input layer by means of one-hot coding and random initialization; 3. respectively carrying out feature propagation on the teacher model, the user student model and the product student model through graph convolution; 4. constructing a prediction layer to predict the grade of the product by the user; 5. fitting a real label updating characteristic matrix and an embedded characterization matrix of the attribute node according to the output result of the prediction layer; 6. and repeating the steps 3-5 until the new product recommendation effect of the new user reaches the optimal value. The method and the system can fully mine the high-order information of the graph and the potential association among the users, the products and the attribute nodes, thereby realizing the accurate recommendation of the cold start entity.

Description

Cold start entity recommendation method for knowledge distillation based on graph convolution network

Technical Field

The invention relates to the field of cold start recommendation, in particular to a cold start entity recommendation method for knowledge distillation based on a graph convolution network.

Background

The information overload problem in the internet era interferes with the judgment of users, and the recommendation system is successfully applied to various industries and comprises the following steps: e-commerce, music, video, education, etc. The recommendation system is mainly used for recommending commodities to the user in a personalized manner according to historical click records and click records of the user. The collaborative filtering model in the traditional recommendation is the most popular way, and the user preference and the product characteristics are obtained by mining the history. However, the new users or new products appear without history, and the recommendation model based on collaborative filtering is often limited to the recommendation of new products by new users.

In order to solve the recommendation problem of cold start entities (new users and new products), a collaborative filtering system of attribute information is introduced, the user attributes (gender, age, occupation, and the like) and the product attributes (category, service, environment, and the like) are utilized to perform characterization modeling on the user and the product, and the relation between a collaborative information space and an attribute special space is learned, so that personalized recommendation is effectively provided for the cold start entities. However, this method only obtains a simple summary space transfer function in two embedded vector characterization spaces, which limits the recommended performance of the cold-start entity.

By modeling user behavior data on products as a user-product bipartite graph, the present scoring matrix-based collaborative filtering embedded characterization model can be converted into a graph problem. The attribute information is introduced into the graph, so that attribute representation can be effectively learned, and new users or new products can be effectively represented. The existing attribute enhanced graph model recommendation system (node attribute initialization, attribute feature fusion embedded characterization model, etc.) can perform attribute characterization on new users or new products lacking historical interaction information, but the accuracy of attribute characterization is still to be improved. In addition, attribute information characterization and entity embedding characterization are mutually facilitated in the graph, and are not independent optimization learning. How to utilize the attribute information to the graph model to complete the accurate personalized recommendation of the cold start entity (new user and new product) becomes a problem which needs to be solved urgently.

Disclosure of Invention

The invention provides a cold-start entity recommendation method for knowledge distillation based on a graph convolution network, aiming at solving the defects of the prior art, so that the internal interaction and potential association between entity nodes and attribute nodes in a graph can be fully mined, and the relationship between an attribute characterization space and a corresponding entity embedding space is mined, thereby realizing more accurate recommendation of cold-start entities.

The invention adopts the following technical scheme for solving the technical problems:

the invention relates to a cold start entity recommendation method for knowledge distillation based on a graph convolution network, which is characterized by comprising the following steps of:

step 1, let U represent a user set, and U ═ U₁,...,u_i,...,u_b,...,u_M}，u_iRepresents the ith user, u_bRepresenting the b-th user, M represents the total number of users, i is more than or equal to 1, and b is more than or equal to M; let V denote the product set, and V ═ V₁,...,v_j,...,v_N}，v_jIt is indicated that the product of the jth,n represents the total number of products, j is more than or equal to 1 and less than or equal to N; let R_ijRepresents the ith user u_iFor jth product v_jIf the implicit feedback exists, the implicit feedback matrix R of the product is set as R_ij}_M×N；

Let the user attribute matrix be

Represents the ith user u_iD of (A)_uA dimension attribute vector; order product attribute matrix

Denotes the jth product v_jD of (A)_vA dimension attribute vector;

defining an embedded characterization matrix of user attribute nodes, and randomly initializing for the first time to

A K-dimensional embedded characterization vector representing a kth user attribute node; randomly initializing the embedded characterization matrix of the user attribute node for the second time to be

e_kA K-dimensional embedded characterization vector representing a kth user attribute node;

defining an embedded characterization matrix of product attribute nodes and randomly initializing for the first time to

To representEmbedding a characterization vector into the K dimension of the ith product attribute node; randomly initializing the embedded characterization matrix of the product attribute nodes for the second time to

f_lA K-dimensional embedded characterization vector representing the ith product attribute node;

using implicit feedback matrix R ═ R_ij}_M×NAnd a user attribute matrix of

Constructing a user graph adjacency matrix S^U；

Using implicit feedback matrix R ═ R_ij}_M×NAnd a product attribute matrix

Construct the product drawing adjacency matrix S^V；

Step 2, obtaining a characteristic matrix through single hot coding, comprising the following steps: a user characteristic matrix and a product characteristic matrix:

step 2.1, performing one-hot coding on the user set U, thereby constructing a user feature matrix P ═ P₁,...,p_i,...,p_MIn which p is_iRepresents the ith user u_iK-dimensional user feature vectors of (1);

step 2.2, performing one-hot coding on the item set V, thereby constructing a product characteristic matrix Q ═ Q₁,...,q_j,...,q_NWherein q is_jDenotes the jth product v_jThe K-dimensional product feature vector of (1);

step 3, constructing a characteristic initialization layer:

step 3.1, defining the current updating times as t, and initializing t to be 0;

step 3.2, define and initialize ith user u updated for the t time_iUser feature vector of

Define and initialize the product of the jth product of the tth updateFeature vector

Defining and initializing K-dimensional embedded characterization vectors for kth updated user attribute nodes

Defining and initializing K-dimensional embedded characterization vectors for the ith updated product attribute node

And 4, carrying out feature propagation on the teacher model through graph convolution:

step 4.1, defining the teacher model to comprise a T' layer graph convolution layer;

step 4.2, the ith user u updated for the t time_iUser feature vector of

T updated product feature vector of jth product

K-dimensional embedded characterization vector of kth user attribute node of t-th update

And the t updated K-dimensional embedded characterization vector of the ith product attribute node

Inputting the teacher model for feature propagation, and respectively calculating the ith user u by using the formulas (1) and (2)_iUser feature vector output from t +1 th convolution layer after t-th update

Jth product v_jThe characteristic vector of the product output by the t +1 th convolution layer after the t-th update

Embedding characterization vector of user attribute node output by t +1 th convolution layer after t-time updating of kth user attribute node

And the embedded characterization vector of the product attribute node output by the t +1 th convolution layer after the t-th update of the ith product attribute node

In the formulae (1) and (2), A_iIs the ith user u_iThe set of interacted product and user attribute nodes,

is the jth product v_jThe characteristic vector output by the t-th layer convolution layer after the t-th updating;

is the embedding characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node; a. the_jIs the jth product v_jThe set of interacted user and product attribute nodes,

is the characteristic vector output by the ith user at the t-th layer convolution layer after the t-th updating;

is the t-th layer volume of the ith product attribute node after the t-th updateAn embedded characterization vector of the lamination output; a. the_kIs the user set interacted by the kth user attribute node; a. the_lIs the product set interacted by the ith product attribute node;

and 5, constructing a prediction layer of the teacher model according to the output of the T-th layer of graph convolution:

step 5.1, obtaining a user representation U of the teacher model by using the formula (3)_iAnd product characterization V_j：

In the formula (3), the reaction mixture is,

ith user u representing output of T' th layer of teacher model_iThe feature vector of the user of (2),

j product v representing teacher model T' layer output_jThe product feature vector of (1);

step 5.2, obtaining the ith user u by using the formula (4)_iFor jth product v_jPredictive scoring of teacher models

In the formula (4), < > represents the vector inner product;

step 6, constructing an input layer of the user student model, and inputting an embedded characterization matrix of the user attribute nodes for the second random initialization

Product characteristic matrix Q ═ Q₁,...,q_j,...,q_N}, user graph adjacency matrix S^U；

Step 7, constructing a characteristic initialization layer:

step 7.1, initializing t to be 0;

step 7.2, define and initialize the kth user attribute node of the t-th update

Defining the product feature vector of the jth updated product

And 8, carrying out feature propagation on the user student model through the graph convolution:

step 8.1, defining the user student model to comprise a T' layer graph convolution layer;

step 8.2, embedding the K-dimension of the kth updated user attribute node into the characterization vector

And the t updated product feature vector of the jth product

Inputting the user student model for feature propagation, and calculating the embedded characterization vector of the user attribute node output by the t +1 th convolution layer after the kth updating of the kth user attribute node by using the formula (5)

And jth product v_jThe characteristic vector of the product output by the t +1 th convolution layer after the t-th update

In the formula (5), the reaction mixture is,

is the k-th user attribute node in the user graph adjacency matrix S^UThe corresponding set of products in (a) to (b),

is the embedded characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node,

is the jth product v_jIn the user graph adjacency matrix S^UTo a corresponding set of user attribute nodes.

Step 9, constructing a prediction layer of the user student model according to the user characteristics output by the T' th layer of the user student model:

step 9.1, obtaining the ith user u output by the user student model by using the formula (6)_iUser characterization of

And jth product v_jProduct characterization of

In the formula (6), the reaction mixture is,

an embedded token vector representing the kth user attribute node output by the T' th layer of the user student model,

representing users studentsJth product v output from T' th layer of model_jThe feature vector of the product of (1),

represents the ith user u_iThe user attribute node set of (2);

step 10, constructing an input layer of a product student model, and inputting an embedded characterization matrix of product attribute nodes for second random initialization

User feature matrix P ═ { P ═ P₁,...,p_i,...,p_M}, product graph adjacency matrix S^V；

Step 11, according to the processes of the step 7 to the step 9, carrying out feature initialization and feature propagation on the product student model so as to obtain the ith user u output by the product student model_iUser characterization of

And jth product v_jProduct characterization of

Step 12, obtaining the ith user u by using the formula (7)_iFor jth product v_jUser student model and product student model of (1)

In the formula (7), the reaction mixture is,

ith user u representing user student model output_iThe user characterization of (a) is performed,

j product v representing product student model output_jThe product characterization of (1);

step 12.1, constructing a loss function L of the teacher model according to the formula (8)_r：

In the formula (8), σ represents a sigmoid activation function; d_uRepresents the u-th user u_uTraining data of (2);

represents the u-th user u_uFor the ith product v_iThe prediction score of (a);

represents the u-th user u_uFor jth product v_jThe prediction score of (a); θ represents a parameter to be optimized; gamma denotes the coefficient of the regularization term;

step 12.2, constructing a knowledge distillation loss function L of the user student model by using the formula (9)_uAnd knowledge distillation loss function L of product student model_v：

In formula (9), U_iI-th user u representing teacher model output_iThe user characterization of (a) is performed,

ith user u representing user student model output_iA user characterization of (a); v_jJth product v output by instructor model_jThe product of (1) is characterized in that,

step 12.3, constructing a loss function L of the score prediction of the user student model and the product student model by using the formula (10)_s：

L_s＝||U^g(V^g)^T-U^U(V^I)^T|| (10)

In the formula (10), U^gA user feature matrix representing the teacher model output; v^gA product feature matrix representing the teacher model output; u shape^UA user characterization matrix representing user student model outputs; v^IA product representation matrix representing the output of the product student model;

and 12.4, obtaining an overall Loss function Loss (theta) of the whole network according to the formula (11):

Loss(θ)＝L_r+λL_u+μL_v+ηL_s (11)

in formula (11), λ, μ, η are the hyperparameters used to balance the distillation loss functions of the different fractions; solving the Loss function Loss (theta) by a gradient descent method to minimize the Loss (theta) so as to obtain an updated optimal parameter theta^*；

Step 13, obtaining a new user u by using the updated user student model_cIs best characterized

Obtaining new product v by using updated product student model_dIs best characterized

In the formula (12), the reaction mixture is,

embedded token vector representing kth user attribute node of updated user student model T' th layer output，

Indicating new user u_cThe user attribute node set of (2); f. of_l ^T′An embedded characterization vector representing the ith product attribute node output by the updated product student model at layer T',

indicating a new product v_dThe set of product attribute nodes.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention provides a cold start entity recommendation method for knowledge distillation based on a graph convolution network, aiming at the problem that users and products in a recommendation system lack historical interaction records under the condition of cold start entities (new users and new products), deeply mining the characteristic relation between collaborative filtering information and the entities and attributes, and iteratively updating the attributes and entity embedded characteristics, thereby effectively improving the accuracy of cold start entity recommendation.

2. The method processes the entity (user and product) set in a single hot coding mode, and is convenient for quickly performing index operation and calculation of the matrix, so that the method plays a role in expanding characteristics on the representation in recommendation.

3. The invention designs a heterogeneous graph to model implicit feedback data, user attribute data and product attribute data of a user to a product into nodes and connection relations in a teacher model, constructs indirect connection between entities and attributes into graph tie matrixes of a user student model and a product student model, introduces a knowledge distillation technology, and leads entity representations output by the teacher model to guide corresponding entity attribute representations output by the student model, thereby solving the recommendation problem of cold start entities.

4. The method adopts the graph convolution mode to carry out representation learning on the interaction information between the user and the product and the interaction information between the entity and the attribute, and the propagation is carried out through the graph convolution, so that the high-order similarity can be better captured, and the more accurate attribute representation can be learned.

5. According to the method, the node representation embedded matrix of the entity and the attribute is updated according to the prediction result of the teacher model and the distillation knowledge of the student model, iterative learning is carried out on the whole neural network, and the recommendation precision of the cold start entity is effectively improved.

Drawings

FIG. 1 is a flow chart of the cold start entity recommendation method for knowledge distillation based on graph convolution network of the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, the method for recommending a cold start entity based on knowledge distillation by using a graph convolution network is performed according to the following steps:

step 1, let U represent a user set, and U ═ U₁,...,u_i,...,u_b,...,u_M}，u_iRepresents the ith user, u_bRepresenting the b-th user, M represents the total number of users, i is more than or equal to 1, and b is more than or equal to M; let V denote the product set, and V ═ V₁,...,v_j,...,v_N}，v_jRepresents the jth product, N represents the total number of products, and j is more than or equal to 1 and less than or equal to N; let R_ijRepresents the ith user u_iFor jth product v_jIf the implicit feedback exists, the implicit feedback matrix R of the product is set as R_ij}_M×NIf the ith user u_iFor jth item v_jWith implicit feedback recording, then R_ijNot all right 1, otherwise R_ij＝0；

Let the user attribute matrix be

Denotes the jth product v_jD of (A)_vA dimension attribute vector;

A K-dimensional embedded characterization vector representing a kth user attribute node; randomly initializing the embedded characterization matrix of the user attribute nodes for the second time to

A K-dimensional embedded characterization vector representing the ith product attribute node; randomly initializing the embedded characterization matrix of the product attribute nodes for the second time to

using implicit feedback matrix R ═ R_ij}_M×NAnd a user attribute matrix of

Constructing a user graph adjacency matrix S^U；

Using implicit feedback matrix R ═ R_ij}_M×NAnd a product attribute matrix

Construct the product drawing adjacency matrix S^V；

step 3, constructing a characteristic initialization layer:

step 3.1, defining the current updating times as t, and initializing t to be 0;

Defining and initializing the product feature vector of the jth updated jth product

step 4.1, defining that the teacher model comprises a T' layer graph convolution layer;

step 4.2, updating the ith user u for the t time_iUser feature vector of

T updated product feature vector of jth product

Inputting a teacher model for feature propagation, and respectively calculating the ith user u by using an equation (1) and an equation (2)_iUser feature vector output from t +1 th convolution layer after t-th update

is the embedded characterization vector output by the tth layer convolution layer after the tth update of the ith product attribute node; a. the_kIs the user set interacted by the kth user attribute node; a. the_lIs the product set interacted by the ith product attribute node;

In the formula (3), the reaction mixture is,

In the formula (4), < > represents the vector inner product;

Step 7, constructing a characteristic initialization layer:

step 7.1, initializing t to be 0;

step 7.2, define and initialize the kth user attribute node of the t-th update

Defining the product feature vector of the jth updated product

step 8.1, defining a user student model to comprise a T' layer graph convolution layer;

step 8.2, embedding the K dimension of the kth updated user attribute node into the characterization vector

And the t updated product feature vector of the jth product

Inputting a user student model for feature propagation, and calculating a user attribute node embedded characterization vector output by the t +1 th convolution layer after the kth updating of the kth user attribute node by using the formula (5)

In the formula (5), the reaction mixture is,

And jth product v_jProduct characterization of

In the formula (6), the reaction mixture is,

j product v representing user student model T' layer output_jThe feature vector of the product of (1),

represents the ith user u_iThe user attribute node set of (2);

And jth product v_jProduct characterization of

In the formula (7), the reaction mixture is,

represents the u-th user u_uFor the ith product v_iThe prediction score of (a);

represents the u-th user u_uFor the firstj products v_jThe prediction score of (a); θ represents a parameter to be optimized; gamma denotes the coefficient of the regularization term;

L_s＝||U^g(V^g)^T-U^U(V^I)^T|| (10)

Loss(θ)＝L_r+λL_u+μL_v+ηL_s (11)

in formula (11), λ, μ, η are the super-parameters for balancing the distillation loss function of the different fractionsCounting; solving the Loss function Loss (theta) by a gradient descent method to minimize the Loss (theta) so as to obtain an updated optimal parameter theta^*；

Step 13, cold starting entities, namely new users and new products, and obtaining new users u by using the updated user student models_cIs best characterized

In the formula (12), the reaction mixture is,

an embedded token vector representing the kth user attribute node output by the T' th layer of the updated user student model,

indicating a new product v_dThe set of product attribute nodes.

Example (b):

to verify the effectiveness of the method, the invention employs three public data sets that are commonly used in recommendations: yelp, Amazon-Video Games, XING. Less than 3 users are scored per data set screening, and 30% of users are randomly drawn from all users as new users and 30% of products are drawn from all products as new products. The interaction records and corresponding attributes of the old user and the old product are used as training data. The method divides new user new product recommendation under different scenes into three tasks: recommending old products to a new user as a task one; and recommending the new product to the old user as a task II, and recommending the new product to the new user as a task III.

Corresponding to the product recommendation task, the invention adopts Hit Ratio (HR) and Normalized distributed relational gateway (NDCG) as evaluation criteria. The invention selects 8 methods for effect comparison, namely KNN, DropoutNet, Linmap, XDeepFM, CDL, Heater, Pinpage and Student.

TABLE 1 recommendation results for three tasks on a Yelp dataset for the methods of the present invention and comparison

TABLE 2 recommendation results for task two on Amazon-Video Games dataset by the method of the present invention and the comparative method

TABLE 3 recommendation results of three tasks on XING data set by the method and the comparative method of the present invention

As shown in Table 1, Table 2 and Table 3, the results of the experiments on the Yelp, Amazon-Video Games and XING data sets are shown respectively, and it can be seen that the method provided by the invention is superior to the 8 comparison methods in both the HR index and the NDCG index in three data sets.

The method (PGD) provided by the invention is remarkably superior to a plurality of comparative methods on three data sets by integrating three tasks under the cold start condition, thereby proving the feasibility of the method provided by the invention.

Claims

1. A cold start entity recommendation method for knowledge distillation based on graph convolution network is characterized by comprising the following steps:

step 1, let U represent a user set, and U ═ U₁,...,u_i,...,u_b,...,u_M}，u_iRepresents the ith user, u_bRepresenting the b-th user, M represents the total number of users, i is more than or equal to 1, and b is more than or equal to M; let V denote the product set, and V ═ V₁,...,v_j,...,v_N}，v_jRepresents the jth product, N represents the total number of products, and j is more than or equal to 1 and less than or equal to N; let R_ijRepresents the ith user u_iFor jth product v_jIf the implicit feedback exists, the implicit feedback matrix R of the product is set as R_ij}_M×N；

Let the user attribute matrix be

Denotes the jth product v_jD of (A)_vA dimension attribute vector;

using implicit feedback matrix R ═ R_ij}_M×NAnd a user attribute matrix of

Constructing a user graph adjacency matrix S^U；

Using implicit feedback matrix R ═ R_ij}_M×NAnd a product attribute matrix

Construct the product drawing adjacency matrix S^V；

step 2.2, carrying out independent hot coding on the item set V, thereby constructing a productThe product characteristic matrix Q ═ Q₁,...,q_j,...,q_NWherein q is_jDenotes the jth product v_jThe K-dimensional product feature vector of (1);

step 3, constructing a characteristic initialization layer:

step 3.1, defining the current updating times as t, and initializing t to be 0;

step 4.2, the ith user u updated for the t time_iUser feature vector of

T updated product feature vector of jth product

In the formula (3), the reaction mixture is,

step 5Obtaining the ith user u by using the formula (4)_iFor jth product v_jPredictive scoring of teacher models

In the formula (4), < > represents the vector inner product;

Step 7, constructing a characteristic initialization layer:

step 7.1, initializing t to be 0;

step 7.2, define and initialize the kth user attribute node of the t-th update

Defining the product feature vector of the jth updated product

And the t updated product feature vector of the jth product

In the formula (5), the reaction mixture is,

And jth product v_jProduct characterization of

In the formula (6), the reaction mixture is,

represents the ith user u_iThe user attribute node set of (2);

Step 11, according to the processes of the step 7 to the step 9, carrying out feature initialization and feature propagation on the product student model so as to obtain the ith user u output by the product student model_iUser meterSign for