CN113343113A - Cold start entity recommendation method for knowledge distillation based on graph convolution network - Google Patents
Cold start entity recommendation method for knowledge distillation based on graph convolution network Download PDFInfo
- Publication number
- CN113343113A CN113343113A CN202110755889.4A CN202110755889A CN113343113A CN 113343113 A CN113343113 A CN 113343113A CN 202110755889 A CN202110755889 A CN 202110755889A CN 113343113 A CN113343113 A CN 113343113A
- Authority
- CN
- China
- Prior art keywords
- user
- product
- matrix
- output
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000013140 knowledge distillation Methods 0.000 title claims abstract description 17
- 238000012512 characterization method Methods 0.000 claims abstract description 108
- 239000011159 matrix material Substances 0.000 claims abstract description 100
- 239000013598 vector Substances 0.000 claims description 115
- 230000006870 function Effects 0.000 claims description 25
- 239000011541 reaction mixture Substances 0.000 claims description 15
- 238000004821 distillation Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 230000003993 interaction Effects 0.000 description 6
- 238000001914 filtration Methods 0.000 description 5
- 238000012733 comparative method Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Molecular Biology (AREA)
- Economics (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a cold start entity recommendation method for knowledge distillation based on a graph convolution network, which comprises the following steps: 1. constructing an implicit feedback matrix of a user to a product, a user attribute matrix, a product attribute matrix, a user image adjacency matrix of a user student model and a product image adjacency matrix of a product student model; 2. constructing an input layer by means of one-hot coding and random initialization; 3. respectively carrying out feature propagation on the teacher model, the user student model and the product student model through graph convolution; 4. constructing a prediction layer to predict the grade of the product by the user; 5. fitting a real label updating characteristic matrix and an embedded characterization matrix of the attribute node according to the output result of the prediction layer; 6. and repeating the steps 3-5 until the new product recommendation effect of the new user reaches the optimal value. The method and the system can fully mine the high-order information of the graph and the potential association among the users, the products and the attribute nodes, thereby realizing the accurate recommendation of the cold start entity.
Description
Technical Field
The invention relates to the field of cold start recommendation, in particular to a cold start entity recommendation method for knowledge distillation based on a graph convolution network.
Background
The information overload problem in the internet era interferes with the judgment of users, and the recommendation system is successfully applied to various industries and comprises the following steps: e-commerce, music, video, education, etc. The recommendation system is mainly used for recommending commodities to the user in a personalized manner according to historical click records and click records of the user. The collaborative filtering model in the traditional recommendation is the most popular way, and the user preference and the product characteristics are obtained by mining the history. However, the new users or new products appear without history, and the recommendation model based on collaborative filtering is often limited to the recommendation of new products by new users.
In order to solve the recommendation problem of cold start entities (new users and new products), a collaborative filtering system of attribute information is introduced, the user attributes (gender, age, occupation, and the like) and the product attributes (category, service, environment, and the like) are utilized to perform characterization modeling on the user and the product, and the relation between a collaborative information space and an attribute special space is learned, so that personalized recommendation is effectively provided for the cold start entities. However, this method only obtains a simple summary space transfer function in two embedded vector characterization spaces, which limits the recommended performance of the cold-start entity.
By modeling user behavior data on products as a user-product bipartite graph, the present scoring matrix-based collaborative filtering embedded characterization model can be converted into a graph problem. The attribute information is introduced into the graph, so that attribute representation can be effectively learned, and new users or new products can be effectively represented. The existing attribute enhanced graph model recommendation system (node attribute initialization, attribute feature fusion embedded characterization model, etc.) can perform attribute characterization on new users or new products lacking historical interaction information, but the accuracy of attribute characterization is still to be improved. In addition, attribute information characterization and entity embedding characterization are mutually facilitated in the graph, and are not independent optimization learning. How to utilize the attribute information to the graph model to complete the accurate personalized recommendation of the cold start entity (new user and new product) becomes a problem which needs to be solved urgently.
Disclosure of Invention
The invention provides a cold-start entity recommendation method for knowledge distillation based on a graph convolution network, aiming at solving the defects of the prior art, so that the internal interaction and potential association between entity nodes and attribute nodes in a graph can be fully mined, and the relationship between an attribute characterization space and a corresponding entity embedding space is mined, thereby realizing more accurate recommendation of cold-start entities.
The invention adopts the following technical scheme for solving the technical problems:
the invention relates to a cold start entity recommendation method for knowledge distillation based on a graph convolution network, which is characterized by comprising the following steps of:
step 1, let U represent a user set, and U ═ U1,...,ui,...,ub,...,uM},uiRepresents the ith user, ubRepresenting the b-th user, M represents the total number of users, i is more than or equal to 1, and b is more than or equal to M; let V denote the product set, and V ═ V1,...,vj,...,vN},vjIt is indicated that the product of the jth,n represents the total number of products, j is more than or equal to 1 and less than or equal to N; let RijRepresents the ith user uiFor jth product vjIf the implicit feedback exists, the implicit feedback matrix R of the product is set as Rij}M×N;
Let the user attribute matrix be Represents the ith user uiD of (A)uA dimension attribute vector; order product attribute matrix Denotes the jth product vjD of (A)vA dimension attribute vector;
defining an embedded characterization matrix of user attribute nodes, and randomly initializing for the first time to A K-dimensional embedded characterization vector representing a kth user attribute node; randomly initializing the embedded characterization matrix of the user attribute node for the second time to beekA K-dimensional embedded characterization vector representing a kth user attribute node;
defining an embedded characterization matrix of product attribute nodes and randomly initializing for the first time to To representEmbedding a characterization vector into the K dimension of the ith product attribute node; randomly initializing the embedded characterization matrix of the product attribute nodes for the second time toflA K-dimensional embedded characterization vector representing the ith product attribute node;
using implicit feedback matrix R ═ Rij}M×NAnd a user attribute matrix ofConstructing a user graph adjacency matrix SU;
Using implicit feedback matrix R ═ Rij}M×NAnd a product attribute matrixConstruct the product drawing adjacency matrix SV;
Step 2, obtaining a characteristic matrix through single hot coding, comprising the following steps: a user characteristic matrix and a product characteristic matrix:
step 2.1, performing one-hot coding on the user set U, thereby constructing a user feature matrix P ═ P1,...,pi,...,pMIn which p isiRepresents the ith user uiK-dimensional user feature vectors of (1);
step 2.2, performing one-hot coding on the item set V, thereby constructing a product characteristic matrix Q ═ Q1,...,qj,...,qNWherein q isjDenotes the jth product vjThe K-dimensional product feature vector of (1);
step 3, constructing a characteristic initialization layer:
step 3.1, defining the current updating times as t, and initializing t to be 0;
step 3.2, define and initialize ith user u updated for the t timeiUser feature vector ofDefine and initialize the product of the jth product of the tth updateFeature vectorDefining and initializing K-dimensional embedded characterization vectors for kth updated user attribute nodesDefining and initializing K-dimensional embedded characterization vectors for the ith updated product attribute node
And 4, carrying out feature propagation on the teacher model through graph convolution:
step 4.1, defining the teacher model to comprise a T' layer graph convolution layer;
step 4.2, the ith user u updated for the t timeiUser feature vector ofT updated product feature vector of jth productK-dimensional embedded characterization vector of kth user attribute node of t-th updateAnd the t updated K-dimensional embedded characterization vector of the ith product attribute nodeInputting the teacher model for feature propagation, and respectively calculating the ith user u by using the formulas (1) and (2)iUser feature vector output from t +1 th convolution layer after t-th updateJth product vjThe characteristic vector of the product output by the t +1 th convolution layer after the t-th updateEmbedding characterization vector of user attribute node output by t +1 th convolution layer after t-time updating of kth user attribute nodeAnd the embedded characterization vector of the product attribute node output by the t +1 th convolution layer after the t-th update of the ith product attribute node
In the formulae (1) and (2), AiIs the ith user uiThe set of interacted product and user attribute nodes,is the jth product vjThe characteristic vector output by the t-th layer convolution layer after the t-th updating;is the embedding characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node; a. thejIs the jth product vjThe set of interacted user and product attribute nodes,is the characteristic vector output by the ith user at the t-th layer convolution layer after the t-th updating;is the t-th layer volume of the ith product attribute node after the t-th updateAn embedded characterization vector of the lamination output; a. thekIs the user set interacted by the kth user attribute node; a. thelIs the product set interacted by the ith product attribute node;
and 5, constructing a prediction layer of the teacher model according to the output of the T-th layer of graph convolution:
step 5.1, obtaining a user representation U of the teacher model by using the formula (3)iAnd product characterization Vj:
In the formula (3), the reaction mixture is,ith user u representing output of T' th layer of teacher modeliThe feature vector of the user of (2),j product v representing teacher model T' layer outputjThe product feature vector of (1);
step 5.2, obtaining the ith user u by using the formula (4)iFor jth product vjPredictive scoring of teacher models
In the formula (4), < > represents the vector inner product;
step 6, constructing an input layer of the user student model, and inputting an embedded characterization matrix of the user attribute nodes for the second random initializationProduct characteristic matrix Q ═ Q1,...,qj,...,qN}, user graph adjacency matrix SU;
Step 7, constructing a characteristic initialization layer:
step 7.1, initializing t to be 0;
step 7.2, define and initialize the kth user attribute node of the t-th updateDefining the product feature vector of the jth updated product
And 8, carrying out feature propagation on the user student model through the graph convolution:
step 8.1, defining the user student model to comprise a T' layer graph convolution layer;
step 8.2, embedding the K-dimension of the kth updated user attribute node into the characterization vectorAnd the t updated product feature vector of the jth productInputting the user student model for feature propagation, and calculating the embedded characterization vector of the user attribute node output by the t +1 th convolution layer after the kth updating of the kth user attribute node by using the formula (5)And jth product vjThe characteristic vector of the product output by the t +1 th convolution layer after the t-th update
In the formula (5), the reaction mixture is,is the k-th user attribute node in the user graph adjacency matrix SUThe corresponding set of products in (a) to (b),is the embedded characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node,is the jth product vjThe characteristic vector output by the t-th layer convolution layer after the t-th updating;is the jth product vjIn the user graph adjacency matrix SUTo a corresponding set of user attribute nodes.
Step 9, constructing a prediction layer of the user student model according to the user characteristics output by the T' th layer of the user student model:
step 9.1, obtaining the ith user u output by the user student model by using the formula (6)iUser characterization ofAnd jth product vjProduct characterization of
In the formula (6), the reaction mixture is,an embedded token vector representing the kth user attribute node output by the T' th layer of the user student model,representing users studentsJth product v output from T' th layer of modeljThe feature vector of the product of (1),represents the ith user uiThe user attribute node set of (2);
step 10, constructing an input layer of a product student model, and inputting an embedded characterization matrix of product attribute nodes for second random initializationUser feature matrix P ═ { P ═ P1,...,pi,...,pM}, product graph adjacency matrix SV;
Step 11, according to the processes of the step 7 to the step 9, carrying out feature initialization and feature propagation on the product student model so as to obtain the ith user u output by the product student modeliUser characterization ofAnd jth product vjProduct characterization of
Step 12, obtaining the ith user u by using the formula (7)iFor jth product vjUser student model and product student model of (1)
In the formula (7), the reaction mixture is,ith user u representing user student model outputiThe user characterization of (a) is performed,j product v representing product student model outputjThe product characterization of (1);
step 12.1, constructing a loss function L of the teacher model according to the formula (8)r:
In the formula (8), σ represents a sigmoid activation function; duRepresents the u-th user uuTraining data of (2);represents the u-th user uuFor the ith product viThe prediction score of (a);represents the u-th user uuFor jth product vjThe prediction score of (a); θ represents a parameter to be optimized; gamma denotes the coefficient of the regularization term;
step 12.2, constructing a knowledge distillation loss function L of the user student model by using the formula (9)uAnd knowledge distillation loss function L of product student modelv:
In formula (9), UiI-th user u representing teacher model outputiThe user characterization of (a) is performed,ith user u representing user student model outputiA user characterization of (a); vjJth product v output by instructor modeljThe product of (1) is characterized in that,j product v representing product student model outputjThe product characterization of (1);
step 12.3, constructing a loss function L of the score prediction of the user student model and the product student model by using the formula (10)s:
Ls=||Ug(Vg)T-UU(VI)T|| (10)
In the formula (10), UgA user feature matrix representing the teacher model output; vgA product feature matrix representing the teacher model output; u shapeUA user characterization matrix representing user student model outputs; vIA product representation matrix representing the output of the product student model;
and 12.4, obtaining an overall Loss function Loss (theta) of the whole network according to the formula (11):
Loss(θ)=Lr+λLu+μLv+ηLs (11)
in formula (11), λ, μ, η are the hyperparameters used to balance the distillation loss functions of the different fractions; solving the Loss function Loss (theta) by a gradient descent method to minimize the Loss (theta) so as to obtain an updated optimal parameter theta*;
Step 13, obtaining a new user u by using the updated user student modelcIs best characterizedObtaining new product v by using updated product student modeldIs best characterized
In the formula (12), the reaction mixture is,embedded token vector representing kth user attribute node of updated user student model T' th layer output,Indicating new user ucThe user attribute node set of (2); f. ofl T′An embedded characterization vector representing the ith product attribute node output by the updated product student model at layer T',indicating a new product vdThe set of product attribute nodes.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a cold start entity recommendation method for knowledge distillation based on a graph convolution network, aiming at the problem that users and products in a recommendation system lack historical interaction records under the condition of cold start entities (new users and new products), deeply mining the characteristic relation between collaborative filtering information and the entities and attributes, and iteratively updating the attributes and entity embedded characteristics, thereby effectively improving the accuracy of cold start entity recommendation.
2. The method processes the entity (user and product) set in a single hot coding mode, and is convenient for quickly performing index operation and calculation of the matrix, so that the method plays a role in expanding characteristics on the representation in recommendation.
3. The invention designs a heterogeneous graph to model implicit feedback data, user attribute data and product attribute data of a user to a product into nodes and connection relations in a teacher model, constructs indirect connection between entities and attributes into graph tie matrixes of a user student model and a product student model, introduces a knowledge distillation technology, and leads entity representations output by the teacher model to guide corresponding entity attribute representations output by the student model, thereby solving the recommendation problem of cold start entities.
4. The method adopts the graph convolution mode to carry out representation learning on the interaction information between the user and the product and the interaction information between the entity and the attribute, and the propagation is carried out through the graph convolution, so that the high-order similarity can be better captured, and the more accurate attribute representation can be learned.
5. According to the method, the node representation embedded matrix of the entity and the attribute is updated according to the prediction result of the teacher model and the distillation knowledge of the student model, iterative learning is carried out on the whole neural network, and the recommendation precision of the cold start entity is effectively improved.
Drawings
FIG. 1 is a flow chart of the cold start entity recommendation method for knowledge distillation based on graph convolution network of the present invention.
Detailed Description
In this embodiment, as shown in fig. 1, the method for recommending a cold start entity based on knowledge distillation by using a graph convolution network is performed according to the following steps:
step 1, let U represent a user set, and U ═ U1,...,ui,...,ub,...,uM},uiRepresents the ith user, ubRepresenting the b-th user, M represents the total number of users, i is more than or equal to 1, and b is more than or equal to M; let V denote the product set, and V ═ V1,...,vj,...,vN},vjRepresents the jth product, N represents the total number of products, and j is more than or equal to 1 and less than or equal to N; let RijRepresents the ith user uiFor jth product vjIf the implicit feedback exists, the implicit feedback matrix R of the product is set as Rij}M×NIf the ith user uiFor jth item vjWith implicit feedback recording, then RijNot all right 1, otherwise Rij=0;
Let the user attribute matrix be Represents the ith user uiD of (A)uA dimension attribute vector; order product attribute matrix Denotes the jth product vjD of (A)vA dimension attribute vector;
defining an embedded characterization matrix of user attribute nodes, and randomly initializing for the first time to A K-dimensional embedded characterization vector representing a kth user attribute node; randomly initializing the embedded characterization matrix of the user attribute nodes for the second time toekA K-dimensional embedded characterization vector representing a kth user attribute node;
defining an embedded characterization matrix of product attribute nodes and randomly initializing for the first time to A K-dimensional embedded characterization vector representing the ith product attribute node; randomly initializing the embedded characterization matrix of the product attribute nodes for the second time toflA K-dimensional embedded characterization vector representing the ith product attribute node;
using implicit feedback matrix R ═ Rij}M×NAnd a user attribute matrix ofConstructing a user graph adjacency matrix SU;
Using implicit feedback matrix R ═ Rij}M×NAnd a product attribute matrixConstruct the product drawing adjacency matrix SV;
Step 2, obtaining a characteristic matrix through single hot coding, comprising the following steps: a user characteristic matrix and a product characteristic matrix:
step 2.1, performing one-hot coding on the user set U, thereby constructing a user feature matrix P ═ P1,...,pi,...,pMIn which p isiRepresents the ith user uiK-dimensional user feature vectors of (1);
step 2.2, performing one-hot coding on the item set V, thereby constructing a product characteristic matrix Q ═ Q1,...,qj,...,qNWherein q isjDenotes the jth product vjThe K-dimensional product feature vector of (1);
step 3, constructing a characteristic initialization layer:
step 3.1, defining the current updating times as t, and initializing t to be 0;
step 3.2, define and initialize ith user u updated for the t timeiUser feature vector ofDefining and initializing the product feature vector of the jth updated jth productDefining and initializing K-dimensional embedded characterization vectors for kth updated user attribute nodesDefining and initializing K-dimensional embedded characterization vectors for the ith updated product attribute node
And 4, carrying out feature propagation on the teacher model through graph convolution:
step 4.1, defining that the teacher model comprises a T' layer graph convolution layer;
step 4.2, updating the ith user u for the t timeiUser feature vector ofT updated product feature vector of jth productK-dimensional embedded characterization vector of kth user attribute node of t-th updateAnd the t updated K-dimensional embedded characterization vector of the ith product attribute nodeInputting a teacher model for feature propagation, and respectively calculating the ith user u by using an equation (1) and an equation (2)iUser feature vector output from t +1 th convolution layer after t-th updateJth product vjThe characteristic vector of the product output by the t +1 th convolution layer after the t-th updateEmbedding characterization vector of user attribute node output by t +1 th convolution layer after t-time updating of kth user attribute nodeAnd the embedded characterization vector of the product attribute node output by the t +1 th convolution layer after the t-th update of the ith product attribute node
In the formulae (1) and (2), AiIs the ith user uiThe set of interacted product and user attribute nodes,is the jth product vjThe characteristic vector output by the t-th layer convolution layer after the t-th updating;is the embedding characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node; a. thejIs the jth product vjThe set of interacted user and product attribute nodes,is the characteristic vector output by the ith user at the t-th layer convolution layer after the t-th updating;is the embedded characterization vector output by the tth layer convolution layer after the tth update of the ith product attribute node; a. thekIs the user set interacted by the kth user attribute node; a. thelIs the product set interacted by the ith product attribute node;
and 5, constructing a prediction layer of the teacher model according to the output of the T-th layer of graph convolution:
step 5.1, obtaining a user representation U of the teacher model by using the formula (3)iAnd product characterization Vj:
In the formula (3), the reaction mixture is,ith user u representing output of T' th layer of teacher modeliThe feature vector of the user of (2),j product v representing teacher model T' layer outputjThe product feature vector of (1);
step 5.2, obtaining the ith user u by using the formula (4)iFor jth product vjPredictive scoring of teacher models
In the formula (4), < > represents the vector inner product;
step 6, constructing an input layer of the user student model, and inputting an embedded characterization matrix of the user attribute nodes for the second random initializationProduct characteristic matrix Q ═ Q1,...,qj,...,qN}, user graph adjacency matrix SU;
Step 7, constructing a characteristic initialization layer:
step 7.1, initializing t to be 0;
step 7.2, define and initialize the kth user attribute node of the t-th updateDefining the product feature vector of the jth updated product
And 8, carrying out feature propagation on the user student model through the graph convolution:
step 8.1, defining a user student model to comprise a T' layer graph convolution layer;
step 8.2, embedding the K dimension of the kth updated user attribute node into the characterization vectorAnd the t updated product feature vector of the jth productInputting a user student model for feature propagation, and calculating a user attribute node embedded characterization vector output by the t +1 th convolution layer after the kth updating of the kth user attribute node by using the formula (5)And jth product vjThe characteristic vector of the product output by the t +1 th convolution layer after the t-th update
In the formula (5), the reaction mixture is,is the k-th user attribute node in the user graph adjacency matrix SUThe corresponding set of products in (a) to (b),is the embedded characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node,is the jth product vjThe characteristic vector output by the t-th layer convolution layer after the t-th updating;is the jth product vjIn the user graph adjacency matrix SUTo a corresponding set of user attribute nodes.
Step 9, constructing a prediction layer of the user student model according to the user characteristics output by the T' th layer of the user student model:
step 9.1, obtaining the ith user u output by the user student model by using the formula (6)iUser characterization ofAnd jth product vjProduct characterization of
In the formula (6), the reaction mixture is,an embedded token vector representing the kth user attribute node output by the T' th layer of the user student model,j product v representing user student model T' layer outputjThe feature vector of the product of (1),represents the ith user uiThe user attribute node set of (2);
step 10, constructing an input layer of a product student model, and inputting an embedded characterization matrix of product attribute nodes for second random initializationUser feature matrix P ═ { P ═ P1,...,pi,...,pM}, product graph adjacency matrix SV;
Step 11, according to the processes of the step 7 to the step 9, carrying out feature initialization and feature propagation on the product student model so as to obtain the ith user u output by the product student modeliUser characterization ofAnd jth product vjProduct characterization of
Step 12, obtaining the ith user u by using the formula (7)iFor jth product vjUser student model and product student model of (1)
In the formula (7), the reaction mixture is,ith user u representing user student model outputiThe user characterization of (a) is performed,j product v representing product student model outputjThe product characterization of (1);
step 12.1, constructing a loss function L of the teacher model according to the formula (8)r:
In the formula (8), σ represents a sigmoid activation function; duRepresents the u-th user uuTraining data of (2);represents the u-th user uuFor the ith product viThe prediction score of (a);represents the u-th user uuFor the firstj products vjThe prediction score of (a); θ represents a parameter to be optimized; gamma denotes the coefficient of the regularization term;
step 12.2, constructing a knowledge distillation loss function L of the user student model by using the formula (9)uAnd knowledge distillation loss function L of product student modelv:
In formula (9), UiI-th user u representing teacher model outputiThe user characterization of (a) is performed,ith user u representing user student model outputiA user characterization of (a); vjJth product v output by instructor modeljThe product of (1) is characterized in that,j product v representing product student model outputjThe product characterization of (1);
step 12.3, constructing a loss function L of the score prediction of the user student model and the product student model by using the formula (10)s:
Ls=||Ug(Vg)T-UU(VI)T|| (10)
In the formula (10), UgA user feature matrix representing the teacher model output; vgA product feature matrix representing the teacher model output; u shapeUA user characterization matrix representing user student model outputs; vIA product representation matrix representing the output of the product student model;
and 12.4, obtaining an overall Loss function Loss (theta) of the whole network according to the formula (11):
Loss(θ)=Lr+λLu+μLv+ηLs (11)
in formula (11), λ, μ, η are the super-parameters for balancing the distillation loss function of the different fractionsCounting; solving the Loss function Loss (theta) by a gradient descent method to minimize the Loss (theta) so as to obtain an updated optimal parameter theta*;
Step 13, cold starting entities, namely new users and new products, and obtaining new users u by using the updated user student modelscIs best characterizedObtaining new product v by using updated product student modeldIs best characterized
In the formula (12), the reaction mixture is,an embedded token vector representing the kth user attribute node output by the T' th layer of the updated user student model,indicating new user ucThe user attribute node set of (2); f. ofl T′An embedded characterization vector representing the ith product attribute node output by the updated product student model at layer T',indicating a new product vdThe set of product attribute nodes.
Example (b):
to verify the effectiveness of the method, the invention employs three public data sets that are commonly used in recommendations: yelp, Amazon-Video Games, XING. Less than 3 users are scored per data set screening, and 30% of users are randomly drawn from all users as new users and 30% of products are drawn from all products as new products. The interaction records and corresponding attributes of the old user and the old product are used as training data. The method divides new user new product recommendation under different scenes into three tasks: recommending old products to a new user as a task one; and recommending the new product to the old user as a task II, and recommending the new product to the new user as a task III.
Corresponding to the product recommendation task, the invention adopts Hit Ratio (HR) and Normalized distributed relational gateway (NDCG) as evaluation criteria. The invention selects 8 methods for effect comparison, namely KNN, DropoutNet, Linmap, XDeepFM, CDL, Heater, Pinpage and Student.
TABLE 1 recommendation results for three tasks on a Yelp dataset for the methods of the present invention and comparison
TABLE 2 recommendation results for task two on Amazon-Video Games dataset by the method of the present invention and the comparative method
TABLE 3 recommendation results of three tasks on XING data set by the method and the comparative method of the present invention
As shown in Table 1, Table 2 and Table 3, the results of the experiments on the Yelp, Amazon-Video Games and XING data sets are shown respectively, and it can be seen that the method provided by the invention is superior to the 8 comparison methods in both the HR index and the NDCG index in three data sets.
The method (PGD) provided by the invention is remarkably superior to a plurality of comparative methods on three data sets by integrating three tasks under the cold start condition, thereby proving the feasibility of the method provided by the invention.
Claims (1)
1. A cold start entity recommendation method for knowledge distillation based on graph convolution network is characterized by comprising the following steps:
step 1, let U represent a user set, and U ═ U1,...,ui,...,ub,...,uM},uiRepresents the ith user, ubRepresenting the b-th user, M represents the total number of users, i is more than or equal to 1, and b is more than or equal to M; let V denote the product set, and V ═ V1,...,vj,...,vN},vjRepresents the jth product, N represents the total number of products, and j is more than or equal to 1 and less than or equal to N; let RijRepresents the ith user uiFor jth product vjIf the implicit feedback exists, the implicit feedback matrix R of the product is set as Rij}M×N;
Let the user attribute matrix be Represents the ith user uiD of (A)uA dimension attribute vector; order product attribute matrix Denotes the jth product vjD of (A)vA dimension attribute vector;
defining an embedded characterization matrix of user attribute nodes, and randomly initializing for the first time to A K-dimensional embedded characterization vector representing a kth user attribute node; randomly initializing the embedded characterization matrix of the user attribute node for the second time to beekA K-dimensional embedded characterization vector representing a kth user attribute node;
defining an embedded characterization matrix of product attribute nodes and randomly initializing for the first time to A K-dimensional embedded characterization vector representing the ith product attribute node; randomly initializing the embedded characterization matrix of the product attribute nodes for the second time toflA K-dimensional embedded characterization vector representing the ith product attribute node;
using implicit feedback matrix R ═ Rij}M×NAnd a user attribute matrix ofConstructing a user graph adjacency matrix SU;
Using implicit feedback matrix R ═ Rij}M×NAnd a product attribute matrixConstruct the product drawing adjacency matrix SV;
Step 2, obtaining a characteristic matrix through single hot coding, comprising the following steps: a user characteristic matrix and a product characteristic matrix:
step 2.1, performing one-hot coding on the user set U, thereby constructing a user feature matrix P ═ P1,...,pi,...,pMIn which p isiRepresents the ith user uiK-dimensional user feature vectors of (1);
step 2.2, carrying out independent hot coding on the item set V, thereby constructing a productThe product characteristic matrix Q ═ Q1,...,qj,...,qNWherein q isjDenotes the jth product vjThe K-dimensional product feature vector of (1);
step 3, constructing a characteristic initialization layer:
step 3.1, defining the current updating times as t, and initializing t to be 0;
step 3.2, define and initialize ith user u updated for the t timeiUser feature vector ofDefining and initializing the product feature vector of the jth updated jth productDefining and initializing K-dimensional embedded characterization vectors for kth updated user attribute nodesDefining and initializing K-dimensional embedded characterization vectors for the ith updated product attribute node
And 4, carrying out feature propagation on the teacher model through graph convolution:
step 4.1, defining the teacher model to comprise a T' layer graph convolution layer;
step 4.2, the ith user u updated for the t timeiUser feature vector ofT updated product feature vector of jth productK-dimensional embedded characterization vector of kth user attribute node of t-th updateAnd the t updated K-dimensional embedded characterization vector of the ith product attribute nodeInputting the teacher model for feature propagation, and respectively calculating the ith user u by using the formulas (1) and (2)iUser feature vector output from t +1 th convolution layer after t-th updateJth product vjThe characteristic vector of the product output by the t +1 th convolution layer after the t-th updateEmbedding characterization vector of user attribute node output by t +1 th convolution layer after t-time updating of kth user attribute nodeAnd the embedded characterization vector of the product attribute node output by the t +1 th convolution layer after the t-th update of the ith product attribute node
In the formulae (1) and (2), AiIs the ith user uiThe set of interacted product and user attribute nodes,is the jth product vjThe characteristic vector output by the t-th layer convolution layer after the t-th updating;is the embedding characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node; a. thejIs the jth product vjThe set of interacted user and product attribute nodes,is the characteristic vector output by the ith user at the t-th layer convolution layer after the t-th updating;is the embedded characterization vector output by the tth layer convolution layer after the tth update of the ith product attribute node; a. thekIs the user set interacted by the kth user attribute node; a. thelIs the product set interacted by the ith product attribute node;
and 5, constructing a prediction layer of the teacher model according to the output of the T-th layer of graph convolution:
step 5.1, obtaining a user representation U of the teacher model by using the formula (3)iAnd product characterization Vj:
In the formula (3), the reaction mixture is,ith user u representing output of T' th layer of teacher modeliThe feature vector of the user of (2),j product v representing teacher model T' layer outputjThe product feature vector of (1);
step 5Obtaining the ith user u by using the formula (4)iFor jth product vjPredictive scoring of teacher models
In the formula (4), < > represents the vector inner product;
step 6, constructing an input layer of the user student model, and inputting an embedded characterization matrix of the user attribute nodes for the second random initializationProduct characteristic matrix Q ═ Q1,...,qj,...,qN}, user graph adjacency matrix SU;
Step 7, constructing a characteristic initialization layer:
step 7.1, initializing t to be 0;
step 7.2, define and initialize the kth user attribute node of the t-th updateDefining the product feature vector of the jth updated product
And 8, carrying out feature propagation on the user student model through the graph convolution:
step 8.1, defining the user student model to comprise a T' layer graph convolution layer;
step 8.2, embedding the K-dimension of the kth updated user attribute node into the characterization vectorAnd the t updated product feature vector of the jth productInputting the user student model for feature propagation, and calculating the embedded characterization vector of the user attribute node output by the t +1 th convolution layer after the kth updating of the kth user attribute node by using the formula (5)And jth product vjThe characteristic vector of the product output by the t +1 th convolution layer after the t-th update
In the formula (5), the reaction mixture is,is the k-th user attribute node in the user graph adjacency matrix SUThe corresponding set of products in (a) to (b),is the embedded characterization vector output by the kth layer convolution layer after the kth updating of the kth user attribute node,is the jth product vjThe characteristic vector output by the t-th layer convolution layer after the t-th updating;is the jth product vjIn the user graph adjacency matrix SUTo a corresponding set of user attribute nodes.
Step 9, constructing a prediction layer of the user student model according to the user characteristics output by the T' th layer of the user student model:
step 9.1, obtaining the ith user u output by the user student model by using the formula (6)iUser characterization ofAnd jth product vjProduct characterization of
In the formula (6), the reaction mixture is,an embedded token vector representing the kth user attribute node output by the T' th layer of the user student model,j product v representing user student model T' layer outputjThe feature vector of the product of (1),represents the ith user uiThe user attribute node set of (2);
step 10, constructing an input layer of a product student model, and inputting an embedded characterization matrix of product attribute nodes for second random initializationUser feature matrix P ═ { P ═ P1,...,pi,...,pM}, product graph adjacency matrix SV;
Step 11, according to the processes of the step 7 to the step 9, carrying out feature initialization and feature propagation on the product student model so as to obtain the ith user u output by the product student modeliUser meterSign forAnd jth product vjProduct characterization of
Step 12, obtaining the ith user u by using the formula (7)iFor jth product vjUser student model and product student model of (1)
In the formula (7), the reaction mixture is,ith user u representing user student model outputiThe user characterization of (a) is performed,j product v representing product student model outputjThe product characterization of (1);
step 12.1, constructing a loss function L of the teacher model according to the formula (8)r:
In the formula (8), σ represents a sigmoid activation function; duRepresents the u-th user uuTraining data of (2);represents the u-th user uuFor the ith product viThe prediction score of (a);represents the u-th user uuFor jth product vjThe prediction score of (a); θ represents a parameter to be optimized; gamma denotes the coefficient of the regularization term;
step 12.2, constructing a knowledge distillation loss function L of the user student model by using the formula (9)uAnd knowledge distillation loss function L of product student modelv:
In formula (9), UiI-th user u representing teacher model outputiThe user characterization of (a) is performed,ith user u representing user student model outputiA user characterization of (a); vjJth product v output by instructor modeljThe product of (1) is characterized in that,j product v representing product student model outputjThe product characterization of (1);
step 12.3, constructing a loss function L of the score prediction of the user student model and the product student model by using the formula (10)s:
Ls=||Ug(Vg)T-UU(VI)T|| (10)
In the formula (10), UgA user feature matrix representing the teacher model output; vgA product feature matrix representing the teacher model output; u shapeUA user characterization matrix representing user student model outputs; vIA product representation matrix representing the output of the product student model;
and 12.4, obtaining an overall Loss function Loss (theta) of the whole network according to the formula (11):
Loss(θ)=Lr+λLu+μLv+ηLs (11)
in formula (11), λ, μ, η are the hyperparameters used to balance the distillation loss functions of the different fractions; solving the Loss function Loss (theta) by a gradient descent method to minimize the Loss (theta) so as to obtain an updated optimal parameter theta*;
Step 13, obtaining a new user u by using the updated user student modelcIs best characterizedObtaining new product v by using updated product student modeldIs best characterized
In the formula (12), the reaction mixture is,an embedded token vector representing the kth user attribute node output by the T' th layer of the updated user student model,indicating new user ucThe user attribute node set of (2); f. ofl T′An embedded characterization vector representing the ith product attribute node output by the updated product student model at layer T',indicating a new product vdThe set of product attribute nodes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110755889.4A CN113343113A (en) | 2021-07-05 | 2021-07-05 | Cold start entity recommendation method for knowledge distillation based on graph convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110755889.4A CN113343113A (en) | 2021-07-05 | 2021-07-05 | Cold start entity recommendation method for knowledge distillation based on graph convolution network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113343113A true CN113343113A (en) | 2021-09-03 |
Family
ID=77482453
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110755889.4A Pending CN113343113A (en) | 2021-07-05 | 2021-07-05 | Cold start entity recommendation method for knowledge distillation based on graph convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113343113A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115545822A (en) * | 2022-09-20 | 2022-12-30 | 中国电信股份有限公司 | Product attribute recommendation method and device, computer storage medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291270A (en) * | 2020-03-02 | 2020-06-16 | 合肥工业大学 | Attribute reasoning and product recommendation method based on self-adaptive graph convolution network |
US20200311552A1 (en) * | 2019-03-25 | 2020-10-01 | Samsung Electronics Co., Ltd. | Device and method for compressing machine learning model |
CN112861936A (en) * | 2021-01-26 | 2021-05-28 | 北京邮电大学 | Graph node classification method and device based on graph neural network knowledge distillation |
-
2021
- 2021-07-05 CN CN202110755889.4A patent/CN113343113A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200311552A1 (en) * | 2019-03-25 | 2020-10-01 | Samsung Electronics Co., Ltd. | Device and method for compressing machine learning model |
CN111291270A (en) * | 2020-03-02 | 2020-06-16 | 合肥工业大学 | Attribute reasoning and product recommendation method based on self-adaptive graph convolution network |
CN112861936A (en) * | 2021-01-26 | 2021-05-28 | 北京邮电大学 | Graph node classification method and device based on graph neural network knowledge distillation |
Non-Patent Citations (1)
Title |
---|
SHUAI WANG 等: "《Privileged Graph Distillation for Cold Start Recommendation》", 《ARXIV》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115545822A (en) * | 2022-09-20 | 2022-12-30 | 中国电信股份有限公司 | Product attribute recommendation method and device, computer storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110516160B (en) | Knowledge graph-based user modeling method and sequence recommendation method | |
CN105740401B (en) | A kind of interested site recommended method and device based on individual behavior and group interest | |
CN111291270B (en) | Attribute reasoning and product recommendation method based on self-adaptive graph convolution network | |
CN112487199B (en) | User characteristic prediction method based on user purchasing behavior | |
CN113297370B (en) | End-to-end multi-modal question-answering method and system based on multi-interaction attention | |
CN108921657A (en) | A kind of sequence of recommendation method of knowledge based enhancing memory network | |
CN109584006B (en) | Cross-platform commodity matching method based on deep matching model | |
CN115186097A (en) | Knowledge graph and reinforcement learning based interactive recommendation method | |
Fuge et al. | Automatically inferring metrics for design creativity | |
CN112800344B (en) | Deep neural network-based movie recommendation method | |
CN111723285A (en) | Depth spectrum convolution collaborative filtering recommendation method based on scores | |
CN114722182A (en) | Knowledge graph-based online class recommendation method and system | |
CN105701225A (en) | Cross-media search method based on unification association supergraph protocol | |
CN114386513A (en) | Interactive grading prediction method and system integrating comment and grading | |
WO2023050232A1 (en) | Asset value evaluation method and apparatus, model training method and apparatus, and readable storage medium | |
CN115631008B (en) | Commodity recommendation method, device, equipment and medium | |
CN113343113A (en) | Cold start entity recommendation method for knowledge distillation based on graph convolution network | |
US20240037133A1 (en) | Method and apparatus for recommending cold start object, computer device, and storage medium | |
CN112256918A (en) | Short video click rate prediction method based on multi-mode dynamic routing | |
CN116610874A (en) | Cross-domain recommendation method based on knowledge graph and graph neural network | |
CN115545834A (en) | Personalized service recommendation method based on graph neural network and metadata | |
CN115840853A (en) | Course recommendation system based on knowledge graph and attention network | |
CN112784153B (en) | Tourist attraction recommendation method integrating attribute feature attention and heterogeneous type information | |
CN112818196B (en) | Data processing method, equipment, electronic device and storage medium based on electronic learning platform | |
CN108549979B (en) | Open-source software development team extension method based on precise embedded representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210903 |
|
RJ01 | Rejection of invention patent application after publication |