CN112364242A

CN112364242A - Graph convolution recommendation system for context-aware type

Info

Publication number: CN112364242A
Application number: CN202011249269.5A
Authority: CN
Inventors: 何向南; 吴剑灿; 王翔; 陈伟健
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2021-02-12
Anticipated expiration: 2040-11-10
Also published as: CN112364242B

Abstract

The invention discloses a graph convolution recommending system for context sensing, which comprises: an encoder, a graph convolution layer, and a decoder; the encoder associates a hidden space vector for each nonzero feature of the input user information, article information and context information, and combines the hidden space vectors from three domains of the user information, the article information and the context information; in the graph convolution layer, based on a pre-constructed user-article bipartite graph with attributes, and combining the output of an encoder to carry out graph convolution operation, and obtaining the final characteristic representation of the user and the article through a plurality of times of graph convolution operation; and the decoder predicts the preference degree of the user to the article under the context information based on the associated embedded set of the user and the final characteristic representation and the context information of the article. The system is a general recommendation system framework suitable for online service, can combine various auxiliary information, can capture a collaborative filtering effect, and improves the performance of a model.

Description

Graph convolution recommendation system for context-aware type

Technical Field

The invention relates to the field of recommendation systems and graph data mining, in particular to a context-aware graph convolution recommendation system.

Background

Personalized recommendation systems have become an indispensable service in the current internet as an important tool for alleviating information overload and improving user experience. The collaborative filtering model is one of the most representative recommendation models, and maps each user and item to a high-dimensional vector space by using historical interaction records of the user and the item, such as clicking, purchasing and the like, and personalized recommendation is performed by calculating the similarity between vectors. Recently, as Graph Neural Networks (GNNs) have enjoyed great success in the fields of image processing, natural language processing, and the like, more and more researchers have introduced GNNs into recommendation systems to model collaborative filtering signals into the high-order connectivity of user-item bipartites, thereby improving the performance of the model. Although providing a general solution, the collaborative filtering model has some inherent disadvantages, such as the inability to utilize context information related to interaction, i.e., the user-item bipartite graph does not contain context information, which can often have a significant impact on the user's selection. For example, in a catering recommendation scene, two factors, namely time and place, can effectively filter out an inappropriate candidate set, and in an e-commerce recommendation scene, the purchasing tendency of a user is often highly similar to the recent consumption behavior. Therefore, it is important to develop a Context-Aware Recommender System (CARS) that comprehensively considers various auxiliary information.

The existing CARS model generally follows the paradigm of a Factorization Model (FM) to transform the problem into a standard supervised learning task. Specifically, all information related to one-time interaction record is encoded into a feature vector in a Multi-hot Encoding mode, and then interaction among features is modeled by adopting different feature interaction modules, so that the preference of a user for an article in the record is predicted. With the rise of neural networks in recent years, feature interaction modules are replaced by neural networks of various structures to enhance the expression ability thereof. In a comprehensive analysis of recent CARS developments, we found that they suffer from the following disadvantages: 1) the method adopts a standard supervised learning strategy, neglects linkage among data samples, and leads the learned model to not capture the collaborative filtering effect well, because a plurality of interactive records need to be considered simultaneously when the collaborative filtering effect is identified; 2) they tend to be highly complex because they employ well-designed network architectures to model complex feature interactions, and when served online, this inefficient, time-consuming inference (inference) strategy is not suitable for online service, requiring a forward propagation through the network for each user-candidate item pair.

Disclosure of Invention

The invention aims to provide a context-aware graph convolution recommendation system, which is a universal recommendation system framework suitable for online service, can combine various auxiliary information, can capture a collaborative filtering effect, and improves the performance of a model.

The purpose of the invention is realized by the following technical scheme:

a graph convolution recommendation system for context-aware type, comprising: an encoder, a graph convolution layer, and a decoder;

the encoder associates a hidden space vector for each nonzero feature of input user information, article information and context information, combines the hidden space vectors from three domains of the user information, the article information and the context information, and outputs an initial characterization of a user and an article and an associated embedded set of the context information;

in the graph convolution layer, based on a pre-constructed user-article bipartite graph with attributes, and combining the output of an encoder to carry out graph convolution operation, and obtaining the final characteristic representation of the user and the article through a plurality of times of graph convolution operation;

and the decoder predicts the preference degree of the user for the item under the context information based on the associated embedded set of the user and the final characteristic representation and the context information of the item.

Compared with the existing CARS model based on the neural network, the technical scheme provided by the invention mainly has the following advantages: 1) the accuracy of the test is obviously improved. 2) The system parameters are small, and the model inference speed is high. 3) The graph convolution operation can effectively improve the expression and generalization capability of the system.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a diagram illustrating a context-aware graph convolution recommendation system according to an embodiment of the present invention;

FIG. 2 is a diagram of a data structure and its conversion into a user-item bipartite graph with attributes according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating the number of convolution layers and whether the convolution layers affect context modeling under two data sets according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a context-aware graph convolution recommendation system, which is a universal recommendation system framework suitable for online service, can be combined with various auxiliary information, can capture a collaborative filtering effect, and improves the performance of a model.

As shown in fig. 1, a schematic diagram of a graph convolution recommendation system for context-aware type mainly includes: an encoder, a graph convolution layer, and a decoder;

the encoder associates a hidden space vector for each non-zero feature of the input user information, article information and context information, combines the hidden space vectors from three domains of the user information, the article information and the context information, and outputs an initial characterization of the user and the article and an associated embedding (embedding) set of the context information;

For ease of understanding, the following detailed description is provided for the principles and operation of the above-described system.

I, data structure.

In order to efficiently organize various information related to interaction, the information is divided into four types: user and their static representation, item and its static attributes, dynamic context information, and interaction records. In the embodiment of the present invention, the context information is an abstract description of a real interaction scene, and therefore, the context information may also be referred to as a context scene. The user static representation (i.e. the user information mentioned above) refers to the attributes of the user, such as the age, occupation, etc. The static attribute of the article (i.e. the article information described above) refers to the related attribute information of the article.

As shown in fig. 2, the above information constitutes a mixed data structure, which has stronger description capability than the traditional CARS data form, for example, it can describe that a user interacts with the same item in different context scenarios. Further, the mixed data structure composed of the information is converted into a user-article bipartite graph with attributes, so that the user-article bipartite graph is suitable for the graph neural network. The right side of FIG. 2 is a user-item bipartite graph, where u and i are a user and an item pass through, respectivelyA multi-hot coded high-dimensional sparse feature vector; c represents a feature vector of context information after multi-hot coding, the bipartite graph comprises two types of nodes, namely a user node (represented by a rectangle) and an article node (represented by a circle), and the features of the user node and the article node are the features of the encoder output respectively; the continuous edge of the user node and the article node represents the interaction records of the user node and the article node, and the characteristic of the continuous edge is a context characteristic (namely c)₁、c₂)。

And secondly, the structure of the system model and the working process thereof.

The system model result is shown in fig. 1, and mainly includes three parts, namely an encoder, a graph convolution layer and a decoder.

1. An encoder.

The input of the encoder is a sparse feature vector subjected to multi-hot encoding, and the sparse feature vector has only a few non-zero elements, namely 'non-zero features'. Taking the user domain as an example, one user information may include: the method comprises the steps that the characteristics of user ID, age, gender, occupation and the like are subjected to multi-hot coding to obtain a high-dimensional sparse characteristic vector, each non-zero characteristic in the characteristic vector is associated with a hidden space vector, and therefore a user has a plurality of hidden space vectors to describe the characteristic vector. Then, the encoder combines the hidden space vectors to obtain the initial characterization of the user, and the operation of the encoder on the article domain is similarly expressed by a formula as follows:

in the above formula, u and i respectively represent indexes of a user and an article, P represents an embedding matrix associated with all user features (each feature is associated with one hidden space vector, all the hidden space vectors are spliced into one matrix, and the k-th row of the matrix represents the embedding vector of the k-th user feature and is marked as P_k) Q represents andembedding matrix associated with article characteristics (the l-th row of the matrix represents the embedding vector of the l-th article characteristic and is marked as Q_l) And | u | and | i | represent the number of non-zero features of user u and item i, respectively.

It should be noted that in the latter graph convolution layer, we do not update the embedded vector of the context information, so the encoder does not pool it, but only performs the correlation of the original feature with the hidden space vector. Thus, the output of the encoder includes the user u initial characterization

And article i initial characterization

And a set of implicit space vector components associated with context information c

(i.e., the aforementioned associated embedded set of context information), where s is a non-zero feature in c, v_sRepresenting the implicit space vector associated with feature s.

In the embodiment of the invention, for the user, the characteristics such as gender, age, occupation and the like can be included; the item may include the brand, category, price, and other features of the item, and the information may be different according to the scene. The present invention is not limited to the types of user characteristics and article characteristics, and in theory, the present invention can process any type of information.

Those skilled in the art will appreciate that the dimension of the hidden space vector is a super parameter of the model, and the dimension can be adjusted for different scenes.

2. The layers are scrolled.

The graph convolutional layer is used for solving the defects of the existing CARS model based on the supervised learning strategy, and improves the characterization of users and articles by explicitly capturing the synergistic filtering effect by utilizing all interaction data of the users and the articles. In a pre-constructed bi-section graph with attributes, the connecting edges between the user nodes and the item nodes carry context features which are important for understanding the interaction mode depending on the context, so that more accurate user and item characteristics can be learned if the context features are integrated into the graph volume operation. Based on this, the embodiment of the present invention proposes a new graph convolution operation, which is expressed as:

wherein the content of the first and second substances,

and

respectively representing all observed interaction record sets of a user u and an article i, (i, c) representing an article in one interaction record and a context information binary group thereof, and (u, c) representing a user in one interaction record and a context information binary group thereof;

a set of implicit space vectors associated with a context feature c representing the encoder output;

and

representing regularization;

respectively representing the characteristic representations of users and articles obtained by the convolution operation of the first layer of graph;

each represents the characteristic representation of the user and the article obtained by the (l + 1) th layer graph convolution operation.

Analysis of the rationality of the graph convolution operation: from the point of view of the user,

regularization, which can avoid unreasonable increase of the characterization value caused by increase of the graph convolution layer number; the characteristic representation of the context is introduced into the graph volume operation in a mean value mode and is added into the user representation, and the processing mode can achieve the following effects: if a user prefers to interact with an item in a context scenario, the user and the item and the representation of the context will be similar; by stacking multiple such graph convolution layers, a user node will collect information of multi-hop neighbors (the neighbor nodes may be either the user node or the item node), thereby improving the user's characterization. Similar analysis results can be obtained from the perspective of the article, and are not described in detail here.

Considering that the semantics of different graph volume layers are different, after the graph volume operation of the L layers, the following modes are adopted for integration to obtain the final characteristic representation of the user and the article:

wherein alpha is_lDenotes the weight of the l-th layer, p_u、q_iAnd respectively representing the final characteristics of the user u and the item i.

3. A decoder.

The decoder functions to predict the user's preference for the item under the context information using the representations obtained by the graph convolution layer. FM can be used as a core component because it is a linear model and has better interpretability than multi-layer perceptrons, which is formulated as follows:

wherein the content of the first and second substances,

the preference degree of the user u for the item i under the context information c predicted by the decoder;

that is to say

User's final characterization p comprising graph convolution layer output_uFinal characterization of the article q_iAnd associated embedded sets of context information

v_k、v_tAre all embedded sets

Of (1).

It should be noted that the decoder in the embodiment of the present invention is slightly different from the original FM: previous encoders and graph convolution layers have mapped all features of a user (or object) into a vector, which has the advantage that the system model can focus more on inter-domain feature interaction, and reduce interference caused by inter-domain feature interaction.

4. And (6) optimizing the model.

To optimize the model parameters, the Log loss (logarithmic loss function) often adopted in the recommendation system is selected as the optimization target of the model, and the formula is as follows:

wherein the content of the first and second substances,

and

respectively representing a set of observed interaction records (i.e., a set of positive samples) and a set of multiple negative sample records (i.e., a set of negative samples) randomly matched for each observation record, σ (-) representing an S-shaped activation function, λ being L₂The coefficients of the regularization are adjusted,

are trainable parameters of the model.

Compared with the conventional CARS model based on a neural network, the system provided by the embodiment of the invention has the following advantages: 1) the accuracy of the test is obviously improved. 2) The number of model parameters is small, and the model inference speed is high. 3) The proposed graph convolution operation can effectively improve the expression and generalization capability of the model.

For the above advantages, exhaustive experiments were performed on three data sets to demonstrate that table 1 is a data statistic for three data sets:

data set	Yelp-NC	Yelp-OH	Amazon-Book
				Number of users	6,336	5,170	44,709
Number of articles	13,003	12,997	46,831
				Number of interactions	185,408	143,884	1,174,785
Number of user characteristics	24	24	-
				Characteristic number of article	68	213	24,816
Number of contextual characteristics	13,209	13,347	46,900

Table 1 data statistics of data sets

1. The precision of the test is obviously improved: compared with a plurality of reference models, the accuracy of the invention (GCM) is improved by 13.1 percent on the average on three real data sets, and the reference models comprise: 1) the matrix decomposition model MF (Koren et al, IEEE computer Journal 2009) is a classical collaborative filtering model that uses only user and item interactions for characterization learning. 2) LightGCN (He et al, SIGIR 2020) is a graph-based collaborative filtering model, but does not take into account additional characteristics of users, goods, and contextual information. 3) The decomposer model FM (Steffen Rendle, ICDM 2010) converts all information related to interaction into feature vectors and models the user's preferences with second order interactions of features. 4) NFM (He et al, SIGIR 2017) utilizes a multi-layered perceptron to capture non-linear and high-order interactions between features of users, items, and context information. 5) xdepfm (Lian et al, SIGKDD 2018) is a recently proposed neural network-based decomposition model that combines explicit and implicit feature high-order interactions. 6) GIN (Li et al, SIGIR 2019) is a graph-based recommendation model that exploits the user's intent on a constructed commodity similarity graph with an attention mechanism. The results of the experiment are shown in table 2:

TABLE 2 results of the experiment

It can be seen that all the metrics of the present invention are superior to the existing model in three data sets, attributing the performance improvement to: 1) embedding (embedding) propagation is carried out in the bipartite graph with the attributes, and useful information from neighbor nodes is extracted to improve the expressive force of a system model; 2) different from the embedded GIN model which only transmits the articles on the graph, the invention integrates the representations of the users, the articles and the contexts into the graph for information transmission, so that more uniform representations can be learned; 3) after accurate characterization is obtained, the present invention further employs FM to explicitly model the interaction between features, as can be verified from Table 3. In Table 3, GCM-MLP indicates that the decoder uses MLP and the GCM-MF decoder uses a matrix decomposition model.

TABLE 3 verification results of different models

2. The system model parameters are small, and the model inference speed is high: all training parameters of the GCM come from the encoder, i.e. feature embedding of user, goods and context, i.e. P, Q and all v. Assuming that the feature quantities of the three domains of user, item and context are respectively U, I, C and the embedding dimension is D, the total parameter quantity is (U + I + C) x D, which is the same as the simplest embedding-based model FM. Thanks to the three-layer architecture of GCM, after the model is trained on line, the final characterization of all users and articles can be obtained by performing the forward propagation of the graph convolution layer only once, and then only the decoder part needs to be performed during online service, thereby obtaining the online time complexity similar to FM.

Table 4 gives the time consumption for online service of 1000 Yelp-OH users, and it can be seen that GCM is much faster than other neural network based CARS models. The model CFM (Xin et al, IJCAI 2019) in table 4 uses the outer product as a pairwise second-order interaction between features, and then combines the convolutional neural network to extract an interaction pattern.

Model (model)	FM	GCM	GIN	xDeepFM	CFM
						Time/second	8.51	14.93	35.45	365.82	2354.25

TABLE 4 Online service time consumption of different models

3. The graph convolution operation can effectively improve the expression and generalization capability of the model: graph convolutional layer is the core of GCM, and its rationality and validity can be verified from two aspects: 1) the number of graph convolution layers; 2) modeling of context information. As can be seen from fig. 3, as the number of layers increases, the GCM continuously collects effective information from multi-hop neighbors, the expressive power of the GCM is enhanced, and meanwhile, if modeling the context is omitted, the generalization capability of the model is reduced to some extent, and an over-smoothing phenomenon easily occurs. The model GCM-C in FIG. 3 is also a variation of GCM, which uses no context information in the map convolution layer, but only user and item information, as compared to GCM.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A graph convolution recommendation system for a context-aware type, comprising: an encoder, a graph convolution layer, and a decoder;

2. The system of claim 1, wherein the graph convolution recommendation system for context-aware type,

the encoder associates a hidden space vector for each non-zero feature of user information, thereby describing a single user by using a plurality of hidden space vectors, and then combines the plurality of hidden space vectors to obtain an initial characterization of the user

The encoder also operates in the same manner for the article to obtain an initial characterization of the article

The formula is expressed as:

in the above formula, u and i respectively represent indexes of users and articles, P represents an embedding matrix composed of hidden space vectors associated with all user features, the kth row represents an embedding vector associated with the kth user feature, Q represents an embedding matrix composed of hidden space vectors associated with all article features, the lth row represents an embedding vector associated with the lth article feature, u and i are feature vectors of users u and articles i after multi-hot coding, and | u | and | i | respectively represent the number of nonzero features of users u and articles i;

for the context information c, only the hidden space vector is associated to obtain an associated embedded set of the context information, namely, the hidden space vector associated with the context information c forms a set

Where s is a non-zero feature in the context information c, v_sRepresenting a hidden space vector associated with a non-zero feature s.

3. The system of claim 1, wherein the step of constructing a user-item bipartite graph comprising user-item interactions comprises:

acquiring a mixed data structure consisting of a user and a static portrait thereof, an article and a static attribute thereof, context information and an interaction record;

converting the hybrid data structure into a user-item bipartite graph with attributes; the user-article bipartite graph comprises two types of nodes, namely a user node and an article node, wherein the characteristics of the user node and the article node are the characteristics of the encoder output respectively; and the connecting edge of the user node and the article node represents the interaction records of the user node and the article node, and the characteristic of the connecting edge is a context characteristic.

4. The system of claim 1, wherein the graph convolution operation in the graph convolution layer is expressed as:

wherein the content of the first and second substances,

and

implicit space vector composition set associated with a context feature c representing the output of the encoder, s being a non-zero feature in the context information c, v_sRepresenting a hidden space vector associated with a non-zero feature s;

and

representing regularization;

respectively representing the characteristic representations of users and articles obtained by the (l + 1) th layer of graph convolution operation;

after the graph convolution operation of the L layers, the following modes are adopted for integration to obtain the final characteristic representation of the user and the article:

5. The system of claim 1, wherein the decoder predicts the like degree of the user to the item in the context scenario by the following formula:

wherein the content of the first and second substances,

that is to say

v_k、v_tAre all embedded sets

Of (1).

6. The graph convolution recommendation system for context-aware type according to claim 1, wherein a logarithmic loss function is selected as an optimization target of the system model, and is formulated as follows:

wherein the content of the first and second substances,

respectively representing a set of observed interaction records and a set of a plurality of negative sample records randomly matched for each observation record, σ (-) representing an S-shaped activation function, λ being L₂The regularization coefficients, Θ, are trainable parameters.