CN112925977A

CN112925977A - Recommendation method based on self-supervision graph representation learning

Info

Publication number: CN112925977A
Application number: CN202110219147.XA
Authority: CN
Inventors: 何向南; 吴剑灿; 王翔; 冯福利; 陈亮
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-06-08

Abstract

The invention discloses a recommendation method based on self-supervision graph representation learning, which can be adapted to all existing recommendation models based on a graph neural network, utilizes a designed data enhancement strategy to carry out self-supervision learning to assist a supervised recommendation task, and constructs higher-quality vector representations for users and articles, thereby providing higher-quality and accurate personalized recommendation contents for the users.

Description

Recommendation method based on self-supervision graph representation learning

Technical Field

The invention relates to the technical field of recommendation systems, in particular to a recommendation method based on self-supervision graph characterization learning.

Background

The recommendation system aims at learning high quality representations (recurrentations) of users and items from user-item interaction data. A collaborative filtering model such as Matrix Factorization (Matrix Factorization) maps the ID of each user and item into a vector (ID embedding) to represent the characteristics of the user and item.

In recent years, Graph Convolution Networks (GCNs) have been a great leap in characterization learning, which provides an efficient end-to-end way to fuse the features of multi-hop neighbors to node characterizations. Inspired by the graph convolution network, the characterization learning in the recommendation system has evolved from using a single ID feature and interaction history to utilizing high-order connectivity in a user-item bipartite graph, and a huge improvement in recommendation accuracy is achieved.

The comprehensive analysis of the recent recommended model based on the graph neural network finds that the following defects exist: 1) sparse supervisory signals: most recommended models at the present stage use a supervised learning paradigm, wherein a supervision signal of the supervised learning paradigm is from observed interaction data between a user and an article, however, compared with an interaction space, the observed interaction data is extremely sparse, so that the models cannot learn high-quality characteristics; 2) skewed data distribution: observed interaction data tend to obey power law distribution, wherein a long tail part is formed by low-frequency articles lacking supervision signals, and in contrast, high-frequency articles frequently appear in a neighbor aggregation and loss function and have larger influence on feature learning, so that a model is biased to recommend the high-frequency articles and sacrifice the exposure rate of the long tail articles; 3) noisy interaction data: since the data provided by the user is mostly implicit feedback such as clicking, watching, and not explicit feedback such as scoring, likes/dislikes, the observed interaction data is usually noisy (e.g., the user is misled to click on an item before finding it disliked), while the GCN amplifies the effect of noisy interactions on token learning when doing neighbor aggregation, making the learned tokens exceptionally sensitive to noisy interactions.

Unlike supervised learning that relies on tagged data, self-supervised learning (SSL) has an advantage in that a supervisory signal can be actively extracted from data itself by changing input observation data without relying on tagged data, and has been widely used in the fields of Natural Language Processing (NLP), Computer Vision (CV), and the like. For example, BERT (devin et al, NAACL-HLT 2019) captures inter-term dependencies by randomly masking (masking) certain terms in a sentence and letting the predictor recover the masked terms; the labeled images are randomly rotated by RotNet (Gidaris et al, ICLR 2018), and then are subjected to contrast learning by using images at different rotation angles in order to still recognize the original images. However, SSL is rarely applied in the recommendation field, especially in the recommendation model based on graph network, and its main difficulties are: unlike the CV and NLP fields, the data of the recommendation system is discrete and interrelated. Therefore, the research on how to introduce the self-supervision graph representation learning into the recommendation system has great research value.

Disclosure of Invention

The invention aims to provide a recommendation method based on self-supervision graph characterization learning, which can improve the precision and robustness of a recommendation model based on a graph neural network and alleviate the long tail problem of the existing model.

The purpose of the invention is realized by the following technical scheme:

a recommendation method based on self-supervision graph characterization learning comprises the following steps:

in the training stage, data enhancement operation is selected, data enhancement is carried out on an embedded matrix or an adjacent matrix in the user-article bipartite graph based on the selected data enhancement operation to generate a plurality of enhancement views of each node, and positive sample pairs and negative sample pairs are constructed by utilizing different enhancement views of the same node and different enhancement views of different nodes; inputting the positive sample pair and the negative sample pair into a current graph neural network-based recommendation model for training, wherein the training targets are to maximize consistency characterization among different enhancement views in the positive sample pair and differential characterization among different enhancement views in the negative sample pair; the nodes comprise a user node and an article node in a user-article bipartite graph;

after training is finished, obtaining final characterization vectors of all nodes through one-time forward propagation; in the inference stage, for the user to be recommended, calculating the matching scores of the user to be recommended and each article by using the inner product of the final characterization vectors, performing descending order arrangement based on the matching scores, and selecting the first K non-interactive articles as recommendation results, wherein K is the size of the recommendation list.

According to the technical scheme provided by the invention, all existing recommendation models based on the graph neural network can be adapted, the designed data enhancement strategy is utilized to carry out self-supervision learning to assist the supervised recommendation task, and higher-quality vector representation is constructed for the user and the article, so that higher-quality and accurate personalized recommendation content is provided for the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a schematic diagram of a recommendation method based on an auto-supervised graph characterization learning according to an embodiment of the present invention;

FIG. 2 is a graph comparing the performance of SGL-ED and LightGCN in different groups according to an embodiment of the present invention;

FIG. 3 is a graph of training curves for SGL-ED and LightGCN according to an embodiment of the present invention;

fig. 4 is a graph illustrating the relationship between the model performance and the noise interaction ratio in the training set according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a recommendation method based on self-supervision graph characterization learning, which comprises a training stage, wherein data enhancement operation is selected, data enhancement is carried out on an embedded matrix or an adjacent matrix in a user-article bipartite graph based on the selected data enhancement operation, so that a plurality of enhancement views of each node are generated, and a positive sample pair and a negative sample pair are constructed by using different enhancement views of the same node and different enhancement views of different nodes; inputting the positive sample pair and the negative sample pair into a current graph neural network-based recommendation model for training, wherein the training targets are to maximize consistency characterization among different enhancement views in the positive sample pair and differential characterization among different enhancement views in the negative sample pair; the nodes comprise a user node and an article node in a user-article bipartite graph; after training is finished, obtaining final characterization vectors of all nodes through one-time forward propagation; in the inference stage, for the user to be recommended, calculating the matching scores of the user to be recommended and each article by using the inner product of the final characterization vector, performing descending order arrangement based on the matching scores, and selecting the first K non-interactive articles as a recommendation result, wherein K is the size of a recommendation list, K is a positive integer, the specific numerical value of the K is set according to the actual situation or experience, and the method is not limited by the numerical value.

As can be understood by those skilled in the art, the user-item bipartite graph is an existing data structure, users and items in the bipartite graph are two types of nodes respectively, and the connection edges of the nodes represent the interaction relationship between the users and the corresponding items.

In the above scheme of the embodiment of the present invention, SSL is used to construct a supervision signal of unlabeled data, and the internal association of data is mined, so as to supplement and enhance the main supervised Learning task, and the association scheme is named as a Self-supervised Graph Learning paradigm (SGL), where the SGL includes two main parts: data enhancement and contrast learning. 1) Data enhancement is responsible for generating different views (views) of each node: since the recommendation model based on the graph neural network can be understood as node embedding propagation on the bipartite graph, it can be abstractly defined by the following formula:

wherein

Representing a node set consisting of all user nodes and item nodes

And observing a bipartite graph adjacency matrix constructed by the interaction set epsilon, H (-) is a graph convolution function for coding the connection information into node representation,

is the ID embedding matrix for all nodes on the bipartite graph, and d is the dimension of the embedding space. It can be seen that the inputs to the graph neural network based recommendation model include: from this perspective, the label-free data is constructed by performing data enhancement on two inputs of the recommendation model based on the graph neural network, and four different data enhancement operations are designed, namely an ID Embedding mask (IM), an ID Embedding Dropout (ID), a node Dropout (node Dropout, ND) and a connected edge Dropout (EdgeDropout, ED). 2) The goal of contrast learning is to maximize the consistency between different views of the same node and the variability between views of different nodes.

In fig. 1, the first layer depicts the workflow of a supervised recommendation task. The second layer and the third layer respectively depict the work flow of the self-supervision learning task for carrying out data enhancement from two different angles of ID embedding and graph structure. In the embodiment of the invention, the embedded matrix E is subjected to data enhancement through a data enhancement operation and then is combined with an adjacent matrix

Obtaining a plurality of enhanced views by a graph convolution function H (-); or, by aligning the adjacency matrices

And (4) enhancing the data, and combining with the embedded matrix E to obtain a plurality of enhanced views by a graph convolution function H (-). Specifically, the method comprises the following steps: the embedded matrix and the adjacent matrix are used as input of a graph convolution function, and the difference is that the action objects of two types of data enhancement operations of ID embedded mask and ID embedded Dropout are the embedded matrix, and the action objects of two types of data enhancement operations of node Dropout and connecting edge Dropout are the adjacent matrix; therefore, if the selected data enhancement operation is ID embedding mask or ID embedding Dropout, the adjacent matrix in the input part of the graph convolution function is not changed, and the embedded matrix carries out data enhancement; similarly, if the selected data enhancement operation is the node Dropout or the connected edge Dropout, the embedded matrix in the input part of the graph convolution function is not changed, and the adjacent matrix carries out data enhancement; finally, a plurality of enhanced views are output through a graph convolution function.

The process of generating the enhanced view is formulated in the following in combination with the corresponding data enhancement operation from both the perspective of ID embedding and graph structure.

First, data enhancement of the embedded matrix.

Before neighbor aggregation is performed, the embedding of each node describes the intrinsic characteristics of the node, and two enhanced views Z 'and Z' are generated for the embedding matrix by:

wherein t 'and t' are defined as follows:

t′(E)＝M′⊙E，t″(E)＝M″⊙E

in the above-mentioned formula, the first and second groups,

representing a set of random processes masking the embedding moments, t 'and t' being derived from

Two independent random mask processes (corresponding to random sampling) obtained from random samplingID embedded mask or ID embedded Dropout), M' and M ″ are two mask matrices correspondingly generated, which indicates the product of hadamard.

As will be appreciated by those skilled in the art, the view can be regarded as the result of graph characterization learning, and the invention modifies the intrinsic characteristics of the nodes or the connection relationship between the nodes (i.e., "data enhancement") to obtain the result of different graph characterization learning (i.e., "enhanced view").

The enhanced embedding matrices t' (E) and t "(E) are obtained by either of the following two types of data enhancement operations:

1) ID embedding mask: a certain subset of dimensions of the embedding matrix E (a subset of dimensions made up of certain dimensions of the embedding space) is masked out with a probability p, so that the recommendation model based on the graph neural network uses only a part of the dimensions for characterization learning; with two independent random masking processes t' and t ″, this design can focus on different embedding dimensions to learn the intrinsic associations between different dimensional subsets.

2) ID embedded Dropout: with the probability rho, the subset of elements embedded in one node in the embedding matrix E is set to 0, so that the remaining element set of the node is used for distinguishing the corresponding node from other nodes; by utilizing two independent dropout random processes t 'and t' (a specific mode in a random mask process), the design encourages the recovery of node embedded information by using partial information, reduces the dependency of characterization on certain specific information, and improves the robustness of the model.

The principle of the two types of data enhancement operations is as follows: the known graph convolution function H (-) changes its output (i.e., enhances the view) by changing its inputs. Take ID embedded Dropout as an example: the input ID embedding matrix E is an initial ID embedding matrix, the ID embedding matrices obtained by data enhancement are t '(E) and t' (E), two random mask matrices M 'and M' are introduced during data enhancement operation, and a part of elements of the input ID embedding matrix are set to be 0 by the hadamard product during data enhancement, so that the input of the function is changed.

The corresponding t '(E) and t "(E) can be obtained according to the originally selected data enhancement operation, and the expressions of Z' and Z" are substituted so as to obtain the corresponding enhanced view.

For a single node k, its enhanced view z'_kAnd z ″)_kThe k-th row of views Z 'and Z' is enhanced for embedding in the matrix, and view Z 'is enhanced'_kAnd z ″)_kIs a positive sample pair and the different enhanced views of different nodes constitute a negative sample pair, e.g. the enhanced view z "of node l_lAnd z'_kForming a negative sample pair.

Generally speaking, the nodes involved in the positive sample pair are all nodes on the bipartite graph, that is, viewed separately, the node k may be a user node or an item node; for the negative sample pair, the node type may not be limited theoretically, but the node types are distinguished when defining the loss function, that is, the user node can only form the negative sample pair with other user nodes, and the article node can only form the negative sample pair with other article nodes, that is, the node l and the node k are the same type of node.

And secondly, enhancing data of the adjacent matrix.

Since the user-item bipartite graph contains collaborative filtering signals, such as one-hop neighbors directly representing historical interactions of users and items, two-hop neighbors describing users of similar behavior or items of the same audience, and high-order communication from user nodes to item nodes reflects potential interests of users in items, the inherent mode of mining the graph structure is beneficial to characterization learning. Based on this, in the embodiment of the present invention, two enhanced views Z' and Z ″ are generated for the adjacency matrix by:

wherein the content of the first and second substances,

representing a set of random processes masking the adjacency moments, s 'and s' being derived from

Two independent random masking processes (corresponding to either the node Dropout or the connected edge Dropout) resulting from the random sampling.

Enhanced adjacency matrix

And

obtained by either of the following two types of data enhancement operations:

1) node Dropout: with probability ρ, each node and its connected edges on the user-item bipartite graph are discarded, which is formulated as follows:

where M 'and M "are two mask matrices generated correspondingly (M' and M" are the same concept as before, except that the action objects are different, here, the adjacent matrix is performed, and the embedded matrix is performed before, so the mask matrices are different in size).

The above operation enables important nodes to be identified from different enhanced perspectives, reducing the sensitivity of the learned representations to structural changes.

2) Connecting edge Dropout: with probability ρ, each side on the bipartite graph is discarded, which is formulated as follows:

the operation only aggregates partial neighbors to generate the node representation, aims to capture the useful mode of the node part, gives stronger robustness to the representation, and resists the influence of noise interaction on the representation.

Similarly to before, after the adjacency matrix is operated by the node Dropout or the connecting edge Dropout, two enhanced views Z' and Z ″ can be obtained; likewise, for a single node k,enhanced View z'_kAnd z ″)_kIs an enhanced view z "of one positive sample pair, another node l_lAnd z'_kForming a negative sample pair.

In general, to reduce the training complexity, at the beginning of each epoch, the two mask matrices M 'and M "are regenerated according to the selected data enhancement operation, and the embedded matrix or the adjacent matrix is further subjected to data enhancement, resulting in enhanced matrices t' (E) and t ″ (E), or

And

two enhanced views Z' and Z "are finally obtained. Then, the positive and negative sample pairs are constructed according to the mode provided in the text, and then training is carried out according to the training target.

After different enhanced views of the node are constructed, an InfonCE loss function is used as an objective function of SSL, so that the auxiliary supervision signals of the positive sample pairs can promote consistency characterization among different views of the same node, and the auxiliary supervision signals of the negative sample pairs promote difference characterization among different nodes. Here, the auxiliary means that in the proposed multitasking framework, the SSL task serves as an auxiliary task, and the supervision signal means that a constructed sample pair has a positive or negative label.

Since the bipartite graph is a heterogeneous graph composed of two different types of nodes (i.e., user nodes and article nodes), the objective function of the self-supervised learning is constructed

And then, self-supervision learning loss functions of a user side and an article side are respectively defined:

wherein the content of the first and second substances,

the method comprises the steps that a user node set and an item node set are respectively represented, u and v are user node indexes, i and j are item node indexes, and s (-) is a cosine similarity function and is used for measuring the similarity of two enhanced views; τ is a super parameter, similar to the temperature coefficient of SoftMax. As before, View z 'is enhanced'_*，z″_*U, v, i, j, can be obtained by any of four types of data enhancement operations, that is, four SGL variants can be formed, i.e., SGL-ID, SGL-IM, SGL-ND, SGL-ED in table 2. Z 'in the above-mentioned loss function, with reference to previous definitions regarding pairs of positive and negative samples'_u，z″_u、z′_i，z″_iAre all positive sample pairs, z'_u，z″_v、z′_i，z″_jAre all negative sample pairs.

Finally, a multi-task training framework is designed to jointly optimize the classical recommendation task and the SSL task:

wherein the content of the first and second substances,

equivalent to the right side of the first layer of fig. 1

It is the BPR (Bayesian personalized ranking) loss function, λ₁And λ₂Is used for controlling SSL tasks (i.e. for controlling SSL tasks)

) And L21 regularization (i.e.

) And intensity hyper-reference, theta, is a trainable parameter in the graph neural network-based recommendation model.

By the scheme provided by the invention, the existing recommendation model based on the graph neural network is trained, (illustratively, the training can be stopped if the recommendation index is not promoted in 10 continuous tests), and after the training is finished, the final characterization vectors of all nodes can be obtained by only carrying out forward propagation once, so that the recommendation model based on the graph neural network can construct higher-quality characterization vectors for users and articles, and higher-quality and accurate personalized recommendation contents are provided for the users.

Compared with the existing method, the Scheme (SGL) provided by the embodiment of the invention has the following advantages: 1) the recommendation precision is obviously improved; 2) the long tail problem is effectively relieved; 3) the training efficiency is greatly improved; 4) the noise resistance is obviously enhanced. For the above advantages, exhaustive experiments were performed on three real data sets using LightGCN (He et al, SIGIR 2020) as the GCN model for SGL. Table 1 gives the statistics of the three data sets.

Data set	Number of users	Number of articles	Number of interactions	Degree of sparseness
					Yelp2018	31,668	38,048	1,561,406	0.00130
Amazon-book	52,643	91,599	2,984,108	0.00062
					Alibaba-iFashion	300,000	81,614	1,607,813	0.00007

TABLE 1 statistical information of data sets

1) The recommendation precision is obviously improved.

According to the invention, SSL is used for assisting the recommendation of tasks for characterization learning, so that higher-quality user and article characterizations can be obtained, and NDCG @20 is respectively promoted by 5.7%, 19.2% and 6.5% on three data sets.

The results of the experiment are shown in table 2.

TABLE 2 SGL Performance comparison with LightGCN

In Table 2, the recommended performance index rec @20 represents the Recall (Recall) of the first 20 recommended items, ndcg @20 represents the normalized discount cumulative gain of the first 20 recommended items, and the ID, IM, ND, ED at the end of the model portion correspond to the four types of data enhancement operations described above. As can be seen from table 2, the four data enhancement schemes of SGL all achieve higher recommendation accuracy than LightGCN, validating the above conclusion. Meanwhile, in the four data enhancement schemes, the SGL-ED is basically stable and better than the other three data enhancement schemes, but the other three data enhancement schemes also have respective advantages, so that in practical application, a user can select the type of data enhancement operation according to needs or experience. In addition, for a model of one layer, the performance improvement on two sparse data (namely Amazon-Book and Alibaba-iFashinon) in three data sets is more obvious, which shows that the SGL can better mine the intrinsic characteristics in the sparse data, and the superiority of introducing SSL into a recommendation model based on a neural network is verified.

2) The long tail problem is effectively alleviated.

As mentioned above, the GNN-based recommendation model has a serious long tail problem, and in order to verify that SGL can effectively alleviate the long tail problem, the following experiment is performed: the pool of items is first divided into 10 groups according to the popularity of the items, the smaller the group ID, the lower the popularity of the items within the group, while the total number of exposures for the items within each group is the same. Then, dividing the Recall index into the sum of Recall of each group, and formulating the sum as follows:

where (g) is an index to the packet, m represents the total number of users,

and

respectively representing a recommendation list of the user u and a related item list in the test set.

FIG. 2 shows the results of the above experiments, with suffixes labeled to indicate the number of layers of the panel convolutional layers, each set of bar graphs corresponding in sequence from left to right to the top left labeled SGL-ED-L1, SGL-ED-L2, SGL-ED-L3, LightGCN-L1, LightGCN-L2, LightGCN-L3. As can be seen from fig. 2, LightGCN prefers to recommend head items, while barely exposing long-tail items, such as group 10, on three data sets, the number of items in the group accounts for 0.83%, and 0.22% of the total item pool, respectively, while contributing to recall rates of 39.72%, 39.92%, and 51.92%, respectively, indicating that LightGCN has difficulty learning the characterization of high-quality long-tail items, mainly due to sparse surveillance signals. While for SGL, the contribution rates dropped to 36.27%, 29.15%, and 35.07%, respectively, indicating that SGL is relatively more likely to recommend non-headwear items. Meanwhile, compared with the results in table 2, it can be found that the improvement of the recommendation accuracy by the SGL mainly comes from accurate recommendation of waist articles, which also indicates that the characterization learning can benefit from the auxiliary supervision signal to construct a better node characterization.

3) The training efficiency is greatly improved.

SSL has proven to be a great advantage in pre-training of natural language models and graph structures, and its impact on training efficiency is therefore investigated here. For this purpose, the training curves for SGL-ED and LightGCN are shown in fig. 3, where the upper three plots represent the BPR loss function versus Epoch on the three data sets, respectively, and the lower three plots represent the recall on the corresponding test set. It can be seen that the convergence rate of SGL-ED on Yelp2018 and Amazon-Book is much faster than that of LightGCN, specifically, the epochs corresponding to the SGL on the two data sets with optimal performance are 18 and 16 respectively, while LightGCN requires 720 and 700 epochs respectively, which indicates that SGL can greatly shorten the required training time and achieve considerable performance improvement. Although a different curve trend was shown on Alibaba-iFashion, a faster convergence of SGL-ED than LightGCN was observed, also validating the benefit of introducing SSL.

4) The noise resistance is obviously enhanced.

The robustness of the SGL to noise interaction is verified by experiments. For the purpose, for the two data sets of Yelp2018 and Amazon-Book, a certain proportion (5%, 10%, 15% and 20% respectively) of unobserved interaction data is randomly added into the training set to destroy the training set to different degrees, and then the recall rate on the test set is counted. The results of the experiment are shown in fig. 4, where the bar graph represents the recall rate of the model and the line graph represents the rate of performance degradation. As can be seen from fig. 4, the accuracy of both SGL and LightGCN decreases with increasing noise ratio, but the magnitude of SGL decrease is significantly smaller than LightGCN, and the difference between the two performances increases with increasing noise ratio. This shows that by comparing different enhanced views of nodes, the SGL can find useful patterns, especially graph structure information, to reduce the dependency of the model on certain edges. On the Amazon-Book dataset, even with 20% noise interaction added, the performance of the SGL is still higher than the LightGCN without noise interaction added, which further verifies the advantage of the SGL.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A recommendation method based on self-supervision graph characterization learning is characterized by comprising the following steps:

2. The recommendation method based on the self-supervision graph characterization learning according to claim 1, characterized in that the graph neural network based recommendation model performs node embedding propagation on the user-item bipartite graph, defined as:

wherein the content of the first and second substances,

representing a collection of nodes

And observing an adjacency matrix constructed by the interaction set epsilon, wherein H (-) is a graph convolution function, and E is an embedded matrix;

data enhancement is carried out on the embedded matrix E through data enhancement operation, and then the embedded matrix E is combined with the adjacent matrix

And (4) enhancing the data, and combining with the embedded matrix E to obtain a plurality of enhanced views by a graph convolution function H (-).

3. The recommendation method based on self-supervision graph characterization learning according to claim 2, characterized in that the data enhancement operation comprises: an ID embedding mask, an ID embedding Dropout, a node Dropout, and a connecting edge Dropout; the active objects of two types of data enhancement operations of ID embedding mask and ID embedding Dropout are embedding matrixes, and the active objects of two types of data enhancement operations of node Dropout and connecting edge Dropout are adjacent matrixes; only one data enhancement operation is used in each training, two mask matrixes M 'and M' are generated according to the selected data enhancement operation, then data enhancement is carried out on the embedded matrix or the adjacent matrix, an enhanced matrix is obtained, and finally two enhanced views Z 'and Z' are obtained; single node k, its enhanced View z'_kAnd z ″)_kIs the k-th line of enhanced views Z 'and Z ", and enhances view Z'_kAnd z ″)_kIs a positive sample pair, and the different enhanced views of different nodes form a negative sample pair.

4. A recommendation method based on self-supervised graph characterization learning, according to claim 2 or 3, characterized in that if the role object of the selected data enhancement operation is an embedding matrix, two enhancement views Z' and Z "are generated for the embedding matrix by:

wherein the content of the first and second substances,

representing a collection of nodes

And observing the intersectionA bipartite graph adjacency matrix constructed by the mutual aggregation epsilon, H (-) is a graph convolution function, E is an embedded matrix,

representing a random set of processes masking the embedded moments, M 'and M' are two mask matrices generated correspondingly, which indicate the product of the hadamard, t 'and t' are from

Two independent random mask processes obtained by random sampling, t 'and t' are defined as follows:

t′(E)＝M′⊙E,t″(E)＝M″⊙E

ID embedding mask: with the probability rho, a certain dimensionality subset of the embedding matrix E is masked off, so that the recommendation model based on the graph neural network only uses a part of dimensionalities for characterization learning; this data enhancement operation utilizes two independent random masking processes t' and t ″, which can focus on different embedding dimensions to learn the intrinsic associations between different dimensional subsets;

ID embedded Dropout: with the probability rho, the subset of elements embedded in one node in the embedding matrix E is set to 0, so that the remaining element set of the node is used for distinguishing the corresponding node from other nodes; this data enhancement operation utilizes two independent dropout random processes t' and t "to recover node-embedded information using partial information; the dropout random process is a specific mode in the random mask process.

5. A recommendation method based on unsupervised graph characterization learning according to claim 2 or 3, characterized in that if the effect object of the selected data enhancement operation is the adjacency matrix, two enhancement views Z' and Z "are generated for the adjacency matrix by:

wherein the content of the first and second substances,

representing a collection of nodes

And a bipartite graph adjacency matrix constructed by the observation interaction set epsilon, H (-) is a graph convolution function, E is an embedded matrix,

Two mutually independent random mask processes obtained by the random sampling in the process;

enhanced adjacency matrix

And

obtained by either of the following two types of data enhancement operations:

node Dropout: with probability ρ, each node and its connected edges on the user-item bipartite graph are discarded, which is formulated as follows:

connecting edge Dropout: with probability ρ, each side on the bipartite graph is discarded, which is formulated as follows:

where M 'and M' are two mask matrices correspondingly generated, which indicate the product of adama.

6. The recommendation method based on self-supervision graph characterization learning according to claim 2 or 3, characterized in that the training target is expressed as:

wherein the content of the first and second substances,

as a function of BPR loss, λ₁And λ₂For hyper-parameter, Θ is a trainable parameter in the graph neural network based recommendation model;

in order to be an objective function of the self-supervised learning,

respectively representing self-supervision learning loss functions of a user side and an article side;

respectively representing a user node set and an article node set, u and v are user node indexes, i and j are article node indexes, and s (-) is a cosine similarity function; τ is a super reference, enhancement View z'_′*And z ″)_*Two enhanced views of the node are denoted u, v, i, j.