CN113850317A

CN113850317A - Multi-type neighbor aggregation graph convolution recommendation method and system

Info

Publication number: CN113850317A
Application number: CN202111116056.XA
Authority: CN
Inventors: 陈建芮; 扶永照; 王志慧; 邵仲世; 雷鸣; 吴迪
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-12-28

Abstract

The invention provides a graph convolution recommendation method and a system based on multi-type neighbor aggregation, which comprises the following steps: setting a threshold value; dividing the data samples to obtain positive samples, intermediate samples and negative samples; constructing a graph convolution network model; updating the constructed graph convolution network model through loss calculation to obtain an updated graph convolution network model; optimizing the set threshold, the obtained parameters in the graph convolution network model and the parameters of loss calculation to obtain the optimized threshold and each parameter; performing iteration until the threshold value and each parameter reach the optimum; further obtaining an optimal graph convolution network model, and recommending projects through the optimal graph convolution network model; compared with a recommendation method based on positive and negative samples, the recommendation accuracy of the method is obviously improved, and the recommendation can be better carried out for the user. Therefore, the method has certain reference significance for other recommendation models.

Description

Multi-type neighbor aggregation graph convolution recommendation method and system

Technical Field

The invention belongs to the field of information recommendation, and particularly relates to a method and a system for recommending a multi-type neighbor aggregated graph volume.

Background

At present, internet technology is rapidly developing, but the amount of information is also greatly increased, so that users cannot obtain the information which is really useful for themselves when facing a large amount of information, and the utilization rate of the information for people is reduced. With the advent of recommendation systems, this problem has been effectively solved. The recommendation system can recommend information, products and the like which are interested by the user to the user according to the requirements, interests and the like of the user, so that time is saved for the user.

Disclosure of Invention

The invention aims to provide a method and a system for recommending a multi-type neighbor aggregation graph convolution, which overcome the defect of poor recommendation effect of a recommendation system in the prior art.

In order to achieve the purpose, the invention adopts the technical scheme that:

the invention provides a graph convolution recommendation method based on multi-type neighbor aggregation, which comprises the following steps of:

step 1, setting a threshold value;

step 2, dividing the data samples in the training set and the test set according to the threshold set in the step 1 to obtain the training set and the test set, wherein the two sample sets are composed of a positive sample, a middle sample and a negative sample;

step 3, constructing a graph convolution network model according to the positive samples and the intermediate samples obtained in the step 2;

step 4, updating the graph convolution network model constructed in the step 3 through loss calculation to obtain an updated graph convolution network model;

step 5, recommending items according to the updated graph convolution network model to obtain recommendation indexes;

step 6, iteratively executing the step 4 and the step 5 until the output recommendation index tends to be stable;

step 7, optimizing the threshold value set in the step 1, the parameters in the graph convolution network model obtained in the step 3 and the parameters of loss calculation in the step 4 according to the final recommendation index obtained in the step 6 to obtain the optimized threshold value and each parameter;

step 8, iteratively executing the step 2 to the step 7 until the threshold value and each parameter in the step 7 reach the optimal value; and further obtaining an optimal graph convolution network model, and recommending the project through the optimal graph convolution network model.

Preferably, in step 1, the specific method for setting the threshold value is as follows:

an initial threshold is set according to the number of interactions between the user and the item.

Preferably, in step 2, the data samples are divided according to a set threshold, and the specific method is as follows:

classifying the training set and the test set according to a set threshold, wherein data with the interaction times larger than the set threshold is used as a positive sample of a user, data with the interaction times between 0 and the set threshold is used as a middle sample, and the rest data are used as negative samples; the number of positive samples in the training set and the test set accounts for 85% -95% of the total number of the positive samples and the intermediate samples.

Preferably, in step 3, a graph convolution network model is constructed according to the positive samples and the intermediate samples obtained in step 1, and the specific method is as follows:

s21, combining the data in the positive sample and the intermediate sample in the training set in the step 1 respectively to obtain an adjacent matrix A of the positive sample₁And an adjacency matrix A of intermediate samples₂；

S22, obtaining the adjacency matrix A of the positive samples according to S21₁And an adjacency matrix A of intermediate samples₂Obtaining a transfer function of a convolution layer in the graph convolution network model;

s23, randomly generating an initial embedded matrix, and combining the transfer function obtained in the S22 to obtain an embedded matrix of each convolutional layer;

s24, obtaining a final embedding matrix of the graph convolution network model according to the embedding matrixes obtained in the S23; and finally obtaining the graph convolution network model.

Preferably, in step 4, the graph convolution network model constructed in step 3 is updated through loss calculation to obtain an updated graph convolution network model, and the specific method is as follows:

the loss values for all users in the training set are calculated using the following equation:

Loss＝Loss₁+Loss₂+λ||E⁽⁰⁾||²

wherein, Loss is the Loss value of all users in the training set; loss₁Loss values for all users in the training that contain intermediate samples; loss₂Loss values for all users in the training set without intermediate samples; λ is a coefficient; e⁽⁰⁾Generating an initial embedding matrix for random; i E⁽⁰⁾||²For initially embedding the matrix E⁽⁰⁾The two norms of (a) are used as regularization terms in the functional expression to prevent over-fitting.

Updating the obtained final embedded matrix by combining a back propagation method and a gradient descent method according to the obtained loss values of all users in the training set; taking the updated final embedded matrix as a random generation initial embedded matrix of the next epoch;

obtaining a final embedded matrix of the graph convolution network model according to the randomly generated initial embedded matrix; and finally obtaining the updated graph convolution network model.

Preferably, in step 5, item recommendation is performed according to the updated graph convolution network model to obtain a recommendation index, and the specific method is as follows:

calculating a rating value between each user and each corresponding item; obtaining a rating table of a project corresponding to each user;

according to the sequence from big to small, items corresponding to the first 20 rating values are obtained from a rating table, and the 20 items are used as a recommended item set of a user;

taking a positive sample set in the test set as a Testtrue set;

and respectively calculating a call recommendation index, a precision recommendation index and an ndcg recommendation index according to the recommended item set and the TestTrue set of the user.

Preferably, the rating value between each user and each corresponding item is calculated by:

wherein, y_uiRepresenting the preference degree of the user u for the item i; e.g. of the type_uEmbedding vectors of the user u after passing through the multilayer convolution layer;

is the transpose of the embedded vector after the item i passes through the multi-layer convolution layer.

A graph convolution recommendation system based on multi-type neighbor aggregation, the system being capable of operating the method, comprising:

a threshold setting unit for setting a threshold;

the sample dividing unit is used for dividing the data samples in the training set and the test set according to a set threshold value to obtain the training set and the test set, wherein the two sample sets consist of a positive sample, a middle sample and a negative sample;

the model building unit is used for building a graph convolution network model according to the obtained positive samples and the intermediate samples;

the model updating unit is used for updating the constructed graph convolution network model through loss calculation to obtain an updated graph convolution network model;

the project recommendation unit is used for recommending projects according to the updated graph convolution network model to obtain recommendation indexes;

the iteration unit is used for performing iteration until the output recommendation index tends to be stable;

the parameter optimization unit is used for optimizing the set threshold, the obtained parameters in the graph convolution network model and the parameters of loss calculation according to the obtained final recommendation index to obtain the optimized threshold and each parameter;

the model optimization unit is used for performing iteration until the threshold value and each parameter reach the optimum; and further obtaining an optimal graph convolution network model, and recommending the project through the optimal graph convolution network model.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a graph convolution recommending method based on multi-type neighbor aggregation, which divides a sample into a positive sample, a middle sample and a negative sample by setting conditions, wherein the division of the three samples is an indispensable step for performing loss calculation subsequently; in the neighbor aggregation part, many neighbor information aggregated by the recommendation system model comes from the connection information contained in the positive sample, but the method makes some adjustment on the aggregation of the neighbor information; in the three types of samples, because the connection information contained in the positive sample and the middle sample has a positive effect on neighbor aggregation, but the influence of the positive sample and the middle sample on the neighbor aggregation is different, when the neighbor information is aggregated, the method endows the connection information contained in the positive sample and the middle sample with different weights, and then continuously transmits the aggregated neighbor information downwards through the convolutional layer; when calculating the loss, the loss calculation is performed based on the divided three types of samples corresponding to the network configuration. Compared with a recommendation method based on positive and negative samples, the recommendation accuracy of the method is obviously improved, and recommendation can be better performed for users. Therefore, the method has certain reference significance for other recommendation models.

Drawings

FIG. 1 is an overall flow diagram of the present method;

fig. 2 is a schematic diagram of neighbor aggregation and delivery in embodiment 1.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

As shown in fig. 1 and fig. 2, the graph convolution recommendation method based on multi-type neighbor aggregation provided by the present invention includes the following steps:

step 1, dividing a data set into a training set and a testing set, setting a threshold value according to interaction times, and classifying the training set and the testing set according to the threshold value to obtain a positive sample, an intermediate sample and a negative sample; wherein the proportion of positive and intermediate samples in the training set and the test set is between 85% and 95%.

The data set includes a plurality of pieces of data, each piece of data including a user number, an item number, and a number of interactions.

Many data sets contain multiple types of data, such as data sets containing users, items, and numbers of interactions, as well as data sets containing users, items, scores, and so forth. However, in many models, data such as interaction times and scores are not considered in a neighbor aggregation part, and as long as a user has an interaction record with a project, interaction information of the part is aggregated and the interaction projects are used as positive samples, and other projects are used as negative samples of the user. However, there are a few interactions, and some of the few interactions do not reflect the user's preference for the item. When such a data set is used, the samples are divided into positive and negative samples for subsequent calculation, so that the optimal recommendation effect cannot be achieved. We therefore try to set a threshold on the number of interactions by which to classify the original data set. Through classification, on the basis of original positive and negative samples, the samples are divided into three types of positive samples, intermediate samples and negative samples, and the sample division process is as follows:

s11, setting an initial threshold epsilon according to the interaction times between the user and the project, and classifying the training set and the test set according to the initial threshold epsilon to obtain a positive sample, a middle sample and a negative sample; wherein the number of positive samples in the training set and the test set accounts for 85% -95% of the total number of the positive samples and the intermediate samples;

taking data with the interaction times larger than an initial threshold epsilon as a positive sample of a user, taking data with the interaction times between 0 and the initial threshold epsilon as a middle sample, and taking the rest as a negative sample;

s12, setting a threshold range, wherein the value of the threshold epsilon needs to be continuously adjusted according to subsequently obtained recommendation indexes, presetting upper and lower limits, and continuously adjusting the value of the threshold epsilon within the upper and lower limits according to the set amplitude, wherein the adjustment also needs to ensure that the proportion of the positive samples and the middle samples in the training set and the test set is 85% -95%.

Step 2, constructing and obtaining a graph convolution network model

S21, respectively carrying out the data in the positive sample and the intermediate sample in the training set in the step 1Combining to obtain a adjacency matrix A of positive samples₁And an adjacency matrix A of intermediate samples₂；

s23, obtaining an embedded matrix of each convolution layer according to the transfer function obtained in the S22;

Specifically, in S22: since the positive sample and the intermediate sample play different roles in neighbor aggregation, different weights are respectively given to the aggregated positive sample information and the intermediate sample information, wherein the weights are respectively alpha and beta, and the transfer function is as follows:

wherein A is₁An adjacency matrix that is a positive sample, indicating that aggregated neighbor information is from the positive sample; d₁Is A₁A degree matrix of (c); a. the₂An adjacency matrix that is an intermediate sample, indicating that aggregated neighbor information is from the intermediate sample; d₂Is A₂A degree matrix of (c); e^kRepresenting the embedded matrix of the k-th layer; e^k+1The embedded matrix of the k +1 th layer is shown; and normalizing alpha and beta, giving initial values, and continuously adjusting according to the amplitude until finding out parameters corresponding to the optimal recommendation result.

In S23, the row of the embedding matrix is the sum of users and items, and the column is the embedding dimension; randomly generating an initial embedding matrix E⁽⁰⁾(ii) a By the transmission rule of the convolutional layer, the embedded matrix of the next layer can be obtained from the embedded matrix of the previous layer.

In S24, after obtaining the embedding matrix of each layer through the propagation rule, the embedding matrix of each layer needs to be weighted and summed to obtain a final embedding matrix, where the expression is:

E＝α₀E⁽⁰⁾+α₁E⁽¹⁾+α₂E⁽²⁾+...+α_kE^(k) (8)

wherein alpha is_kShowing the corresponding weight of each layer embedding matrix due to alpha₀+α₁+...+α_kMust be one, and E⁽⁰⁾To E^kThe importance of each embedded matrix is the same between them, so alpha₀To alpha_kAll the parameters need to be equal, and the values are all 1/K + 1.

Step 3, calculating to obtain three recommendation indexes according to the constructed graph convolution network model, wherein the three recommendation indexes are recall, precision and ndcg respectively; the specific method comprises the following steps:

first, between the user and the target, there is defined: for a user and an item, vector dot multiplication of the two after all the convolution layers can obtain a rating value, and the rating value can be regarded as the preference degree of the user for the item. The calculation formula of the rating value is abbreviated as y, and the preference degrees of m users to n items are calculated as follows:

According to the above formula, the preference degree of m users to n items can be obtained, that is, the rating table is obtained, and the size of the rating table is m × n.

Then, selecting items corresponding to the rating value of 20 th top of the rank corresponding to each user from the rating table, and obtaining the items of 20 th top of the rank corresponding to each user as a recommended item subset of the user; it should be noted that if a recommended item of a user already exists in the training set, the item needs to be taken from the recommended list of the user and a new recommended item needs to be supplemented;

then, combining the recommended item sub-sets corresponding to the m users to obtain a user recommended item set, and recording the user recommended item set as a rating set;

and then, recording a positive sample set of the users in the test set as a TestTrue set, and calculating three recommendation indexes of call, precision and ndcg according to the rating set and the TestTrue set.

Step 4, calculating loss value in training set

According to step 1, the samples in the training set are divided into three types, namely, positive samples, intermediate samples and negative samples, so when calculating the loss, how to calculate the loss through the difference between the three types of samples needs to be considered.

Since the difference between the positive and the intermediate samples is small compared to the negative sample, the difference between the positive and the negative sample and the difference between the intermediate and the negative sample are mainly considered in the calculation process of the loss, and the difference can be measured by the difference between the rating values. Meanwhile, during calculation, whether a user contains an intermediate sample needs to be judged, and for the same user, the intermediate sample is special, so that the user may not have the intermediate sample, and therefore calculation of the loss function needs to be performed in a segmented manner, specifically:

1) for the users containing the intermediate samples, the user set is marked as U₁，N_u1For a positive sample set of these users, M_u1For the intermediate sample set of these users, L_u1For the negative sample set of the user, the loss calculation method for this part of users is as follows:

wherein σ is an activation function; ω and γ are weights, ω + γ is 1;

is the loss calculated by the user's positive and negative examples, the goal of doing the loss calculation is to expand the difference between the positive and negative examples and the intermediate and negative examples. Since above mentioned y_uiIndicates the degree of preference of user u for item i, and therefore when y_ui-y_ujThe larger the value of (a), the more obvious the difference between item i and item j, and the smaller the loss value calculated in this part. While

This part is the same, but the part is the penalty calculated by the user's intermediate and negative examples. The proportion of the loss calculation of the two parts (i.e. omega and gamma) cannot be directly judged, and the optimal value needs to be found through continuous experiments.

2) For users without intermediate samples, the user set is marked as U₂The positive sample set of these users is N_u2And the set of negative samples is marked as L_u2The loss calculation method for this part of users is as follows:

where σ is the activation function, since for U₂Does not have an intermediate sample, and therefore does not need to consider the difference between the intermediate sample and the negative sample, and therefore, is

Again, how to set the loss function by the difference between the positive and negative samples is considered, and the principle is the same as that of the part of equation (10) in which the loss is calculated by the user positive and negative samples.

Besides, we should add a quadratic norm to prevent overfitting on the overall loss, with the coefficient λ, so the final loss is:

Loss＝Loss₁+Loss₂+λ||E⁽⁰⁾||² (12)

wherein, Loss is the Loss value of all users in the training set; loss₁Loss values for all users in the training that contain intermediate samples; loss₂Loss values for all users in the training set without intermediate samples; λ is a coefficient; e⁽⁰⁾Generating an initial embedding matrix for random; i E⁽⁰⁾||²For initially embedding the matrix E⁽⁰⁾Is used as a regularization term in the functional expression (12) to prevent overfitting.

Step 5, updating the final embedded matrix in the step 2 by combining a back propagation method and a gradient descent method according to the final loss value to obtain an updated embedded matrix; and using the updated embedded matrix as the initial embedded matrix E of the next epoch⁰Iteratively executing the step 2 to the step 4 until the three recommendation indexes output in the step 3 tend to be stable;

the frequency of epoch is set according to the variation curve of loss value, and the value of epoch should be selected in the stable region of the loss variation curve. If the set epoch number is M, then each parameter update needs to perform the epoch M times,

step 6, taking the three final recommendation indexes in the step 5 as basic recommendation indexes, and respectively updating the threshold epsilon, the weight alpha, the weight beta, the weight omega and the weight gamma in the set threshold epsilon range, the weight alpha range, the weight beta range, the weight omega range and the weight gamma range; and (5) iteratively executing the step 1 to the step 5, and finding out optimal parameters through the recommendation indexes.

The final three recommendation indexes in the step 5 are obtained by selecting the maximum value from all the recommendation indexes which tend to be stable as the final three recommendation indexes and are used for measuring the recommendation effect of the current model.

The threshold epsilon for dividing the positive sample, the middle sample and the negative sample in the step (1) is continuously adjusted through experiments.

The neighbor information aggregated in step (2) of the present invention is derived from the connection information contained in the positive sample and the intermediate sample, and the proportion of the two samples is different.

According to the method, the loss function calculation of the model in the step (3) corresponds to the network structure, and the recommendation effect can be obviously improved.

According to the method, on the basis of using the convolutional network, the samples are divided into the positive samples, the intermediate samples and the negative samples through the set conditions, so that compared with a traditional positive and negative sample dividing method, the method can reflect the difference among the samples and further reflect the preference degree of a user to the items more deeply. In addition, in the process of aggregation and transmission, different weights are given to the connection information contained in the positive sample and the middle sample by the method, so that the network structure of the whole method is more clear and reasonable. And during subsequent loss calculation, the loss is calculated again corresponding to the network structure, so that the overall structure is more complete. Through the three steps, the recommendation effect of the method is obviously improved compared with that of the traditional method, and the method has certain reference significance for other recommendation algorithms.

Example 1

A lastfm dataset is used that contains 1892 users and 4489 items. There are 42135 interactive data in training set and 10533 interactive data in test set. The training set and the test set both contain three types of data, the first column representing the user, the second column representing the items, and the third column representing the number of interactions. The convolution layer network in the whole method is 3 layers, and the embedding dimension of the vector is 64 dimensions.

(1) Data sample partitioning

An initial threshold 35 is set based on the number of interactions in the training set. According to the interaction times, the proportion of the positive samples and the intermediate samples in the divided training set and the divided test set is 9: around 1. The initial threshold was set at 35. When samples are classified, items with the interaction times larger than 35 in the training set and the testing set are used as positive samples of the user, items with the interaction times between 0 and 35 are used as intermediate samples of the user, and samples obtained by subtracting the positive samples and the intermediate samples from all the items are used as negative samples.

2) And continuously adjusting the size of the threshold epsilon according to the recommendation index, wherein the proportion of positive samples and intermediate samples in the training set and the test set is ensured to be 9: near 1, the downward adjustment range is 25-35, the upward adjustment range is 35-50, the adjustment range is 5, and the optimal threshold value is found through continuous adjustment.

(2) Multi-neighbor aggregation and delivery

According to the formula (1), the current epsilon is 35, and the method divides three types of samples, namely a positive sample, a middle sample and a negative sample. At this time, 38358 positive sample data and 9555 middle sample data exist in the training set. In the process of neighbor aggregation, experimental comparison shows that neighbor information of an aggregated negative sample has a negative influence on an experimental result. Therefore, in the neighbor aggregation process, aggregation of the positive sample neighbor information and the intermediate sample neighbor information is mainly used as a main part. Considering that the positive samples and the intermediate samples play different roles in neighbor aggregation, different weights are respectively given to aggregated positive sample information and intermediate sample information, namely alpha and beta. The aggregated information is passed between convolutional layers, the passing rule is as follows:

wherein A is₁And A₂Are all matrices 1892 × 4489, D₁And D₂All sizes of 1892 × 1. A. the₁Is a connection matrix of positive samples, indicating that the aggregated neighbor information is from positive samples, D₁Is A₁A degree matrix of (c); a. the₂Is a connection matrix of intermediate samples, representing aggregated neighbor information from the intermediate samples, D₂Is A₂The degree matrix of (c). E^kDenoted is an embedded matrix of the k-th layer, E^k+1An embedded matrix of the (k + 1) th layer is shown. In the embedded matrix of users and items, the sum of the behavior users and the items is listed as an embedded dimension, and an initial embedded matrix E⁰Randomly generated, as a matrix of 6381 x 64. According to the transmission rule of the convolutional layer, the embedded matrix of the next layer can be obtained from the embedded matrix of the previous layer. The alpha and beta parameters are normalized, the value of alpha is 0-1, and beta changes along with the adjustment of 0.1 each time. Currently, α and β are equal to 0.9 and β is equal to 0.1.

After obtaining the embedding matrix of each layer through the propagation rule, weighted summation is required to be performed on the embedding matrix of each layer, so as to obtain a final embedding matrix, where the expression is:

E＝α₀E⁽⁰⁾+α₁E⁽¹⁾+α₂E⁽²⁾+...+α_kE^(k) (14)

wherein alpha is₀...α_kThe weight corresponding to each layer of the embedded matrix is shown, and the number of the network layers in the example is 3, so that alpha is₀To alpha_kIs 1/4.

(3) Making recommendations

Between the user and the target, we define: for a user and a project, their vector dot product will yield a rating value, abbreviated as y, and the formula is as follows:

wherein e is_uAnd e_iThe embedded vectors of the user u and the item i after passing through the multilayer convolution network layers are shown, and a rating table of each user about each item can be obtained, wherein the size of the table is 1892 × 4489. Taking the items with the ranking value of each user ranked in the top 20 in the ranking table, and recording the items as a ranking set, wherein the set is a recommendation set; and simultaneously finding a positive sample set of the user in the test set, recording the positive sample set as a Testtrue set, and calculating three recommendation indexes of call, precision and ndcg according to the two sets.

(4) Calculation of losses

E⁰As the initial embedding matrix is not constant, E in each training and testing cycle⁰Are constantly changing. The loss is calculated to continuously optimize and train the initial embedding matrix E⁰And further improve the recommendation effect of the people. According to the formula:

subsequent loss calculations are performed.

According to (1) and (2), since the samples in the training set are divided into three types, namely, positive samples, intermediate samples and negative samples, we need to consider how to calculate the loss through the three types of samples in calculating the loss. Since the difference between the positive sample and the intermediate sample is smaller than that between the negative sample, the loss calculation process mainly considers the difference between the rating of the positive sample and the negative sample and the difference between the rating of the intermediate sample and the negative sample, and meanwhile, whether the user contains the intermediate sample needs to be judged, for the same user, since the intermediate sample is special, the user may not have the intermediate sample, so that the loss function needs to be calculated in segments according to the condition, and the calculation formula is as follows:

in addition to calculating the loss from the basis, we should also add a two-norm to the overall loss to prevent overfitting, with a coefficient of λ, so the final loss function is:

Loss＝Loss₁+Loss₂+λ||E⁽⁰⁾||² (18)

wherein λ is 6 x 10^-4The loss of the power amplifier is calculated according to the set loss function,continuously updating and optimizing the initial embedded matrix E for each test and training cycle⁰And taking one test and training as an epoch, wherein the whole method needs to carry out 1000 epochs, and the optimal result is obtained in 1000 epochs. In the 1000 epochs, the obtained recommendation indexes include a recall of 0.2836, an ndcg of 0.2183 and a precision of 0.0756.

In the above, the selection of the threshold, the parameters in the process of aggregation transfer and the parameters in the process of loss calculation need to be adjusted continuously to find the optimal value. And each adjusted value needs to be processed by 1000 epochs, and whether the adjusted value is the optimal value is observed by calculating the optimal three parameter indexes in 1000 times. Through multiple experiments, the optimal value of the threshold is 45, the optimal values of alpha and beta in polymerization transfer are 1 and 0 respectively, and the optimal values of alpha and beta in the loss calculation process are 0.9 and 0.1 respectively. At this time, the recommendation index is calculated, and recall is 0.2932, ndcg is 0.2226, precision is 0.0759.

Claims

1. A graph convolution recommendation method based on multi-type neighbor aggregation is characterized by comprising the following steps:

step 1, setting a threshold value;

step 2, dividing the data samples in the training set and the test set according to the threshold set in the step 1 to obtain the training set and the test set, wherein the two sample sets comprise a positive sample, a middle sample and a negative sample;

step 3, constructing a graph convolution network model according to the positive samples and the intermediate samples obtained from the training set in the step 2;

step 5, recommending the project according to the updated graph convolution network model, and combining the test set to obtain a recommendation index;

2. The graph convolution recommendation method based on multi-type neighbor aggregation according to claim 1, wherein in step 1, a specific method for setting the threshold is as follows:

3. The graph convolution recommendation method based on multi-type neighbor aggregation according to claim 1, wherein in step 2, the data samples are divided according to a set threshold, and the specific method is as follows:

4. The method for recommending graph convolution based on multi-type neighbor aggregation according to claim 1, wherein in step 3, a graph convolution network model is constructed according to the positive samples and the intermediate samples obtained in step 1, and the specific method is as follows:

S22, obtaining the adjacency matrix A of the positive samples according to S21₁And an adjacency matrix A of intermediate samples₂Obtain the volume of the graphTransfer functions of convolutional layers in the network model;

5. The method for recommending graph convolution based on multi-type neighbor aggregation according to claim 4, wherein in step 4, the graph convolution network model constructed in step 3 is updated through loss calculation to obtain an updated graph convolution network model, and the specific method is as follows:

Loss＝Loss₁+Loss₂+λ||E⁽⁰⁾||²

wherein, Loss is the Loss value of all users in the training set; loss₁Loss values for all users in the training that contain intermediate samples; loss₂Loss values for all users in the training set without intermediate samples; λ is a coefficient; e⁽⁰⁾Generating an initial embedding matrix for random; i E⁽⁰⁾||²For initially embedding the matrix E⁽⁰⁾The two norms of (a) as regularization terms in the functional expression to prevent over-fitting;

6. The method for recommending graph convolution based on multi-type neighbor aggregation according to claim 1, wherein in step 5, project recommendation is performed according to the updated graph convolution network model, and a recommendation index is obtained by combining a test set, and the method specifically comprises the following steps:

taking a positive sample set in the test set as a Testtrue set;

7. The method of claim 6, wherein a rating value between each user and each corresponding item is calculated by the following formula:

8. A graph convolution recommendation system based on multi-type neighbor aggregation, the system being capable of executing the method of any one of claims 1 to 7, comprising:

a threshold setting unit for setting a threshold;

the sample dividing unit is used for dividing the data samples in the training set and the test set according to a set threshold value to obtain the training set and the test set, wherein the two sample sets respectively comprise a positive sample, a middle sample and a negative sample;

the project recommendation unit is used for recommending projects according to the updated graph convolution network model and obtaining recommendation indexes by combining with the test set;