CN113420866B

CN113420866B - Score prediction method based on dual generation countermeasure network

Info

Publication number: CN113420866B
Application number: CN202110698814.7A
Authority: CN
Inventors: 秦继伟; 武步尘
Original assignee: Xinjiang University
Current assignee: Xinjiang University
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2022-10-11
Anticipated expiration: 2041-06-23
Also published as: CN113420866A

Abstract

The invention discloses a scoring prediction method based on a dual-generation countermeasure network, which mainly relates to the field of deep learning; the method comprises the following steps: s1, dividing samples into three types, namely samples liked by a user, samples not purchased and samples disliked by the user; s2, two GANs are used, wherein G of the first GAN is a negative sample generator, G of the second GAN is a positive sample generator, and the negative sample generator is used for generating high-quality negative samples; s3, inputting the samples generated by the negative sample generator into the positive sample generator as additional marking data, and randomly selecting some samples which are not purchased as input into the negative sample generator to generate positive samples; s4, inputting a favorite purchase vector of a user by the second GAN and requiring to generate an output close to 0 on a negative sample element generated by the first GAN, and generating x (0 < x < 1) from a randomly sampled unpurced vector as much as possible; the method can improve the accuracy of the model on the recommendation prediction and the generalization force of the model.

Description

Score prediction method based on dual generation countermeasure network

Technical Field

The invention relates to the field of deep learning, in particular to a scoring prediction method based on a dual-generation countermeasure network.

Background

Collaborative Filtering (CF) is one of the most mature recommendation technologies, and the similarity of scores among users is calculated based on the historical score records of the users, so as to establish a user preference model; the establishment of the user preference model is a key factor influencing the quality of the recommendation algorithm. When the user sparsely scores the recommended resource scoring matrix, a user model is difficult to establish; the problem of cold start can be brought when facing new users and new resources, and resource recommendation cannot be effectively completed. Therefore, how to build a user preference model by fully utilizing user and resource information, especially implicit information between them, has been a focus of research.

With the continuous development of deep learning, generation of antagonistic neural Networks (GAN) has been successfully applied to the field of recommendation. The generation of the countermeasure network is implemented by continuously gaming a generative model G (Generator) and a discriminant model D (Discriminator), so that G learns the distribution of data, and in the training process, the Generator tries to cheat the Discriminator and considers the data in the generative model to be real data. The discriminator attempts to correctly judge the true data. During GAN training, the generative model attempts to generate true data by inputting false data, while the discriminative model estimates the likelihood of the data being true or false (from the generative model). Finally, the discriminator cannot distinguish the data in the generator, and the generator can generate the data we need.

CFGAN is a very successful deep recommendation framework based on generating countermeasure networks and collaborative filtering. The CFGAN is a GAN framework which introduces GAN in the field of collaborative filtering, migrates the relation between implicit features and files to users and projects, adopts a vector mode to resist training in the aspect of data preprocessing and proposes a vector mode-based resist training. Since the data is easier to collect, it is planned to focus on CFs with implicit feedback. The generative model of CFGAN attempts to generate a true purchase vector. Likewise, D attempts to differentiate between the generated user purchase vector and the true purchase vector obtained from ground truth. The generator plays games with the discriminator by generating a vector similar to the purchase vector, but the final generator will train to obtain an all-1 vector, for this reason, a negative sampling technique is introduced into the CFGAN, and optimization is performed through a loss function, so that the generator G is ensured to learn that the purchased article of the user is 1, and the unpurchased article is 0, so as to generate a purchase vector which is not all-1. The CFGAN has achieved great success in performance, however, the selection of negative examples is too random, because in real life, articles that the user does not purchase often do not represent articles that the user does not like, but rather are articles that the user does not see, and meanwhile, the previous algorithms do not utilize information of samples that the user does not purchase in a data set, which results in a large amount of hidden information being wasted by the model, and the limit of the model performance is not reached.

Disclosure of Invention

The invention aims to solve the problems in the prior art, provides a scoring prediction method based on a dual-generation countermeasure network, utilizes user embedded information, enhances the intimacy between users and projects and between users, improves a model, and better adapts to the problems of data sparseness and user cold start so as to improve the precision of the model on recommendation prediction and the generalization force of the model.

In order to achieve the purpose, the invention is realized by the following technical scheme:

the scoring prediction method based on the dual-generation countermeasure network comprises the following steps:

s1, dividing samples into three types, namely samples liked by a user, samples not purchased and samples disliked by the user;

s2, two GANs are used, wherein G of the first GAN is a negative sample generator, G of the second GAN is a positive sample generator, and the negative sample generator is used for generating high-quality negative samples;

s3, inputting the samples generated by the negative sample generator into the positive sample generator as additional marking data, and randomly selecting some samples which are not purchased as input into the negative sample generator to generate positive samples;

and S4, inputting a favorite purchase vector of the user by the second GAN, requiring the negative sample element generated by the first GAN to generate an output close to 0, and generating x (0 < x < 1) by using the randomly sampled unpurchased vector as much as possible.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention fully utilizes the user embedded matrix and enhances the potential relationship between users; the negative sample and the generated countermeasure network are combined, so that the accuracy of the model is improved; samples which are not purchased by a user are utilized, so that the model is applied to the implicit relation between user items;

2. the invention provides a general CF frame name DGAN based on GAN, which makes full use of the implicit information of user-resource, uses the unpurchased items, and proves the effectiveness of the implicit information used by the invention through comparison experiments; the experimental results obtained by carrying out a large number of experimental verifications on the two data sets not only prove the effectiveness of the invention, but also prove the superiority of the invention, and compared with the latest top-N, the accuracy is obviously improved.

Drawings

FIG. 1 is a general architecture diagram of the present invention;

FIG. 2 is a score prediction result (movielens-100K) based on the present invention and comparison algorithm;

FIG. 3 shows the score prediction results (movielens-1M) based on the present invention and comparison algorithm.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the present application.

The embodiment is as follows: as shown in fig. 1, the scoring prediction method based on the dual generation countermeasure network of the present invention includes the steps of:

Specifically, the method comprises the following steps:

1. negative sample generator

Collecting user historical data information, processing data, marking three different types of samples which are favorite samples, unpurchased samples and disliked samples of a user, and randomly selecting some samples which are disliked by the user as input to a negative sample generator.

2. Positive sample generator

Positive samples are generated by inputting the samples generated by the negative sample generator as additional marking data into the positive sample generator. The positive sample generator inputs the user's favorite purchase vector and requires the negative sample generator to generate an output close to 0 on the element, and simultaneously generates x (0 < x < 1) on the randomly sampled unpurchased vector as much as possible, thereby improving the accuracy of the scoring prediction model.

3. Scoring prediction recommendation model

Referring to fig. 1, a negative sample generator and a positive sample generator are fused and learned and score prediction is tested so as to better recommend favorite items for a user and improve user satisfaction.

The method comprises the following specific steps:

step 1: negative examples are extracted from the negative example generator.

Step 2: item scores are generated from the positive sample generator.

And 3, step 3: the negative and positive sample generators are fused and learned and the score predictions are tested.

4. The experiment proves that

2-3, using 2 data sets movielens-100K and movielens-1M, the following 7 comparison algorithms were used: itempop, BPR, MPR, CDAE, IRGAN, CFGAN, gauphGAN, thereby verifying that the model performance is better than other comparison recommendation algorithms.

Claims

1. The scoring prediction method based on the dual generation countermeasure network is characterized by comprising the following steps of:

s3, inputting the sample generated by the negative sample generator into the positive sample generator as additional marking data to generate a positive sample;

and S4, inputting a favorite purchase vector of the user by the second GAN, requiring to generate an output close to 0 on the negative sample element generated by the first GAN, and generating x as much as possible by using a vector formed by randomly sampled unpurchased commodities, wherein 0 < x < 1.