CN113420866A

CN113420866A - Score prediction method based on dual generation countermeasure network

Info

Publication number: CN113420866A
Application number: CN202110698814.7A
Authority: CN
Inventors: 秦继伟; 武步尘
Original assignee: Xinjiang University
Current assignee: Xinjiang University
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-09-21
Anticipated expiration: 2041-06-23
Also published as: CN113420866B

Abstract

The invention discloses a scoring prediction method based on a dual-generation countermeasure network, which mainly relates to the field of deep learning; the method comprises the following steps: s1, dividing the samples into three types, namely samples which are liked by the user, samples which are not purchased and samples which are disliked by the user; s2, two GANs are used, wherein G of the first GAN is a negative sample generator, G of the second GAN is a positive sample generator, and the negative sample generator is used for generating high-quality negative samples; s3, inputting the samples generated by the negative sample generator into the positive sample generator as additional marking data, and randomly selecting some samples which are not purchased as input into the negative sample generator to generate positive samples; s4, inputting a favorite purchase vector of the user by the second GAN and requiring to generate an output close to 0 on the negative sample element generated by the first GAN, and generating x (0 < x < 1) from the randomly sampled unpurchased vector as much as possible; the method can improve the accuracy of the model to the recommended prediction and the generalization force of the model.

Description

Score prediction method based on dual generation countermeasure network

Technical Field

The invention relates to the field of deep learning, in particular to a scoring prediction method based on a dual-generation countermeasure network.

Background

Collaborative Filtering (CF) is one of the most mature recommendation technologies, and the similarity of scores among users is calculated based on the historical score records of the users to establish a user preference model; the establishment of the user preference model is a key factor influencing the quality of the recommendation algorithm. When the user sparsely scores the recommended resource scoring matrix, a user model is difficult to establish; the problem of cold start can be brought when facing new users and new resources, and resource recommendation cannot be effectively completed. Therefore, how to build a user preference model by fully utilizing user and resource information, especially implicit information between them, has been a focus of research.

With the continuous development of deep learning, generation of antagonistic neural Networks (GAN) has been successfully applied to the field of recommendation. The generation of the countermeasure network is realized by continuously playing a game on a generation model G (Generator) and a discrimination model D (discriminator), so that G learns the distribution of data, and in the training process, the generator tries to cheat a discriminator and considers the data in the generation model to be real data. The discriminator attempts to correctly judge the true data. During GAN training, the generative model attempts to generate true data by inputting false data, while the discriminative model estimates the likelihood of the data being true or false (from the generative model). Finally, the discriminator cannot distinguish the data in the generator, and the generator can generate the data we need.

CFGAN is a highly successful deep recommendation framework based on generating countermeasure networks and collaborative filtering. The CFGAN is a GAN framework which introduces GAN in the field of collaborative filtering, migrates the relation between implicit features and files to users and projects, adopts a vector mode to resist training in the aspect of data preprocessing and proposes a vector mode-based resist training. Since the data is easier to collect, it plans to focus on CF with implicit feedback. The generative model of CFGAN attempts to generate a true purchase vector. Likewise, D attempts to distinguish between the generated user purchase vector and the true purchase vector obtained from ground truth. The generator plays games with the discriminator by generating a vector similar to the purchase vector, but the final generator will train to obtain an all-1 vector, for this reason, a negative sampling technique is introduced into the CFGAN, and optimization is performed through a loss function, so that the generator G is ensured to learn that the purchased article of the user is 1, and the unpurchased article is 0, so as to generate a purchase vector which is not all-1. CFGAN has achieved great success in performance, but the selection of negative examples is too random, because in real life, articles that the user does not purchase often do not represent articles that the user does not like, but rather, there are articles that the user does not see, and meanwhile, the previous algorithms do not utilize information of samples that the user does not purchase in the data set, so that a large amount of hidden information is wasted by the model, and the limit of model performance is not reached.

Disclosure of Invention

The invention aims to solve the problems in the prior art, provides a scoring prediction method based on a dual-generation countermeasure network, utilizes user embedded information, enhances the intimacy between users and projects and between users, improves a model, and better adapts to the problems of data sparseness and user cold start so as to improve the precision of the model on recommendation prediction and the generalization force of the model.

In order to achieve the purpose, the invention is realized by the following technical scheme:

the scoring prediction method based on the dual generation countermeasure network comprises the following steps:

s1, dividing the samples into three types, namely samples which are liked by the user, samples which are not purchased and samples which are disliked by the user;

s2, two GANs are used, wherein G of the first GAN is a negative sample generator, G of the second GAN is a positive sample generator, and the negative sample generator is used for generating high-quality negative samples;

s3, inputting the samples generated by the negative sample generator into the positive sample generator as additional marking data, and randomly selecting some samples which are not purchased as input into the negative sample generator to generate positive samples;

and S4, inputting a favorite purchase vector of the user by the second GAN, requiring that the negative sample element generated by the first GAN generates an output close to 0, and generating x (0 < x < 1) by using the randomly sampled unpurchased vector as much as possible.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention fully utilizes the user embedded matrix and enhances the potential relationship between users; the negative sample and the generated countermeasure network are combined, so that the accuracy of the model is improved; samples which are not purchased by the user are utilized, so that the model is applied to the implicit relation between the user items;

2. the invention provides a general CF frame name DGAN based on GAN, which makes full use of the implicit information of user-resource, uses the unpurchased items, and proves the effectiveness of the implicit information used by the invention through comparison experiments; the experimental results obtained by carrying out a large number of experimental verifications on the two data sets not only prove the effectiveness of the invention, but also prove the superiority of the invention, and compared with the latest top-N, the accuracy is obviously improved.

Drawings

FIG. 1 is a general architecture diagram of the present invention;

FIG. 2 is a score prediction result (movielens-100K) based on the present invention and comparison algorithm;

FIG. 3 shows the score prediction results (movielens-1M) based on the present invention and comparison algorithm.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the present application.

Example (b): as shown in fig. 1, the scoring prediction method based on the dual generation countermeasure network of the present invention includes the steps of:

Specifically, the method comprises the following steps:

1. negative sample generator

The method comprises the steps of collecting user historical data information, conducting data processing, marking samples which are three different types, namely samples which are liked by a user, samples which are not bought by the user and samples which are disliked by the user, and randomly selecting some samples which are disliked by the user as input to a negative sample generator.

2. Positive sample generator

Positive samples are generated by inputting the samples generated by the negative sample generator as additional marking data into the positive sample generator. The positive sample generator inputs the user's favorite purchase vector and requires the negative sample generator to generate an output close to 0 on the element, and simultaneously generates x (0 < x < 1) on the randomly sampled unpurchased vector as much as possible, thereby improving the accuracy of the scoring prediction model.

3. Scoring prediction recommendation model

Referring to fig. 1, a negative sample generator and a positive sample generator are fused and learned and score prediction is tested so as to better recommend favorite items for a user and improve user satisfaction.

The method comprises the following specific steps:

step 1: negative examples are extracted from the negative example generator.

Step 2: an item score is generated from the positive sample generator.

And step 3: the negative and positive sample generators are fused and learned and the score predictions are tested.

4. Experiments prove that

2-3, using 2 data sets movielens-100K and movielens-1M, the following 7 comparison algorithms were used: itempop, BPR, MPR, CDAE, IRGAN, CFGAN, GauphGAN, thereby verifying that the model performance is superior to other comparison recommendation algorithms.

Claims

1. The scoring prediction method based on the dual generation countermeasure network is characterized by comprising the following steps of:

s3, inputting the sample generated by the negative sample generator into the positive sample generator as additional marking data to generate a positive sample;

s4, inputting a favorite purchase vector of the user by the second GAN and requiring to generate an output close to 0 on the negative sample element generated by the first GAN, and generating x (0 < x < 1) from the randomly sampled unpurchased vector as much as possible; the method can improve the accuracy of the model to the recommended prediction and the generalization force of the model.