CN109360069A

CN109360069A - A kind of recommended models based on pairs of dual training

Info

Publication number: CN109360069A
Application number: CN201811265107.3A
Authority: CN
Inventors: 叶阳东; 孙中川; 吴宾; 吴云鹏
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2018-10-29
Filing date: 2018-10-29
Publication date: 2019-02-19
Anticipated expiration: 2038-10-29
Also published as: CN109360069B

Abstract

The invention discloses a kind of recommended models based on pairs of dual training.The model mainly includes two parts, generator and arbiter.Wherein, generator to the preference of user for modeling and generating the article that user is liked, and arbiter is for judging whether user likes certain article.Based on the assumption that " relative to the article that generator generates, arbiter thinks the article that user prefers to have interacted ", establishes connection using pairs of loss function between generator and arbiter.Specifically, arbiter increases the discriminating power of oneself by minimizing loss in pairs, and generator is by maximizing loss modeling user preference in pairs and cheating arbiter.In addition, the present invention substitutes traditional sampling using sample mode that can be micro-, make the connection between generator and arbiter can be micro-, therefore the training of the method based on gradient can be used in this model.Compared to existing method, the present invention can be improved stability and convergence rate of the dual training in recommender system.

Description

A kind of recommended models based on pairs of dual training

Technical field

The invention belongs to recommender system technical fields, more specifically, being under dual training frame based on losing in pairs Recommended models.

Background technique

With the fast development of e-commerce and online website, such as Taobao and bean cotyledon etc., user is enjoying convenient service While, also perplexed by problem of information overload.Recommender system is considered as the effective tool for alleviating this problem, it passes through modeling The historical behavior of user simultaneously recommends possible interested article to it.

The model of recommender system, which can be divided into, generates model and discrimination model.Model is generated to build the Behavior preference of user Mould with good theoretical basis, but is difficult with information relevant to user and article, such as comment of the user to article With the visual information of article etc..Discrimination model directly judges the relationship between user and article according to the feature of user and article, But it cannot learn from the data of no label.

The advantages of in order to integrate two kinds of models, the unified information generated with discrimination model based on minimax game Retrieval model (IRGAN, IRGAN:A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models) it will generation model and differentiation using dual training frame Model system is combined with the precision of lift scheme.As the subdomains of information retrieval, recommender system can also use confrontation instruction Practice the characterization ability that frame increases model.Different from traditional optimization problem, the solution of confrontation model belongs to minimax game, Its objective function causes model training unstable.In addition, traditional dual training uses the method Optimized model of gradient decline, because This requires model integrally can be micro-.In order to solve the problems, such as that the discrete non-differentiability of article in information retrieval, IRGAN use strategy ladder The method Optimized model of degree.But the variance and number of articles of Policy-Gradient are proportional, in recommender system, substantial amounts Article makes Policy-Gradient variance with higher, more unstable so as to cause model training.The above both sides reason is led It causes IRGAN training in recommender system field unstable and restrains the problems such as slow.

Summary of the invention

For the disadvantages described above and Improvement requirement of the prior art, the characteristics of present invention combination recommender system and confrontation model, A kind of recommended models based on pairs of dual training are provided, it is steady in recommender system its purpose is to improve dual training Qualitative and convergence rate is simultaneously accurately and quickly recommended.

Meanwhile we use and are based on Geng Beier-flexibility maximum value (Gumbel-Softmax) classification reparameterization (Categorical Reparameterization with Gumbel-Softmax) can be sampled micro-ly and be used successive value table Show discrete articles, therefore model proposed by the present invention is that entirety can be micro-, can directly use the method based on gradient to carry out excellent Change.Compared to Policy-Gradient, Geng Beier-flexibility max methods gradient has lower variance, being capable of further Lifting Modules The stability of type training.

Model of the invention mainly includes two parts: generator and arbiter.Generator is responsible for inclined to the behavior of user Good modeling, and the article that the preference list of user can be generated and may like.Arbiter is responsible for judging user to a certain article Whether like.

Based on the assumption that " relative to the article that generator generates, arbiter thinks that user more likes the article interacted ", this Generator and arbiter are unified under the frame of pairs of dual training by invention, by being optimized to (pairwise) loss function To train entire model.Specifically, generator for modelling customer behavior preference, generates the favorite article of user and cheats and sentence Other device, target are to maximize loss in pairs；Arbiter minimizes loss in pairs to maintain the hypothesis of model to set up.Not Break in alternate dual training, generator and arbiter will reach nash banlance (Nash equilibrium).At this point, generating The Behavior preference of device energy analog subscriber simultaneously generates the article that user likes, and arbiter cannot be distinguished user to generation article and hand over The preference of mutual article.

A training process of the invention the following steps are included:

(1) parameter of fixed generator, training arbiter:

Choose the user-article pair interacted from data set, generator generates user to the preference probability of article used, Then using can micro- sampling Gumbel-Softmax method generate the article that the user may like；

Input with triple (user, the article interacted, the article of generation) as arbiter, to minimize model Pairs of loss function is the parameter that target updates arbiter using the method based on gradient.

(2) parameter of fixed arbiter, training generator:

Input with triple (user, the article interacted, the article of generation) as arbiter, to maximize model Pairs of loss function is the parameter that target updates generator using the method based on gradient.

After training, modeling according to generator to user behavior preference recommends the article liked to it.

Using the above method provided by the invention, stability of the dual training in recommender system and convergence speed can be improved Degree.

Using the above method provided by the invention, more acurrate favorite article effectively can be recommended for user.

Detailed description of the invention

Illustrate technical solution of the present invention in order to clearer, it below will be to required in embodiment or description of the prior art The attached drawing used is simply introduced.

Fig. 1 is model structure of the invention；

Fig. 2 is micro- can to sample the exemplary diagram that Gumbel-Softmax method and parameter influence；

Fig. 3 is the algorithm flow chart of model training of the present invention；

Fig. 4 is the learning curve figure of model.

Specific implementation

The technical scheme in the embodiments of the invention will be clearly and completely described below, but tool described herein The examples are only for explaining the invention for body, is not intended to limit the present invention.

M and n is enabled to indicate the number of user and article, S ∈ R^m×nFor user-article Interactive matrix, if user u and object Product i has intersection record, then s_ui=1, otherwise s_ui=0.Enable W ∈ R^m×dWith V ∈ R^n×dIndicate the eigenmatrix of user and article, In, w_uAnd v_iThe feature vector of user u and article i are respectively indicated, d indicates the dimension of feature.b∈R^n×1Indicate the biasing of article Vector.G and f is enabled to respectively indicate generator and arbiter.As described above, objective function of the invention is as follows:

Wherein, L is pairs of loss function, can be the pairs of loss function such as logarithm, hinge.θ and φ is generator respectively With the parameter sets of arbiter, p_realExpression has interacted the probability distribution of article, p_θIndicate that generator generates the probability point of article Cloth.

In our implementation, the model of generator and arbiter is matrix decomposition (MF, Matrix Factorization) model, MF describe user u to the preference value r of article i using the inner product of vector_ui:

Training step of the invention is as follows:

Step (1) pre-training.Use Bayes's personalized ordering (BPR, BPR:Bayesian Personalized Ranking from Implicit Feedback) to generator pre-training to convergence until.

Step (2) trains arbiter.User-article (u, i) intersection record in Ergodic Matrices S, does each (u, i) Following operation:

Step (2-1) generates user u to the preference vector r of all items using generator g_u=(r_u1,…,r_un):

Wherein, θ indicates the parameter of generator g.

Step (2-2) is by user u to the preference r of article_uIt is normalized to probability:

Wherein, sampled probability when subscript f expression training arbiter, and parameter τ ∈ (0,1] center of gravity sampled is controlled, τ is got over It is small, r_uiThe probability of higher article is bigger.

Step (2-3) uses the generation article j that Geng Beier-flexibility max methods can be micro- from the preference probability of user u:

Wherein, z is the noise vector obtained from Gumbel (0,1) profile samples.Article j is the class one-hot of n dimension Vector indicates the article that generator g is generated, and when parameter t approach 0, j is close to one-hot vector, when t approach is just infinite, j Become uniform vector.Fig. 2 illustrate can micro- sampling the influence to class one-hot vector j of process and parameter t.

Step (2-4) computational discrimination device f has interacted the scoring of article i to user u:

Wherein, φ indicates the parameter of arbiter f.

The article j that step (2-5) generates is not the article of necessary being, it is therefore desirable to the article be calculated in arbiter Feature vector and biasing in f:

Wherein j ∈ R^1×nIt is class one-hot vector, V^φ∈R^n×dIt is the eigenmatrix of article in arbiter, b^φ∈R^n×1It is Article bias vector, therefore b_j ^φAnd v_j ^φ∈R^1×dIt can be used as the biasing and feature vector of article j.

Scoring of step (2-6) the computational discrimination device f to the article j of generation:

Pairs of loss of step (2-7) the computational discrimination device f about article i and j:

Loss=log (1+exp (f (j | u)-f (i u))) (11)

Formula (11) uses the pairs of loss function of logarithm, further, it is also possible to use other pairs of damages such as hinge loss Lose function.

Step (2-8) arbiter f will minimize objective function, therefore the parameter phi of f is updated using gradient decline:

Wherein α is learning rate.

Step (3) trains generator.User-article (u, i) intersection record in Ergodic Matrices S, does each (u, i) Following operation:

Step (3-1) generates user u to the preference vector r of all items using formula (3)_u=(r_u1..., r_un)。

Preference vector of the user u to article is normalized into using formula (13) as probability by step (3-2):

p_u=softmax (r_u) (13)

Step (3-3) generator is the preference in order to be fitted user, therefore using important when trained generator Property sampling, make to have interacted article and occupy bigger specific gravity in sampling:

Wherein, subscript g indicates sampled probability when training generator, | { s_ui|s_ui=1 } | indicate that user u's has interacted object Product quantity, λ are the parameters for controlling importance sampling, and value is bigger, and it is bigger to have interacted probability shared by article.

Step (3-4) uses the generation article j that Geng Beier-flexibility max methods can be micro- from the preference probability of user u:

Step (3-5) has interacted the scoring of article i using formula (7) computational discrimination device f to user u.

Step (3-6) calculates feature vector and biasing of the article j in arbiter f using formula (8) and formula (9).

Scoring of the step (3-7) using formula (10) computational discrimination device f to article j is generated.

Step (3-8) uses pairs of loss of formula (11) the computational discrimination device f about article i and j.

Step (3-9) generator will maximize objective function, therefore rise the parameter θ for updating g using gradient:

Step (4) is if model has been restrained, deconditioning, otherwise return step (2).

Training flow chart of the invention is as shown in Figure 3.

Fig. 4 is the learning curve of the present invention on both data sets, illustrates the stability and convergence rate of model.

Recommend the stage in article, the Behavior preference r of user is generated using formula (3) for user u, generator g_uSide by side Sequence, the article high to user's recommendation score.

Those skilled in the art will readily recognize that the embodiment in the present invention, the foregoing is merely preferred embodiments of the invention , it is not intended to limit the invention.

Claims

1. a kind of recommended models based on pairs of dual training, it is characterised in that: model includes generator and arbiter two parts, Generator models to the Behavior preference of user and generates the item lists that user likes, and arbiter judges that user is to a certain article It is no to like；Based on the assumption that " relative to the article that generator generates, arbiter thinks that user more likes the article interacted ", uses Confrontation loss function establishes connection between generator and arbiter in pairs；The target of arbiter is to minimize to lose and mention in pairs The discriminating power of oneself is risen, the target of generator is to maximize loss in pairs, cheats arbiter and is promoted and is built to user preference Mould ability；Using can be micro- the method for sampling and based on gradient decline optimization method promoted dual training stability.

2. a kind of recommended models based on pairs of dual training according to claim 1, it is characterised in that: based on the assumption that " relative to the article that generator generates, arbiter thinks that user more likes the article interacted ", will using pairs of loss function Generator and arbiter are unified under the frame of dual training, wherein arbiter needs to minimize objective function, and generator needs Maximize objective function, it may be assumed that

Wherein, i is the article that user u has been interacted, and j is the article that generator g is generated, and f is arbiter, and L is pairs of loss function, θ and φ is the parameter sets of generator and arbiter, p respectively_realExpression has interacted the probability distribution of article, p_θIndicate generator Generate the probability distribution of article；In the training stage, the optimization of generator and arbiter is alternately.

3. a kind of recommended models based on pairs of dual training according to claim 1, it is characterised in that: the model Sampling process can be micro-, sampling process are as follows:

J=softmax ((logp_u+z)/t)

Wherein, p_uIt is the probability distribution that generator generates article according to user u, z is obtained from Gumbel (0,1) profile samples Noise vector, article j are only hot (one-hot) vectors of the approximation of a n dimension, indicate the article that generator g is generated；Parameter t becomes When nearly 0, j is close to only hot vector, and when t approach is just infinite, j is then uniform vector.

4. a kind of recommended models based on pairs of dual training according to claim 1, it is characterised in that: generate article j Feature vector and biasing in arbiter be by can be micro- process obtain:

v_j=jV

b_j=jb

Wherein j ∈ R^1×nIt is approximate solely hot vector, represents the article of generation, V ∈ R^n×dIt is the eigenmatrix of article in arbiter, b ∈R^n×1It is article bias vector, therefore b_jAnd v_j∈R^1×dIt can be used as the biasing and feature vector of article j.

5. a kind of recommended models based on pairs of dual training according to claim 1, it is characterised in that: the model is whole Can be micro-, using the optimization method based on gradient, alternately training pattern parameter, the target of arbiter are to minimize loss in pairs, are needed Gradient decline is used to update its parameter phi:

The target of generator is to maximize loss in pairs, needs to rise its parameter θ of update using gradient: