CN109902823A

CN109902823A - A kind of model training method and equipment based on generation confrontation network

Info

Publication number: CN109902823A
Application number: CN201811654623.5A
Authority: CN
Inventors: 刘志容; 董振华; 张宇宙; 刘明瑞; 郭贵斌
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-06-18
Anticipated expiration: 2038-12-29
Also published as: WO2020135642A1

Abstract

The embodiment of the present application provides a kind of model training method and equipment based on generation confrontation network, this method comprises: equipment is the first user generation positive example counterfeiting and negative example counterfeiting by generating model；The multiple authentic items of the equipment training are to, to obtain discrimination model, the discrimination model is used to differentiate the difference between the multiple authentic item pair and the multiple counterfeiting pair with multiple counterfeiting；Each authentic item is to including a positive example authentic item and a negative example authentic item, and each counterfeiting is to including a positive example counterfeiting and a negative example counterfeiting；The equipment updates the generation model according to the loss function of the discrimination model.Using the embodiment of the present application, the discriminating power of the generative capacity and discrimination model that generate model can be improved.

Description

A kind of model training method and equipment based on generation confrontation network

Technical field

This application involves big data field more particularly to it is a kind of based on generate confrontation network model training method and set It is standby.

Background technique

With information-based continuous development, people are facing to the problem of information overload got worse.Personalized recommendation system As a kind of effective information filtering tool, the recommendation service of various personalizations can be provided for user.Information retrieval generation pair Anti- network (Information Retrieval GAN, IRGAN) is will to generate confrontation network (Generative Adversarial Net, GAN) model be applied to article recommend field model, the product data of input can be instructed Practice to obtain generating model and discrimination model, generate model and be responsible for generation and the similar counterfeiting of authentic item, and differentiates Model is responsible for differentiating the counterfeiting generated and authentic specimen.The training for generating model and discrimination model interdepends, in article Recommend the scoring for needing to generate counterfeiting and article by generating model in scene, then article is arranged according to scoring Sequence is to obtain recommendation results.

IRGAN common training method includes sample point (point-wise) method and sample to (pair-wise) method. The main thought of Point-wise is problem will to be recommended to be converted into classification problem or regression problem, it is assumed that user is to each The fancy grade of article be it is independent, the article extraction feature that user may like is trained.Pair-wise's is main Thought is that problem will be recommended to be converted into two classification problems, and pair-wise no longer does independence vacation to article when carrying out model training If but article, to as trained minimum unit, usual each article is to the article and a use liked including a user The article that family does not like.The training effect of pair-wise is not as good as point-wise at present, how to pair-wise into Row optimization, to improve the skill that the generative capacity of generation model and the discriminating power of discrimination model in recommendation scene are this fields The technical issues of art personnel are studying.

Apply for content

The embodiment of the present application discloses a kind of model training method and equipment based on generation confrontation network, can be improved life At the generative capacity of model and the discriminating power of discrimination model.

In a first aspect, the embodiment of the present application provides a kind of model training method based on generation confrontation network, this method packet It includes:

Equipment is that the first user generates positive example counterfeiting and negative example counterfeiting by generating model, wherein the negative example Counterfeiting is to be generated according to the positive example counterfeiting, and the positive example counterfeiting of first user is prediction by institute The article of the first user concern is stated, the negative example counterfeiting of first user is not closed by wantonly first user for prediction The article of note；The multiple authentic items of equipment training to multiple counterfeiting to obtain discrimination model, the differentiation mould Type is used to differentiate the difference between the multiple authentic item pair and the multiple counterfeiting pair；Each authentic item is to packet A positive example authentic item and a negative example authentic item are included, each counterfeiting is to including a positive example counterfeiting With a negative example counterfeiting；The positive example authentic item be according to the operation behavior identification of first user by The article of first user concern, the negative example authentic item be according to the operation behavior identification of first user not The article paid close attention to by first user；The equipment updates the generation model according to the loss function of the discrimination model.

By executing the above method, the negative example counterfeiting of counterfeiting centering is to rely on positive example counterfeiting and generate , substantially envisage the potential relationship between negative example counterfeiting and positive example counterfeiting so that counterfeiting to comprising Information content is richer, improves training effect, enhances the generative capacity for generating model, therefore the generation model is generated Article and existing authentic item are ranked up generated recommendation results and have more reference value for a user.

With reference to first aspect, in the first possible implementation of the first aspect, the equipment is according to the differentiation The loss function of model updates after the generation model, further includes: the equipment is generated pseudo- by updated generation model The scoring of divine force that created the universe product, the counterfeiting include the positive example counterfeiting and negative example counterfeiting generated for the first user； The equipment is according to the scoring of counterfeiting and the scoring of existing authentic item, to the authentic item and the counterfeit Product sequence, and article is recommended to first user according to the sequence in sequence.It is understood that raw to the generation model At article and existing authentic item be ranked up caused by recommendation results have more reference value for a user.

With reference to first aspect or any of the above-described possible implementation of first aspect, at second of first aspect In possible implementation, the equipment is that the first user generates positive example counterfeiting and negative example counterfeit by generating model After product, the multiple authentic items of equipment training to multiple counterfeiting to before to obtain discrimination model, further includes: The equipment is that multiple first positive example counterfeiting match first negative counterfeiting respectively to form the multiple counterfeit Product pair, first negative counterfeiting belong to scoring in the negative example counterfeiting of first user and come preceding M negative examples Counterfeiting, M are the quantity of the first positive example counterfeiting, and the first positive example counterfeiting is from the generation model The positive example counterfeiting of first user sampled in the positive example counterfeiting of generation；In addition, the equipment is multiple the One positive example authentic item matches first negative authentic item respectively to form the multiple authentic item pair, and described first is negative Example authentic item belongs to the negative example authentic item that scoring in the negative example authentic item of first user comes top N, and N is institute The quantity of the first positive example authentic item is stated, the first positive example authentic item is true from the existing positive example of the first user The positive example authentic item sampled in article.

It is understood that acquisition is scored, high article forms article pair, including authentic item to and counterfeiting pair, by In the high article of scoring more by the concern of user, thus its article that this mode obtains for a user to comprising information Amount is bigger and noise is smaller, can fully analyze the feature paid close attention to by user to being trained according to such article, thus Train the stronger generation model of generative capacity.

With reference to first aspect or any of the above-described possible implementation of first aspect, in the third of first aspect In possible implementation, the model that is initially generated includes that positive example generates model, negative example generates model and scoring generates mould Type；The equipment is that the first user generates positive example counterfeiting and negative example counterfeiting by generating model, comprising:

The equipment generates the distribution that model generates the positive example counterfeiting of the first user by positive example, and the positive example generates Model are as follows:

The equipment generates the distribution that model generates the negative example counterfeiting of the first user by negative example, and the negative example generates Model are as follows:

The equipment generates the scoring and each negative example counterfeiting of each positive example counterfeiting by the generator that scores Scoring；

Wherein, g⁺(f⁺| it is u) distribution of the positive example counterfeiting, e_uFor the insertion vector embedding of the first user,It is the embedding, e of positive example counterfeiting to be generated_iIt is the embedding of i-th of positive example counterfeiting, b is represented The deviation bias of first user；g^-(f^-|u,f⁺) be the negative example counterfeiting distribution,It is to be generated bear The embedding of example counterfeiting.

With reference to first aspect or any of the above-described possible implementation of first aspect, at the 4th kind of first aspect In possible implementation, the equipment updates the generation model according to the loss function of the discrimination model, comprising: institute It states equipment and determines first user to the attention index of article, first user is to adopt to the attention index of article The authentic item scoring of the first user described in attention network training and counterfeiting score to obtain；The equipment is according to institute The loss function for stating discrimination model obtains reward value reward, and excellent by attention index of first user to article Change the reward value reward to obtain new reward value；The equipment updates the generation mould using the new reward value Type.

It is understood that the importance of each article pair is different, by introducing attention network, each object is obtained The weights of importance of product pair can effectively select good article pair, reduce the negative effect of article pair inferior, let us Obtained generation model, discrimination model has more robustness and adaptivity.Here article to can be authentic item pair, It can be counterfeiting pair.

With reference to first aspect or any of the above-described possible implementation of first aspect, at the 5th kind of first aspect In possible implementation, the equipment determines first user to the attention index of article, comprising:

The equipment calculates the first user to the attention index of article using attention network according to the following formula；

α=softmax (g (r⁺,r^-,f⁺,f^-|u))

Wherein, α is the attention index to article of the first user u, w_uIndicate first user trained Weight,Indicate the weight of the positive example authentic item of the first user trained,Indicate the first user trained Negative example authentic item weight,Indicate the weight of the positive example counterfeiting of first user trained,Table Show the weight of the negative example counterfeiting of first user trained；B is the deviation bias of first user.

With reference to first aspect or any of the above-described possible implementation of first aspect, at the 6th kind of first aspect It is described to pass through first user reward value reward described in the attention index optimization of article in possible implementation To obtain new reward value, comprising: optimize the reward value by attention index α of first user to article Reward is to obtain the corresponding reward value reward_1 of first user, wherein attention of first user to article Index α, reward value reward and the corresponding reward value reward_1 of first user meet following relationship: reward_1= α*reward；New reward value is determined according to the corresponding reward value reward_1 of first user.

Second aspect, the embodiment of the present application provide a kind of based on the model training equipment for generating confrontation network, the equipment packet It includes:

Model is generated, for generating positive example counterfeiting and negative example counterfeiting for the first user, wherein the negative example is pseudo- Divine force that created the universe product are to be generated according to the positive example counterfeiting, and the positive example counterfeiting of first user is prediction by described The article of first user concern, the negative example counterfeiting of first user are not paid close attention to by wantonly first user for prediction Article；

Training pattern, for train multiple authentic items to multiple counterfeiting to obtain discrimination model, it is described to sentence Other model is used to differentiate the difference between the multiple authentic item pair and the multiple counterfeiting pair；Each authentic item To including a positive example authentic item and a negative example authentic item, each counterfeiting is to including a positive example forgery Article and a negative example counterfeiting；The positive example authentic item is the operation behavior identification according to first user The article paid close attention to by first user, the negative example authentic item is according to the operation behavior identification of first user Not by first user pay close attention to article；

The training pattern, for updating the generation model according to the loss function of the discrimination model.

By running said units, the negative example counterfeiting of counterfeiting centering is to rely on positive example counterfeiting and generate , substantially envisage the potential relationship between negative example counterfeiting and positive example counterfeiting so that counterfeiting to comprising Information content is richer, improves training effect, enhances the generative capacity for generating model, therefore the generation model is generated Article and existing authentic item are ranked up generated recommendation results and have more reference value for a user.

In conjunction with second aspect, in the first possible implementation of the second aspect, which further includes recommended models, Wherein:

It is updated after the training pattern updates the generation model according to the loss function of the discrimination model The scoring that model is used to generate counterfeiting is generated, the counterfeiting includes that the positive example generated for the first user is forged Article and negative example counterfeiting；

The recommended models, for the scoring and the scoring of existing authentic item according to counterfeiting, to described true Article and counterfeiting sequence, and article is recommended to first user according to the sequence in sequence.

It is understood that the article and existing authentic item to generation model generation are ranked up generated push away It recommends result and has more reference value for a user.

In conjunction with any of the above-described possible implementation of second aspect or second aspect, at second of second aspect In possible implementation, after the generation model is the first user generation positive example counterfeiting and negative example counterfeiting, The multiple authentic items of training pattern training to multiple counterfeiting to before to obtain discrimination model, the trained mould Type is also used to:

Match first negative counterfeiting respectively for multiple first positive example counterfeiting to form the multiple counterfeit Product pair, first negative counterfeiting belong to scoring in the negative example counterfeiting of first user and come preceding M negative examples Counterfeiting, M are the quantity of the first positive example counterfeiting, and the first positive example counterfeiting is from the generation model The positive example counterfeiting of first user sampled in the positive example counterfeiting of generation；

Match first negative authentic item respectively for multiple first positive example authentic items to form the multiple true object Product pair, first negative authentic item belong to the negative example that scoring in the negative example authentic item of first user comes top N Authentic item, N are the quantity of the first positive example authentic item, and the first positive example authentic item is from first user The positive example authentic item sampled in existing positive example authentic item.

In conjunction with any of the above-described possible implementation of second aspect or second aspect, in the third of second aspect In possible implementation, the model that is initially generated includes that positive example generates model, negative example generates model and scoring generates mould Type；The generation model, for generating positive example counterfeiting and negative example counterfeiting for the first user, specifically:

For generating the distribution that model generates the positive example counterfeiting of the first user by positive example, the positive example generates model Are as follows:

For generating the distribution that model generates the negative example counterfeiting of the first user by negative example, the negative example generates model Are as follows:

For generating the scoring of each positive example counterfeiting and the scoring of each negative example counterfeiting by the generator that scores；

In conjunction with any of the above-described possible implementation of second aspect or second aspect, at the 4th kind of second aspect In possible implementation, for updating the generation model according to the loss function of the discrimination model, specifically:

First user is determined to the attention index of article, first user is to adopt to the attention index of article The authentic item scoring of the first user described in attention network training and counterfeiting score to obtain；

Reward value reward is obtained according to the loss function of the discrimination model, and by first user to article Reward value reward described in attention index optimization is to obtain new reward value；

The generation model is updated using the new reward value.

In conjunction with any of the above-described possible implementation of second aspect or second aspect, at the 5th kind of second aspect In possible implementation, the training pattern determine first user to the attention index of article, specifically:

First user is calculated to the attention index of article using attention network according to the following formula；

α=softmax (g (r⁺,r^-,f⁺,f^-|u))

In conjunction with any of the above-described possible implementation of second aspect or second aspect, at the 6th kind of second aspect It is described to pass through first user reward value reward described in the attention index optimization of article in possible implementation To obtain new reward value, specifically:

Optimize the reward value reward by attention index α of first user to article to obtain described first The corresponding reward value reward_1 of user, wherein first user to attention index α, the reward value reward of article and The corresponding reward value reward_1 of first user meets following relationship: reward_1=α * reward；

New reward value is determined according to the corresponding reward value reward_1 of first user.

The third aspect, the embodiment of the present application provide a kind of equipment, which includes processor and memory, wherein storage Device is for sample data needed for storing program instruction and training pattern, and processor is for calling described program instruction to execute Method described in any possible implementation of first aspect or first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer readable storage medium, the computer-readable storage medium It is stored with program instruction in matter, when it runs on a processor, realizes any possible of first aspect or first aspect Method described in implementation.

Detailed description of the invention

The attached drawing that the embodiment of the present application is used is introduced below.

Figure 1A is a kind of application scenarios schematic diagram provided by the embodiments of the present application；

Figure 1B is another application scenarios schematic diagram provided by the embodiments of the present application；

Fig. 1 C is another application scenarios schematic diagram provided by the embodiments of the present application；

Fig. 1 D is a kind of structural schematic diagram of equipment provided by the embodiments of the present application；

Fig. 2 is a kind of processor processing flow schematic diagram provided by the embodiments of the present application；

Fig. 3 is provided by the embodiments of the present application a kind of based on the model training method for generating confrontation network；

Fig. 4 is a kind of training flow diagram of discrimination model provided by the embodiments of the present application；

Fig. 5 is a kind of schematic diagram of a scenario of attention mechanism provided by the embodiments of the present application；

Fig. 6 is a kind of training flow diagram for generating model provided by the embodiments of the present application；

Fig. 7 is the schematic diagram of a scenario that a kind of discrimination model provided by the embodiments of the present application and generation model are integrally trained；

Fig. 8 is a kind of structural schematic diagram of equipment provided by the embodiments of the present application.

Specific embodiment

The embodiment of the present application is described below with reference to the attached drawing in the embodiment of the present application.

The target of recommender system is fancy grade of the Accurate Prediction user for particular commodity, the recommendation effect of recommender system User experience is not only influenced, the income for recommending platform is also directly influenced, therefore accurately recommends to be of great significance.

It is simply introduced below with reference to the recommendation principle and target of 1 recommender system of table.

Table 1

User article	101	102	103	104	105	106
							A	5	3	2.5	?	?	?
B	2	2.5	5	2	?	?
							C	2	?	?	4	4.5	5
D	5	?	3	4.5	?	4
							E	4	3	2	4	3.5	4

The user illustrated in table 1 includes user A, user B, user C, user D and user E, and the article of signal includes article 101, article 102, article 103, article 104, article 105 and article 106, in addition, it is phase that table 1, which further illustrates corresponding user, The scoring for the article answered, some user scoring corresponding to some article is higher, and to represent the user stronger to the hobby of the article. For example, user A is 5 points to the scoring of article 101, show that user A is very high to 101 fancy grade of article.Question mark in table 1 It represents current user and scoring was not carried out still to the article, the target of recommender system is exactly to predict relative users to not evaluating The fancy grade for the commodity crossed.For example, it is desired to predict scoring of the user A to article 104, article 105 and article 106, need pre- Scoring of the user B to article 105 and article 106 is surveyed, the rest may be inferred for remaining.After proposed algorithm by recommender system calculates, Recommender system can scoring with completion user to the article that do not score.As shown in table 2, if recommender system is intended for user A recommendation New article, then article 106 may be a relatively good selection, it is high because recommender system is 5 points to the scoring of article 106 In the scoring for giving other articles, the user A is there is a high likelihood that like article 106.

Table 2

User article	101	102	103	104	105	106
							A	5	3	2.5	2	4	5
B	2	2.5	5	2	2	4
							C	2	4	3	4	4.5	5
D	5	3	3	4.5	3	4
							E	4	3	2	4	3.5	4

It is better that the embodiment of the present application proposed can train effect based on the model training method for generating confrontation network Generate model, therefore when carrying out article recommendation using the generation model to the marking of counterfeiting as foundation, can obtain more Good recommendation effect.

Can be applied in many scenes based on the model training method for generating confrontation network in the embodiment of the present application, example Such as, ad click prediction, the recommendation of interested TopN article and the maximally related answer prediction of problem etc., illustrate below Explanation.

Recommend in scene in advertisement, advertisement recommender system need to return to one or more advertising listings to have sorted and show use Family.The embodiment of the present application can predict advertisement more popular with users, to improve the clicking rate of advertisement.The application can incite somebody to action The advertisement and the advertisement composition authentic item pair that do not clicked that user clicked, wherein the advertisement clicked is equivalent to positive example Authentic item, the advertisement that do not clicked are equivalent to negative example authentic item, can be raw by generating model using IRGAN technology At counterfeiting pair, to be done the best by discrimination model and differentiate which is the article pair generated, which is true article pair, Under the training of IRGAN confrontation type, user can be estimated to the click probability (being equivalent to the scoring to article) of each advertisement.Such as figure Shown in 1A, by being instructed to user for the historical behavior data of advertisement based on the model training method for generating confrontation network Practice, user can be obtained to the click probabilistic forecasting value of each advertisement.

Recommend in scene in topN article, topN article for needing that the user is recommended to be most interested in user, to promote Consumer behavior into user to article, wherein article can be electric business product, application market APP etc..The application can will use Family post-consumer was downloaded and user scores it higher article and customer consumption is crossed and user scores lower object Product form authentic item pair, wherein the higher article that scores is equivalent to positive example authentic item, and the lower article that scores is equivalent to Negative example authentic item can be generated counterfeiting pair by generating model, be sentenced as possible by discrimination model using IRGAN technology Which is not the article pair generated, which is true article pair, under the training of IRGAN confrontation type, can estimate user to every The evaluation comparison of a article is high, this is equivalent to the scoring to article.As shown in Figure 1B, by based on the mould for generating confrontation network Type training method is trained user for the historical behavior data of article, and it is emerging to the sense of each article that user can be obtained The ranking of interesting degree, to export its interested topN article to user.

In question and answer scene, question answering system needs to provide aiming at the problem that proposition of user meets that user is desired to be answered as far as possible Case, to improve user to the friendliness of question answering system.The application can receive user and user scores to it higher Answer and user receive and user score it lower answer composition authentic item pair, wherein scoring is higher to be answered Case is equivalent to positive example authentic item, and lower answer of scoring is equivalent to negative example authentic item, using IRGAN technology, Ke Yitong It crosses generation model and generates counterfeiting pair, done the best by discrimination model and differentiate which is the article pair generated, which is true Article pair can estimate user to the evaluation comparison height of each answer, this is equivalent to article under the training of IRGAN confrontation type Scoring.As shown in Figure 1 C, by being gone through to user for problem and answer based on the model training method for generating confrontation network History behavioral data is trained, and user can be obtained to the ranking of the satisfaction of each answer, to export its phase to user To N relatively satisfactory answer.

It is introduced below with reference to Fig. 1 D to the equipment based on the model training method for generating confrontation network is executed.

D referring to Figure 1, Fig. 1 D are a kind of structural schematic diagrams of equipment provided by the embodiments of the present application, the equipment for pair Article is classified, which can be the cluster that an equipment, such as server or several equipment are constituted, below The structure of the equipment is simply introduced so that the equipment is a server as an example.The equipment 10 include processor 101, Memory 102 and communication interface 103, the processor 101, memory 102 and communication interface 103 are connected with each other by bus, Wherein:

The communication interface 103 is used to obtain the data of existing article, for example, the mark of existing article, scoring, to The information of user, etc. that some articles score.Optionally, communication interface 103 can be established logical between other equipment Letter connection, therefore can receive the data of the existing article of other equipment transmission or read existing object from other equipment The data of product；Optionally, communication interface 103 can connect the readable storage medium storing program for executing of an outside, therefore can from external Read the data that existing article is read on storage medium；The communication interface 103 is also possible to obtain existing article by other means Data.

Memory 102 include but is not limited to be random access memory (random access memory, RAM), it is read-only Memory (read-only memory, ROM), Erasable Programmable Read Only Memory EPROM (erasable programmable Read only memory, EPROM) or portable read-only memory (compact disc read-only memory, CD- ROM), which instructs for storing program therefor, and storage related data, which may include passing through The data that communication interface 103 is got, can also include after being handled these data generate new data, model, And result based on model prediction, etc., the data can also claim sample.

Processor 101 can be one or more central processing units (central processing unit, CPU), locate In the case that reason device 101 is a CPU, which can be monokaryon CPU, be also possible to multi-core CPU.The processor 101 is used for It reads the program that stores in the memory 102 to execute, execute a kind of based on being related in the model training method for generating confrontation network And the relevant operation arrived, for example, the training of discrimination model, the training for generating model, carrying out score in predicting, etc. to article.Please Referring to fig. 2, Fig. 2 illustrates the substantially execution process of processor, including by the information of existing article, score article User information, to information inputs such as the score values of article into initial discrimination model 201, wherein existing article Information may include article mark ID, and the information to the user that article scores may include user identifier ID.Generate mould Type 202 can also generate the article of some forgeries and the relevant information of the counterfeiting is input to the initial discrimination model 201, to be trained to the discrimination model 201, constantly fought between the discrimination model 201 and the generation model 202 It finally obtains one to distinguish authentic specimen and forge the very capable discrimination model 201 of sample, and obtains the puppet of a generation Divine force that created the universe product are capable of the generation model 202 of very close authentic item；Later by the generation model 202 generate counterfeiting with And the scoring of counterfeiting；Then sequence prediction 203 is according to the scoring of the storewide of any one user, to generate the use The sequence of the article at family, to obtain the article recommendation list for any one user, optionally, the object according to sequence Product include authentic item and counterfeiting.In the embodiment of the present application, which includes arbiter and attention net Network, arbiter are responsible for differentiating authentic item and counterfeiting, and attention network is for recording different user to true object The attention weight of product and counterfeiting, to provide reference to the generation for generating model；Generating model 202 includes article Generator and scoring generator, article generator are used to generate for the counterfeiting for generating counterfeiting, scoring generator Scoring, wherein article generator is further divided into negative example generator and positive example generator, and positive example generator is for generating positive example Counterfeiting, negative example generator is for generating negative example counterfeiting.Wherein, dynamic sampling skill is used in article generator Art is sampled.

Optionally, which can also include output precision, for example, display, sound equipment etc., the output precision be used for Developer shows the parameter to be used of training pattern, therefore developer can know these parameters, can also be to these ginsengs Number is modified, and modified parameter is input in the equipment 10 by input module, for example, input module can wrap Include mouse, keyboard etc..In addition, the model that the equipment 10 can also will be trained by output precision, and it is based on model prediction Result show developer.

One of the embodiment of the present application is done more in detail based on the model training method for generating confrontation network below with reference to Fig. 3 It is thin to introduce.

Fig. 3 is referred to, Fig. 3 is a kind of model training method based on generation confrontation network provided by the embodiments of the present application, This method can be realized based on equipment 10 shown in Fig. 1 D, can also be realized based on other frameworks, and this method includes following step It is rapid:

Step S301: equipment is that the first user generates counterfeiting by generating model.

Specifically, the invention relates to arrive authentic item and counterfeiting, wherein counterfeiting includes that positive example is forged Article, negative example counterfeiting, authentic item include positive example authentic item and negative example authentic item, each user in multiple users There are respective positive example counterfeiting, negative example counterfeiting, positive example authentic item and negative these concepts of example authentic item, In, for any one user, the positive example authentic item of the user is that the user had operation behavior and compared the object of concern Product, the negative example authentic item of the user are the article that the user had operation behavior and was not concerned with, the positive example counterfeit of the user Product are that the user did not operated and predicted the article for comparing concern, and the negative example counterfeiting of the user is that the user does not operate It crosses and predicts the article being not concerned with.The first user in the embodiment of the present application is a user in multiple users, in order to just It is illustrated by taking the first user as an example here in understanding, the feature of other users is referred to the description to the first user.

First user includes downloading, evaluating, click, browse etc. to the operation behavior of the article shown in some terminal, this A little behaviors can be recorded by terminal and be scored according to its behavior is operated corresponding article, for example, it may be the score that user beats It is also possible to score that the terminal or above equipment are made according to the behavioral data of user, scores for measuring user to the article Degree of concern, can be there is the scoring of each article of operation behavior to divide the positive example of some user according to some user Authentic item and negative example authentic item, if such as scoring score range is 1-5 point, then in 4-5 scoring can be divided to model The article enclosed is defined as the positive example authentic item of the user, and scoring is defined as the negative of the user in the article of 1-3 points of range Example authentic item.Here article is application program (APP) or advertisement or video or song or question answering system Answer etc..

The generation model is the object paid close attention to by first user that the positive example counterfeiting that the first user generates is prediction Product are the article that do not paid close attention to by first user of prediction for the negative example counterfeiting that the first user generates.For example, generating Model is that the first user generates comedy movie 1, the comedy movie 2, comedy movie 3 that may be paid close attention to by the first user, and is the One user generates horrow movie 1, horrow movie 2 and the horrow movie 3 that may not be paid close attention to by the first user, then comedy movie 1, comedy movie 2, comedy movie 3 just belong to the positive example counterfeiting of the first user, horrow movie 1, horrow movie 2 and terror Film 3 just belongs to the negative example counterfeiting of the first user, which can be also comedy movie 1, comedy movie 2, comedy electricity Shadow 3, horrow movie 1, horrow movie 2 and horrow movie 3 generate scoring, and the scoring of generation belongs to the scoring of prediction, for indicating Fancy grade of first user to these films.The generation model is that other users generate positive example counterfeiting and negative example is forged The principle of article is referred to the description above in relation to the first user.The positive example counterfeiting of different user and negative example counterfeit Product may it is identical may not also be identical, corresponding scoring may also it is identical may also be different.It is situated between below to generation model It continues.

Specifically, the target of generation module be generate counterfeiting to and approaching to reality article pair as much as possible correlation Property distribution, wherein counterfeiting to include a positive example counterfeiting and a negative example counterfeiting, authentic item to include one A positive example authentic item and a negative example authentic item.Here the dependent linearity distribution such as formula (1) of the counterfeiting pair generated It is shown:

G (f | u)=G ((f⁺,f^-) | u)=g⁺(f⁺|u)·g^-(f^-|u,f⁺) (1)

In formula (1), f represents the counterfeiting generated, f⁺It is the positive example counterfeiting generated, f^-It is the negative example generated Counterfeiting.Two submodels of positive example generator and negative example generator, g can be divided by generating model⁺Positive example generator is represented, And g^-Negative example generator is represented, u represents the first user.Positive example generator g⁺For generating the u positive example counterfeit of first user The distribution of product, negative example generator g^-For according to positive example generator g⁺The positive example counterfeiting of generation generates the negative of first user The distribution of example counterfeiting, wherein positive example generator g⁺Shown in the distribution such as formula (2) of the positive example counterfeiting of generation:

In formula (2), e_uIndicate the insertion vector (embedding) of the first user,It is positive example counterfeiting Embedding, e_iIt is the embedding of i-th of positive example counterfeiting, b represents the bias of the first user.The application is implemented Example the adjacent embedding of insertion, deviation bias can in first time initial training allocating default value, training every time Embedding, bias usually will be updated later.

In the embodiment of the present application, it is desirable that generate and exist between the positive example counterfeiting and negative example counterfeiting that model generates Some potential relationships, therefore the generation of negative example counterfeiting is after the generation of positive example counterfeiting.For example, negative example The mode of generator inner product calculates the relationship between positive example counterfeiting and negative example counterfeiting, thus the puppet generated Shown in the distribution such as formula (3) for making negative example article:

In formula (3),It is the embedding of negative example counterfeiting to be generated.Optionally, if a user Like comedy and do not like horror film, then equipment can generally train comedy and this layer " opposition " of horror film closes System, therefore after a comedy is generated for user as positive example counterfeiting by formula 2, it is more likely that one can be generated It is a with the opposed film of comedy sheet type as negative example counterfeiting, i.e., horror film here, and be less likely to generate one A comedy is as negative example counterfeiting." horror film " of here as negative example counterfeiting is that basis is previously generated just What example counterfeiting " comedy " generated, rather than independently generate, negative example counterfeiting is embodied to positive example counterfeiting Dependence.

It is understood that a series of positive example counterfeiting and negative example counterfeiting can be generated through the above way, It is that each positive example counterfeiting generated and negative example counterfeiting generate scoring that the following equipment, which generates model by scoring, Optionally, the principle that scoring generates model generation scoring can be as shown in formula (4):

r_u,t=e_u·e_t+b (4)

In formula (4), r_u,tIndicate scoring of the first user generated to t-th of counterfeiting, e_tIt is t-th of forgery The embedding of article t.

In the embodiment of the present application, a series of positive example counterfeiting and its scoring and a system are generated in the above manner Arrange negative example counterfeiting and its scoring after, from the positive example counterfeiting of generation sampling section positive example counterfeiting, and from The negative example counterfeiting of sampling section in the negative example counterfeiting generated, so that sampling obtained positive example counterfeiting and sampling The negative example counterfeiting arrived constitutes multiple counterfeiting pair, and each counterfeiting is positive to one including the first user Example counterfeiting and the counterfeiting that is negative, the mode for generating multiple counterfeiting pair can be such that

The equipment is that the first positive example counterfeiting matches a negative example counterfeiting to form the counterfeiting Right, one negative example counterfeiting is that scoring comes preceding M negative examples in all negative example counterfeiting of first user Counterfeiting, M are the quantity of all positive example counterfeiting of first user, and the first positive example counterfeiting is to generate Positive example counterfeiting in belong to any one positive example counterfeiting sampled of first user, M is positive integer.It can Choosing, it is highest to acquire a scoring from the negative example counterfeiting of generation for the positive example counterfeiting sampled for one Negative example counterfeiting and the positive example counterfeiting constitute a counterfeiting pair, at this time the negative example counterfeiting sampled It is weeded out from the pond sampled, is then directed to next positive example counterfeiting sampled, forged from the negative example of generation A highest negative example counterfeiting of scoring is acquired in article and the positive example counterfeiting constitutes another counterfeiting pair, according to It can be that each positive example counterfeiting sampled matches a negative example counterfeiting that this, which analogizes, to obtain multiple counterfeits Product pair.A kind of realization code is schematically illustrated below:

Optionally, which is that the first positive example authentic item one negative example authentic item of matching is described true to form one Article pair, one negative example authentic item are that scoring comes top N in all negative example authentic items of first user Negative example authentic item, N are the quantity of all positive example authentic items of first user, and the first positive example authentic item is What any one in existing positive example authentic item was sampled belongs to the positive example authentic item of first user, and N is positive whole Number.Optionally, the positive example authentic item sampled for one acquires a scoring from the negative example authentic item of generation Highest negative example authentic item and the positive example authentic item constitute an authentic item pair, and the negative example sampled is true at this time Product in kind are weeded out from the pond sampled, are then directed to next positive example authentic item sampled, from the negative of generation One highest negative example authentic item of scoring of acquisition and the positive example authentic item constitute another authentic item in example authentic item It is right, and so on can be that each positive example authentic item sampled matches a negative example authentic item, to obtain multiple true Product pair in kind.

Step S302: the equipment using minimize loss function as the target multiple authentic items of training to multiple counterfeits Product are to obtain discrimination model.

Specifically, shown in the discrimination model such as formula (5) that training obtains:

In formula (5), v can be r, or f.When v is f, p (f | u) represent this be distributed as generating model it is raw At counterfeiting pair distribution, e_uIndicate the embedding of the first user,Indicate positive example counterfeiting Embedding,Indicate that the embedding of negative example counterfeiting, b indicate the bias of the first user.When v is r, p (r | u) Represent the distribution of the authentic item pair for being distributed as sampling from true article, e_uIndicate the first user's Embedding,Indicate the embedding of positive example authentic item,Indicate the embedding of negative example authentic item, b table Show the bias of the first user.Discrimination model is responsible for differentiating the distribution of above-mentioned counterfeiting pair and the distribution of above-mentioned authentic item pair Between difference, can be optimized using cross entropy (cross-entropy) loss function (6), enable the discrimination model Ability enough with higher identification authentic item and counterfeiting.

D (r, f | u)=cross_entropy (p (r | u), p (f | u)) (6)

Optionally, during training discrimination model, following process can be executed for each user:

1, from true data cluster sampling authentic item to (r⁺, r^-)；

2, using be currently generated model generate counterfeiting, and from the article of forgery sampling obtain counterfeiting to (f⁺, f^-)；

3, by (r⁺, r^-) and (f⁺, f^-) give discrimination model together and be trained, minimize the loss function of discrimination model；

4, above step is repeated until all users finish marking training of article.

Optionally, will preset frequency of training to reach n times is target, training process such as Fig. 4 institute in this case Show.

Step S303: the equipment updates the generation model according to the loss function of the discrimination model.

In a kind of optional scheme, the equipment updates the generation mould according to the loss function of the discrimination model Type may include: firstly, the equipment obtains reward value reward according to the loss function of the discrimination model, wherein institute State shown in the loss function such as formula (6) of discrimination model, can according in formula (6) parameter D (r, f | u's calculates the prize Value reward is encouraged, for example, reward=log (1-D (r, f | u))；Then, the equipment is updated using the new reward value The generation model is to obtain new generation model, wherein the generation model can use Policy-Gradient (policy Gradient mode) is trained, to obtain updated generation model, the formula of Policy-Gradient is as with formula (7) institute Show:

In formula (7),For expectation function, f~Gu indicate f be generated from generator G (f | u), in addition, I is from 1 to N value, f_iI-th of sample of generator generation is represented, reward is the reward value being previously obtained in formula (7).

In another optional scheme, the equipment updates the generation mould according to the loss function of the discrimination model Type may include: the first step to obtain new generation model, the equipment determine the first user to the attention index of article, First user is using the authentic item scoring of the first user described in attention network training and puppet the attention index of article The divine force that created the universe, which is judged, to be got；Second step, the equipment obtain reward value reward according to the loss function of the discrimination model, and By first user reward value reward described in the attention index optimization of article to obtain new reward value；Third Step, the equipment update the generation model using the new reward value；Below to the above-mentioned first step, second step, third step Expansion description.

Step 1: the equipment determines first user to the attention index of article.

Specifically, the first user is using the true of the first user described in attention network training the attention index of article Product in kind and counterfeiting obtain.The first user pays attention between the article pair of forgery authentic item in many cases, The weight of power is different, we can be considered using attention network remember the first user to authentic item to and counterfeit Weight between product pair.There are many potential factors between article pair, by taking film scores as an example, some users like liking them Joyous film comments higher point, and comments lower point to the film that they do not like, such as positive example film is 5 points, negative example film It is 1 point.Some users like evaluating the Intermediate scores of two films that they like and do not like, such as positive example film is 4 points It is 3 points with negative example film.For some article pair, the gap of the film score between them is different because of different user.For Pair-wise module, these factors should be concerned.We remembered using a kind of attention mechanism these it is potential it is pairs of because Element.In this work, attention is indicated by a series of weight vectors, it represents different articles to the weight of each user The property wanted.For some article pair, the attention weight of different user is usually different.Attention weight is higher, they get over It is important.Attention network can be one or more layers neural network, it and user, and the counterfeiting generated to adopt The authentic item of sample is to related.It can learn the first user to the different weights of two couples of pair by the attention network.Note The network structure for power mechanism of anticipating is as shown in Figure 5.

Specifically, the first user can calculate the attention index α of article by formula (8), specific as follows:

In formula (8), w_uThe attention weight of the first user is represented,The first user is represented to positive example authentic item Attention weight,The first user is represented to the attention weight of negative example authentic item,The first user is represented to just The attention weight of example counterfeiting,The first user is represented to the attention weight of negative example counterfeiting, b the first user of generation Bias (deviation).

Step 2: the equipment, which obtains reward value reward according to the loss function of the discrimination model, (obtains reward Mode before have been described), the equipment is rewarded described in the attention index optimization of article by first user Value reward is to obtain new reward value.

Specifically, the equipment passes through first user reward value reward described in the attention index optimization of article It, can be with to obtain new reward value specifically: the equipment passes through described in attention index α optimization of first user to article Reward value reward is to obtain the corresponding reward value reward_1 of first user, wherein attention of first user to article Power index α, reward value reward and the corresponding reward value reward_1 of first user meet following relationship: reward_1 =α * reward；Wherein, first user is a user in the multiple user, and the multiple user respectively corresponds to Reward value for constituting new reward value, for example, the new reward value can be expressed as reward0=(reward_1₁, reward_1₂, reward_1₃... ..., reward_1_i... ..., reward_1_n-1, reward_1_n0, wherein reward_1_iFor The corresponding reward value of i-th of user in above-mentioned multiple users.

Step 3: the equipment updates the generation model using the new reward value.

Specifically, which can be trained by the way of Policy-Gradient (policy gradient), thus To new generation model, shown in the formula of the Policy-Gradient such as following formula (9):

It is referred to formula (7) in the meaning of formula (9), after the update that the reward0 in formula (9) is as previously obtained Reward value.

The training process of new generation model may include operating as follows:

1, counterfeiting is generated to (f using current generation model⁺, f^-)；

2, true article is sampled in true data set to (r⁺, r^-)；

3, by (r⁺, r^-) and (f⁺, f^-) it is fed for discrimination module, calculate reward value reward；

4, the α of attention network is calculated；

5, it updates reward value and obtains new reward value reward0；

6, the new more newly-generated model of reward value reward0 is utilized；

7, above step is repeated.

Optionally, will preset frequency of training to reach m times is target, training process such as Fig. 6 institute in this case Show.

It is understood that each article is different the importance of (pair), by introducing attention network, obtain The weights of importance of each pair can effectively select good pair, reduce the negative effect of poor quality pair, let us Obtained generation model, discrimination model has more robustness and adaptivity.

It in the embodiment of the present application, is the crucial part of comparison to the training of discrimination model and to the training for generating model, Also the training process of the training process of discrimination model and generation model is introduced respectively above, below by two process knots It is introduced altogether, better understands the embodiment of the present application to facilitate, Fig. 7 is corresponding flow diagram.

Preparation stage:

1, model and discrimination model are generated with random parameter θ and φ initialization；

2, it determines and pre-training is carried out using the data set S being made of article；

Training stage:

1、Repeat

// training discrimination module

For d_epoch do

2, fixed generation model parameter is constant；

3, authentic item is sampled from the data set S that existing authentic item is constituted to (r⁺, r^-)；

4, model is generated to generate counterfeiting and acquire counterfeiting from counterfeiting to (f⁺, f^-)；

5, with (r⁺, r^-) and (f⁺, f^-) training discrimination model；

6、End for

// training generates model；

For g_epoch do

7, fixed discrimination model parameter constant；

8, model is generated to generate counterfeiting and acquire counterfeiting from counterfeiting to (f⁺, f^-)；

9, reward value reward is calculated by discrimination module according to Policy-Gradient algorithm；

10, reward is updated according to attention network, and uses the updated more newly-generated model of reward value reward0；

11, Until judgment models and generation model convergence.

In the embodiment of the present application, updated generation model is in particular in that update is public for generating model Embedding, bias in formula (2), formula (3) and formula (4).

Step S304: the equipment generates the scoring of counterfeiting by updated generation model.

Specifically, the counterfeiting include it is described for each user in multiple users generate respectively positive example counterfeiting and Negative example counterfeiting；In other words, it was needed after training new generation model through generation model previous existence for it again At each positive example counterfeiting and the marking of each negative example counterfeiting, it is new to generate marking that model generates with more reference Value.

Step S305: the equipment is according to the scoring of counterfeiting and the scoring of existing authentic item, to described true Article and counterfeiting sequence, and article is recommended to the first user according to the sequence in sequence.

Specifically, which can generate the authentic item of the first user for the first user and counterfeiting is ranked up, Wherein, sequence can also be arranged according to the rule compositor of score from high to low according to other rules of predetermined definition Sequence；Article is recommended to user according to the sequence in sequence later.The equipment can also be authentic item and the forgery of other users Article is ranked up, if for example, the counterfeiting of user 1 includes positive example counterfeiting 1 and corresponding scoring is that 4.7, positive example is pseudo- Divine force that created the universe product 2 and corresponding scoring are 4, negative example counterfeiting 1 and corresponding scoring is 0.5, negative example counterfeiting 2 and corresponding scoring are 1.1, negative example counterfeiting 3 and corresponding scoring is 1, the authentic item of user 1 includes positive example authentic item 1 and corresponding scoring is 4.9, positive example authentic item 2 and corresponding scoring are 4.5, negative example authentic item 1 and corresponding scoring are 3.5, negative example authentic item 2 And corresponding scoring is 3.3, negative example authentic item 3 and corresponding scoring are 3.4；So, it sorts in the way of from high to low by score If, obtained sequence sequencing is successively are as follows: positive example authentic item 11, positive example counterfeiting 01, positive example authentic item 12, Positive example counterfeiting 02, negative example authentic item 11, negative example authentic item 03, negative example authentic item 12, negative example counterfeiting 02, Negative example counterfeiting 13, negative example counterfeiting 01.Later, these authentic items and counterfeiting are recommended according to this sequence To user 1.

The principle of the embodiment of the present application is described in detail above, is said below with reference to a specific example It is bright.

Step 1: data input

The article that the embodiment of the present application inputs the identity ID of all users into data set and each user gave a mark Mark ID.By taking article is recommended as an example, the present embodiment one shares 10 articles, and the information of input is as shown in table 3:

Table 3

Entry serial number	User ID	Article ID
			1	U1	I1
2	U1	I3
			3	U1	I5
4	U1	I8
			5	U2	I2
6	U2	I3
			7	U2	I4

In table 3, the user's evaluation that first representative capacity of entry serial number 1 is identified as U1 crosses article I1, entry sequence It number is identified as the user's evaluation of U1 for 2 Article 2 representative capacity and crosses article I3, the rest may be inferred for remaining.

Step 2: initialization generate model parameter and discrimination model parameter, including user embedding (indicate to Amount) and article embedding size, the size of training batch, and the rate of training, wherein batch is for characterizing sample The quantity for the sample that this when once takes.

Step 3: keep generation model parameter constant, training discrimination model.Each user is needed from true when training Article pair is sampled in real article, the quantity of article pair is identical as the quantity of positive example authentic item, wherein positive example authentic item Refer to user the scored and higher article of scoring, such as 4 points or more of article.In the present embodiment, user U1 is come It saying, the article I1, I3, I5, I8 evaluated is exactly positive example authentic item, article I2, I4, the I6 that user U1 was not evaluated, I7, I9, I10 are exactly negative example authentic item.User U1 has 4 articles evaluated, so the authentic item of sampling is to being four It is right, specific as follows:

(I1, I2), (I3, I4), (I5, I9), (I8, I6)；

Wherein negative example authentic item I2, I4, I9, I6 are extracted from the article that the user U1 was not evaluated, can be with It randomly selects, can also be extracted according to other prespecified strategies.In training, it is also necessary to generate model and generate forgery Article pair.Positive example generator in generation module is responsible for generating positive example counterfeiting, and negative example generator is responsible for generating negative example puppet Divine force that created the universe product.

For example, for user U1, article that model generates is generated to may is that

(I1, I2), (I2, I6), (I5, I7), (I8, I9)；

In training discrimination model, need authentic item to (I1, I2), (I3, I4), (I5, I9) and is generated (I8, I6) Counterfeiting to (I1, I2), (I2, I6), (I5, I7), (I8, I9) gives discrimination model together, and discrimination model can be by most Smallization loss function come distinguish as far as possible authentic item to and counterfeiting pair, achieve the purpose that promoted discriminating power.It repeats Training discrimination model, until the article of each user was to being trained up.

Step 4: keeping discrimination model parameter constant, training generates model.It is similar with the training discrimination model stage, for Each user needs to acquire authentic item pair from existing authentic item, and generates counterfeiting by generating model It is right, still by taking user U1 as an example:

For the user U1 authentic item to can be such that

(I1, I2), (I3, I4), (I5, I9), (I8, I6)；

For the user U1 counterfeiting to can be such that

(I1, I2), (I2, I6), (I5, I7), (I8, I9).

With when training discrimination model the difference is that, discrimination model can be according to two groups of articles of input to calculating Reward value.Generation module can update ginseng according to the new reward value reward0 value updated on the basis of the reward Number, repetition training generation module, until the article of each user was to being trained up.

Step 5: repeating 3-4 step until judgment models and generating model training to best.

Step 6: equipment scores according to the article that the generation model that final training obtains is the forgery generated.

Step 7: inputting the User ID wanted with assessment point into equipment, for example, user U1, which can be directed to the use Family U1 is ranked up all items according to scoring, and the high then fancy grade of scoring is high, which includes existing true object The counterfeiting of product and generation, table 4 have carried out exemplary signal to the ranking results:

Table 4

User ID	Article ID	Scoring
			U1	I3	2.54
U1	I5	2.35
			U1	I7	1.93
U1	I1	1.54
			U1	I8	1.32
U1	I2	1.14
			U1	I4	0.97
U1	I10	0.78
			U1	I9	0.76
U1	I6	0.54

According to recommendation list shown in table 4, it can know that the possible favorite article of user U1 is article I7.

By executing the above method, the negative example counterfeiting of counterfeiting centering is to rely on positive example counterfeiting and generate , substantially envisage the potential relationship between negative example counterfeiting and positive example counterfeiting so that counterfeiting to comprising Information content is richer, improves training effect, enhances the generative capacity for generating model, therefore the generation model is generated Article and existing authentic item are ranked up generated recommendation results and have more reference value for a user.Further Ground, score high article of acquisition form article pair, including authentic item to and counterfeiting pair, due to scoring high article more By the concern of user, thus its article that this mode obtains for a user to comprising information content it is bigger and noise is smaller, According to such article to the feature that can fully analyze by user's concern is trained, to train generative capacity more Strong generation model.

A kind of equipment is described from the angle of hardware device above, is also had in practical applications completely through functional module pair What terminal structure was described, in order to which those skilled in the art can better understand the thought of the application, as shown in figure 8, The embodiment of the present application also provides a kind of based on the model training equipment 80 for generating confrontation network, which includes generating model 801, training pattern 802 and discrimination model, wherein each model is described below:

It generates model 801 to be used to generate positive example counterfeiting and negative example counterfeiting for the first user, wherein the negative example Counterfeiting is to be generated according to the positive example counterfeiting, and the positive example counterfeiting of first user is prediction by institute The article of the first user concern is stated, the negative example counterfeiting of first user is not closed by wantonly first user for prediction The article of note；

Training pattern 802 for train multiple authentic items to multiple counterfeiting to obtain discrimination model 803, institute Discrimination model is stated for differentiating the difference between the multiple authentic item pair and the multiple counterfeiting pair；It is each true Article is to including a positive example authentic item and a negative example authentic item, and each counterfeiting is to including a positive example Counterfeiting and a negative example counterfeiting；The positive example authentic item is the operation behavior according to first user The article paid close attention to by first user assert, the negative example authentic item are the operation behavior according to first user The article that do not paid close attention to by first user assert；

The training pattern 802 is used to update the generation model according to the loss function of the discrimination model.

In a kind of optional scheme, which further includes recommended models, in which:

It is that the first user generates positive example counterfeiting and negative example is pseudo- in the generation model in another optional scheme After divine force that created the universe product, the multiple authentic items of training pattern training to multiple counterfeiting to before to obtain discrimination model, The training pattern is also used to:

In another optional scheme, it is described be initially generated model include positive example generate model, negative example generate model and Scoring generates model；The generation model, for generating positive example counterfeiting and negative example counterfeiting for the first user, specifically Are as follows:

In another optional scheme, for updating the generation model according to the loss function of the discrimination model, Specifically:

The generation model is updated using the new reward value.

In another optional scheme, the training pattern determine first user to the attention index of article, Specifically:

α=softmax (g (r⁺,r^-,f⁺,f^-|u))

It is described to be encouraged described in the attention index optimization of article by first user in another optional scheme Value reward is encouraged to obtain new reward value, specifically:

It should be noted that the realization of each unit can also correspond to reference to the foregoing embodiments described in based on generate pair The model training method of anti-network, such as step S301-S305.

The embodiment of the present application also provides a kind of computer readable storage medium, stores in the computer readable storage medium There is instruction, when it runs on a processor, realizes described in previous embodiment based on the model training for generating confrontation network Method, such as step S301-S305.

The embodiment of the present application also provides a kind of computer program product, when the computer program product is transported on a processor When row, realize described in previous embodiment based on the model training method for generating confrontation network, such as step S301-S305.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, the process Relevant hardware can be instructed to complete by computer program, which can be stored in computer-readable storage medium, should Program is when being executed, it may include such as the process of above-mentioned each method embodiment.And storage medium above-mentioned includes: ROM or deposits at random Store up the medium of the various program storage codes such as memory body RAM, magnetic or disk.

Claims

1. a kind of based on the model training method for generating confrontation network characterized by comprising

Equipment is that the first user generates positive example counterfeiting and negative example counterfeiting by generating model, wherein the negative example is forged Article is to be generated according to the positive example counterfeiting, and the positive example counterfeiting of first user is prediction by described first The article of user's concern, the negative example counterfeiting of first user are the article that do not paid close attention to by first user of prediction；

The multiple authentic items of the equipment training are to, to obtain discrimination model, the discrimination model is used for multiple counterfeiting Differentiate the difference between the multiple authentic item pair and the multiple counterfeiting pair；Each authentic item to include one just Example authentic item and a negative example authentic item, each counterfeiting is to including described in a positive example counterfeiting and one Negative example counterfeiting；The positive example authentic item is according to the operation behavior identification of first user by first user The article of concern, the negative example authentic item are according to the operation behavior identification of first user not by first user The article of concern；

The equipment updates the generation model according to the loss function of the discrimination model.

2. the method according to claim 1, wherein the equipment according to the loss function of the discrimination model more After the new generation model, further includes:

The equipment generates the scoring of counterfeiting by updated generation model, and it is first that the counterfeiting, which includes described, The positive example counterfeiting and negative example counterfeiting that user generates；

The equipment is according to the scoring of counterfeiting and the scoring of existing authentic item, to the authentic item and the forgery Article sequence, and article is recommended to first user according to the sequence in sequence.

3. method according to claim 1 or 2, which is characterized in that the equipment is raw for the first user by generating model After positive example counterfeiting and negative example counterfeiting, the multiple authentic items of equipment training to multiple counterfeiting to Before obtaining discrimination model, further includes:

The equipment is that multiple first positive example counterfeiting match first negative counterfeiting respectively to form the multiple puppet Divine force that created the universe product pair, first negative counterfeiting belong to scoring in the negative example counterfeiting of first user and come first M Negative example counterfeiting, M are the quantity of the first positive example counterfeiting, and the first positive example counterfeiting is from the generation mould The positive example counterfeiting of first user sampled in the positive example counterfeiting that type generates；

It is the multiple true to form that the equipment is that multiple first positive example authentic items respectively match first negative authentic item Product pair in kind, first negative authentic item belong to scoring in the negative example authentic item of first user and come top N Negative example authentic item, N are the quantity of the first positive example authentic item, and the first positive example authentic item is to use from described first The positive example authentic item sampled in the existing positive example authentic item in family.

4. method according to claim 1-3, which is characterized in that the model that is initially generated includes that positive example generates Model, negative example generate model and scoring generates model；The equipment is that the first user generates positive example counterfeit by generating model Product and negative example counterfeiting, comprising:

The equipment generates the scoring of each positive example counterfeiting and the scoring of each negative example counterfeiting by the generator that scores；

Wherein, g⁺(f⁺| it is u) distribution of the positive example counterfeiting, e_uFor the insertion vector embedding of the first user, It is the embedding, e of positive example counterfeiting to be generated_iIt is the embedding of i-th of positive example counterfeiting, described in b is represented The deviation bias of first user；g^-(f^-| u, f⁺) be the negative example counterfeiting distribution,It is that negative example to be generated is forged The embedding of article.

5. method according to claim 1-4, which is characterized in that the equipment is according to the damage of the discrimination model It loses function and updates the generation model, comprising:

The equipment determines first user to the attention index of article, attention index of first user to article To score to obtain using the scoring of the authentic item of the first user described in attention network training and counterfeiting；

The equipment obtains reward value reward according to the loss function of the discrimination model, and by first user to object Reward value reward described in the attention index optimization of product is to obtain new reward value；

The equipment updates the generation model using the new reward value.

6. according to the method described in claim 5, it is characterized in that, the equipment determines attention of first user to article Power index, comprising:

α=soft max (g (r⁺, r^-, f⁺, f^-|u))

Wherein, α is the attention index to article of the first user u, w_uIndicate the power of first user trained Weight,Indicate the weight of the positive example authentic item of the first user trained,Indicate the negative example of the first user trained The weight of authentic item,Indicate the weight of the positive example counterfeiting of first user trained,Expression trains First user negative example counterfeiting weight；B is the deviation bias of first user.

7. method according to claim 5 or 6, which is characterized in that the attention by first user to article Reward value reward described in power index optimization is to obtain new reward value, comprising:

Optimize the reward value reward by attention index α of first user to article to obtain first user Corresponding reward value reward_1, wherein first user is to attention index α, the reward value reward of article and described The corresponding reward value reward_1 of first user meets following relationship: reward_1=α * reward；

8. a kind of based on the model training equipment for generating confrontation network characterized by comprising

Model is generated, for generating positive example counterfeiting and negative example counterfeiting for the first user, wherein the negative example counterfeit Product are to be generated according to the positive example counterfeiting, and the positive example counterfeiting of first user is being used by described first for prediction The article of family concern, the negative example counterfeiting of first user are the article that do not paid close attention to by wantonly first user of prediction；

Training pattern, for train multiple authentic items to multiple counterfeiting to obtain discrimination model, the differentiation mould Type is used to differentiate the difference between the multiple authentic item pair and the multiple counterfeiting pair；Each authentic item is to including One positive example authentic item and a negative example authentic item, each counterfeiting is to including a positive example counterfeiting and one A negative example counterfeiting；The positive example authentic item is according to the operation behavior identification of first user by described the The article of one user concern, the negative example authentic item are according to the operation behavior identification of first user not by described the The article of one user concern；

9. equipment according to claim 8, which is characterized in that further include recommended models, in which:

After the training pattern updates the generation model according to the loss function of the discrimination model, updated generation Model is used to generate the scoring of counterfeiting, the counterfeiting include the positive example counterfeiting generated for the first user and Negative example counterfeiting；

The recommended models, for the scoring and the scoring of existing authentic item according to counterfeiting, to the authentic item It sorts with the counterfeiting, and article is recommended to first user according to the sequence in sequence.

10. equipment according to claim 8 or claim 9, which is characterized in that generate positive example in the generation model for the first user After counterfeiting and negative example counterfeiting, the multiple authentic items of training pattern training to multiple counterfeiting to To before discrimination model, the training pattern is also used to:

Match first negative counterfeiting respectively for multiple first positive example counterfeiting to form the multiple counterfeiting pair, First negative counterfeiting belongs to scoring in the negative example counterfeiting of first user and comes preceding M negative example counterfeits Product, M are the quantity of the first positive example counterfeiting, and the first positive example counterfeiting is to generate just from the generation model The positive example counterfeiting of first user sampled in example counterfeiting；

Match first negative authentic item respectively for multiple first positive example authentic items to form the multiple authentic item pair, First negative authentic item belongs to the true object of negative example that scoring in the negative example authentic item of first user comes top N Product, N are the quantity of the first positive example authentic item, and the first positive example authentic item is existing just from first user The positive example authentic item sampled in example authentic item.

11. according to the described in any item equipment of claim 8-10, which is characterized in that the model that is initially generated includes that positive example is raw Model is generated at model, negative example and scoring generates model；The generation model, for generating positive example counterfeiting for the first user With negative example counterfeiting, specifically:

12. according to the described in any item equipment of claim 8-11, which is characterized in that the training pattern, for according to The loss function of discrimination model updates the generation model, specifically:

First user is determined to the attention index of article, first user is using note to the attention index of article The authentic item scoring of first user described in meaning power network training and counterfeiting score to obtain；

Reward value reward, and the attention by first user to article are obtained according to the loss function of the discrimination model Reward value reward described in power index optimization is to obtain new reward value；

The generation model is updated using the new reward value.

13. equipment according to claim 12, which is characterized in that the training pattern determines first user to article Attention index, specifically:

α=soft max (g (r⁺, r^-, f⁺, f^-|u))

14. equipment according to claim 12 or 13, which is characterized in that it is described by first user to the note of article Anticipate power index optimization described in reward value reward to obtain new reward value, specifically:

15. a kind of computer readable storage medium, which is characterized in that be stored with program in the computer readable storage medium and refer to It enables, when it runs on a processor, realizes any method of claim 1-8.