CN110992127A

CN110992127A - Article recommendation method and device

Info

Publication number: CN110992127A
Application number: CN201911114678.1A
Authority: CN
Inventors: 张亮; 张波; 王一凯
Original assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2020-04-10
Anticipated expiration: 2039-11-14
Also published as: CN110992127B

Abstract

The invention discloses an article recommendation method and device, and relates to the technical field of computers. One embodiment of the method comprises: selecting positive and negative training samples from the sample set; circularly executing the following steps, training a discriminator for item recommendation until reaching a training stop criterion: calculating the distance between the positive training sample and the negative training sample and the discrimination parameter of the negative training sample according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample, wherein the discrimination parameter indicates the probability of the negative training sample as the positive training sample; reselecting a positive training sample and a related negative training sample from the training sample set according to the distance and the discrimination parameters; training a discriminator according to the reselected positive and negative training samples; and determining a to-be-recommended positive sample from the to-be-recommended sample set by using the discriminator, and recommending the article corresponding to the to-be-recommended positive sample to the user. The embodiment improves the prediction accuracy of the discriminator, thereby improving the accuracy of item recommendation by the discriminator.

Description

Article recommendation method and device

Technical Field

The invention relates to the technical field of computers, in particular to an article recommendation method and device.

Background

The recommendation system is an intelligent application of electronic commerce, and can help users to obtain products based on user requirements and preferences. Specifically, the recommendation system may recommend products that the user prefers for the user based on the trained user behavior model. Wherein the user behavior model can be generated based on multiple types of model training.

The arbiter is a kind of user behavior model, and when training the arbiter, a plurality of pairs of positive and negative training samples are randomly selected from a sample set including the positive and negative training samples, and the arbiter is trained according to the randomly selected pairs of positive and negative training samples. Wherein positive training samples in the set of training samples correspond to exposed and clicked items and negative training samples correspond to exposed and un-clicked items.

In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:

the sample set is generally generated based on historical feedback data of a user, the historical feedback data of the user often has great sparsity, the number of negative training samples in the sample set is far greater than that of positive training samples, and the randomly selected positive and negative training sample pairs cannot reflect the correlation of the negative training samples relative to the positive training samples, so that the training effect of the discriminator is influenced, the prediction accuracy of the discriminator is low, and the accuracy of recommended articles is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide an article recommendation method and apparatus, which can select a negative training sample related to a positive training sample according to a distance between the positive training sample and the negative training sample, and train a discriminator by using the selected positive training sample and the selected negative training sample, so as to improve prediction accuracy of the discriminator, thereby improving accuracy of article recommendation.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an item recommendation method.

The article recommendation method of the embodiment of the invention comprises the following steps: acquiring a sample set, wherein training samples in the sample set indicate attribute characteristics of an article and behavior characteristics of a user for the article;

selecting a positive training sample and a negative training sample from the sample set, wherein the article corresponding to the positive training sample is exposed and clicked, and the article corresponding to the negative training sample is exposed and not clicked;

circularly executing the following steps, training a discriminator for item recommendation until reaching a training stop criterion:

calculating the distance between the positive training sample and the negative training sample according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample;

calculating a discrimination parameter of the selected negative training sample, the discrimination parameter indicating a probability that the negative training sample is a positive training sample;

reselecting a positive training sample and a negative training sample related to the reselected positive training sample from the training sample set according to the distance and the discrimination parameter;

training the discriminator according to the reselected positive training sample and the negative training sample;

and determining a to-be-recommended positive sample from the to-be-recommended sample set by using the discriminator, and recommending the article corresponding to the to-be-recommended positive sample to the user.

Alternatively,

the reselecting a positive training sample and a negative training sample related to the reselected positive training sample from the training sample set according to the distance and the discrimination parameter includes:

taking the distance, the discrimination parameter and a negative training sample corresponding to the discrimination parameter as the input of a generator for selecting a sample so as to optimize the generator;

selecting, with the optimized generator, the positive training samples and negative training samples associated with the reselected positive training samples.

Alternatively,

and calculating the optimized gradient of the generator according to the distance and the discrimination parameters, and optimizing the generator by utilizing a random gradient method based on the optimized gradient.

Alternatively,

the reselecting positive training samples from the set of training samples and the negative training samples associated with the reselected positive training samples comprises:

randomly selecting positive training samples from the sample set by using the generator, and selecting negative training samples with the correlation degree larger than a first threshold value with the positive training samples according to the attribute characteristics of the articles indicated by the positive training samples and the behavior characteristics of the user aiming at the articles.

Alternatively,

the training stopping criterion is: and the difference between the discrimination parameter of the negative training sample calculated by the discriminator and the discrimination parameter of the positive training sample is smaller than a second threshold value, or the number of times of circularly training the discriminator is larger than a third threshold value.

Alternatively,

and the difference between the discrimination parameter of the reselected negative training sample and the distance between the reselected positive training sample and the negative training sample is larger than a fourth threshold value.

Alternatively,

the calculating the distance between the positive training sample and the negative training sample according to the attribute feature and the behavior feature respectively indicated by the positive training sample and the negative training sample includes:

pre-training the discriminator according to the attribute characteristics and the behavior characteristics, and calculating the distance between the positive training sample and the negative training sample according to the pre-trained discriminator.

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an article recommendation device.

An article recommendation device of an embodiment of the present invention includes: the system comprises a sample acquisition module, a sample selection module, a training module and a recommendation module; wherein,

the sample acquisition module is used for acquiring a sample set, wherein training samples in the sample set indicate attribute characteristics of an article and behavior characteristics of a user aiming at the article;

the sample selection module is used for selecting a positive training sample and a negative training sample from the sample set, wherein the article corresponding to the positive training sample is exposed and clicked, and the article corresponding to the negative training sample is exposed and not clicked;

the training module is used for circularly executing the following steps, training the discriminator for recommending the articles until the training stopping criterion is reached: calculating the distance between the positive training sample and the negative training sample according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample; calculating a discrimination parameter of the selected negative training sample, the discrimination parameter indicating a probability that the negative training sample is a positive training sample; reselecting a positive training sample and a negative training sample related to the reselected positive training sample from the training set according to the distance and the discrimination parameter, and training the discriminator according to the reselected positive training sample and the negative training sample;

the recommending module is used for determining a to-be-recommended positive sample from the to-be-recommended sample set by using the discriminator and recommending the article corresponding to the to-be-recommended positive sample to the user.

Optionally, the training module is configured to use the distance, the discrimination parameter, and a negative training sample corresponding to the discrimination parameter as inputs of a generator for selecting a sample, so as to optimize the generator, and select, by using the optimized generator, the positive training sample and a negative training sample related to the reselected positive training sample.

Alternatively,

and the training module randomly selects a positive training sample from the sample set by using the generator, and selects a negative training sample with the correlation degree with the positive training sample larger than a first threshold value according to the attribute characteristics of the article indicated by the positive training sample and the behavior characteristics of the pair of articles of the user.

Alternatively,

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an electronic device for item recommendation.

An electronic device for recommending an article according to an embodiment of the present invention includes: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for item recommendation in accordance with an embodiment of the present invention.

To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium.

A computer-readable storage medium of an embodiment of the present invention has stored thereon a computer program that, when executed by a processor, implements a method of item recommendation of an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of calculating the distance between a positive training sample and a negative training sample according to attribute characteristics and behavior characteristics respectively indicated by the positive training sample and the negative training sample, reselecting the positive training sample and the negative training sample related to the positive training sample according to the distance and discrimination parameters of the negative training sample, and training a discriminator by using the reselected positive training sample and the negative training sample to improve the training effect of the discriminator, so that the prediction accuracy of the discriminator is improved, and the accuracy of recommended articles when the discriminator is used for recommending articles is improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main steps of an item recommendation method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of the main modules of an item recommendation device according to an embodiment of the present invention;

FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 4 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments of the present invention and the technical features of the embodiments may be combined with each other without conflict.

Fig. 1 is a schematic diagram of the main steps of an item recommendation method according to an embodiment of the present invention.

As shown in fig. 1, an article recommendation method according to an embodiment of the present invention mainly includes the following steps:

step S101: a sample set is obtained, wherein training samples in the sample set indicate attribute characteristics of an item and behavior characteristics of a user for the item.

The sample set may be generated based on historical interaction data of the recommendation system with the user, for example, an item exposed to and clicked on by the recommendation system as a positive training sample, and an item exposed to but not clicked on by the recommendation system as a negative training sample. A sample set may be defined as Γ ═ Γ⁺，Γ^-Wherein Γ is⁺Representing a set of positive training samples, Γ^-Represents a negative training sample set, for any training sample s e ΓThe training sample can comprise two parts, one part is the attribute characteristic e of the article^c(s) another part is the user's behavioral characteristics for the item

Such as a user's click record for an item. Wherein the attribute characteristic e of the article^c(s) may be derived from information characterizing the attributes of the item, such as the description of the item by the e-commerce platform, or from an embedded imbedding representation of the user's historical clicks on the item. In addition, the user's behavioral characteristics for the item

May come from the user's historical clicks on the item.

It is worth mentioning that the attribute characteristic e of the article^c(s) when an embedded imbedding representation of a historical click record from a user for the item, the historical click record and the user's behavioral characteristics for the item

The corresponding historical click history is preferably not repeated, e.g., the historical click history is divided in time order such that the attribute e of the item^c(s) historical click records corresponding to the item as compared to the user's behavioral characteristics for the item

The corresponding history click record is earlier history record, that is, the attribute characteristic e of the article^c(s) the corresponding historical click history is generated in the user's behavior characteristics for the item

Before the corresponding historical click record, the overfitting phenomenon in the training process is avoided as much as possible.

To better extract behavioral characteristics from historical click records, in one embodiment of the invention, the historical click records are modeled as GRUsInputting a Long Short-Term Memory network (LSTM), and then taking a GRU model or the last hidden layer of the LSTM as a behavior characteristic of a user for an article

Is shown. Of course, other models may be used to extract behavioral characteristics

For example, using an attention mechanism. Thus, each training sample in the sample set is represented by the attribute features e of the article^c(s) and user click behavior for items

Spliced, each training sample can be defined as

Step S102: and selecting a positive training sample and a negative training sample from the sample set, wherein the article corresponding to the positive training sample is exposed and clicked, and the article corresponding to the negative training sample is exposed and not clicked.

It will be appreciated that at the start of training, positive and negative training samples may be randomly selected from the sample set.

Step S103: and calculating the distance between the positive training sample and the negative training sample according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample.

For the selected positive training sample and the negative training sample, the distance between the positive training sample and the negative training sample can be calculated according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample. For example, positive training samples s ∈ Γ⁺And a negative training sample s' is epsilon gamma^-The euclidean distance between the positive training samples s and the negative training samples s' can be calculated using the following calculation formula (1):

wherein p (s, s') characterizes the Euclidean distance, λ, between the positive training sample s and the negative training sample s₁And λ₂For a predetermined regulating parameter, λ₁And λ₂Can be (0, 1), e^c(s) characterizing the attribute features of the training sample s, e^c(s ') characterizing the attributes of the negative training samples s',

the behavioral characteristics of the training sample s are characterized,

and characterizing the behavior characteristics of the negative training sample s'.

Step S104: calculating a discrimination parameter for the selected negative training sample, the discrimination parameter indicating a probability that the negative training sample is a positive training sample.

Training samples to be selected

And inputting a discriminator so that the discriminator respectively calculates discrimination parameters of the positive training sample and the negative training sample, wherein the discriminator can adopt a simple multilayer fully-connected network, such as a GRU model. The discriminant parameter calculated by the discriminant may indicate a probability that each training sample is a positive training sample, for example, the discriminant calculates discriminant parameters of 1 positive training sample and 9 negative training samples, where the discriminant parameters corresponding to 10 training samples are: 1. 0.1, 0.2, 0.3, 0.2, 0.1, 0.4, 0.3, 0.8, 0.1, the probability that the corresponding training sample is taken as the positive training sample can be determined according to the relative size of the discrimination parameter, for example, the training sample with the discrimination parameter > 0.9 is taken as the positive sample, the training sample with the discrimination parameter less than or equal to 0.9 is taken as the negative training sample, and the training sample corresponding to 1 can be determined as the positive training sample according to the discrimination parameter, and the rest 9 training samples are the negative training samples. For negative training samples, their corresponding discriminant parametersThe number also indicates the probability of the positive training sample, for example, the probability of the negative training sample corresponding to the discrimination parameter 0.8 being the positive training sample is greater than the probability of the negative training sample corresponding to the discrimination parameter 0.1.

It can be understood that, at the initial stage of training of the discriminator, the error of the discrimination parameter calculated by the discriminator is large, and at this time, the discriminator may be pre-trained according to the attribute features and behavior features respectively indicated by the positive training samples and the negative training samples, for example, the discriminator may be pre-trained by combining the attribute features and behavior features indicated by the training samples, the discrimination parameter calculated by the discriminator, and the labels respectively corresponding to the positive training samples and the negative training samples in the sample set (the labels represent that the training samples are the positive training samples or the negative training samples), and then the distance between the positive training samples and the negative training samples and the discrimination parameter of the negative training samples are calculated by using the discriminator after pre-training, so that the distance calculated by the discriminator and the discrimination parameter are as accurate as possible.

Step S105: reselecting a positive training sample and a negative training sample related to the reselected positive training sample from the training sample set according to the distance and the discrimination parameter; training the discriminator according to the reselected positive training samples and the negative training samples.

In one embodiment of the present invention, the distance, the discrimination parameter, and a negative training sample corresponding to the discrimination parameter may be used as an input of a generator for selecting a sample to optimize the generator; selecting, with the optimized generator, the positive training samples and negative training samples associated with the reselected positive training samples.

The generator is a countermeasure network, and the main principle of the countermeasure network is to sample negative training samples which are antagonistic to positive training samples from the negative training samples according to the distribution of the positive training samples, so that the training process of the generator is actually a process of accurately sampling by using a limited countermeasure network. The generator can also adopt a GRU model and other multilayer full-connection networks, the aim of optimizing the generator is to maximize the discrimination parameter of the negative training sample, and meanwhile, the distance punishment between the positive training sample and the negative training sample is considered, and based on the result, the generator can be optimized based on the following loss function (2):

wherein f is_D(e_D(s ')) the discriminant parameter characterizing the negative training samples, p (s, s') the Euclidean distance between the positive training samples and the negative training samples, p_G(s' | s) characterize the set of selected negative training samples.

Specifically, when the generator is optimized, the generator may be optimized by calculating an optimized gradient of the generator according to the distance and the discrimination parameter, and optimizing the generator by using a random gradient method based on the optimized gradient.

Based on the above loss function (2), according to the REINFORCE algorithm, a loss gradient of the generator can be obtained, which is shown in the following calculation formula (3), based on which the generator is optimized by using a random gradient method such that a difference between a discrimination parameter of a negative training sample reselected by using the optimized generator and a distance between the reselected positive training sample and the negative training sample is greater than a fourth threshold, that is, for the reselected negative training sample and the positive training sample, L calculated by using the loss function (2) is larger than a fourth threshold_GAs large as possible. The fourth threshold may be dynamically adjusted along with the training process, or may be a fixed value.

In addition, when the optimized generator is used to reselect the positive training samples and the negative training samples from the training set, the generator may be used to randomly select the positive training samples from the sample set, and then the negative training samples with the correlation degree greater than the first threshold value with the positive training samples are selected according to the attribute characteristics of the articles indicated by the positive training samples and the behavior characteristics of the users for the articles.

In order to calculate the correlation between the positive training samples and the negative training samples, the attribute features and the behavior features of the samples can be extracted by using a GRU model, the structure of the GRU model is the same as that of the generator, but the GRU model is different from the parameters of the generator, and the extracted features are expressed as e_G(s). It can be understood that the feature e extracted by this GRU model_G(s) may be reacted with

Same, that is, can be directly based on e^c(s) and

e formed after splicing_D(s) calculating a correlation between the positive training samples and the negative training samples. In addition, when the kind of model or the model parameter employed twice is different, e_G(s) may be reacted with e_D(s) are different, e.g. e_G(s) and e_DThe dimensions of(s) are different. It is understood that whether e is_G(s) may be reacted with e_D(s) is identical, e_GAnd(s) the method is only used for calculating the correlation between the positive training sample and the negative training sample, and only the positive training sample and the negative training sample need to be ensured to adopt the same model to extract the features.

In addition, since the number of negative training samples in the sample set is large, in order to reduce the amount of computation, after a positive training sample is randomly selected from the sample set, for each selected positive training sample, a plurality of candidate negative training samples are randomly selected from the sample set to form a candidate negative training sample set neg(s). Then, the correlation between the negative training samples in the candidate negative training sample set neg(s) and the corresponding positive training samples is calculated, specifically, the following calculation formula (4) may be used to calculate the correlation M between the positive training samples and the negative training samples, and the sampling strategy of the generator is determined according to the correlation M, that is, the negative training samples whose correlation with the positive training samples is greater than a first threshold value is selected, where the first threshold value may be dynamically adjusted in the training process or may be set as a fixed value.

Wherein e is_G(s') characterizing the attribute and behavior characteristics of the negative training sample indications, e_G(s) characterizing the attribute features and behavior features indicated by the training sample.

Step S106: and judging whether the training stopping criterion is reached, if so, executing the step S107, otherwise, executing the step S103.

Wherein the training stopping criterion is: and the difference between the discrimination parameter of the negative training sample calculated by the discriminator and the discrimination parameter of the positive training sample is smaller than a second threshold value, or the number of times of circularly training the discriminator is larger than a third threshold value.

In one embodiment of the present invention, a triplet loss is used as the optimization target for the arbiter, wherein the triplet loss can be calculated as shown in equation (5), i.e. the arbiter can be stopped from training when the triplet loss is larger than the threshold.

L_D＝∑_s∈Γ[f_D(e_D(s′))-f_D(e_D(s))+r]₊，s′～p_G(s′|s).......(5)

f_D(e_D(s)) characterizing the discriminatory parameters of the re-selected positive training samples calculated with the discriminators, f_D(e_D(s')) characterizing the discriminatory parameter of the reselected negative training sample calculated by the discriminator, r being the preset tuning parameter, p_G(s' | s) characterize the set of reselected negative training samples.

According to the formula (5), when the difference between the discrimination parameter of the negative training sample calculated by the discriminator and the discrimination parameter of the positive training sample is smaller than the second threshold, the triplet loss is smaller than the threshold, that is, when the difference between the discrimination parameter of the negative training sample calculated by the discriminator and the discrimination parameter of the positive training sample is smaller than the second threshold, the discriminant can be stopped. Alternatively, the training of the arbiter may be stopped when the number of times the arbiter is trained cyclically is greater than the third threshold.

Therefore, the discriminant parameters are calculated by the discriminant according to the attribute features and the behavior features respectively corresponding to the positive training sample and the negative training sample which are calculated by the discriminant, and the discriminant parameters are trained by the positive training sample and the negative training sample respectively corresponding to the labels. After a relatively accurate discriminator is obtained, the generator is optimized by combining the discrimination parameters calculated by the discriminator and the distances between the positive training samples and the negative training samples, so that the generator can reselect the negative training samples with higher correlation with the positive training samples, that is, the reselected positive training samples and the reselected negative training samples are more difficult to distinguish compared with the initially selected positive training samples and the initially selected negative training samples. And then inputting the reselected positive and negative training samples into a discriminator to optimize the discriminator, namely, the training discriminator distinguishes the positive and negative training samples which are difficult to distinguish, and the training process of the discriminator is consistent with the process of training the discriminator according to the discrimination scores and the labels of the positive and negative training samples in the initial stage, so that the description is omitted.

When the generator is optimized, two distance limit signals are added, wherein one distance limit is the Euclidean distance between the attribute features of the positive training samples and the attribute features of the negative training samples, and the other distance limit is the Euclidean distance between the behavior features of the positive training samples and the behavior features of the negative training samples. By limiting the two distances, the signal strength of the generator is strengthened, so that a better negative sample generator can be obtained, and the generator can select positive and negative training samples with higher correlation from the sample set. In the process of multiple-cycle optimization, the positive and negative training samples with higher correlation degrees are continuously utilized to train the discriminator, so that the discrimination capability of the discriminator is gradually enhanced, and the discriminator with more accurate prediction effect is obtained.

Step S107: and determining a to-be-recommended positive sample from the to-be-recommended sample set by using the discriminator, and recommending the article corresponding to the to-be-recommended positive sample to the user.

The method comprises the steps of calculating a set of samples to be recommended by using a trained discriminator, calculating the discrimination score of each sample to be recommended, and determining the sample to be recommended from the set of samples to be recommended according to the discrimination parameters corresponding to the samples to be recommended respectively, wherein the discrimination score can represent the probability of the corresponding sample to be recommended as the sample to be recommended.

When the article recommendation method provided by the embodiment of the invention is applied to the e-commerce platform, more accurate article advertisements can be provided for users, so that the click rate of the users on the article advertisements is improved, and the advertisement putting effect of the e-commerce platform is improved. For example, in a scenario where the e-commerce platform uses the third-party platform to deliver advertisements, after the user clicks the advertisements displayed by the third-party platform, the advertisements are directed to an intermediate page, which is a typical scenario of the recommendation system. In the recommendation system scenario, items may be presented to the user in a feed stream, and the user may click and view items on the intermediate page, or slide to view more items. On the middle page, the recommendation system recommends the articles, which directly affects the click and conversion of the user and also affects the advertisement putting effect of the e-commerce platform. Therefore, in the recommendation system scene of the middle page, by adopting the article recommendation method provided by the embodiment of the invention, the offline AUC (Area enclosed by the ROC Curve and the coordinate axis) is greatly improved, and more accurate article recommendation can be provided for the user, so that the click rate and the conversion rate of the user are improved, the advertisement delivery effect of the e-commerce platform is further improved, and the benefit of the e-commerce platform is favorably improved.

According to the article recommendation method provided by the embodiment of the invention, the distance between the positive training sample and the negative training sample is calculated according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample, then the positive training sample and the negative training sample related to the positive training sample are reselected according to the distance and the discrimination parameters of the negative training sample, and the discriminator is trained by using the reselected positive training sample and the reselected negative training sample, so that the training effect of the discriminator is improved, the prediction accuracy of the discriminator is improved, and the accuracy of the recommended article when the discriminator is used for article recommendation is improved.

Fig. 2 is a schematic diagram of main blocks of an item recommendation device according to an embodiment of the present invention.

As shown in fig. 2, an item recommendation apparatus 200 according to an embodiment of the present invention includes: a sample acquisition module 201, a sample selection module 202, a training module 203 and a recommendation module 204; wherein,

the sample acquiring module 201 is configured to acquire a sample set, where a training sample in the sample set indicates attribute features of an item and behavior features of a user for the item;

the sample selection module 202 is configured to select a positive training sample and a negative training sample from the sample set, where an article corresponding to the positive training sample is exposed and clicked, and an article corresponding to the negative training sample is exposed and not clicked;

the training module 203 is configured to cyclically execute the following steps to train the arbiter for recommending the item until the training stop criterion is reached: calculating the distance between the positive training sample and the negative training sample according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample; calculating a discrimination parameter of the selected negative training sample, the discrimination parameter indicating a probability that the negative training sample is a positive training sample; reselecting a positive training sample and a negative training sample related to the reselected positive training sample from the training set according to the distance and the discrimination parameter, and training the discriminator according to the reselected positive training sample and the negative training sample;

the recommending module 204 is configured to determine a to-be-recommended positive sample from the to-be-recommended sample set by using the discriminator, and recommend an article corresponding to the to-be-recommended positive sample to a user.

In an embodiment of the present invention, the training module 203 is configured to use the distance, the discriminant parameter, and a negative training sample corresponding to the discriminant parameter as input of a generator for selecting a sample to optimize the generator, and select the positive training sample and a negative training sample related to the reselected positive training sample by using the optimized generator.

In an embodiment of the present invention, the training module 203 randomly selects a positive training sample from the sample set by using the generator, and selects a negative training sample having a correlation degree greater than a first threshold with the positive training sample according to the attribute features of the item indicated by the positive training sample and the behavior features of the pair of items of the user.

In one embodiment of the present invention, the training stopping criterion is: and the difference between the discrimination parameter of the negative training sample calculated by the discriminator and the discrimination parameter of the positive training sample is smaller than a second threshold value, or the number of times of circularly training the discriminator is larger than a third threshold value.

According to the article recommending device provided by the embodiment of the invention, the distance between the positive training sample and the negative training sample is calculated according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample, then the positive training sample and the negative training sample related to the positive training sample are reselected according to the distance and the distinguishing parameters of the negative training sample, and the discriminator is trained by using the reselected positive training sample and the reselected negative training sample, so that the training effect of the discriminator is improved, the prediction accuracy of the discriminator is improved, and the accuracy of the recommended article is improved when the discriminator is used for recommending articles.

Fig. 3 shows an exemplary system architecture 300 of an item recommendation method or an item recommendation apparatus to which an embodiment of the present invention may be applied.

As shown in fig. 3, the system architecture 300 may include

terminal devices

301, 302, 303, a network 304, and a server 305. The network 304 serves as a medium for providing communication links between the

terminal devices

301, 302, 303 and the server 305. Network 304 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal device

301, 302, 303 to interact with the server 305 via the network 304 to receive or send messages or the like. The

terminal devices

301, 302, 303 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.

The

terminal devices

301, 302, 303 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 305 may be a server providing various services, such as a background management server providing support for shopping websites browsed by the user using the

terminal devices

301, 302, 303. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.

It should be noted that the item recommendation method provided in the embodiment of the present invention is generally executed by the server 305, and accordingly, the item recommendation apparatus is generally disposed in the server 305.

It should be understood that the number of terminal devices, networks, and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the system of the present invention when executed by a Central Processing Unit (CPU) 401.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sample acquisition module, a sample selection module, a training module, and a recommendation module. Where the names of these modules do not in some cases constitute a limitation on the module itself, for example, a sample acquisition module may also be described as a "module that acquires a sample set".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a sample set, wherein training samples in the sample set indicate attribute characteristics of an article and behavior characteristics of a user for the article; selecting a positive training sample and a negative training sample from the sample set, wherein the article corresponding to the positive training sample is exposed and clicked, and the article corresponding to the negative training sample is exposed and not clicked; circularly executing the following steps, training a discriminator for item recommendation until reaching a training stop criterion: calculating the distance between the positive training sample and the negative training sample according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample; calculating a discrimination parameter of the selected negative training sample, the discrimination parameter indicating a probability that the negative training sample is a positive training sample; reselecting a positive training sample and a negative training sample related to the reselected positive training sample from the training sample set according to the distance and the discrimination parameter; training the discriminator according to the reselected positive training sample and the negative training sample; and determining a to-be-recommended positive sample from the to-be-recommended sample set by using the discriminator, and recommending the article corresponding to the to-be-recommended positive sample to the user.

According to the technical scheme of the embodiment of the invention, the distance between the positive training sample and the negative training sample is calculated according to the attribute characteristics and the behavior characteristics respectively indicated by the positive training sample and the negative training sample, then the positive training sample and the negative training sample related to the positive training sample are reselected according to the distance and the discrimination parameters of the negative training sample, and the discriminator is trained by using the reselected positive training sample and the reselected negative training sample, so that the training effect of the discriminator is improved, the prediction accuracy of the discriminator is improved, and the accuracy of the recommended articles when the discriminator is used for recommending the articles is improved.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An item recommendation method, comprising:

acquiring a sample set, wherein training samples in the sample set indicate attribute characteristics of an article and behavior characteristics of a user for the article;

2. The method of claim 1, wherein said reselecting positive training samples and negative training samples associated with the reselected positive training samples from the set of training samples based on the distance and the discriminative parameter comprises:

reselecting the positive training samples and the negative training samples associated with the reselected positive training samples using the optimized generator.

3. The method of claim 2,

4. The method of claim 2, wherein the reselecting positive training samples and negative training samples associated with the reselected positive training samples from the set of training samples comprises:

5. The method of claim 1,

6. The method of claim 1,

7. The method of claim 1, wherein the calculating the distance between the positive training sample and the negative training sample according to the attribute features and the behavior features respectively indicated by the positive training sample and the negative training sample comprises:

8. An item recommendation device, comprising: the system comprises a sample acquisition module, a sample selection module, a training module and a recommendation module; wherein,

9. The apparatus of claim 8,

and the training module is used for taking the distance, the discrimination parameter and the negative training sample corresponding to the discrimination parameter as the input of a generator for selecting a sample so as to optimize the generator, and selecting the positive training sample and the negative training sample related to the reselected positive training sample by utilizing the optimized generator.

10. The apparatus of claim 9,

11. The apparatus of claim 8,

12. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.