CN116541610B

CN116541610B - Training method and device for recommendation model

Info

Publication number: CN116541610B
Application number: CN202310820760.6A
Authority: CN
Inventors: 齐盛; 董辉
Original assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Current assignee: Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date: 2023-07-06
Filing date: 2023-07-06
Publication date: 2023-09-29
Anticipated expiration: 2043-07-06
Also published as: CN116541610A

Abstract

The disclosure relates to the technical field of artificial intelligence, and provides a training method, a training device, computer equipment and a computer readable storage medium for a recommendation model. According to the method, the potential of data can be fully mined by the target recommendation model through the joint optimization generator and the discriminator, and the training interaction sequence is obtained by introducing the predicted interaction sequence into the history interaction sequence in the model training process, so that some noise is introduced into the training interaction sequence for training the target recommendation model, the effect of data enhancement is achieved, and the limitation of low quality of sequence data for training the target recommendation model is greatly improved; therefore, the method provided by the embodiment can greatly improve the prediction recommendation accuracy of the target recommendation model and improve the online recommendation performance of the target recommendation model.

Description

Training method and device for recommendation model

Technical Field

The disclosure relates to the technical field of artificial intelligence, and in particular relates to a training method and device for a recommendation model.

Background

The recommendation system plays an indispensable role in the life today, and has the physical and mental effects of online shopping, news reading and video watching. In order to make the recommendation system push more accurately, firstly, the articles and the users are fully modeled, and the articles most likely to be clicked by the users are preferentially pushed to the users through complicated means, so that the satisfaction degree of the users and the efficiency of the whole system are improved. In the prior art, recommendation models are generally used to recommend appropriate objects (such as goods or services) to a user; for example, in a business scenario, the input of a recommendation model is generally a batch of user features and commodity features, and the recommendation model is used to determine whether a user clicks on a specific commodity to purchase the commodity. The discrimination result is used as the output result of the model and clicked by a real user, and the purchase result is used for carrying out loss function calculation, so that the recommendation model is guided to be optimized. Recommendation models often appear in e-commerce and news recommendation scenarios in a sequential recommendation manner, specifically by modeling the entire sequence to predict what the next user may click on. However, the existing recommendation model only processes and encodes the historical interaction sequence of the user in the training process, but does not consider the quality and meaning of the data, so that the problem that the result of the recommendation model obtained by training is suboptimal is solved; therefore, in the scene of recommending goods or services to users by using the existing recommendation model, the goods or services recommended to the users are not really wanted goods or services by the users, so that the user experience is poor, and the conversion rate of the goods or services is limited to a certain extent.

Disclosure of Invention

In view of the above, embodiments of the present disclosure provide a method, an apparatus, a computer device, and a computer readable storage medium for training a recommendation model, so as to solve the problem that in the prior art, a recommendation model result obtained by training is suboptimal, and thus, in a scenario of recommending an object to a user by using the recommendation model, the recommended object is not really intended by the user, and thus, the user experience is poor.

In a first aspect of an embodiment of the present disclosure, a training method for a recommendation model is provided, where the method includes:

inputting a history interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the history interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interactive feature information in the predicted interactive sequence is interactive feature information corresponding to an object of which the target user does not execute preset operation;

generating a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; wherein, part of the interactive characteristic information in the training interactive sequence is part of the interactive characteristic information in the history interactive sequence, and the other part of the interactive characteristic information in the training interactive sequence is all the interactive characteristic information or part of the interactive characteristic information of the prediction interactive sequence;

Inputting the training interaction sequence into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence; the prediction conversion result corresponding to the interaction characteristic information is used for representing whether the object corresponding to the interaction characteristic information predicted by the discriminator is subjected to preset operation by the target user or not;

and according to the predicted interactive sequence corresponding to the historical interactive sequence, the predicted conversion result corresponding to each interactive feature information in the training interactive sequence and the real conversion result corresponding to each interactive feature information in the training interactive sequence, the model parameters of the discriminator are adjusted to obtain the target recommendation model.

In a second aspect of the embodiments of the present disclosure, there is provided a training apparatus for a recommendation model, the apparatus including:

the sequence prediction module is used for inputting a history interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the history interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interactive feature information in the predicted interactive sequence is interactive feature information corresponding to an object of which the target user does not execute preset operation;

The sequence generation module is used for generating a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; wherein, part of the interactive characteristic information in the training interactive sequence is part of the interactive characteristic information in the history interactive sequence, and the other part of the interactive characteristic information in the training interactive sequence is all the interactive characteristic information or part of the interactive characteristic information of the prediction interactive sequence;

the result prediction module is used for inputting the training interaction sequence into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence; the prediction conversion result corresponding to the interaction characteristic information is used for representing whether the object corresponding to the interaction characteristic information predicted by the discriminator is subjected to preset operation by the target user or not;

the model adjustment module is used for adjusting model parameters of the discriminator according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each interaction characteristic information in the training interaction sequence to obtain a target recommendation model.

In a third aspect of the disclosed embodiments, a computer device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when the computer program is executed.

In a fourth aspect of the disclosed embodiments, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, implements the steps of the above-described method.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: according to the embodiment of the disclosure, the historical interaction sequence in the interaction training sample can be input into a preset generator to obtain the predicted interaction sequence corresponding to the historical interaction sequence; then, a training interaction sequence can be generated according to the historical interaction sequence and the predicted interaction sequence; then, the training interaction sequence can be input into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence; finally, the model parameters of the discriminator can be adjusted according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each interaction characteristic information in the training interaction sequence, so as to obtain the target recommendation model. Therefore, in the embodiment, the predicted interaction sequence generated by the generator is subjected to secondary verification by utilizing the discriminator, so that the target recommendation model can fully mine the potential of data by combining the optimized generator and the discriminator, and the training interaction sequence is obtained by introducing the predicted interaction sequence into the history interaction sequence in the model training process, so that some noise is introduced into the training interaction sequence for training the target recommendation model, the effect of data enhancement is achieved, and the limitation of low quality of sequence data for training the target recommendation model is greatly improved; therefore, the method provided by the embodiment can greatly improve the prediction recommendation accuracy of the target recommendation model and improve the online recommendation performance of the target recommendation model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

Fig. 1 is a scene schematic diagram of an application scene of an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of training a recommendation model provided by an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a model training process for a target recommendation model provided by an embodiment of the present disclosure;

FIG. 4 is a block diagram of a training apparatus of a recommendation model provided by an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

A training method and apparatus for a recommendation model according to embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

In the prior art, recommendation models are generally used to recommend appropriate objects (such as goods or services) to a user; for example, in a business scenario, the input of a recommendation model is generally a batch of user features and commodity features, and the recommendation model is used to determine whether a user clicks on a specific commodity to purchase the commodity. The discrimination result is used as the output result of the model and clicked by a real user, and the purchase result is used for carrying out loss function calculation, so that the recommendation model is guided to be optimized. Recommendation models often appear in e-commerce and news recommendation scenarios in a sequential recommendation manner, specifically by modeling the entire sequence to predict what the next user may click on. However, the existing recommendation model only processes and encodes the historical interaction sequence of the user in the training process, but does not consider the quality and meaning of the data, so that the problem that the result of the recommendation model obtained by training is suboptimal is solved; therefore, in the scene of recommending goods or services to users by using the existing recommendation model, the goods or services recommended to the users are not really wanted goods or services by the users, so that the user experience is poor, and the conversion rate of the goods or services is limited to a certain extent.

In order to solve the above problems. In the method, as the predicted interaction sequence generated by the generator is secondarily checked by utilizing the discriminator, the target recommendation model can fully mine the potential of data by combining the optimizing generator and the discriminator, and the training interaction sequence is obtained by introducing the predicted interaction sequence into the history interaction sequence in the model training process, so that some noise is introduced into the training interaction sequence for training the target recommendation model, the effect of data enhancement is achieved, and the limitation of low quality of sequence data for training the target recommendation model is greatly improved; therefore, the method provided by the embodiment can greatly improve the prediction recommendation accuracy of the target recommendation model and improve the online recommendation performance of the target recommendation model.

For example, the embodiment of the present invention may be applied to an application scenario as shown in fig. 1. In this scenario, a terminal device 1 and a server 2 may be included.

The terminal device 1 may be hardware or software. When the terminal device 1 is hardware, it may be various electronic devices having a display screen and supporting communication with the server 2, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the terminal device 1 is software, it may be installed in the electronic device as described above. The terminal device 1 may be implemented as a plurality of software or software modules, or as a single software or software module, to which the embodiments of the present disclosure are not limited. Further, various applications, such as a data processing application, an instant messaging tool, social platform software, a search class application, a shopping class application, and the like, may be installed on the terminal device 1.

The server 2 may be a server that provides various services, for example, a background server that receives a request transmitted from a terminal device with which communication connection is established, and the background server may perform processing such as receiving and analyzing the request transmitted from the terminal device and generate a processing result. The server 2 may be a server, a server cluster formed by a plurality of servers, or a cloud computing service center, which is not limited in the embodiment of the present disclosure.

The server 2 may be hardware or software. When the server 2 is hardware, it may be various electronic devices that provide various services to the terminal device 1. When the server 2 is software, it may be a plurality of software or software modules providing various services to the terminal device 1, or may be a single software or software module providing various services to the terminal device 1, which is not limited by the embodiments of the present disclosure.

The terminal device 1 and the server 2 may be communicatively connected via a network. The network may be a wired network using coaxial cable, twisted pair wire, and optical fiber connection, or may be a wireless network that can implement interconnection of various communication devices without wiring, for example, bluetooth (Bluetooth), near field communication (Near Field Communication, NFC), infrared (Infrared), etc., which are not limited by the embodiments of the present disclosure.

Specifically, the user may input an interactive training sample through the terminal device 1; the terminal device 1 sends the interaction training samples to the server 2. The server 2 stores a preset generator and a discriminator. The server 2 may input the historical interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the historical interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interaction characteristic information in the predicted interaction sequence is interaction characteristic information corresponding to an object of which the target user does not execute a preset operation. Then, the server 2 may generate a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; the method comprises the steps of training interaction sequences, wherein one part of interaction characteristic information in the training interaction sequences is part of interaction characteristic information in the historical interaction sequences, and the other part of interaction characteristic information in the training interaction sequences is all interaction characteristic information or part of interaction characteristic information of the predicted interaction sequences. Then, the server 2 can input the training interaction sequence into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence; the prediction conversion result corresponding to the interaction characteristic information is used for representing whether the object corresponding to the interaction characteristic information predicted by the discriminator is subjected to preset operation by the target user or not. Finally, the server 2 may adjust model parameters of the discriminator according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction feature information in the training interaction sequence, and the real conversion result corresponding to each interaction feature information in the training interaction sequence, so as to obtain a target recommendation model. In this way, the predicted interactive sequence generated by the generator is subjected to secondary verification by utilizing the discriminator, the potential of data can be fully mined by the target recommendation model through combining the optimized generator and the discriminator, and the training interactive sequence is obtained by introducing the predicted interactive sequence into the history interactive sequence in the model training process, so that some noise is introduced into the training interactive sequence for training the target recommendation model, the effect of data enhancement is achieved, and the limitation of low quality of the sequence data for training the target recommendation model is greatly improved; therefore, the method provided by the embodiment can greatly improve the prediction recommendation accuracy of the target recommendation model and promote the online recommendation performance of the target recommendation model.

It should be noted that the specific types, numbers and combinations of the terminal device 1 and the server 2 and the network may be adjusted according to the actual requirements of the application scenario, which is not limited in the embodiment of the present disclosure.

It should be noted that the above application scenario is only shown for the convenience of understanding the present disclosure, and embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.

Fig. 2 is a flowchart of a training method of a recommendation model provided in an embodiment of the present disclosure. The training method of one of the recommendation models of fig. 2 may be performed by the terminal device or the server of fig. 1. As shown in fig. 2, the training method of the recommendation model includes:

s201: and inputting the historical interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the historical interaction sequence.

In this embodiment, the generator may be understood as a neural network model for predicting a predicted interaction sequence corresponding to an interaction sequence. In one implementation, the preset generator may be a transducer-based sequence encoder.

The historical interaction sequence may include a plurality of interaction feature information, and it may be understood that the plurality of interaction feature information may be sequenced according to a preset sequence to be used as the historical interaction sequence, and in one implementation, the sequence of the plurality of interaction feature information in the historical interaction sequence may be determined according to a preset operation time (i.e. a time when the interaction action occurs) of an object corresponding to the interaction feature information, for example, the plurality of interaction feature information may be sequenced from near to far according to the corresponding interaction time to be used as the historical interaction sequence. For example, products in which the same user has interactive actions (e.g., purchase, collection) in an e-commerce website or application for a period of time may be arranged in chronological order to form a sequence of interactive products (i.e., a historical sequence of interactions) for the user. It should be noted that all the interactive feature information in the history interactive sequence is the interactive feature information corresponding to the object (e.g. commodity, video, etc.) on which the target user has performed the preset operation (e.g. has been purchased, has been collected). That is, the historical interaction sequence includes a plurality of interaction characteristic information; each piece of interaction characteristic information is corresponding to one object of which the target user has executed a preset operation (namely interaction behavior); wherein, the objects corresponding to each interactive characteristic information in the history interactive sequence are different.

The predicted interaction sequence may include only predicted interaction feature information, and may also include interaction feature information in a historical interaction sequence and interaction feature information predicted based on the historical interaction sequence. It should be noted that, the predicted interaction characteristic information in the predicted interaction sequence is interaction characteristic information corresponding to an object for which the target user does not perform a preset operation (for example, does not purchase, does not collect, etc.).

It should be noted that, the above-mentioned object may be understood as an object to be executed with an interactive action, and the interactive feature information may be understood as feature information capable of reflecting an attribute of the object interacted with by the user. For example, when the interactive object is a commodity or service, the interactive feature information may be a feature capable of reflecting attributes such as price, sales volume, product type, and product use of the commodity or service.

For example, as shown in fig. 3, assuming that the user a clicks on the commodity a, b, c, d, the historical interaction sequence in the interaction training sample may be [ a, b, c, d ], the historical interaction sequence in the interaction training sample is input into a preset generator to obtain a predicted interaction sequence [ b, c, d, e ] corresponding to the historical interaction sequence [ a, b, c, d ], where e is predicted interaction feature information, that is, it is predicted that the commodity clicked by the user next may be the commodity e.

S202: and generating a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence.

After obtaining the predicted interaction sequence corresponding to the history interaction sequence predicted by the generator, a training interaction sequence for training the discriminator can be generated according to the history interaction sequence and the predicted interaction sequence. The part of the interactive feature information in the training interactive sequence can be part of the interactive feature information in the historical interactive sequence, and the other part of the interactive feature information in the training interactive sequence can be all the interactive feature information or part of the interactive feature information of the predicted interactive sequence. For example, as shown in FIG. 3, the historical interaction sequence and the predicted interaction sequence may be input to a data sampler module to obtain a training interaction sequence. Therefore, some noise can be introduced into the training interaction sequence for training the discriminant, the effect of data enhancement is achieved, and the limitation of low quality of sequence data is greatly improved.

Specifically, the target number of the interactive feature information to be replaced can be determined according to the number of the interactive feature information in the historical interactive sequence and a preset replacement proportion. Then, the target number of interaction characteristic information may be selected from the predicted interaction sequence as target interaction characteristic information. Then, the target number of interactive feature information can be selected from the historical interactive sequence to serve as interactive feature information to be replaced. And finally, the interactive characteristic information to be replaced in the historical interactive sequence can be replaced by the target interactive characteristic information, so that a training interactive sequence is obtained. For example, as shown in fig. 3, assuming that the preset replacement ratio is 25%, since the historical interaction sequence is [ a, b, c, d ] and the predicted interaction sequence is [ b, c, d, e ], one interaction characteristic information d in the historical interaction sequence may be replaced with the predicted interaction characteristic information e in the target interaction characteristic information, thereby obtaining the training interaction sequence [ b, c, d, e ].

S203: inputting the training interaction sequence into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence.

In this embodiment, the arbiter may be understood as a neural network model for predicting the transformation result of an object. The arbiter may be the same as the generator, and may be a sequence encoder based on a transducer. It will be appreciated that the network architecture of the generator and the arbiter are the same, while the model parameters of the generator and the arbiter are different. For example, as shown in fig. 3, the training interaction sequences [ a, b, c, e ] are input into a preset discriminator, so as to obtain the prediction conversion results corresponding to each interaction characteristic information a, b, c, e in the training interaction sequences.

The prediction conversion result corresponding to the interaction characteristic information may be used to characterize whether the object corresponding to the interaction characteristic information predicted by the discriminator is subjected to a preset operation (such as ordering, purchasing, etc.) by the target user. It should be noted that, assuming that the interactive object is a commodity, if the user generates further actions such as ordering or reserving on the interactive commodity in a period of time after the user interacts with the last commodity, such further user actions may be referred to as conversion; when training a model, the conversion behavior collection is required to be carried out aiming at the sequence of the interactive commodity of the user, so as to obtain whether the user finally carries out conversion on certain commodity; the training goal of the model is to perform actual transformation behaviors on which commodities the user can generate under the condition that the interactive commodity sequence of the user is taken as input, so that the trained recommendation model can push commodities with high transformation possibility to the user in use, and the transformation rate of the user on the interactive commodities is improved.

S204: and according to the predicted interactive sequence corresponding to the historical interactive sequence, the predicted conversion result corresponding to each interactive feature information in the training interactive sequence and the real conversion result corresponding to each interactive feature information in the training interactive sequence, the model parameters of the discriminator are adjusted to obtain the target recommendation model.

It should be noted that, in this embodiment, the interactive training sample may include a historical interactive sequence and a real conversion result corresponding to each interactive feature information in the historical interactive sequence, and since the predicted interactive feature information in the predicted interactive sequence is predicted, the user has not performed a preset operation on an object corresponding to the predicted interactive feature information, and therefore the real conversion result corresponding to the predicted interactive feature information may be preset as untransformed.

After the prediction conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence is obtained, a loss value can be determined according to the prediction interaction sequence corresponding to the historical interaction sequence, the prediction conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence, so that model parameters of the discriminator can be adjusted by using the loss value, and a target recommendation model is obtained.

The beneficial effects of the embodiment of the disclosure are that: according to the embodiment of the disclosure, the historical interaction sequence in the interaction training sample can be input into a preset generator to obtain the predicted interaction sequence corresponding to the historical interaction sequence; then, a training interaction sequence can be generated according to the historical interaction sequence and the predicted interaction sequence; then, the training interaction sequence can be input into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence; finally, the model parameters of the discriminator can be adjusted according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each interaction characteristic information in the training interaction sequence, so as to obtain the target recommendation model. Therefore, in the embodiment, the predicted interaction sequence generated by the generator is subjected to secondary verification by utilizing the discriminator, so that the target recommendation model can fully mine the potential of data by combining the optimized generator and the discriminator, and the training interaction sequence is obtained by introducing the predicted interaction sequence into the history interaction sequence in the model training process, so that some noise is introduced into the training interaction sequence for training the target recommendation model, the effect of data enhancement is achieved, and the limitation of low quality of sequence data for training the target recommendation model is greatly improved; therefore, the method provided by the embodiment can greatly improve the prediction recommendation accuracy of the target recommendation model and improve the online recommendation performance of the target recommendation model.

In some implementations of the present embodiment, the predicted interaction sequence may include predicted interaction characteristic information and interaction probabilities corresponding to the predicted interaction characteristic information. The interaction probability corresponding to the predicted interaction characteristic information can be understood as the probability that the object corresponding to the predicted interaction characteristic information is subjected to the preset operation by the user; it should be noted that, the larger the interaction probability corresponding to the predicted interaction feature information, the higher the possibility that the object corresponding to the predicted interaction feature information is executed by the user to perform the preset operation, otherwise, the smaller the interaction probability corresponding to the predicted interaction feature information, the lower the possibility that the object corresponding to the predicted interaction feature information is executed by the user to perform the preset operation.

In some implementations, S204 "adjusting the model parameters of the arbiter according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction feature information in the training interaction sequence, and the real conversion result corresponding to each interaction feature information in the training interaction sequence, to obtain the target recommendation model" may include the following steps:

s204a: and determining a first loss value according to the interaction probability corresponding to the predicted interaction characteristic information.

In this embodiment, for each piece of predicted interaction feature information, a logarithmic value of interaction probability corresponding to the predicted interaction feature information may be calculated first; and taking the product of the logarithmic value of the interaction probability and a preset coefficient as a first loss value of the interaction characteristic information. The sum of the first penalty values of all the predicted interaction characteristic information may then be taken as the first penalty value. The first loss value may be understood as a loss value of the generator.

S204b: and determining a second loss value according to the predicted conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence and the preset real conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence.

And determining the second loss value by using a cross entropy loss function, a predicted conversion result corresponding to each interactive feature information in the training interactive sequence and a preset real conversion result corresponding to each interactive feature information in the training interactive sequence. Specifically, based on a cross entropy loss function (such as a binary cross entropy loss function), a two-class cross entropy loss value can be determined according to a predicted conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence and a preset real conversion result corresponding to each piece of interaction characteristic information in the training interaction sequence; and determining a second loss value from the bi-class cross entropy loss value, e.g., taking the bi-class cross entropy loss value as the second loss value. The second loss value may be understood as a loss value of the discriminator.

S204c: a total loss value is determined based on the first loss value and the second loss value.

In one implementation, the sum of the first loss value and the second loss value according to a preset ratio may be taken as the total loss value. For example, the total loss value may be calculated by the following formula:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,for the total loss value, +.>For the first penalty value (i.e. the penalty value of the generator based on the historical interaction sequence I),/for the first penalty value (i.e. the penalty value of the generator based on the historical interaction sequence I)>For a preset constant, ++>For the first loss value (i.e. the generator based on training interaction sequence +.>A loss value of (c) in the above-mentioned (d) range).

S204d: and adjusting model parameters of the discriminator according to the total loss value to obtain a target recommendation model.

After determining the total loss value, if the total loss value does not meet the preset condition, the model parameters of the discriminator and the generator can be adjusted according to the total loss value to obtain the adjusted discriminator and generator, and the step S201 is continuously executed until the total loss value meets the preset condition or the training times of the discriminator and the generator reach the preset times. When the total loss value meets the preset condition or the training times of the discriminator and the generator reach the preset times, the discriminator at the moment can be used as a target recommendation model.

Next, the goal recommendation model will be described in the process of use (i.e., the inference prediction phase). In some implementations of this embodiment, the method further includes:

and inputting the interaction sequence to be predicted into the target recommendation model to obtain a prediction conversion result of the object to be predicted in the interaction sequence to be predicted. The interaction sequence to be predicted comprises interaction characteristic information corresponding to a historical interaction object corresponding to a target user and interaction characteristic information corresponding to an object to be predicted, and the object to be predicted can be understood as an object needing conversion result prediction. The predictive conversion result may reflect whether the object to be predicted will be subjected to a preset operation (e.g., order, purchase, etc.) by the target user.

Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 4 is a schematic diagram of a training apparatus of a recommendation model provided in an embodiment of the present disclosure. As shown in fig. 4, the training device of the recommendation model includes:

The sequence prediction module 401 is configured to input a historical interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the historical interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interactive feature information in the predicted interactive sequence is interactive feature information corresponding to an object of which the target user does not execute preset operation;

a sequence generating module 402, configured to generate a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; wherein, part of the interactive characteristic information in the training interactive sequence is part of the interactive characteristic information in the history interactive sequence, and the other part of the interactive characteristic information in the training interactive sequence is all the interactive characteristic information or part of the interactive characteristic information of the prediction interactive sequence;

the result prediction module 403 is configured to input the training interaction sequence into a preset discriminator, so as to obtain a predicted conversion result corresponding to each interaction feature information in the training interaction sequence; the prediction conversion result corresponding to the interaction characteristic information is used for representing whether the object corresponding to the interaction characteristic information predicted by the discriminator is subjected to preset operation by the target user or not;

The model adjustment module 404 is configured to adjust model parameters of the discriminator according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction feature information in the training interaction sequence, and the real conversion result corresponding to each interaction feature information in the training interaction sequence, so as to obtain a target recommendation model.

Optionally, the generator and the arbiter are both a transform-based sequence encoder; wherein the network architecture of the generator and the arbiter are the same, and the model parameters of the generator and the arbiter are different.

Optionally, the historical interaction sequence includes a plurality of interaction characteristic information; each piece of interaction characteristic information is corresponding to one object of which the target user has executed preset operation, and the objects corresponding to each piece of interaction characteristic information are different; and the ordering order of the interaction characteristic information in the history interaction sequence is determined according to the preset operation time of the object corresponding to the interaction characteristic information.

Optionally, the sequence generating module 402 is configured to:

determining the target quantity of the interactive feature information to be replaced according to the quantity of the interactive feature information in the historical interactive sequence and a preset replacement proportion;

Selecting the interactive characteristic information of the target number from the predicted interactive sequence as target interactive characteristic information;

selecting the interactive characteristic information of the target number from the historical interactive sequence as the interactive characteristic information to be replaced;

and replacing the interactive characteristic information to be replaced in the historical interactive sequence with the target interactive characteristic information to obtain a training interactive sequence.

Optionally, the predicted interaction sequence includes predicted interaction feature information and interaction probability corresponding to the predicted interaction feature information; the model adjustment module 404 is configured to:

determining a first loss value according to the interaction probability corresponding to the predicted interaction characteristic information;

determining a second loss value according to a predicted conversion result corresponding to each interactive feature information in the training interactive sequence and a preset real conversion result corresponding to each interactive feature information in the training interactive sequence;

determining a total loss value based on the first loss value and the second loss value;

and adjusting model parameters of the discriminator according to the total loss value to obtain a target recommendation model.

Optionally, the model adjustment module 404 is configured to:

calculating the logarithmic value of the interaction probability corresponding to the predicted interaction characteristic information;

Taking the product of the logarithmic value of the interaction probability and a preset coefficient as the first loss value.

Optionally, the model adjustment module 404 is configured to:

and determining the second loss value by using a cross entropy loss function, a predicted conversion result corresponding to each interactive feature information in the training interactive sequence and a preset real conversion result corresponding to each interactive feature information in the training interactive sequence.

Optionally, the apparatus further includes a result prediction module configured to:

inputting the interaction sequence to be predicted into the target recommendation model to obtain a prediction conversion result of an object to be predicted in the interaction sequence to be predicted; the interaction sequence to be predicted comprises interaction characteristic information corresponding to a historical interaction object corresponding to a target user and interaction characteristic information corresponding to the object to be predicted.

Compared with the prior art, the embodiment of the disclosure has the beneficial effects that: the embodiment of the disclosure provides a training device of a recommendation model, which comprises: the sequence prediction module is used for inputting a history interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the history interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interactive feature information in the predicted interactive sequence is interactive feature information corresponding to an object of which the target user does not execute preset operation; the sequence generation module is used for generating a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; wherein, part of the interactive characteristic information in the training interactive sequence is part of the interactive characteristic information in the history interactive sequence, and the other part of the interactive characteristic information in the training interactive sequence is all the interactive characteristic information or part of the interactive characteristic information of the prediction interactive sequence; the result prediction module is used for inputting the training interaction sequence into a preset discriminator to obtain a prediction conversion result corresponding to each interaction characteristic information in the training interaction sequence; the prediction conversion result corresponding to the interaction characteristic information is used for representing whether the object corresponding to the interaction characteristic information predicted by the discriminator is subjected to preset operation by the target user or not; the model adjustment module is used for adjusting model parameters of the discriminator according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each interaction characteristic information in the training interaction sequence to obtain a target recommendation model. Therefore, in the embodiment, the predicted interaction sequence generated by the generator is subjected to secondary verification by utilizing the discriminator, so that the target recommendation model can fully mine the potential of data by combining the optimized generator and the discriminator, and the training interaction sequence is obtained by introducing the predicted interaction sequence into the history interaction sequence in the model training process, so that some noise is introduced into the training interaction sequence for training the target recommendation model, the effect of data enhancement is achieved, and the limitation of low quality of sequence data for training the target recommendation model is greatly improved; therefore, the method provided by the embodiment can greatly improve the prediction recommendation accuracy of the target recommendation model and improve the online recommendation performance of the target recommendation model.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 5 is a schematic diagram of a computer device 5 provided by an embodiment of the present disclosure. As shown in fig. 5, the computer device 5 of this embodiment includes: a processor 501, a memory 502 and a computer program 503 stored in the memory 502 and executable on the processor 501. The steps of the various method embodiments described above are implemented by processor 501 when executing computer program 503. Alternatively, the processor 501, when executing the computer program 503, performs the functions of the modules/modules in the apparatus embodiments described above.

Illustratively, the computer program 503 may be split into one or more modules/modules, which are stored in the memory 502 and executed by the processor 501 to complete the present disclosure. One or more of the modules/modules may be a series of computer program instruction segments capable of performing particular functions for describing the execution of the computer program 503 in the computer device 5.

The computer device 5 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device 5 may include, but is not limited to, a processor 501 and a memory 502. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the computer device 5 and is not limiting of the computer device 5, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the computer device may also include input and output devices, network access devices, buses, etc.

The processor 501 may be a central processing module (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 502 may be an internal storage module of the computer device 5, for example, a hard disk or a memory of the computer device 5. The memory 502 may also be an external storage device of the computer device 5, for example, a plug-in hard disk provided on the computer device 5, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), or the like. Further, the memory 502 may also include both internal memory modules of the computer device 5 and external memory devices. The memory 502 is used to store computer programs and other programs and data required by the computer device. The memory 502 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of each functional module and module is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules or modules to perform all or part of the above-described functions. The functional modules and the modules in the embodiment can be integrated in one processing module, or each module can exist alone physically, or two or more modules can be integrated in one module, and the integrated modules can be realized in a form of hardware or a form of a software functional module. In addition, the specific names of the functional modules and the modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present disclosure. The modules in the above system, and the specific working process of the modules may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., a module or division of modules is merely a logical function division, and there may be additional divisions of actual implementation, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or modules, which may be in electrical, mechanical or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

The integrated modules/modules may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A method for training a recommendation model, the method comprising:

inputting a history interaction sequence in an interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the history interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interaction characteristic information in the predicted interaction sequence is interaction characteristic information corresponding to an object of which the target user does not execute preset operation;

generating a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; wherein, a part of the interactive feature information in the training interactive sequence is a part of the interactive feature information in the history interactive sequence, and another part of the interactive feature information in the training interactive sequence is all the interactive feature information or a part of the interactive feature information of the prediction interactive sequence; specific: determining the target quantity of the interactive feature information to be replaced according to the quantity of the interactive feature information in the historical interactive sequence and a preset replacement proportion; selecting the interactive characteristic information of the target number from the predicted interactive sequence as target interactive characteristic information; selecting the interactive characteristic information of the target number from the historical interactive sequence as the interactive characteristic information to be replaced; replacing the interactive characteristic information to be replaced in the historical interactive sequence with the target interactive characteristic information to obtain a training interactive sequence;

according to the predicted interactive sequence corresponding to the historical interactive sequence, the predicted conversion result corresponding to each interactive feature information in the training interactive sequence and the real conversion result corresponding to each interactive feature information in the training interactive sequence, model parameters of the generator and the discriminator are adjusted to obtain a target recommendation model; the network architecture of the generator and the arbiter is the same, and model parameters of the generator and the arbiter are different;

the predicted interaction sequence comprises predicted interaction characteristic information and interaction probability corresponding to the predicted interaction characteristic information; the method for obtaining the target recommendation model by adjusting model parameters of the generator and the discriminator according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each interaction characteristic information in the training interaction sequence comprises the following steps:

Determining a first loss value according to the interaction probability corresponding to the predicted interaction characteristic information; specific: for each piece of predicted interactive feature information, firstly calculating the logarithmic value of the interactive probability corresponding to the predicted interactive feature information, and taking the product of the logarithmic value of the interactive probability and a preset coefficient as a first loss value of the interactive feature information;

determining a second loss value according to a predicted conversion result corresponding to each interactive feature information in the training interactive sequence and a preset real conversion result corresponding to each interactive feature information in the training interactive sequence; the preset real conversion result corresponding to the predicted interactive feature information in the training interactive sequence is unconverted;

determining a total loss value according to the first loss value and the second loss value;

and adjusting model parameters of the generator and the discriminator according to the total loss value to obtain a target recommendation model.

2. The method of claim 1, wherein the generator and the arbiter are both transform-based sequence encoders.

3. The method of claim 1, wherein the historical interaction sequence includes a plurality of interaction characteristic information; each piece of interaction characteristic information is corresponding to one object of which the target user has executed preset operation, and the objects corresponding to each piece of interaction characteristic information are different; and the ordering order of the interaction characteristic information in the history interaction sequence is determined according to the preset operation time of the object corresponding to the interaction characteristic information.

4. The method of claim 1, wherein determining the second loss value according to the predicted conversion result corresponding to each piece of interaction feature information in the training interaction sequence and the preset real conversion result corresponding to each piece of interaction feature information in the training interaction sequence comprises:

5. The method according to any one of claims 1-4, further comprising:

inputting an interaction sequence to be predicted into the target recommendation model to obtain a prediction conversion result of an object to be predicted in the interaction sequence to be predicted; the interaction sequence to be predicted comprises interaction characteristic information corresponding to a historical interaction object corresponding to a target user and interaction characteristic information corresponding to the object to be predicted.

6. A training device for a recommendation model, the device comprising:

the sequence prediction module is used for inputting a history interaction sequence in the interaction training sample into a preset generator to obtain a predicted interaction sequence corresponding to the history interaction sequence; all interaction characteristic information in the history interaction sequence is interaction characteristic information corresponding to an object of which the target user has executed preset operation; the predicted interaction characteristic information in the predicted interaction sequence is interaction characteristic information corresponding to an object of which the target user does not execute preset operation;

The sequence generation module is used for generating a training interaction sequence according to the historical interaction sequence and the predicted interaction sequence; wherein, a part of the interactive feature information in the training interactive sequence is a part of the interactive feature information in the history interactive sequence, and another part of the interactive feature information in the training interactive sequence is all the interactive feature information or a part of the interactive feature information of the prediction interactive sequence; specific: determining the target quantity of the interactive feature information to be replaced according to the quantity of the interactive feature information in the historical interactive sequence and a preset replacement proportion; selecting the interactive characteristic information of the target number from the predicted interactive sequence as target interactive characteristic information; selecting the interactive characteristic information of the target number from the historical interactive sequence as the interactive characteristic information to be replaced; replacing the interactive characteristic information to be replaced in the historical interactive sequence with the target interactive characteristic information to obtain a training interactive sequence;

The model adjustment module is used for adjusting model parameters of the generator and the discriminator according to the predicted interaction sequence corresponding to the historical interaction sequence, the predicted conversion result corresponding to each interaction characteristic information in the training interaction sequence and the real conversion result corresponding to each interaction characteristic information in the training interaction sequence to obtain a target recommendation model; the network architecture of the generator and the arbiter is the same, and model parameters of the generator and the arbiter are different;

the predicted interaction sequence comprises predicted interaction characteristic information and interaction probability corresponding to the predicted interaction characteristic information; the model adjustment module is specifically configured to:

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 5.