CN111090756A

CN111090756A - Artificial intelligence-based multi-target recommendation model training method and device

Info

Publication number: CN111090756A
Application number: CN202010214210.6A
Authority: CN
Inventors: 刘剑; 刘鸿; 陈凯; 夏锋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Shenzhen Yayue Technology Co ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2020-05-01
Anticipated expiration: 2040-03-24
Also published as: CN111090756B

Abstract

The invention provides a training method and a device of a multi-target recommendation model based on artificial intelligence, electronic equipment and a storage medium; the method comprises the following steps: acquiring a training sample of the multi-target recommendation model, wherein the training sample is marked with at least two labels corresponding to the interactive features; the interactive features comprise a first interactive feature and at least one second interactive feature, wherein the sampling time window of the second interactive feature is larger than that of the first interactive feature; respectively inputting the training samples into at least one teacher model; respectively carrying out second interactive characteristic prediction on the training samples through at least one teacher model to obtain corresponding prediction results; updating the label of the corresponding second interactive feature in the training sample based on the obtained prediction result to obtain the training sample after at least one label is updated; training a multi-target recommendation model based on the training samples with the updated at least one label; according to the method and the device, the prediction accuracy of the multi-target recommendation model can be improved.

Description

Artificial intelligence-based multi-target recommendation model training method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a training method and device of a multi-target recommendation model based on artificial intelligence, electronic equipment and a storage medium.

Background

Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The recommendation system is an important application branch of artificial intelligence, and the multi-target recommendation model is widely applied in the current personalized recommendation scene of the information flow. And different window periods often exist for a plurality of targets aimed at by the multi-target recommendation model, so that the accuracy of training samples acquired based on the same adopted window is reduced, and the prediction accuracy of the multi-target recommendation model trained based on the samples is further low.

Disclosure of Invention

The embodiment of the invention provides a training method, a training device, electronic equipment and a storage medium of a multi-target recommendation model based on artificial intelligence, which can improve the prediction precision of the multi-target recommendation model and further improve the accuracy of recommending media objects based on the prediction result of the multi-target recommendation model.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a multi-target recommendation model training method based on artificial intelligence, which comprises the following steps:

acquiring a training sample of a multi-target recommendation model for recommending media objects, wherein the training sample is marked with at least two labels corresponding to interactive features;

wherein the interactive features include: the method comprises the steps that a first interactive feature and at least one second interactive feature are adopted, wherein the sampling time window of the second interactive feature is larger than that of the first interactive feature;

respectively inputting the training samples into at least one teacher model, wherein each teacher model is used for predicting one second interactive feature;

respectively carrying out second interactive characteristic prediction on the training samples through the at least one teacher model to obtain corresponding prediction results;

updating the label of the corresponding second interactive feature in the training sample based on the obtained prediction result of the at least one teacher model to obtain the training sample with at least one updated label;

training the multi-objective recommendation model based on the updated training samples of the at least one label,

enabling the multi-target recommendation model to perform feature prediction corresponding to the first interaction feature and the at least one second interaction feature based on the input media object, so as to recommend the media object based on a feature prediction result.

The embodiment of the invention also provides a training device of the multi-target recommendation model based on artificial intelligence, which comprises the following steps:

the system comprises an acquisition module, a recommendation module and a recommendation module, wherein the acquisition module is used for acquiring a training sample of a multi-target recommendation model for recommending media objects, and the training sample is marked with at least two labels corresponding to interactive features; wherein the interactive features include: the method comprises the steps that a first interactive feature and at least one second interactive feature are adopted, wherein the sampling time window of the second interactive feature is larger than that of the first interactive feature;

the input module is used for respectively inputting the training samples into at least one teacher model, and each teacher model is used for predicting one second interactive feature;

the prediction module is used for respectively carrying out second interactive characteristic prediction on the training samples through the at least one teacher model to obtain corresponding prediction results;

the updating module is used for updating the label of the corresponding second interactive feature in the training sample based on the obtained prediction result of the at least one teacher model to obtain the training sample with the updated at least one label;

the training module is used for training the multi-target recommendation model based on the training sample after updating the at least one label, so that the multi-target recommendation model can perform feature prediction corresponding to the first interactive feature and the at least one second interactive feature based on an input media object, and recommend the media object based on a feature prediction result.

In the foregoing solution, the obtaining module is further configured to collect data of the media object corresponding to the first interactive feature and data of the at least one second interactive feature based on the sampling time window of the first interactive feature; and are

And constructing training samples of the multi-target recommendation model based on the collected data.

In the above scheme, the updating module is further configured to label the prediction result of each teacher model as a label of a corresponding second interactive feature in the training sample, so as to update the label of the corresponding second interactive feature in the training sample, and obtain the training sample after updating at least one label.

In the above scheme, the apparatus further comprises:

the teacher model training module is used for acquiring training samples of the at least one teacher model;

the training samples of the teacher models are obtained by sampling based on the sampling time windows of the corresponding second interactive features, and at least labels corresponding to the corresponding second interactive features are labeled;

and training the corresponding teacher model based on the training sample of each teacher model respectively, so that the teacher model can predict the corresponding second interactive characteristics based on the input media object.

In the above scheme, the teacher model training module is further configured to input a training sample of each teacher model to a corresponding teacher model, and predict the second interactive feature through the corresponding teacher model to obtain a corresponding prediction result;

determining the value of the loss function of each teacher model based on the obtained prediction result and the label marked by the training sample of each teacher model;

and updating the model parameters of the corresponding teacher model based on the value of the loss function of each teacher model.

In the above scheme, the training module is further configured to predict the interactive features of the training sample after updating the at least one label through the multi-target recommendation model to obtain a feature prediction result;

acquiring a feature prediction result of each interactive feature and a difference between tags corresponding to the interactive features;

determining the value of a loss function corresponding to the corresponding interactive features in the multi-target recommendation model based on the difference corresponding to each interactive feature;

and updating the model parameters of the multi-target recommendation model based on the values of the loss functions corresponding to the interactive features in the multi-target recommendation model.

In the foregoing solution, the training module is further configured to determine an error signal of each interactive feature based on the loss function corresponding to each interactive feature when the value of the loss function corresponding to each interactive feature exceeds the corresponding loss threshold;

and reversely propagating each error signal in the multi-target recommendation model, and updating the model parameters of each layer in the propagation process.

In the above scheme, the multi-target recommendation model includes a sharing layer, a feature extraction layer, a feature splicing layer and a prediction layer, and the training module is further configured to sequentially propagate the error signal of the first interactive feature to the prediction layer, the feature splicing layer, the feature extraction layer and the sharing layer, so as to realize backward propagation of the error signal of the first interactive feature in the multi-target recommendation model;

the error signals of the second interactive features are sequentially transmitted to the prediction layer, the feature splicing layer and the feature extraction layer; and are

Blocking the error signal of the second interactive feature such that the error signal of the second interactive feature cannot propagate to the shared layer;

and updating the model parameters of each layer in the multi-target recommendation model in the process of back propagation of the error signal of the first interactive characteristic and the error signal of the second interactive characteristic.

In the above scheme, the multi-target recommendation model includes a feature mapping layer, a feature extraction layer, a feature splicing layer, and a prediction layer, and the apparatus further includes:

the recommendation module is used for acquiring user data and content data of the media object to be recommended;

respectively mapping the user data and the content data through the feature mapping layer to obtain feature vectors corresponding to the user data and the content data;

extracting the features of the obtained feature vectors through the feature extraction layer to obtain the feature vectors of the media objects to be recommended;

splicing the feature vectors of the media objects to be recommended through the feature splicing layer to obtain spliced vectors;

predicting interactive features through the prediction layer based on the splicing vector to obtain a feature prediction result corresponding to the media object to be recommended;

recommending the media object to be recommended based on the characteristic prediction result.

In the above scheme, the recommendation module is further configured to determine a login user corresponding to the media information flow page;

acquiring user data of the login user and content data of a media object to be recommended;

predicting interactive features through the multi-target recommendation model based on the acquired user data and the acquired content data to obtain feature prediction results corresponding to the first interactive features and the at least one second interactive feature;

determining at least one target media object in the media objects to be recommended based on the obtained characteristic prediction result;

recommending the target media object to the login user so as to present the target media object on the media information flow page.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the training method of the multi-target recommendation model based on artificial intelligence provided by the embodiment of the invention when the executable instructions stored in the memory are executed.

The embodiment of the invention also provides a computer-readable storage medium, which stores executable instructions, and when the executable instructions are executed by a processor, the training method of the multi-target recommendation model based on artificial intelligence provided by the embodiment of the invention is realized.

The embodiment of the invention has the following beneficial effects:

because the labels of the first interactive feature and the second interactive feature are marked in the obtained training samples of the multi-target recommendation model, and the sampling time window of the second interactive feature is larger than the sampling time window of the first interactive feature, that is, the sampling time windows of the two interactive features are asynchronous, which leads to the reduction of the accuracy of the training samples, based on which at least one teacher model capable of predicting the second interactive feature is obtained, the training samples of the multi-target recommendation model are input into the at least one teacher model, predicting corresponding second interactive characteristics of the training samples through the at least one teacher model, updating labels of the corresponding second interactive characteristics in the training samples based on the obtained prediction result to obtain the training samples with at least one updated label, training the multi-target recommendation model based on the training sample after updating the at least one label;

therefore, the training samples are subjected to second interactive characteristic prediction through at least one trained teacher model, labels corresponding to the second interactive characteristics in the training samples are updated based on prediction results, so that the transfer of the prediction capability of the teacher model is realized, and then the training samples with the updated labels are adopted to train the multi-target recommendation model, so that the prediction accuracy of the multi-target recommendation model on the interactive characteristics of different sampling time windows is improved; therefore, when the media object recommendation is carried out by combining the prediction results of the first interactive characteristic and the second interactive characteristic, the accuracy of the media object recommendation is correspondingly improved.

Drawings

FIG. 1 is a schematic diagram of an implementation scenario of a training method for a multi-target recommendation model based on artificial intelligence according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for training a multi-objective recommendation model based on artificial intelligence according to an embodiment of the present invention;

FIG. 4 is a first schematic structural diagram of a multi-objective recommendation model provided by an embodiment of the present invention;

FIG. 5 is a structural diagram of a multi-objective recommendation model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of the back propagation blocking of the error signal for a second interaction feature provided by an embodiment of the present invention;

FIG. 7 is a data flow graph illustrating media object recommendation based on a multi-objective recommendation model according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating a method for training a multi-objective recommendation model based on artificial intelligence according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an architecture of a training method for a multi-objective recommendation model according to an embodiment of the present invention;

FIG. 10 is a diagram of a media information flow page provided by an embodiment of the invention;

FIG. 11 is a schematic structural diagram of a training apparatus for multi-objective recommendation model based on artificial intelligence according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, to enable embodiments of the invention described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.

1) Media objects suitable for various types of information spread over the internet, such as news, information, short videos, and the like;

2) content data, data related to the media object, such as an identification of the media object, a content tag, a publishing source, related text, etc.;

3) user data such as user identification, historical behavior (search records, browsing records, etc.), user context (geographic location, network status, etc.);

4) the interactive characteristics mainly refer to interactive characteristics of a user and a media object, for example, when personalized video stream recommendation is performed, the interactive characteristics refer to that the user plays or shares videos or other users play shared contents;

5) the multi-target recommendation model can predict multiple targets (namely multiple interactive features) based on input media objects to be recommended, for example, the sharing rate and the playing rate of a certain media object to be recommended can be predicted at the same time.

Based on the above explanations of terms and terms related in the embodiments of the present invention, an implementation scenario of the training method for multi-objective recommendation model based on artificial intelligence provided in the embodiments of the present invention is described below, referring to fig. 1, fig. 1 is a schematic diagram of an implementation scenario of the training method for multi-objective recommendation model based on artificial intelligence provided in the embodiments of the present invention, in order to support an exemplary application, an application client, such as an instant messaging client, a video playing client, etc., is provided on a terminal (including a terminal 200-1 and a terminal 200-2); the terminal 200-1 is located at a publishing side of the media object, the terminal 200-2 is located at a receiving side of the media object, the terminal 200 is connected to the server 100 through the network 300, the network 300 may be a wide area network or a local area network, or a combination of the two, and the data transmission is realized by using a wireless or wired link.

A server 100 for obtaining training samples of a multi-target recommendation model for media object recommendation; respectively inputting the training samples into at least one teacher model; respectively carrying out second interactive characteristic prediction on the training samples through at least one teacher model to obtain corresponding prediction results; updating the label of the corresponding second interactive feature in the training sample based on the obtained prediction result of the at least one teacher model to obtain the training sample after the at least one label is updated; training a multi-target recommendation model based on the training samples with the updated at least one label; therefore, training of the multi-target recommendation model is achieved.

A media object publisher opens a client of the terminal 200-1 and publishes a media object to be recommended, for example, the terminal 200-1 is used for generating and sending an interactive characteristic prediction request carrying the media object to be recommended to the server 100;

the server 100 is used for acquiring content data of a media object to be recommended and user data of a login user corresponding to a media information flow page; predicting interactive characteristics through a multi-target recommendation model based on the acquired user data and content data to obtain characteristic prediction results corresponding to the first interactive characteristics and at least one second interactive characteristic; determining at least one target media object in the media objects to be recommended based on the obtained characteristic prediction result; recommending the target media object to the login user, namely returning the target media object to the terminal 200-2;

and the terminal 200-2 is used for presenting the target media object on the media information flow page.

In practical applications, the server 100 may be a server configured independently to support various services, or may be a server cluster; the terminal (e.g., terminal 200-1) may be any type of user terminal such as a smartphone, tablet, laptop, etc., and may also be a wearable computing device, a Personal Digital Assistant (PDA), a desktop computer, a cellular phone, a media player, a navigation device, a game console, a television, or a combination of any two or more of these or other data processing devices.

The hardware structure of the electronic device including but not limited to a server or a terminal according to the training method for multi-target recommendation model based on artificial intelligence provided by the embodiments of the present invention is described in detail below. Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and the electronic device 200 shown in fig. 2 includes: at least one processor 210, memory 250, at least one network interface 220, and a user interface 230. The various components in electronic device 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 2.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 230 includes one or more output devices 231, including one or more speakers and/or one or more visual display screens, that enable the presentation of media content. The user interface 230 also includes one or more input devices 232, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments of the invention is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;

a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

a presentation module 253 to enable presentation of information (e.g., a user interface for operating peripherals and displaying content and information) via one or more output devices 231 (e.g., a display screen, speakers, etc.) associated with the user interface 230;

an input processing module 254 for detecting one or more user inputs or interactions from one of the one or more input devices 232 and translating the detected inputs or interactions.

In some embodiments, the training apparatus for artificial intelligence based multi-objective recommendation model provided by the embodiments of the present invention can be implemented in software, and fig. 2 shows an artificial intelligence based multi-objective recommendation model training apparatus 255 stored in a storage 250, which can be software in the form of programs and plug-ins, and includes the following software modules: an acquisition module 2551, an input module 2552, a prediction module 2553, an update module 2554 and a training module 2555, which are logical and thus can be arbitrarily combined or further split according to the implemented functions, which will be described below.

In other embodiments, the artificial intelligence based multi-target recommendation model training apparatus provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and as an example, the artificial intelligence based multi-target recommendation model training apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the artificial intelligence based multi-target recommendation model training method provided in the embodiments of the present invention, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

In the current personalized information stream recommendation scene, the multi-target recommendation model is widely applied. In order to improve the tracking capability of the recommendation model for the current requirements of the user, time sampling targeted by the multi-target prediction target is different, so that the training samples of the prediction accurate model usually need to be subjected to hourly updating or even real-time updating, however, for targets with sampling time windows larger than an hourly window, the hourly window is insufficient to support behavior occurrence. For example, in an e-commerce shopping website, a time window of several hours or even several days may exist between a click behavior and a purchase behavior for the same commodity, and at this time, if a sample is made based on the hour-level time window, it is easy to generate a sample in which a purchase behavior may occur after several hours as a sample without a purchase behavior, and the prediction accuracy of the multi-target recommendation model trained based on the sample is low. This is caused by the non-synchronization of the sampling time windows between the multiple targets.

In the related art, in order to solve the problem of low accuracy of training samples caused by asynchronous sampling time windows among targets in the multi-target recommendation model, the sampling time windows of all targets are generally adjusted to be the corresponding maximum sampling time windows of all targets. For example, if the sampling time window of target a is in the order of hours and the sampling time window of target B is in the order of days, the sampling time window of target a is also adjusted to the order of days. Therefore, the problem that updating of the sampling time windows of the samples among the multiple targets is not synchronous is solved, but the sampling time windows of all the targets are adjusted to be the largest window period, so that the tracking capacity of the multiple-target recommendation model for the current requirements of the user is reduced, the prediction precision of the multiple-target recommendation model is reduced, and the user experience is influenced. Based on this, embodiments of the present invention provide a training method for multi-objective recommendation models based on artificial intelligence, so as to at least solve the above problems, which will be described in detail below.

In combination with the above description of the implementation scenario and the electronic device of the artificial intelligence based multi-objective recommendation model training method according to the embodiment of the present invention, the following description of the artificial intelligence based multi-objective recommendation model training method according to the embodiment of the present invention is provided. Referring to fig. 3, fig. 3 is a schematic flowchart of a training method of an artificial intelligence-based multi-target recommendation model according to an embodiment of the present invention; in some embodiments, the training method for the multi-objective recommendation model based on artificial intelligence may be implemented by a server or a terminal alone, or implemented by a server and a terminal in a cooperative manner, taking the server as an example, the training method for the multi-objective recommendation model based on artificial intelligence provided in the embodiments of the present invention includes:

step 301: the server obtains training samples of a multi-objective recommendation model for media object recommendation.

Here, the training sample is labeled with at least two labels corresponding to interactive features including: the first interactive feature and at least one second interactive feature, wherein the sampling time window of the second interactive feature is larger than that of the first interactive feature.

In some embodiments, the server may need to obtain training samples for training when training the multi-objective recommendation model for media object recommendation. Because the recommendation model is a multi-target recommendation model, which can predict multiple targets (i.e. multiple interactive features) for the input media objects to be recommended, the training samples obtained here are labeled with at least two labels corresponding to the interactive features.

In practical applications, the media object may be a video, a commercial good, or the like. For example, in the recommendation of personalized short videos, the media object is a short video, and the interactive features may be "play", "share", and "share-back", etc.; or for example, on an e-commerce shopping platform, the media object is a commodity, and the interactive feature may be "click", "collect", "buy", and the like. Thus, different interactive features will have different sensitivities to time, resulting in different interactive features and corresponding optimal time sampling windows. Illustratively, taking the e-commerce shopping platform as an example, there may be a time window of hours or even days between the interactive feature "click to view a certain good" and the interactive feature "purchase a certain good". The occurrence time of the click behavior is short, a large number of click behaviors may occur within one hour, and then for the interactive feature click, sampling can be performed based on a short time window (such as an hour level); whereas the "buy" action takes a long time, perhaps hours or even days, for the interactive feature "buy", sampling needs to be done based on a longer time window, such as day level.

In order to realize that the multi-target recommendation model can predict a plurality of interactive features at the same time, training samples marked with labels corresponding to the interactive features need to be obtained, and one training sample can be obtained only based on the same sampling time window, so that sample data of the interactive features of another sampling time window is not true enough. Based on the above, in the embodiment of the present invention, the interactive feature is divided into the first interactive data and the second interactive feature, wherein the sampling time window of the second interactive feature is larger than that of the first interactive feature. Here, the first interactive feature and the second interactive feature may be one or more than one. The obtained training sample of the multi-target recommendation model is marked with a label corresponding to the first interactive feature and a label corresponding to the second interactive feature.

In some embodiments, the server may obtain the training samples of the multi-objective recommendation model by: and acquiring data of the media object corresponding to the first interactive feature and data of at least one second interactive feature based on the sampling time window of the first interactive feature, and constructing a training sample of the multi-target recommendation model based on the acquired data.

In practical application, in order to improve the real-time recommendation capability of the multi-target recommendation model to the media object, the training samples obtained by the model need to be updated in an hour level or even a real-time level, so that the real-time training of the multi-target recommendation model and the updating of model parameters are realized. Therefore, when training samples of the multi-target recommendation model are collected, the training samples can be collected and constructed according to the sampling time window of the first interactive feature with the minimum sampling time window. Specifically, data of the media object corresponding to the first interactive feature and data of at least one second interactive feature are collected based on a sampling time window of the first interactive feature, and a training sample of the multi-target recommendation model is constructed based on the collected data. For example, the sampling time window of the first interactive feature may be one hour, and then, when the training sample is constructed, every hour passes, the data corresponding to the first interactive feature and the second interactive feature in the time window of the last hour is collected and integrated to construct the training sample of the hour, and the sample of the hour level is used as the training sample of the multi-target recommendation model.

By applying the embodiment, the training sample of the multi-target recommendation model is constructed based on the sampling time window of the first interactive feature, and the sampling time window of the first interactive feature is smaller than the sampling time window of the second interactive feature, so that the real-time property of the sample label of the first interactive feature is ensured, and the tracking capability of the multi-target recommendation model to the current requirement of the user is improved and the prediction accuracy of the multi-target recommendation model is further improved by training the multi-target recommendation model based on the training sample; therefore, when the media object recommendation is carried out by combining the prediction results of the first interactive characteristic and the second interactive characteristic, the real-time performance of the media object recommendation is correspondingly improved.

Step 302: and respectively inputting the training samples into at least one teacher model, wherein each teacher model is used for predicting a second interactive characteristic.

Because the training samples of the multi-target recommendation model are constructed based on the sampling time window of the first interactive feature, and the sampling time window of the first interactive feature is smaller than the sampling time window of the second interactive feature, the data for the second interactive feature may not be accurate in the training samples of the multi-target recommendation model. For example, if the first interactive feature is "click for merchandise" and the second interactive feature is "buy for merchandise", since there may be a difference of hours or even days between "buy" and "click", the sampling time window of the corresponding first interactive feature may be at an hour level and the sampling time window of the second interactive feature may be at a day level. When a training sample of the multi-target recommendation model is constructed based on a sampling time window (hour scale) of the first interactive feature, since the second interactive feature is likely to occur after several hours, an error occurs in a sample label of the training sample for the second interactive feature (that is, "purchasing behavior" occurring after several hours is treated as "no purchase"), so that the reality of the training sample is reduced, and the training of the multi-target recommendation model is influenced.

Therefore, after the training sample of the multi-target recommendation model is obtained, the label corresponding to the second interactive feature in the training sample needs to be updated and adjusted. In the embodiment of the invention, a method for constructing a teacher model for predicting the second interactive characteristics is adopted, and the updating of training samples of the multi-target recommendation model (namely, the student model) is guided through the result generated by the constructed and trained teacher model prediction, so as to guide the training of the multi-target recommendation model. In some embodiments, the server may train to obtain the teacher model by: obtaining a training sample of at least one teacher model; and training the corresponding teacher models respectively based on the training samples of the teacher models so that the teacher models can predict the corresponding second interactive features based on the input media objects.

Because the label corresponding to the second interactive feature in the training sample of the multi-target recommendation model is not true or accurate, if the label is directly used for training the multi-target recommendation model, the prediction precision of the multi-target recommendation model finished based on the training on the second interactive feature is reduced. Therefore, a teacher model that can be used for second interactive feature prediction needs to be trained.

First, training samples for training a teacher model are acquired. In order to ensure the accuracy of the training sample of the teacher model for the label of the second interactive feature, the training sample of the teacher model is collected and constructed based on the sampling time window of the second interactive feature, and the training sample is at least labeled with the label corresponding to the corresponding second interactive feature. In practical application, because the first interactive feature and the second interactive feature have an association relationship, the training sample of the teacher model may be further labeled with a label corresponding to the first interactive feature. When a teacher model is trained, inputting a training sample into the teacher model, simultaneously predicting a plurality of interactive features, and updating model parameters of the teacher model based on the interactive features; therefore, as a plurality of interactive features are learned together, the method has parameter sharing and information sharing, and has better effect than a teacher model obtained by only training one interactive feature.

In some embodiments, the server may perform the training of the teacher model by: respectively inputting the training samples of the teacher models to the corresponding teacher models, and predicting second interactive characteristics through the corresponding teacher models to obtain corresponding prediction results; determining the value of the loss function of each teacher model based on the obtained prediction result and the label marked by the training sample of each teacher model; based on the values of the loss functions of the teacher models, model parameters of the corresponding teacher models are updated.

In practical application, the teacher model can be trained in the following way for each teacher model: inputting the training sample of the teacher model into the teacher model, and predicting second interactive characteristics through the teacher model to obtain a prediction result of the teacher model; determining the value of a loss function of the teacher model based on the prediction result and the label marked by the training sample of the teacher model; thereby updating the parameters of the teacher model based on the values of the loss function of the teacher model.

Specifically, a difference between the prediction result and a label of a training sample of the teacher model may be obtained, and based on the difference, a value of a loss function of the teacher model is determined; when the loss function value is determined to exceed the set loss threshold value, determining an error signal of the teacher model based on the loss function value; the error signal is propagated back in the teacher model, so that the model parameters of each layer in the teacher model are updated in the process of the error signal propagation. Thus, training of the teacher model is completed.

And after at least one teacher model which is trained is obtained, respectively inputting the training samples of the multi-target recommendation model into the at least one teacher model.

Step 303: and respectively carrying out second interactive characteristic prediction on the training samples through at least one teacher model to obtain corresponding prediction results.

And after the training samples of the multi-target recommendation model are respectively input into at least one teacher model, respectively predicting the second interactive characteristics of the training samples of the multi-target recommendation model through the at least one teacher model to obtain corresponding prediction results.

Step 304: and updating the label of the corresponding second interactive feature in the training sample based on the obtained prediction result of the at least one teacher model to obtain the training sample after the at least one label is updated.

And updating the labels corresponding to the second interactive features in the training samples of the multi-target recommendation model based on the prediction result of the at least one teacher model.

In some embodiments, the server may update the labels of the respective second interactive features in the training sample by: and respectively marking the prediction result of each teacher model as a label of the corresponding second interactive feature in the training sample so as to update the label of the corresponding second interactive feature in the training sample, and obtaining the training sample after updating at least one label.

After the prediction results of the second interactive features of the teacher models aiming at the multi-target recommendation model training samples are obtained, the prediction results corresponding to the second interactive features are used as labels of the second interactive features in the training samples for labeling, and the labels of the second interactive features in the training samples are updated on the basis of the labels, so that the training samples with at least one label updated are obtained.

Step 305: and training the multi-target recommendation model based on the training samples after the at least one label is updated.

Here, the multi-target recommendation model can perform feature prediction corresponding to the first interaction feature and the at least one second interaction feature based on the input media object to recommend the media object based on the feature prediction result.

Before describing the training process of the multi-target recommendation model, a model structure of the multi-target recommendation model provided by the embodiment of the invention is described first. In practical application, corresponding specific models are set in the multi-target recommendation model for different prediction targets (i.e. interaction characteristics), that is, the models corresponding to the different prediction targets are different, and the models corresponding to the prediction targets may have different model parameters and the same model structure, or have different model structures and parameters. Referring to fig. 4, fig. 4 is a schematic structural diagram of a multi-target recommendation model according to an embodiment of the present invention, where the multi-target recommendation model is capable of predicting three interactive features, that is, interactive feature 1, interactive feature 2, and interactive feature 3, simultaneously; each interactive feature corresponds to a respective specific model, and a sharing layer exists between the interactive features.

Specifically, referring to fig. 5, fig. 5 is a schematic structural diagram of a multi-target recommendation model according to an embodiment of the present invention, where each interactive feature (interactive feature 1, interactive feature 2 …) in the multi-target recommendation model corresponds to a specific model, and includes a feature extraction layer, a feature concatenation layer, and a prediction layer, and in addition, a feature mapping layer exists as a shared layer corresponding to each interactive feature; the characteristic extraction layer is composed of a Wide layer, a DNN (Deep Neural Network) layer and a shared NFM (Neural Factorization Machine) layer, the characteristic splicing layer is composed of a full-link layer, and the prediction layer is composed of an MLP (Multi-layer Perceptron) model.

In some embodiments, the server may train the multi-objective recommendation model by: predicting the interactive features of the training samples with the updated at least one label through a multi-target recommendation model to obtain feature prediction results; acquiring the feature prediction result of each interactive feature and the difference between the tags corresponding to the corresponding interactive features; determining the value of a loss function corresponding to the corresponding interactive features in the multi-target recommendation model based on the difference corresponding to each interactive feature; and updating the model parameters of the multi-target recommendation model based on the values of the loss functions corresponding to the interactive features in the multi-target recommendation model.

Inputting the training sample with at least one updated label into a multi-target recommendation model, and predicting each interactive feature through the multi-target recommendation model to obtain a feature prediction result; acquiring the feature prediction result of each interactive feature and the difference between the tags corresponding to the corresponding interactive features, so as to obtain the difference corresponding to each interactive feature; determining the value of a loss function corresponding to each interactive feature in the multi-target recommendation model based on the difference corresponding to each interactive feature, wherein the loss function can be a logarithmic loss function, a square error loss function and the like, and the specific loss function can be determined according to the requirement; and updating the model parameters of the multi-target recommendation model based on the values of the loss functions corresponding to the interactive features.

In some embodiments, the server may update the model parameters of the multi-objective recommendation model by: when the value of the loss function corresponding to each interactive feature exceeds the corresponding loss threshold value, determining an error signal of the corresponding interactive feature based on the loss function corresponding to each interactive feature; and reversely propagating each error signal in the multi-target recommendation model, and updating the model parameters of each layer in the propagation process.

In practical applications, a corresponding loss threshold may be set in advance for each interactive feature. When the loss function value corresponding to each interactive feature exceeds the corresponding loss threshold value, determining an error signal of the corresponding interactive feature according to the loss function corresponding to each interactive feature; and then, the error signals are reversely propagated in the multi-target recommendation model, so that the model parameters of all layers are updated in the process of propagation.

In some embodiments, the server may update the model parameters of each layer in the multi-objective recommendation model by: the error signal of the first interactive feature is sequentially transmitted to a prediction layer, a feature splicing layer, a feature extraction layer and a sharing layer, so that the error signal of the first interactive feature is reversely transmitted in the multi-target recommendation model; the error signals of the second interactive features are sequentially transmitted to the prediction layer, the feature splicing layer and the feature extraction layer; blocking the error signal of the second interactive characteristic, so that the error signal of the second interactive characteristic cannot be transmitted to the shared layer; and updating the model parameters of each layer in the multi-target recommendation model in the process of back propagation of the error signal of the first interactive characteristic and the error signal of the second interactive characteristic.

Because the label of the second interactive feature is obtained by updating based on the prediction result of the teacher model, in order to prevent the error signal of the second interactive feature from influencing the model parameter corresponding to the first interactive feature in the back propagation process, in the embodiment of the invention, a corresponding blocking mechanism, namely a GradientBlock mechanism, is set for the error signal of the second interactive feature. Referring to fig. 6, fig. 6 is a schematic diagram of the back propagation blocking of the error signal of the second interaction feature provided by the embodiment of the invention. Here, a blocking mechanism is arranged between the specific model corresponding to the second interactive feature and the shared layer, so that an error signal of the second interactive feature cannot be propagated to the shared layer, and thus, it is ensured that the model parameter corresponding to the first interactive feature is not interfered so that the prediction effect is degraded.

Specifically, the error signal of the first interactive feature is sequentially propagated to a prediction layer, a feature splicing layer and a feature extraction layer corresponding to the first interactive feature, and a sharing layer (feature mapping layer) of each interactive feature, so as to realize the back propagation of the error signal of the first interactive feature in the multi-target recommendation model; therefore, in the process of back propagation of the error signal of the first interactive feature, the model parameters of each layer and the model parameters of the shared layer corresponding to the first interactive feature are updated.

The error signal of the second interactive feature is sequentially transmitted to a prediction layer, a feature splicing layer and a feature extraction layer corresponding to the second interactive feature, so that the error signal of the second interactive feature is reversely transmitted in the multi-target recommendation model; therefore, in the back propagation process of the error signal of the second interactive characteristic, the model parameters of each layer corresponding to the second interactive characteristic are updated.

In some embodiments, since the multi-target recommendation model includes the feature mapping layer, the feature extraction layer, the feature concatenation layer, and the prediction layer, the server may recommend the media object to be recommended based on the multi-target recommendation model by: acquiring user data and content data of a media object to be recommended; respectively mapping the user data and the content data through a feature mapping layer to obtain feature vectors corresponding to the user data and the content data; extracting the features of the obtained feature vectors through a feature extraction layer to obtain the feature vectors of the media objects to be recommended; splicing the feature vectors of the media objects to be recommended through the feature splicing layer to obtain spliced vectors; predicting interactive features through a prediction layer based on the splicing vector to obtain a feature prediction result corresponding to the media object to be recommended; and recommending the media object to be recommended based on the characteristic prediction result.

Here, how media object recommendations are made based on a trained multi-objective recommendation model is described below in conjunction with FIGS. 5 and 7. Referring to fig. 7, fig. 7 is a data flow directed graph for media object recommendation based on a multi-target recommendation model according to an embodiment of the present invention, and as can be seen from fig. 5, the multi-target recommendation model includes a feature mapping layer, a feature extraction layer, a feature concatenation layer, and a prediction layer. In practical application, when the media object is recommended based on the multi-target recommendation model, the server acquires user data and content data of the media object to be recommended, and then inputs the user data and the content data into the multi-target recommendation model.

And respectively carrying out feature mapping processing on the user data and the content data by the multi-target recommendation model through a feature mapping layer to obtain feature vectors corresponding to the user data and the content data. Specifically, for example, the mapping process may be performed by a one-hot encoding method, or a pre-trained feature mapping model.

After the feature vectors corresponding to the user data and the content data are obtained, feature extraction is carried out on the obtained feature vectors through a feature extraction layer, and the feature vectors of the media objects to be recommended are obtained. Specifically, the feature extraction layer is composed of a wide layer, a DNN layer and a shared NFM layer, so that implicit feature intersection can be performed on feature vectors of user data and content data through the DNN layer, and high-order feature vectors are extracted; performing display characteristic crossing on the characteristic vectors of the user data and the content data through the NFM layer, and summing to obtain a multi-dimensional characteristic vector; and performing linear summation on the feature vectors of the user data and the content data based on the weight through the wide layer, and outputting the feature vectors with reduced dimensionality and the like.

After the characteristic vectors of the media objects to be recommended are obtained, vector splicing is carried out through the characteristic splicing layer, and splicing vectors are obtained. Therefore, interactive features are predicted through the prediction layer based on the splicing vector, and feature prediction results corresponding to the media objects to be recommended are obtained. Specifically, the prediction layer may be an artificial neural network model, and the interactive features are predicted by calling an activation function to obtain a feature prediction result; here, the prediction layer may belong to regression prediction and may also belong to classification prediction. When the prediction layer is in regression prediction, carrying out regression processing by calling a first activation function (such as a regression function) to predict and obtain a feature prediction result of each interactive feature; when the prediction layer is classified and predicted, a second activation function (such as a softmax classification function) is called for classification processing, and feature prediction results of the interactive features are obtained through prediction.

And recommending the media object to be recommended based on the feature prediction result output by the multi-target recommendation model. Specifically, the feature prediction result may be an estimated click rate, so that the media object to be recommended may be recommended based on the estimated click rate.

In some embodiments, the server may also recommend media objects to be recommended based on the multi-objective recommendation model by: determining a login user corresponding to a media information flow page; acquiring user data of a login user and content data of a media object to be recommended; predicting interactive characteristics through a multi-target recommendation model based on the acquired user data and content data to obtain characteristic prediction results corresponding to the first interactive characteristics and at least one second interactive characteristic; determining at least one target media object in the media objects to be recommended based on the obtained characteristic prediction result; and recommending the target media object to the login user so as to present the target media object on the media information flow page.

When the fact that a user opens or browses a media information stream page is detected, a login user corresponding to the media information stream page is obtained, and then user data of the login user are obtained; and then acquiring the content data of the media object to be recommended. Inputting the acquired user data and content data into a multi-target recommendation model, and predicting interactive characteristics through the multi-target recommendation model so as to obtain a characteristic prediction result corresponding to the first interactive characteristic and at least one second interactive characteristic; and determining at least one target media object in the media objects to be recommended according to the characteristic prediction result, thereby recommending the target media object to the login user, and presenting the target media object on a media information stream page opened by the user.

By applying the above embodiment of the invention, since the labels of the first interactive feature and the second interactive feature are marked in the obtained training sample of the multi-target recommendation model, and the sampling time window of the second interactive feature is larger than the sampling time window of the first interactive feature, that is, the sampling time windows of the two interactive features are asynchronous, which leads to the reduction of the accuracy of the training samples, based on which at least one teacher model capable of predicting the second interactive feature is obtained, the training samples of the multi-target recommendation model are input into the at least one teacher model, predicting corresponding second interactive characteristics of the training samples through the at least one teacher model, updating labels of the corresponding second interactive characteristics in the training samples based on the obtained prediction result to obtain the training samples with at least one updated label, training the multi-target recommendation model based on the training sample after updating the at least one label;

An exemplary application of the embodiments of the present invention in a practical application scenario will be described below.

In the training process of the multi-objective recommendation model, due to the fact that the sensitivities of different interactive features (namely, objectives) to time windows are different, the optimal time sampling windows corresponding to the different interactive features are also different. Illustratively, taking the e-commerce shopping platform as an example, there may be a time window of hours or even days between the interactive feature "click to view a certain good" and the interactive feature "purchase a certain good". The occurrence time of the click behavior is short, a large number of click behaviors may occur within one hour, and then for the interactive feature click, sampling can be performed based on a short time window (such as an hour level); whereas the "buy" action takes a long time, perhaps hours or even days, for the interactive feature "buy", sampling needs to be done based on a longer time window, such as day level. When a training sample is constructed based on a short sampling time window (hour level), because the interactive feature "purchasing" behavior is likely to occur after several hours, an error occurs in a sample label of the training sample for the interactive feature (that is, "purchasing behavior" occurring after several hours is treated as "no purchasing"), so that the reality of the training sample is reduced, and the training of the multi-target recommendation model is affected, which is a problem caused by the asynchronous update of sample data among the interactive features in the multi-target recommendation model.

In order to solve the above problems, in the related art, the training samples of the multi-target recommendation model are usually acquired and constructed based on the sampling time window corresponding to the interactive feature with the largest sampling time window. Although the scheme solves the problem of asynchronous sample data updating, the real-time recommendation capability of the model is reduced, namely the real-time tracking capability of the multi-target recommendation model for the current requirements of the user is reduced.

Based on this, the embodiment of the present invention provides a training method for a multi-objective recommendation model based on artificial intelligence, so as to at least solve the above problems. Referring to fig. 8 and 9, fig. 8 is a schematic flowchart of a training method for a multi-objective recommendation model based on artificial intelligence according to an embodiment of the present invention, and fig. 9 is a schematic architecture diagram of the training method for the multi-objective recommendation model according to the embodiment of the present invention, where the training method for the multi-objective recommendation model based on artificial intelligence according to the embodiment of the present invention includes:

step 801: the server obtains training samples of a multi-objective recommendation model for media object recommendation.

Here, in order to implement that the multi-objective recommendation model can predict multiple interactive features at the same time, training samples labeled with labels corresponding to the interactive features need to be obtained, and one training sample can be obtained based on only one sampling time window, which may result in that the sampling data of the interactive features of another sampling time window is not true enough. Therefore, in the embodiment of the present invention, the interactive feature is divided into the first interactive data and the second interactive feature, wherein the sampling time window of the second interactive feature is larger than the sampling time window of the first interactive feature. Here, the first interactive feature and the second interactive feature may be one or more than one. The obtained training sample of the multi-target recommendation model is marked with a label corresponding to the first interactive feature and a label corresponding to the second interactive feature. Illustratively, with the media object as a short video, the first interactive characteristic may be "play" and "share", and the second interactive characteristic may be "share reflow"; or the media object is the commodity of the E-commerce shopping platform, the first interactive characteristic is click, and the second interactive characteristic is collection and purchase.

In practical application, in order to improve the real-time recommendation capability of the multi-target recommendation model to the media object, the training samples obtained by the model need to be updated in an hour level or even a real-time level, so that the real-time training of the multi-target recommendation model and the updating of model parameters are realized. Therefore, when training samples of the multi-target recommendation model are collected, the training samples of the multi-target recommendation model can be collected and constructed according to the sampling time window of the first interactive feature with the minimum sampling time window. For example, the sampling time window of the first interactive feature is one hour, and when the training sample is constructed, every hour passes, the data corresponding to the first interactive feature and the second interactive feature in the time window of the last hour is collected and integrated to construct the training sample of the hour, and the sample of the hour level is used as the training sample of the multi-target recommendation model. Illustratively, the sampling time windows of the interactive features "play" and "share" corresponding to the short video are in an hour level, and the sampling time window of the "share reflow" is in a day level, and at this time, when a training sample for the media object "short video" is obtained, the training sample can be constructed based on the hour level time window.

Step 802: training samples of at least one teacher model are obtained.

Here, since the training samples of the multi-target recommendation model are constructed based on the sampling time window of the first interactive feature, the accuracy of the sample label of the second interactive feature is reduced. Therefore, after the training sample of the multi-target recommendation model is obtained, the label corresponding to the second interactive feature in the training sample needs to be updated and adjusted.

In the embodiment of the invention, a method for constructing a teacher model for predicting the second interactive characteristics is adopted, and the updating of training samples of the multi-target recommendation model (namely, the student model) is guided through the result generated by the constructed and trained teacher model prediction, so as to guide the training of the multi-target recommendation model. Referring to fig. 9, the teacher model trained by the day-level samples predicts the second interactive features of the training samples of the multi-target recommendation model, so as to guide the training of the student model (i.e., the multi-target recommendation model) on the second interactive features based on the prediction results, that is, the prediction results of the teacher model corresponding to the second interactive features are used as labels corresponding to the second interactive features in the training samples of the multi-target recommendation model.

Here, the server may be trained to obtain at least one teacher model by: training samples for training the teacher model are obtained. In order to ensure the accuracy of the training sample of the teacher model for the label of the second interactive feature, the training sample of the teacher model is collected and constructed based on the sampling time window of the second interactive feature, and the training sample is at least labeled with the label corresponding to the corresponding second interactive feature. In practical application, because the first interactive feature and the second interactive feature have an association relationship, the training sample of the teacher model may be further labeled with a label corresponding to the first interactive feature. When a teacher model is trained, inputting a training sample into the teacher model, simultaneously predicting a plurality of interactive features, and updating model parameters of the teacher model based on the interactive features; therefore, as a plurality of interactive features are learned together, the method has parameter sharing and information sharing, and has better effect than a teacher model obtained by only training one interactive feature.

In the embodiment of the present invention, the structure of the teacher model is the same as that of the student model (i.e., the multi-objective recommendation model), and specifically, refer to fig. 4 and 5.

Step 803: and training the corresponding teacher models respectively based on the training samples of the teacher models so that the teacher models can predict the corresponding second interactive features based on the input media objects.

The training can be performed in the following way for each teacher model: inputting the training sample of the teacher model into the teacher model, and predicting second interactive characteristics through the teacher model to obtain a prediction result of the teacher model; determining the value of a loss function of the teacher model based on the prediction result and the label marked by the training sample of the teacher model; thereby updating the parameters of the teacher model based on the values of the loss function of the teacher model. Thus, training of the teacher model is completed.

Step 804: and respectively carrying out second interactive characteristic prediction on the training samples through at least one teacher model to obtain corresponding prediction results.

And after the training samples of the multi-target recommendation model are respectively input into at least one teacher model, respectively predicting corresponding second interactive characteristics of the training samples of the multi-target recommendation model through the at least one teacher model to obtain corresponding prediction results.

Step 805: and updating the label of the corresponding second interactive feature in the training sample based on the obtained prediction result of the at least one teacher model to obtain the training sample after the at least one label is updated.

Step 806: predicting the interactive features of the training samples with the updated at least one label through a multi-target recommendation model to obtain feature prediction results;

in practical application, corresponding specific models are set in the multi-target recommendation model for different prediction targets (i.e. interaction characteristics), that is, the models corresponding to the different prediction targets are different, and the models corresponding to the prediction targets may have different model parameters and the same model structure, or have different model structures and parameters. Referring to fig. 4, here, the multi-objective recommendation model can predict three interactive features, namely interactive feature 1, interactive feature 2 and interactive feature 3, at the same time; each interactive feature corresponds to a respective specific model, and a sharing layer exists between the interactive features. Specifically, referring to fig. 5, in the multi-objective recommendation model, a specific model corresponding to each interactive feature includes a feature extraction layer, a feature concatenation layer, and a prediction layer, and in addition, a feature mapping layer exists as a shared layer corresponding to each interactive feature.

Inputting the training sample with at least one updated label into the multi-target recommendation model, and predicting each interactive feature through the multi-target recommendation model to obtain a feature prediction result corresponding to the corresponding interactive feature.

Step 807: and determining the value of the loss function corresponding to the corresponding interactive feature in the multi-target recommendation model based on the feature prediction result of each interactive feature and the label corresponding to the corresponding interactive feature.

Acquiring the feature prediction result of each interactive feature and the difference between the tags corresponding to the corresponding interactive features, so as to obtain the difference corresponding to each interactive feature; and determining the value of a loss function corresponding to each interactive feature in the multi-target recommendation model based on the difference corresponding to each interactive feature, wherein the loss function can be a logarithmic loss function, a square error loss function and the like, and the specific loss function can be determined according to the requirement.

Step 808: and when the value of the loss function corresponding to each interactive feature exceeds the corresponding loss threshold value, determining an error signal of the corresponding interactive feature based on the loss function corresponding to each interactive feature.

Step 809: and reversely propagating each error signal in the multi-target recommendation model, and updating the model parameters of each layer in the propagation process.

In practical application, because the label of the second interactive feature is obtained by updating based on the prediction result of the teacher model, in order to prevent the error signal of the second interactive feature from influencing the model parameter corresponding to the first interactive feature in the back propagation process, in the embodiment of the invention, a corresponding blocking mechanism, namely a GradientBlock mechanism, is set for the error signal of the second interactive feature. Referring to fig. 6, a blocking mechanism is provided between the specific model corresponding to the second interactive feature and the shared layer, so that the error signal of the second interactive feature cannot be propagated to the shared layer, and thus, it is ensured that the model parameter corresponding to the first interactive feature is not disturbed so that the prediction effect is degraded.

Step 810: and the terminal responds to the opening operation of the user for the media information flow page and sends an acquisition request for the media object.

Here, the terminal sends an acquisition request of the media object to the server in response to an opening operation of the user for the media information flow page so as to acquire a target media object recommended by the server, and therefore the media object is presented to the user for the user to watch.

Step 811: and the server receives the acquisition request and determines a login user corresponding to the media information flow page.

Here, after receiving the acquisition request, the server first determines a login user who opens the media information flow page in order to recommend the media object.

Step 812: and acquiring the user data of the login user and the content data of the media object to be recommended.

Here, after the login user is determined, user data of the login user, such as user identification, historical behavior (search records, browsing records, etc.), user environment (geographical location, network status, etc.), is obtained. And then, acquiring content data of the media object to be recommended, such as an identifier, a content tag, a publishing source, related text and the like of the media object to be recommended.

Step 813: and predicting the interactive features through a multi-target recommendation model based on the acquired user data and content data to obtain feature prediction results corresponding to the first interactive features and at least one second interactive feature.

Step 814: and determining at least one target media object in the media objects to be recommended based on the obtained characteristic prediction result, and recommending the target media object to the terminal of the login user.

For example, when the media object is a short video, the corresponding interactive feature may include "play", "share", and "share reflow", and accordingly, the feature prediction result may be a predicted play probability, a predicted share probability, and the like. And after the characteristic prediction results corresponding to the interactive characteristics are obtained, selecting at least one target media object from the media objects to be recommended for recommendation based on the characteristic prediction results.

Step 815: and the terminal receives the target media object and presents the target media object on the media information flow page.

The terminal presents a media information stream page including the target media object. Referring to fig. 10, fig. 10 is a schematic diagram of a media information stream page provided by an embodiment of the present invention, where a media object is a short video, and in a "recommendation" column, a short video recommended by a server (such as an "annoying autumn wind" original video, etc.) is presented for a user to view.

Continuing with the description of the multi-objective artificial intelligence based recommendation model training apparatus 255 provided in the embodiments of the present invention, in some embodiments, the multi-objective artificial intelligence based recommendation model training apparatus may be implemented by using a software module. Referring to fig. 11, fig. 11 is a schematic structural diagram of the training apparatus 255 for multi-objective artificial intelligence-based recommendation model according to the embodiment of the present invention, where the training apparatus 255 for multi-objective artificial intelligence-based recommendation model according to the embodiment of the present invention includes:

an obtaining module 2551, configured to obtain a training sample of a multi-target recommendation model for media object recommendation, where the training sample is labeled with at least two labels corresponding to interactive features; wherein the interactive features include: the method comprises the steps that a first interactive feature and at least one second interactive feature are adopted, wherein the sampling time window of the second interactive feature is larger than that of the first interactive feature;

an input module 2552, configured to input the training samples into at least one teacher model respectively, where each teacher model is configured to predict one of the second interaction features;

the prediction module 2553 is configured to perform second interactive feature prediction on the training samples through the at least one teacher model, so as to obtain corresponding prediction results;

an updating module 2554, configured to update the label of the corresponding second interactive feature in the training sample based on the obtained prediction result of the at least one teacher model, so as to obtain a training sample after updating the at least one label;

a training module 2555, configured to train the multi-target recommendation model based on the updated training sample of the at least one tag, so that the multi-target recommendation model can perform feature prediction corresponding to the first interactive feature and the at least one second interactive feature based on an input media object, so as to recommend the media object based on a feature prediction result.

In some embodiments, the obtaining module 2551 is further configured to collect data of the media object corresponding to the first interactive feature and data of the at least one second interactive feature based on a sampling time window of the first interactive feature; and constructing training samples of the multi-target recommendation model based on the collected data.

In some embodiments, the updating module 2554 is further configured to label the prediction result of each teacher model as a label of a corresponding second interactive feature in the training sample, so as to update the label of the corresponding second interactive feature in the training sample, and obtain the training sample after updating at least one label.

In some embodiments, the apparatus further comprises:

In some embodiments, the teacher model training module is further configured to input a training sample of each teacher model to a corresponding teacher model, and predict the second interaction feature through the corresponding teacher model to obtain a corresponding prediction result;

In some embodiments, the training module 2555 is further configured to perform, by using the multi-objective recommendation model, prediction of the interactive features on the training samples after updating the at least one label, so as to obtain a feature prediction result;

In some embodiments, the training module 2555 is further configured to determine an error signal of each interactive feature based on the loss function corresponding to each interactive feature when the value of the loss function corresponding to each interactive feature exceeds the corresponding loss threshold;

In some embodiments, the multi-objective recommendation model includes a sharing layer, a feature extraction layer, a feature splicing layer, and a prediction layer, and the training module 2555 is further configured to propagate the error signal of the first interactive feature to the prediction layer, the feature splicing layer, the feature extraction layer, and the sharing layer in sequence, so as to implement back propagation of the error signal of the first interactive feature in the multi-objective recommendation model;

In some embodiments, the multi-target recommendation model includes a feature mapping layer, a feature extraction layer, a feature concatenation layer, and a prediction layer, and the apparatus further includes:

In some embodiments, the recommendation module is further configured to determine a login user corresponding to a media information flow page;

An embodiment of the present invention further provides an electronic device, where the electronic device includes:

a memory for storing executable instructions;

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories. The computer may be a variety of computing devices including intelligent terminals and servers.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

The above description is only an example of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A training method of a multi-target recommendation model based on artificial intelligence is characterized by comprising the following steps:

2. The method of claim 1, wherein obtaining training samples for a multi-objective recommendation model for media object recommendations comprises:

acquiring data of the media object corresponding to the first interactive feature and data corresponding to the at least one second interactive feature based on the sampling time window of the first interactive feature; and are

3. The method of claim 1, wherein updating the labels of the corresponding second interactive features in the training sample based on the obtained prediction results of the at least one teacher model to obtain the training sample after updating the at least one label comprises:

and respectively taking the prediction result of each teacher model as a label of the corresponding second interactive feature in the training sample to label so as to update the label of the corresponding second interactive feature in the training sample, and obtaining the training sample after updating at least one label.

4. The method of claim 1, wherein prior to inputting the training samples into at least one teacher model, respectively, the method further comprises:

obtaining a training sample of the at least one teacher model;

5. The method of claim 4, wherein the training of the respective teacher model based on the training samples of each teacher model comprises:

respectively inputting the training samples of the teacher models to the corresponding teacher models, and predicting the second interactive characteristics through the corresponding teacher models to obtain corresponding prediction results;

6. The method of claim 1, wherein training the multi-objective recommendation model based on the updated at least one labeled training sample comprises:

predicting the interactive features of the training sample with the updated at least one label through the multi-target recommendation model to obtain a feature prediction result;

7. The method of claim 6, wherein updating the model parameters of the multi-objective recommendation model based on the values of the loss functions corresponding to the interactive features in the multi-objective recommendation model comprises:

when the value of the loss function corresponding to each interactive feature exceeds the corresponding loss threshold value, determining an error signal of the corresponding interactive feature based on the loss function corresponding to each interactive feature;

8. The method of claim 7, wherein the multi-objective recommendation model comprises a sharing layer, a feature extraction layer, a feature concatenation layer and a prediction layer, and wherein propagating each of the error signals back in the multi-objective recommendation model and updating model parameters of each layer in the process of propagation comprises:

the error signal of the first interactive feature is sequentially transmitted to the prediction layer, the feature splicing layer, the feature extraction layer and the sharing layer, so that the error signal of the first interactive feature is reversely transmitted in the multi-target recommendation model;

9. The method of claim 1, wherein the multi-objective recommendation model comprises a feature mapping layer, a feature extraction layer, a feature concatenation layer, and a prediction layer, the method further comprising:

acquiring user data and content data of a media object to be recommended;

10. The method of claim 1, wherein the method further comprises:

determining a login user corresponding to a media information flow page;

11. An artificial intelligence-based multi-objective recommendation model training device, the device comprising:

12. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the artificial intelligence based multi-objective recommendation model training method of any one of claims 1 to 10 when executing the executable instructions stored in the memory.

13. A computer-readable storage medium having stored thereon executable instructions for, when executed, implementing the artificial intelligence based multi-objective recommendation model training method of any one of claims 1 to 10.