CN113596528B

CN113596528B - Training method and device of video push model, server and storage medium

Info

Publication number: CN113596528B
Application number: CN202010366374.0A
Authority: CN
Inventors: 王琳; 叶璨; 黄俊逸; 胥凯; 闫阳辉
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2022-10-04
Anticipated expiration: 2040-04-30
Also published as: CN113596528A

Abstract

The disclosure relates to a training method, a device, a server and a storage medium of a video push model, wherein the method comprises the following steps: acquiring account information of a sample account and actual operation information of the sample account on pushed video information; inputting the account information and the video information into a video operation prediction model to obtain prediction operation information of the sample account on the video information; training the video operation prediction model according to the prediction operation information and the actual operation information; according to the trained video operation prediction model, obtaining prediction operation information of a plurality of sample accounts on target video information, and using the prediction operation information as training sample data of a video push model to be trained; and training the video push model to be trained according to the training sample data. By adopting the method, the training efficiency of the video push model can be improved.

Description

Training method and device of video push model, server and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for training a video push model, a server, and a storage medium.

Background

With the development of computer technology, various applications for browsing videos are in endless, and more accounts are selected to browse videos through the applications; to achieve accurate pushing of videos, the videos pushed to the account are typically determined by training a model for pushing the videos.

In the related art, a training mode of a model for pushing a video is generally to repeatedly train the model for pushing the video by acquiring a large amount of video operation sample data on a line until the model for pushing the video converges; however, the process of operating sample data through a large amount of videos on the acquisition line is complicated, so that the training time of the model is long, and the training efficiency of the model is low.

Disclosure of Invention

The present disclosure provides a training method, an apparatus, a server and a storage medium for a video push model, so as to at least solve the problem of low training efficiency of models in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a training method for a video push model, including:

acquiring account information of a sample account and actual operation information of the sample account on pushed video information;

inputting the account information and the video information into a video operation prediction model to obtain prediction operation information of the sample account on the video information;

training the video operation prediction model according to the prediction operation information and the actual operation information;

according to the trained video operation prediction model, obtaining prediction operation information of a plurality of sample accounts on target video information, and using the prediction operation information as training sample data of a video push model to be trained;

and training the video push model to be trained according to the training sample data.

In an exemplary embodiment, the inputting the account information and the video information into a video operation prediction model to obtain the prediction operation information of the sample account on the video information includes:

extracting first video information which is operated by the sample account in sequence before a preset moment and second video information which is operated by the sample account at the preset moment from the video information;

acquiring account information characteristic codes of the account information and first video information characteristic codes of the first video information;

inputting the account information feature code and the first video information feature code into an account state coding network in the video operation prediction model to obtain an account state code of the sample account at the preset moment;

inputting a second video information feature code of the second video information and the account status code into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information at the preset moment.

In an exemplary embodiment, the inputting the account information feature code and the first video information feature code into an account status coding network in the video operation prediction model to obtain an account status code of the sample account at the preset time includes:

inputting the first video information feature code into a first network in the account state coding network to obtain the video state code at the preset moment;

and inputting the account information characteristic code and the video state code into a second network in the account state code network to obtain the account state code of the sample account at the preset moment.

In an exemplary embodiment, the inputting the second video information feature coding and the account status coding into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information includes:

inputting the second video information feature code and the account state code into an operation prediction network in the video operation prediction model to obtain a plurality of operation behavior probabilities of the sample account on the second video information;

and according to preset weights corresponding to the operation behavior probabilities, weighting the operation behavior probabilities to obtain a target operation probability of the sample account on the second video information, wherein the target operation probability is correspondingly used as the prediction operation information of the sample account on the second video information at the preset moment.

In an exemplary embodiment, the prediction operation information of the sample account on the target video information includes the prediction operation information of the sample account on the target video information at each preset time;

the training the video push model to be trained according to the training sample data comprises the following steps:

acquiring target video information characteristic codes of the target video information;

inputting the account information feature codes and the target video information feature codes into an account state coding network in the video pushing model to be trained to obtain target account state codes of the sample accounts at all the preset moments;

inputting the target video information feature code and the target account state code into an operation prediction network in the video push model to be trained to obtain target prediction operation information of the sample account on the target video information at each preset moment;

inputting the target prediction operation information into a preset video push evaluation model to obtain an operation feedback value of the sample account on the target video information at each preset moment;

and repeatedly training the video push model to be trained and the preset video push evaluation model according to the target account state code, the prediction operation information of the sample account on the target video information at each preset moment, the target prediction operation information and the operation feedback value until the video push model to be trained and the preset video push evaluation model both meet the convergence condition.

According to a second aspect of the embodiments of the present disclosure, there is provided a video push method, including:

acquiring account information of an account to be pushed;

inputting the account information of the account to be pushed into the trained video pushing model to obtain the pushed video information of the account to be pushed; the trained video push model is obtained according to the training method of the video push model;

and pushing the pushed video information to the account to be pushed.

In an exemplary embodiment, pushing the pushed video information to the account to be pushed includes:

arranging the push video information according to the sequence of outputting the push video information by the trained video push model to obtain the arranged push video information;

and pushing the arranged pushed video information to the account to be pushed.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus for a video push model, including:

the information acquisition unit is configured to acquire account information of a sample account and actual operation information of the sample account on pushed video information;

an information prediction unit configured to perform input of the account information and the video information into a video operation prediction model, so as to obtain prediction operation information of the sample account on the video information;

a prediction model training unit configured to perform training of the video operation prediction model according to the prediction operation information and the actual operation information;

the sample data acquisition unit is configured to execute the video operation prediction model after training to obtain prediction operation information of a plurality of sample accounts on target video information, and the prediction operation information is used as training sample data of a video push model to be trained;

and the push model training unit is configured to train the video push model to be trained according to the training sample data.

In an exemplary embodiment, the information prediction unit is further configured to extract, from the video information, first video information in which the sample account has been operated sequentially before a preset time and second video information in which the sample account has been operated at the preset time; acquiring account information characteristic codes of the account information and first video information characteristic codes of the first video information; inputting the account information feature code and the first video information feature code into an account state coding network in the video operation prediction model to obtain an account state code of the sample account at the preset moment; and inputting a second video information characteristic code and the account state code of the second video information into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information at the preset moment.

In an exemplary embodiment, the information prediction unit is further configured to perform input of the first video information feature code into a first network of the account status coding networks, so as to obtain a video status code at the preset time; and inputting the account information characteristic code and the video state code into a second network in the account state code network to obtain the account state code of the sample account at the preset moment.

In an exemplary embodiment, the information prediction unit is further configured to execute an operation prediction network that inputs the second video information feature coding and the account status coding into the video operation prediction model, and obtain a plurality of operation behavior probabilities of the sample account on the second video information; and according to preset weights corresponding to the operation behavior probabilities, weighting the operation behavior probabilities to obtain a target operation probability of the sample account on the second video information, wherein the target operation probability is correspondingly used as the prediction operation information of the sample account on the second video information at the preset moment.

the push model training unit is further configured to perform target video information feature coding for acquiring the target video information; inputting the account information feature codes and the target video information feature codes into an account state coding network in the video pushing model to be trained to obtain target account state codes of the sample accounts at all the preset moments; inputting the target video information feature code and the target account state code into an operation prediction network in the video push model to be trained to obtain target prediction operation information of the sample account on the target video information at each preset moment; inputting the target prediction operation information into a preset video push evaluation model to obtain an operation feedback value of the sample account on the target video information at each preset moment; repeatedly training the to-be-trained video push model and the preset video push evaluation model according to the target account state code, the prediction operation information of the sample account on the target video information at each preset moment, the target prediction operation information and the operation feedback value until the to-be-trained video push model and the preset video push evaluation model both meet the convergence condition.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a video push apparatus including:

the account information acquisition unit is configured to acquire account information of an account to be pushed;

the video information acquisition unit is configured to execute the video pushing model which inputs the account information of the account to be pushed into the training completion to obtain the pushed video information of the account to be pushed; the trained video push model is obtained according to the training method of the video push model;

a video information pushing unit configured to perform pushing of the pushed video information to the account to be pushed.

In an exemplary embodiment, the video information pushing unit is further configured to perform arranging the pushed video information according to an order in which the trained video pushing model outputs the pushed video information, so as to obtain arranged pushed video information; and pushing the arranged pushed video information to the account to be pushed.

According to a fifth aspect of embodiments of the present disclosure, there is provided a server including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement a method of training a video push model as described in any embodiment of the first aspect and a method of video push as described in any embodiment of the second aspect.

According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium comprising: the instructions in the storage medium, when executed by the processor of the server, enable the server to perform the method of training a video push model as described in any of the embodiments of the first aspect and the method of video pushing as described in any of the embodiments of the second aspect.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, from which the at least one processor of the apparatus reads and executes the computer program, so that the apparatus performs the training method of the video push model described in any of the embodiments of the first aspect and the video push method described in any of the embodiments of the second aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

obtaining the prediction operation information of the sample account on the video information by obtaining the account information of the sample account and the actual operation information of the sample account on the pushed video information and inputting the account information and the video information into a video operation prediction model; then, training a video operation prediction model according to the prediction operation information and the actual operation information; finally, according to the trained video operation prediction model, obtaining the prediction operation information of a plurality of sample accounts on the target video information, using the prediction operation information as sample data of the video push model to be trained, and further training the video push model to be trained; the aim of training the video push model according to the prediction operation information of the target video information by a plurality of sample accounts output by the trained video operation prediction model is fulfilled; the trained video operation prediction model is used as a training sample simulator, training sample data of the video push model can be generated rapidly, and a large amount of video operation sample data on a line does not need to be acquired, so that the training process of the video push model is simplified, and the training efficiency of the video push model is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of a training method of a video push model according to an exemplary embodiment.

FIG. 2 is a flowchart illustrating a method of training a video push model, according to an example embodiment.

Fig. 3 is a flowchart illustrating steps for obtaining prediction operation information of a sample account for video information according to an exemplary embodiment.

FIG. 4 is a diagram illustrating training of a video operation prediction model according to an exemplary embodiment.

FIG. 5 is a flowchart illustrating the training steps of a video push model according to an exemplary embodiment.

Fig. 6 is a flowchart illustrating a video push method according to an exemplary embodiment.

FIG. 7 is a block diagram illustrating a training apparatus for a video push model in accordance with an exemplary embodiment.

Fig. 8 is a block diagram illustrating a video push device according to an example embodiment.

Fig. 9 is an internal block diagram of a server according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the disclosure, as detailed in the appended claims.

The training method of the video push model provided by the present disclosure can be applied to the application environment shown in fig. 1. Referring to fig. 1, the application environment diagram includes a server 110, and the server 110 may be implemented by an independent server or a server cluster composed of a plurality of servers. In fig. 1, the server 110 is taken as an independent server for illustration, and referring to fig. 1, the server 110 obtains account information of a sample account and actual operation information of the sample account on pushed video information; inputting the account information and the video information into a video operation prediction model to obtain prediction operation information of the sample account on the video information; training a video operation prediction model according to the prediction operation information and the actual operation information; according to the trained video operation prediction model, obtaining prediction operation information of a plurality of sample accounts on target video information, and using the prediction operation information as training sample data of a video push model to be trained; and training the video push model to be trained according to the training sample data to obtain a trained video push model, and outputting video information corresponding to the account to be pushed through the trained video push model.

Fig. 2 is a flowchart illustrating a training method of a video push model according to an exemplary embodiment, where as shown in fig. 2, the training method of the video push model is used in the server 110 shown in fig. 1, and includes the following steps:

in step S210, account information of the sample account and actual operation information of the sample account on the pushed video information are acquired.

The account refers to a registered account of a video application in the terminal, such as a registered account of a short video application, a registered account of a video browsing application, and the like. The sample account is an authorized account which needs to be processed and analyzed, and the account information refers to information for identifying the account, such as the age of the account, the sex of the account, the city where the account is located, the type of the terminal used by the account, the network connection mode of the terminal used by the account, the video operation behavior information of the account, and the like; the video operation behavior information of the account can be click, praise, attention, long-time viewing (particularly whether the video operation is finished playing), and the like.

The pushed video information may be short video information, micro-movie information, drama information, and the like, and has corresponding video characteristics, such as a video category, a score category in a video, video operation behavior information, and the like.

The actual operation information of the sample account on the pushed video information refers to that the sample account performs operations such as clicking, praise, attention and long-term watching on the pushed video information; of course, the present invention may also refer to actual operation information of the sample account on the pushed video information at each preset time, specifically, what operation the sample account performs on a certain video at a first preset time, what operation the sample account performs on a certain video at a second preset time, what operation the sample account performs on a certain video at a last preset time, and the like for the video information. For example, for video information pushed to the sample account, the sample account clicks on the a video at a first preset time, approves the B video at a second preset time, and pays attention to the N video at a last preset time, and the like.

Specifically, the server acquires account information corresponding to an authorized account on the network based on a big data technology, and the account information is used as account information of the sample account; acquiring a video operation log of a sample account, and sampling video information pushed to the sample account and actual operation information of the sample account on the pushed video information from the video operation log of the sample account; therefore, the video operation prediction model can be trained subsequently according to the account information of the sample account and the actual operation information of the sample account on the pushed video information, so that the trained video operation prediction model can be obtained.

In step S220, the account information and the video information are input into the video operation prediction model, so as to obtain the prediction operation information of the sample account on the video information.

In step S230, a video operation prediction model is trained based on the prediction operation information and the actual operation information.

The video operation prediction model is a supervised learning network, is used for counting the prediction operation information of an account on video information, and mainly comprises two parts, wherein the first part is an account state coding network and is used for coding the state of a sample account to obtain an account state code; the second part is an operation prediction network, which is used for calculating the prediction operation information of the sample account on the video information under the current account state, such as the click probability, the like probability, the attention probability, the long viewing probability and the like.

Specifically, the server inputs the account information of the sample account and the video information pushed to the sample account into a video operation prediction model to obtain the prediction operation information of the sample account on the video information; obtaining a difference value between the prediction operation information of the sample account on the video information and the corresponding actual operation information, and determining a loss value of a video operation prediction model according to the difference value; for example, the difference value is used as a loss value of the video operation prediction model, or a loss value of the video operation prediction model is obtained through calculation by combining a cross entropy loss function according to the difference value; secondly, the server reversely trains the video operation prediction model according to the loss value until the training times of the video operation prediction model reach preset training times or until the network parameters of the video operation prediction model reach convergence; and if the training times of the video operation prediction model reach the preset training times or the network parameters of the video operation prediction model reach convergence, taking the current video operation prediction model as the trained video operation prediction model. Therefore, the method is beneficial to obtaining the prediction operation information of a plurality of sample accounts on the target video information through the trained video operation prediction model subsequently, and the online collection of a large amount of video operation sample data is not needed, so that the training process of the video push model is simplified, and the training efficiency of the video push model is improved.

In step S240, according to the trained video operation prediction model, obtaining prediction operation information of the target video information by multiple sample accounts, which is used as training sample data of the video push model to be trained. The video push model to be trained is a model capable of pushing video information to an account, the network structure of the model is basically consistent with that of a video operation prediction model, and the model also has an account state coding network and an operation prediction network.

Specifically, the server collects video information on the network or video information in the candidate set as target video information; inputting the account information of the sample account and the target video information into the trained video operation prediction model to obtain the prediction operation information of the sample account on the target video information; by analogy, the prediction operation information of the plurality of sample accounts on the target video information can be obtained, and the prediction operation information of the plurality of sample accounts on the target video information is used as training sample data of the video push model to be trained. Therefore, the method is beneficial to rapidly obtaining the prediction operation information of the target video information by the multiple sample accounts through the trained video operation prediction model, and further improves the training efficiency of the video push model.

Further, the server can also input the 0 th video information (full 0 vector) and the account information of the sample account into an account state coding network in a video push model to be trained to obtain an account state code at the first moment; inputting the account state code at the first moment and the video information in the candidate set into an operation prediction network in a video push model to be trained to obtain the probability that each video information in the candidate set is selected by the sample account at the first moment; the video information with the maximum probability is used as first target video information pushed to the sample account; inputting the 0 th video information, the first target video information and the account information into an account state coding network in a video push model to be trained to obtain an account state code at a second moment; inputting the account state code at the second moment and the video information in the candidate set into an operation prediction network in a video push model to be trained to obtain the probability that each video information in the candidate set is selected by the sample account at the second moment; the video information with the maximum probability is used as second target video information pushed to the sample account; by analogy, target video information pushed again to the sample account by the video pushing model to be trained can be obtained; with reference to the method, target video information pushed again to a plurality of sample accounts can be obtained; secondly, inputting the account information of the sample account and the target video information pushed to the sample account again into the trained video operation prediction model to obtain the prediction operation information of the sample account on the target video information; by analogy, the prediction operation information of the target video information re-pushed to the sample accounts by the multiple sample accounts can be obtained and used as training sample data of the video pushing model to be trained.

In step S250, the video push model to be trained is trained according to the training sample data.

Specifically, the server repeatedly trains the video push model to be trained according to training sample data to obtain a predicted loss value of the video push model to be trained on the training sample data; reversely training a video push model to be trained according to the predicted loss value until the video push model meets a convergence condition; and if the video pushing model meets the convergence condition, for example, the training times of the video pushing model reach the preset training times or the network parameters of the video pushing model reach convergence, taking the video pushing model as the trained video pushing model.

Further, after the trained video push model is obtained, the server may determine, according to the above-described manner of determining the target video information to be pushed to the sample account again, first video information to be pushed to the account to be pushed, then second video information to be pushed to the account to be pushed, and so on, may determine final video information to be pushed to the account to be pushed; therefore, the influence of the first K-1 video information is comprehensively considered when the Kth video information is determined, the accuracy of the determined video information is improved, and the video pushing accuracy is further improved.

In the training method of the video push model, the prediction operation information of the sample account on the video information is obtained by acquiring the account information of the sample account and the actual operation information of the sample account on the pushed video information and inputting the account information and the video information into the video operation prediction model; then, training a video operation prediction model according to the prediction operation information and the actual operation information; finally, according to the trained video operation prediction model, obtaining the prediction operation information of a plurality of sample accounts on the target video information, using the prediction operation information as sample data of the video push model to be trained, and further training the video push model to be trained; the aim of training the video push model according to the prediction operation information of the target video information by a plurality of sample accounts output by the trained video operation prediction model is fulfilled; the trained video operation prediction model is used as a training sample simulator, training sample data of the video push model can be generated quickly, and a large amount of online video operation sample data does not need to be acquired, so that the training process of the video push model is simplified, and the training efficiency of the video push model is improved.

In an exemplary embodiment, as shown in fig. 3, in step S220, the account information and the video information are input into the video operation prediction model to obtain the prediction operation information of the sample account on the video information, which may be specifically implemented by the following steps:

in step S310, first video information in which the sample account has been operated sequentially before a preset time and second video information in which the sample account has been operated at the preset time are extracted from the video information.

The preset time refers to the time corresponding to the video information of the sample account operation, for example, 16 points and 15 points. The first video information is video information that is operated by the sample account before the preset time, and may be one or more pieces of video information, for example, the sample account clicks the video information a, the video information B, and the video information C in sequence before the preset time; the second video information refers to video information that the sample account has operated at a preset time, for example, the sample account clicks the D video information at the preset time. It should be noted that the first video information carries actual operation information of the sample account on the corresponding video information at each time before the preset time, such as clicking, praise, and the like; the second video information carries actual operation information of the sample account on the second video information at a preset time, such as clicking, praise and the like.

Specifically, the server obtains operation sequence information of the sample account on the pushed video information according to actual operation information of the sample account on the pushed video information; according to the operation sequence information, video information of sample accounts which are sequentially operated before a preset moment is extracted from the pushed video information and is used as first video information; and extracting the video information of which the sample account is operated at the preset moment from the pushed video information as second video information.

In step S320, the account information feature code of the account information and the first video information feature code of the first video information are obtained.

The account information feature coding refers to a low-dimensional feature vector which is subjected to compression coding and used for representing low-level semantics of account information, and the first video information feature coding is also a low-dimensional feature vector which is subjected to compression coding and used for representing low-level semantics of the first video information.

It should be noted that the first video information feature codes of the first video information include video information feature codes corresponding to video information that is operated by a sample account in sequence before a preset time; for example, the sample account sequentially clicks the video information a, the video information B, and the video information C before the preset time, and the first video information feature coding includes video information feature coding a, video information feature coding B, and video information feature coding C.

Specifically, the server acquires a preset feature coding instruction, respectively extracts feature information in the account information and feature information in the first video information according to the preset feature coding instruction, and codes the feature information in the account information and the feature information in the first video information to obtain an account information feature code of the account information and a first video information feature code of the first video information.

Further, the server can input the account information and the first video information into a pre-trained feature coding model, and output the account information feature coding of the account information and the first video information feature coding of the first video information through the feature coding model; the pre-trained feature coding model is a neural network model, such as a convolutional neural network model, capable of performing feature extraction and feature coding on account information and video information to obtain account information feature coding of the account information and video information feature coding of the video information.

In step S330, the account information feature code and the first video information feature code are input to an account status coding network in the video operation prediction model, so as to obtain an account status code of the sample account at a preset time.

The account status code also refers to a low-dimensional feature vector of low-level semantics for representing the account status after compression coding. And the account state code of the sample account at the preset moment is used for representing the operation state of the sample account on the video information at the preset moment.

Specifically, the server inputs the account information characteristic code and the first video information characteristic code into an account state coding network in a video operation prediction model, and the first video information characteristic code is coded through the account state coding network to obtain a target characteristic code corresponding to the first video information characteristic code; splicing the account information feature code and the target feature code to obtain a spliced feature code; and carrying out full connection processing on the spliced feature codes to obtain the fully connected feature codes which are used as account state codes of the sample accounts at preset moments.

In step S340, the second video information feature code and the account status code of the second video information are input to the operation prediction network in the video operation prediction model, so as to obtain the predicted operation information of the sample account on the second video information at the preset time.

And the second video information feature coding is also a low-dimensional feature vector which is subjected to compression coding and used for representing low-level semantics of the second video information. It should be noted that the determination manner of the second video information characteristic code is consistent with the determination manner of the first video information characteristic code, and details are not repeated herein.

The prediction operation information of the sample account on the second video information at the preset time refers to an operation probability of the sample account on the second video information at the preset time, such as a click probability, a like probability, an attention probability, and the like of the sample account on the video information a at the preset time.

Specifically, the server acquires a second video information feature code of the second video information, inputs the second video information feature code of the second video information and an account state code of a sample account at a preset moment into an operation prediction network in a video operation prediction model, and performs splicing processing on the account state code and the second video information feature code through the operation prediction network to obtain a spliced feature code; and performing full-connection processing on the spliced feature codes to obtain the prediction operation probability of the sample account on the second video information at the preset time, and using the prediction operation probability as the prediction operation information of the sample account on the second video information at the preset time.

It should be noted that, with reference to this method, prediction operation information of the video information operated by the sample account at a plurality of preset times at the preset time can be obtained; for example, the sample account clicks the video information a at a first preset time, clicks the video information B at a second preset time, and clicks the video information C at a third preset time; then, referring to the above method, the prediction operation information of the sample account on the a video information at the first preset time, the prediction operation information on the B video information at the second preset time, and the prediction operation information on the C video information at the third preset time may be obtained.

Further, the server can obtain a loss value of the video operation prediction model according to actual operation information and prediction operation information of the sample account on the second video information at a preset time; reversely training a video operation prediction model according to the loss value until the video operation prediction model meets a preset convergence condition; and if the video operation prediction model meets the preset convergence condition, taking the current video operation prediction model as the trained video operation prediction model. For example, the server obtains a prediction loss value of the video operation prediction model at a preset moment based on a cross entropy loss function and by combining actual operation information and prediction operation information of the sample account on the second video information at the preset moment; adding the prediction loss values of the video operation prediction model at a plurality of preset moments to obtain the loss value of the video operation prediction model; and determining a network parameter updating gradient of the video operation prediction model according to the loss value, and updating the network parameter of the video operation prediction model according to the network parameter updating gradient until the video operation prediction model meets a preset convergence condition, for example, until the training times of the video operation prediction model reach the preset training times or the network parameter of the video operation prediction model reaches convergence.

According to the technical scheme provided by the embodiment of the disclosure, the video operation prediction model is repeatedly trained, so that the prediction operation information output by the video operation prediction model is close to the real operation information of the sample account, the accuracy of the prediction operation information output by the video operation prediction model is further improved, the video operation prediction model which is trained subsequently can quickly generate training sample data of the video push model, and the video push model can be quickly trained.

In an exemplary embodiment, in step S330, the account information feature code and the first video information feature code are input to an account status coding network in the video operation prediction model, so as to obtain an account status code of the sample account at a preset time, which specifically includes: inputting the first video information characteristic code into a first network in the account state coding network to obtain a video state code at a preset moment; and inputting the account information characteristic code and the video state code into a second network in the account state coding network to obtain the account state code of the sample account at the preset moment.

The video state coding also refers to a low-dimensional feature vector used for representing low-level semantics of a video state after compression coding.

Referring to fig. 4, a first network in the account status coding network is an LSTM (Long Short-Term Memory network) for determining a video status code at each preset time; the video status code at each preset time is determined by the video information selected or operated by the sample account at each preset time before the preset time, for example, the video status code at the K-th time is determined by the video information selected by the sample account at the 0-th time and the 1-th time.

The second network in the account status encoding network is a density network, that is, a fully connected network, and is configured to determine the account status encoding of the sample account at each preset time, as shown in S1, S2.

For example, referring to fig. 4, assuming that the preset time is a third preset time, and the preset time includes a 0 th preset time, a first preset time, and a second preset time before the preset time, the first video information feature code includes a video information feature code of the video information selected at the 0 th preset time, a video information feature code of the video information selected at the first preset time, and a video information feature code of the video information selected at the second preset time; secondly, the server inputs video information feature codes (full 0 vectors) of video information (such as 0 th video information) selected by the sample account at 0 th preset time into a first network (such as an LSTM network) in the account state coding network, and the video information feature codes are coded through the first network to obtain video state codes at the first preset time; inputting the video state code at the first preset moment and the video information characteristic code of the video information (such as the 1 st video information) selected by the sample account at the first preset moment into a first network, and coding the video state code at the first preset moment and the video information characteristic code of the video information selected by the sample account at the first preset moment through the first network to obtain the video state code at the second preset moment; inputting the video state code at the second preset moment and the video information characteristic code of the video information (such as 2 nd video information) selected by the sample account at the second preset moment into the first network, and coding the video state code at the second preset moment and the video information characteristic code of the video information selected by the sample account at the second preset moment through the first network to obtain the video state code at the preset moment; and splicing the user information feature code and the video state code at the preset moment to obtain a spliced feature code, inputting the spliced feature code into a second network in the account state code network, and performing full connection processing on the spliced feature code through the second network to obtain the account state code of the sample account at the preset moment (such as S3).

It should be noted that, in the process of obtaining the account status code of the sample account at the preset time, the user information feature code and the video status code at the first preset time are spliced to obtain a spliced first feature code, the spliced first feature code is input to a second network in the account status code network, and the spliced first feature code is fully connected through the second network to obtain the account status code of the sample account at the first preset time (for example, S1); and splicing the user information characteristic code and the video state code at the second preset moment to obtain a spliced second characteristic code, inputting the spliced second characteristic code into a second network, and performing full connection processing on the spliced second characteristic code through the second network to obtain an account state code of the sample account at the second preset moment (such as S2), and so on to obtain account state codes of the sample account at each preset moment.

According to the technical scheme provided by the embodiment of the disclosure, the account state code of the sample account at the preset time is determined, so that the prediction operation information of the sample account on the second video information at the preset time can be determined according to the second video information feature code and the account state code of the sample account at the preset time.

In an exemplary embodiment, in step S340, inputting the second video information feature coding and the account status coding into an operation prediction network in a video operation prediction model, to obtain the prediction operation information of the sample account on the second video information at a preset time, specifically including: inputting the second video information characteristic code and the account state code into an operation prediction network in a video operation prediction model to obtain a plurality of operation behavior probabilities of the sample account on the second video information; and according to the preset weight corresponding to the operation behavior probabilities, weighting the operation behavior probabilities to obtain a target operation probability of the sample account on the second video information, and correspondingly using the target operation probability as the prediction operation information of the sample account on the second video information at the preset moment.

The plurality of operation behavior probabilities of the second video information refer to click probability, praise probability, attention probability, long viewing probability and the like; the preset weight corresponding to the operation behavior probability is preset, and may be adjusted according to an actual scene, which is not limited herein.

Specifically, the server splices a second video information feature code of second video information operated by the sample account at a preset time and an account state code of the sample account at the preset time to obtain a spliced feature code; inputting the spliced feature codes into an operation prediction network in the video operation prediction model, and performing full connection processing on the spliced feature codes through the operation prediction network to obtain the click probability, the praise probability, the attention probability and the long viewing probability of a sample account on second video information at a preset moment; respectively obtaining preset weights corresponding to the click probability, the praise probability, the attention probability and the long viewing probability, weighting the click probability, the praise probability, the attention probability and the long viewing probability according to the preset weights corresponding to the click probability, the praise probability, the attention probability and the long viewing probability to obtain the target operation probability of the sample account on the second video information at the preset moment, and correspondingly using the target operation probability as the prediction operation information of the sample account on the second video information at the preset moment.

According to the technical scheme provided by the embodiment of the disclosure, the prediction operation information of the sample account on the second video information at the preset time is obtained, so that the loss value of the video operation prediction model can be obtained according to the actual operation information and the prediction operation information of the sample account on the second video information at the preset time, and the video operation prediction model can be trained repeatedly according to the loss value, so that the trained video operation prediction model can be obtained.

In an exemplary embodiment, referring to fig. 4, the video operation prediction model may be trained by:

(1) The server collects data on the network to obtain sample data (u, V) _i ，Y _i ^click ，Y _i ^like ，Y _i ^follow ，Y _i ^longview ) I is more than or equal to 1 and less than or equal to N, wherein u represents account characteristics corresponding to the account information of the sample account, V _i Video characteristics, Y, representing the ith video information _i ^click Showing the click condition of the sample account on the ith video information, if the sample account has clicked on the ith video information, Y _i ^click =1, if sample account does not click ith video information, then Y _i ^click ＝0；Y _i ^like Indicating the like of the sample account for the ith video information, Y _i ^follow Indicating the attention of the sample account to the ith video information, Y _i ^longview Indicating a long viewing of the ith video information by the sample account.

(2) When the result of selecting the video information in the t step is predicted, the video characteristics of the video information selected in the steps from 0 to t-1 are required to be input into an LSTM network, and the video state coding in the t step can be obtained, wherein the 0 th video characteristic is a full zero vector; then coding the video state at the t step anduser features u are spliced together, and a user state code S of the t step is obtained through a full-connection network (such as Dense) _t (ii) a Finally, the user status is coded S _t Respectively spliced with each video feature in the candidate set, and each video V of the user u at the current moment is obtained through a full-connection network _i Click probability Pctr of _θ Probability of like, pltr _θ Attention probability Pwtr _θ And a long viewing probability Plvtr _θ 。

(3) Defining an optimization objective of a video operation prediction model:

(4) And updating the network parameter theta of the video operation prediction model by adopting the optimization target and utilizing a random gradient descent algorithm until the formula reaches the minimum value, thereby obtaining the trained video operation prediction model.

In an exemplary embodiment, as shown in fig. 5, in step S250, training a video push model to be trained according to training sample data may specifically be implemented by the following steps:

in step S510, a target video information feature code corresponding to the target video information is obtained.

The specific implementation of obtaining the target video information feature code corresponding to the target video information refers to the specific implementation of obtaining the first video information feature code corresponding to the first video information, and is not described herein again. It should be noted that the training sample data refers to prediction operation information of a sample account on target video information; the prediction operation information of the sample account on the target video information specifically comprises the prediction operation information of the sample account on the target video information at each preset moment.

In step S520, the account information feature codes and the target video information feature codes are input into an account status coding network in the video push model to be trained, so as to obtain target account status codes of the sample accounts at each preset time.

The target account state code also refers to a low-dimensional feature vector which is subjected to compression coding and used for representing the low-level semantics of the account state. And the target account state code of the sample account at each preset moment is used for representing the operation state of the sample account on the target video information at each preset moment.

The specific implementation of step S520 refers to the specific implementation of step S330, and is not described herein again.

In step S530, the target video information feature code and the target account status code are input to the operation prediction network in the video push model to be trained, so as to obtain the target prediction operation information of the sample account on the target video information at each preset time.

The specific implementation of step S530 refers to the specific implementation of step S340, which is not described herein again.

In step S540, the target prediction operation information is input into a preset video push evaluation model, and an operation feedback value of the sample account on the target video information at each preset time is obtained.

The preset video push evaluation model is an evaluation model capable of outputting expected rewards obtained by a current video push strategy in a current account state, and the operation feedback values of the sample accounts on the target video information at each preset moment are expected rewards of the sample accounts at each preset moment.

In step S550, the video push model to be trained and the preset video push evaluation model are repeatedly trained according to the target account status code, the prediction operation information of the sample account on the target video information at each preset time, the target prediction operation information, and the operation feedback value until both the video push model to be trained and the preset video push evaluation model satisfy the convergence condition.

Specifically, the server uses a loss function to count a loss value of a video push model to be trained and a loss value of a preset video push evaluation model based on a target account state code of a sample account at each preset time and prediction operation information, target prediction operation information and an operation feedback value of the sample account on target video information at each preset time, update network parameters of the video push model to be trained according to the loss value of the video push model to be trained, and update the network parameters of the preset video push evaluation model according to the loss value of the preset video push evaluation model; and continuously repeating the process until the network parameters of the video push model reach convergence and the network parameters of the preset video push evaluation model reach convergence, and ending the training.

According to the technical scheme provided by the embodiment of the disclosure, the video push model to be trained is repeatedly trained, so that the expected reward obtained by the video information output by the video push model can be maximized, and the push accuracy of the video information is further improved.

In an exemplary embodiment, the present disclosure may further train the video push model by using an Actor-Critic algorithm, which specifically includes the following contents:

(1) Based on the on-line request information (u, V) _cand ) Re-recommending a video list V' u representing user information using a video push model to be trained, V _cand Representing that all video characteristics are requested at this time, V 'represents a video sequence recommended again by the video push model, and since a feedback situation corresponding to video information in the recommended video list V' is not provided online, it is necessary to call for help from a trained video operation prediction model for prediction. Because the structure of the video push model is similar to the network structure of the video operation prediction model, the process of generating the recommended video list V' is also the first step of determining the 1 st video information, then determining the 2 nd video information, and so on; therefore, in the prediction stage of the video push model, the prediction result of the video operation prediction model on the video information in each step can be added into the model characteristics of the video push model, so that the prediction capability of reinforcement learning of the model is enhanced; after the prediction is finished, the collected reinforcement learning standard data has the format of (S) _t ，V′ _t ，r _t ，S _t+1 T). Wherein S is _t Representing the current state, by user information and selected video informationDetermining; v' _t Representing video characteristics corresponding to the video information determined by the current video push model; r is a radical of hydrogen _t Feedback case r predicted for video operation prediction model _t ＝aPctr _t +bPltr _t +cPlvtr _t +dPwtr _t A, b, c, d are manually configured hyper-parameters; t is a termination condition, where corresponds to whether it is the last video.

(2) Updating network parameters of a video push model by using an Actor-Critic algorithm according to the sample data generated in the step (1):

(a) Updating policy network parameters:

(b) Updating and evaluating network parameters:

(c) Alpha is the learning rate, gamma is the discount factor, s' is the next state, s is the current state, V _w (s) is the expected reward, π, that can be obtained by the current strategy in the current state _θ And a is a preset coefficient, and the probability of selecting a certain video under the current strategy is shown as a.

(3) And (3) repeating the processes from the step (1) to the step (2) for a plurality of times until the network parameters of the video pushing model reach convergence.

According to the technical scheme provided by the embodiment of the disclosure, the video push model can be quickly trained, so that the training efficiency of the video push model is improved.

Fig. 6 is a flowchart illustrating a video push method according to an exemplary embodiment, where, as shown in fig. 6, the video push method is used in the server 110 shown in fig. 1, and includes the following steps:

in step S610, account information of the account to be pushed is acquired.

Specifically, the server obtains account information of a current login account of the terminal, and the account information is used as account information of an account to be pushed.

In step S620, inputting the account information of the account to be pushed into the trained video push model to obtain the push video information of the account to be pushed; and obtaining the trained video push model according to the training method of the video push model.

Specifically, the server inputs video information feature codes (all 0 vectors) of 0 th pushed video information and account information feature codes of account information of accounts to be pushed into a trained video pushing model, encodes the video information feature codes of the 0 th pushed video information through the trained video pushing model to obtain video state codes at a first moment, and splices the account information feature codes of the account information of the accounts to be pushed and the video state codes at the first moment to obtain spliced feature codes; carrying out full connection processing on the spliced feature codes to obtain account state codes at a first moment; splicing the account state code at the first moment and the video information characteristic code of each video information in the candidate set, and respectively carrying out full connection processing on the spliced characteristic codes to obtain the probability of selecting each video information in the candidate set by the account to be pushed at the first moment, and taking the video information with the maximum probability as the first pushed video information pushed to the account to be pushed; inputting the video state code at the first moment, the video information characteristic code of the first pushed video information and the account information characteristic code of the account information of the account to be pushed into a trained video pushing model, and coding the video state code at the first moment and the video information characteristic code of the first pushed video information through the trained video pushing model to obtain the video state code at the second moment; splicing the account information feature code of the account information of the account to be pushed with the video state code at the second moment to obtain a spliced feature code; carrying out full-connection processing on the spliced feature codes to obtain account state codes at a second moment; splicing the account state code at the second moment and the video information characteristic code of each video information in the candidate set, and respectively carrying out full connection processing on the spliced characteristic codes to obtain the probability of selecting each video information in the candidate set by the account to be pushed at the second moment, and taking the video information with the maximum probability as the second pushed video information pushed to the account to be pushed; by analogy, a plurality of pieces of push video information pushed to the account to be pushed can be obtained.

In step S630, the push video information is pushed to the account to be pushed.

Specifically, the server acquires a terminal identifier corresponding to the account to be pushed, pushes the pushed video information to a terminal corresponding to the terminal identifier according to a preset frequency, and displays the pushed video information through a terminal interface, so that the interest requirement of the account to be pushed, which is currently logged in by the terminal, is met, and accurate pushing of the video information is realized.

According to the video pushing method, through a trained video pushing model, first pushed video information pushed to an account to be pushed is determined, then second pushed video information pushed to the account to be pushed is determined, and by analogy, a plurality of pieces of pushed video information pushed to the account to be pushed can be determined; therefore, the influence of the first K-1 pieces of pushed video information is comprehensively considered when the Kth piece of pushed video information is determined, the accuracy of the determined pushed video information is improved, and the video pushing accuracy is further improved.

In an exemplary embodiment, in step S630, pushing the push video information to the account to be pushed specifically includes: arranging the push video information according to the sequence of outputting the push video information by the trained video push model to obtain the arranged push video information; and pushing the arranged pushed video information to an account to be pushed.

For example, the server outputs the push video information a, then outputs the push video information B, and finally outputs the push video information C, and then pushes the push video information a, the push video information B, and the push video information C to the account to be pushed according to the arrangement order of the push video information a, the push video information B, and the push video information C.

According to the technical scheme provided by the embodiment of the disclosure, the arranged pushed video information is pushed to the account to be pushed, so that the relation between the pushed video information and the video information can be considered comprehensively, the accurate pushing of the video information is realized, and the accuracy of the video pushing is further improved; meanwhile, the click rate of the video information is improved.

It should be understood that although the various steps in the flowcharts of fig. 2-3, 5-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 2-3 and 5-6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least some of the other steps.

FIG. 7 is a block diagram illustrating a training apparatus for a video push model in accordance with an exemplary embodiment. Referring to fig. 7, the apparatus includes an information acquisition unit 710, an information prediction unit 720, a prediction model training unit 730, a sample data acquisition unit 740, and a push model training unit 750.

And an information acquisition unit 710 configured to perform acquisition of account information of the sample account and actual operation information of the sample account on the pushed video information.

And an information prediction unit 720, configured to perform inputting the account information and the video information into the video operation prediction model, resulting in prediction operation information of the sample account on the video information.

And a prediction model training unit 730 configured to perform training of the video operation prediction model according to the prediction operation information and the actual operation information.

The sample data obtaining unit 740 is configured to execute the prediction model according to the trained video operation, and obtain the prediction operation information of the target video information by the multiple sample accounts, which is used as the training sample data of the video push model to be trained.

A push model training unit 750 configured to perform training of the video push model to be trained according to the training sample data.

In an exemplary embodiment, the information prediction unit 720 is further configured to extract, from the video information, first video information in which the sample account has been operated sequentially before a preset time and second video information in which the sample account has been operated at the preset time; acquiring account information characteristic codes of account information and first video information characteristic codes of first video information; inputting the account information characteristic code and the first video information characteristic code into an account state coding network in a video operation prediction model to obtain an account state code of a sample account at a preset moment; and inputting the second video information characteristic code and the account state code of the second video information into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information at the preset moment.

In an exemplary embodiment, the information prediction unit 720 is further configured to perform inputting a first video information feature code into a first network of the account status coding networks, resulting in a video status code at a preset time; and inputting the account information characteristic code and the video state code into a second network in the account state coding network to obtain the account state code of the sample account at the preset moment.

In an exemplary embodiment, the information prediction unit 720 is further configured to perform an operation prediction network that inputs the second video information feature coding and the account status coding into the video operation prediction model, and obtain a plurality of operation behavior probabilities of the sample account on the second video information; and according to the preset weight corresponding to the operation behavior probabilities, performing weighting processing on the operation behavior probabilities to obtain the target operation probability of the sample account on the second video information, and correspondingly using the target operation probability as the predicted operation information of the sample account on the second video information at the preset moment.

In an exemplary embodiment, the prediction operation information of the sample account on the target video information includes prediction operation information of the sample account on the target video information at each preset moment; a push model training unit 750 further configured to perform target video information feature encoding to obtain target video information; inputting the account information characteristic codes and the target video information characteristic codes into an account state coding network in a video push model to be trained to obtain target account state codes of sample accounts at each preset moment; inputting the target video information characteristic code and the target account state code into an operation prediction network in a video push model to be trained to obtain target prediction operation information of a sample account on the target video information at each preset moment; inputting the target prediction operation information into a preset video push evaluation model to obtain an operation feedback value of a sample account on the target video information at each preset moment; and repeatedly training the video push model to be trained and the preset video push evaluation model according to the target account state code, the prediction operation information of the sample account on the target video information at each preset moment, the target prediction operation information and the operation feedback value until the video push model to be trained and the preset video push evaluation model both meet the convergence condition.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 8 is a block diagram illustrating a video push device according to an example embodiment. Referring to fig. 8, the apparatus includes an account information acquiring unit 810, a video information acquiring unit 820, and a video information pushing unit 830.

An account information obtaining unit 810 configured to perform obtaining account information of an account to be pushed.

A video information obtaining unit 820 configured to perform inputting the account information of the account to be pushed into the trained video pushing model, so as to obtain the pushed video information of the account to be pushed; and obtaining the trained video push model according to the training method of the video push model.

And a video information pushing unit 830 configured to perform pushing of the pushed video information to the account to be pushed.

In an exemplary embodiment, the video information pushing unit 830 is further configured to perform arranging the pushed video information according to an order of outputting the pushed video information according to the trained video pushing model, so as to obtain arranged pushed video information; and pushing the arranged pushed video information to an account to be pushed.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 9 is a block diagram illustrating an apparatus 900 for performing the above-described video push model training method or video push method according to an exemplary embodiment. For example, device 900 may be a server. Referring to fig. 9, device 900 includes a processing component 920 that further includes one or more processors and memory resources, represented by memory 922, for storing instructions, such as applications, that are executable by processing component 920. The application programs stored in memory 922 may include one or more modules that each correspond to a set of instructions. Further, the processing component 920 is configured to execute instructions to perform the training method of the video push model or the video push method described above.

The device 900 may also include a power component 924 configured to perform power management of the device 900, a wired or wireless network interface 926 configured to connect the device 900 to a network, and an input/output (I/O) interface 928. The device 900 may operate based on an operating system stored in memory 922, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 922 comprising instructions, executable by a processor of the device 900 to perform the above-described method is also provided. The storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product, which includes a computer program stored in a readable storage medium, from which at least one processor of a device reads and executes the computer program, so that the device performs the training method of a video push model or the video push method described in any embodiment of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A training method of a video push model is characterized by comprising the following steps:

according to the trained video operation prediction model, obtaining prediction operation information of a plurality of sample accounts on target video information, and using the prediction operation information as training sample data of a video push model to be trained; the prediction operation information of the sample account on the target video information comprises the prediction operation information of the sample account on the target video information at each preset moment;

inputting the account information feature code of the account information and the target video information feature code into an account state coding network in the video push model to be trained to obtain a target account state code of the sample account at each preset moment;

inputting the target video information feature codes and the target account state codes into an operation prediction network in the video pushing model to be trained to obtain target prediction operation information of the sample accounts on the target video information at each preset moment;

2. The method for training the video push model according to claim 1, wherein the inputting the account information and the video information into a video operation prediction model to obtain the prediction operation information of the sample account on the video information comprises:

extracting first video information of the sample account which is operated in sequence before a preset time and second video information of the sample account which is operated at the preset time from the video information;

and inputting a second video information characteristic code and the account state code of the second video information into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information at the preset moment.

3. The method for training a video push model according to claim 2, wherein the inputting the account information feature code and the first video information feature code into an account status coding network in the video operation prediction model to obtain an account status code of the sample account at the preset time includes:

4. The method for training a video push model according to claim 2, wherein the inputting the second video information feature coding and the account status coding into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information at the preset time includes:

5. A video push method, comprising:

acquiring account information of an account to be pushed;

inputting the account information of the account to be pushed into the trained video pushing model to obtain the pushed video information of the account to be pushed; the trained video push model is obtained according to the training method of the video push model of any one of claims 1 to 4;

and pushing the pushed video information to the account to be pushed.

6. The video pushing method according to claim 5, wherein pushing the pushed video information to the account to be pushed comprises:

arranging the push video information according to the sequence of the trained video push model outputting the push video information to obtain the arranged push video information;

and pushing the arranged pushed video information to the account to be pushed.

7. An apparatus for training a video push model, comprising:

the sample data acquisition unit is configured to execute the video operation prediction model after training to obtain prediction operation information of a plurality of sample accounts on target video information, and the prediction operation information is used as training sample data of a video push model to be trained; the prediction operation information of the sample account on the target video information comprises the prediction operation information of the sample account on the target video information at each preset moment;

a push model training unit configured to perform target video information feature encoding for obtaining the target video information; inputting the account information feature codes of the account information and the target video information feature codes into an account state coding network in the video push model to be trained to obtain target account state codes of the sample accounts at each preset moment; inputting the target video information feature code and the target account state code into an operation prediction network in the video push model to be trained to obtain target prediction operation information of the sample account on the target video information at each preset moment; inputting the target prediction operation information into a preset video pushing evaluation model to obtain an operation feedback value of the sample account on the target video information at each preset moment; and repeatedly training the video push model to be trained and the preset video push evaluation model according to the target account state code, the prediction operation information of the sample account on the target video information at each preset moment, the target prediction operation information and the operation feedback value until the video push model to be trained and the preset video push evaluation model both meet the convergence condition.

8. The apparatus for training a video push model according to claim 7, wherein the information prediction unit is further configured to extract, from the video information, first video information that the sample account has been operated sequentially before a preset time and second video information that the sample account has been operated at the preset time; acquiring account information characteristic codes of the account information and first video information characteristic codes of the first video information; inputting the account information feature code and the first video information feature code into an account state coding network in the video operation prediction model to obtain an account state code of the sample account at the preset moment; inputting a second video information feature code of the second video information and the account status code into an operation prediction network in the video operation prediction model to obtain the prediction operation information of the sample account on the second video information at the preset moment.

9. The apparatus for training a video push model according to claim 8, wherein the information prediction unit is further configured to perform the step of inputting the first video information feature code into a first network of the account status coding networks, to obtain the video status code at the preset time; and inputting the account information characteristic code and the video state code into a second network in the account state code network to obtain the account state code of the sample account at the preset moment.

10. The apparatus for training a video push model according to claim 8, wherein the information prediction unit is further configured to execute an operation prediction network that inputs the second video information feature coding and the account status coding into the video operation prediction model, and obtain a plurality of operation behavior probabilities of the sample account on the second video information; and according to preset weights corresponding to the operation behavior probabilities, performing weighting processing on the operation behavior probabilities to obtain a target operation probability of the sample account on the second video information, and correspondingly using the target operation probability as the prediction operation information of the sample account on the second video information at the preset moment.

11. A video push apparatus, comprising:

the account information acquisition unit is configured to acquire the account information of the account to be pushed;

the video information acquisition unit is configured to execute the video pushing model which inputs the account information of the account to be pushed into the training completion to obtain the pushed video information of the account to be pushed; the trained video push model is obtained according to the training method of the video push model of any one of claims 1 to 4;

12. The video pushing apparatus according to claim 11, wherein the video information pushing unit is further configured to perform arranging the pushed video information according to an order of outputting the pushed video information according to the trained video pushing model, so as to obtain arranged pushed video information; and pushing the arranged pushed video information to the account to be pushed.

13. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 6.

14. A storage medium, wherein instructions in the storage medium, when executed by a processor of a server, enable the server to perform the method of any one of claims 1 to 6.