CN116385070A

CN116385070A - Multi-target prediction method, system, equipment and storage medium for short video advertisement of E-commerce

Info

Publication number: CN116385070A
Application number: CN202310069380.3A
Authority: CN
Inventors: 陈恩红; 葛铁铮; 连德富; 刘奇; 周峙龙; 王诗瑶; 姜宇宁
Original assignee: University of Science and Technology of China USTC; Alibaba China Co Ltd
Current assignee: University of Science and Technology of China USTC; Alibaba China Co Ltd
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-07-04
Anticipated expiration: 2043-01-18
Also published as: CN116385070B

Abstract

The invention discloses a multi-target prediction method, a system, equipment and a storage medium for short video advertisements of electronic commerce, which can adjust importance weights of different characteristics according to the liveness characteristics of users on one hand, and promote the recommending effect on cold start users; on the other hand, aiming at multi-target tasks, special interest expression can be generated for the tasks by modeling various behavior sequences, various target tasks and complex interaction relations between the behavior sequences and the target tasks, and prediction precision on a plurality of task targets is improved through knowledge migration among the target tasks, so that recommendation effect is improved.

Description

Multi-target prediction method, system, equipment and storage medium for short video advertisement of E-commerce

Technical Field

The invention relates to the technical field of recommendation systems, in particular to a multi-target prediction method, a system, equipment and a storage medium for short video advertisements of electronic commerce.

Background

Short video is an emerging form of internet content and is widely used in the e-commerce field. The e-commerce short video is presented in the form of a 10-60 second reduced video clip. In the video clip, the information of the appearance, the application, the quality, the price and the like of the commodity is fully displayed. By virtue of novelty, content diversity and information richness, the flow occupied by short video advertisements in an e-commerce scene is gradually increased, but certain gaps are still reserved between the short video advertisements and the traditional e-commerce picture-text advertisements. Limited flow, the behavior of the user in the short video advertisement scene of the E-commerce is sparse, and the prediction capability of the recommendation model is required to be higher. In recent years, user behavior sequence modeling has been widely studied in recommended scenes and achieved to achieve satisfactory results in an online delivery environment. However, most of the conventional modeling of the user behavior sequences is focused on the study of single-task single-behavior sequences, and is difficult to be applied to the prediction of multiple target tasks.

In a short video advertising scenario, a user has a number of different behaviors, such as whether the user will watch, click, enter a store, convert (purchase), and so forth. The recommendation model needs to predict the probabilities of various different behaviors of the user about the short video, and integrates the prediction probabilities to make a final recommendation decision. While a variety of different behavior histories of a user may be aggregated into different types of behavior sequences. How to accurately predict a plurality of target tasks based on different types of behavior sequences of users and supervision information of different tasks, and improving the recommendation effect are a problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a multi-target prediction method, a system, equipment and a storage medium for short video advertisements of electronic commerce, which can relieve the problem of cold start of users, realize the prediction of multi-target tasks and further improve the recommendation effect.

The invention aims at realizing the following technical scheme:

a multi-target prediction method for short video advertisements of electronic commerce comprises the following steps:

for each user, generating a gating vector according to the user liveness, and applying different weights to the user portrait features, the short video features and the context features to obtain weighted features, wherein the weighted features comprise: weighting user portrayal features, weighted short video features and weighted contextual features;

aiming at each target task in the multi-target tasks, dynamically generating parameters related to the tasks by using a hyper network technology and an attention mechanism by using corresponding target task vectors and type vectors corresponding to each behavior sequence, and then respectively modeling each behavior sequence of a corresponding user by using the generated parameters to obtain a behavior sequence interest representation related to the tasks;

for each target task, generating behavior representations by using the weighted characteristics and the behavior sequence interest representations related to the corresponding tasks, wherein cascade relations exist among different target tasks, determining whether the behavior representations of the corresponding tasks need to be updated according to the cascade relations, and if the behavior representations do not need to be updated, directly predicting the probability of executing the corresponding behaviors of the corresponding target tasks by using the behavior representations; if the user needs to update, the updated behavior is used for representing the probability of predicting the corresponding behavior of the user for executing the corresponding target task; and integrating the probability of executing the corresponding behaviors of each target task by the user, generating a video recommendation result of the user and recommending the video recommendation result to the user.

An e-commerce short video advertisement multi-objective prediction system, comprising:

the feature self-adapting module based on the user liveness is used for generating a gating vector according to the user liveness for each user, and is used for applying different weights to the user portrait features, the short video features and the context features to obtain weighted features, wherein the weighted features comprise: weighting user portrayal features, weighted short video features and weighted contextual features;

the task related behavior sequence modeling module dynamically generates task related parameters by using a hyper network technology and an attention mechanism according to a corresponding target task vector and a type vector corresponding to each behavior sequence aiming at each target task in the multi-target tasks, and then models each behavior sequence of a corresponding user by using the generated parameters to obtain task related behavior sequence interest expression;

the cascade knowledge migration and recommendation module among the tasks respectively utilizes the weighted characteristics and the behavior sequence interest representation related to the corresponding task to generate behavior representations for each target task, a cascade relation exists among different target tasks, whether the behavior representations of the corresponding tasks need to be updated or not is determined according to the cascade relation, and if the behavior representations do not need to be updated, the probability of the corresponding behaviors of the corresponding target tasks executed by the user is predicted directly by utilizing the behavior representations; if the user needs to update, the updated behavior is used for representing the probability of predicting the corresponding behavior of the user for executing the corresponding target task; and integrating the probability of executing the corresponding behaviors of each target task by the user, generating a video recommendation result of the user and recommending the video recommendation result to the user.

A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the aforementioned methods.

A readable storage medium storing a computer program which, when executed by a processor, implements the method described above.

According to the technical scheme provided by the invention, on one hand, the importance weights of different characteristics can be adjusted according to the user liveness characteristics, so that the recommending effect on cold starting users is improved; on the other hand, aiming at multi-target tasks, special interest expression can be generated for the tasks by modeling various behavior sequences, various target tasks and complex interaction relations between the behavior sequences and the target tasks, and prediction precision on a plurality of task targets is improved through knowledge migration among the target tasks, so that recommendation effect is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a frame diagram of a multi-objective prediction method for short video advertisements of electronic commerce provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of task related behavior sequence modeling provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of an E-commerce short video advertisement multi-objective prediction system according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a processing apparatus according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The terms that may be used herein will first be described as follows:

the terms "comprises," "comprising," "includes," "including," "has," "having" or other similar referents are to be construed to cover a non-exclusive inclusion. For example: including a particular feature (e.g., a starting material, component, ingredient, carrier, formulation, material, dimension, part, means, mechanism, apparatus, step, procedure, method, reaction condition, processing condition, parameter, algorithm, signal, data, product or article of manufacture, etc.), should be construed as including not only a particular feature but also other features known in the art that are not explicitly recited.

The invention provides a method, a system, equipment and a storage medium for multi-target prediction of E-commerce short video advertisements. What is not described in detail in the embodiments of the present invention belongs to the prior art known to those skilled in the art. The specific conditions are not noted in the examples of the present invention and are carried out according to the conditions conventional in the art or suggested by the manufacturer.

Example 1

The embodiment of the invention provides a multi-target prediction method for short video advertisements of electronic commerce, which is based on an Embedding & MLP paradigm commonly used by a current recommendation system, and an overall framework of the method is shown in FIG. 1, and the method mainly comprises the following steps:

step 1, for each user, generating a gating vector according to the user liveness, and applying different weights to the user portrait features, the short video features and the context features to obtain weighted features, wherein the weighted features comprise: weighted user portrayal features, weighted short video features, and weighted context features.

As shown in the lower left corner of fig. 1, in this step, a gating vector is generated according to the user liveness, and the importance weights of different features can be adjusted according to the user liveness, where the starting point is that the data distributions corresponding to the users with different liveness are different, and the features capable of effectively expressing the sample should be different. Specifically: the gating Vector (Gate Vector) can be predicted by a multi-layer perceptron (MLP), and acts on three types of characteristics, namely the context characteristic, the user image characteristic and the short video characteristic. Each class of features contains many more detailed features, such as: the user portrait features comprise features such as user ID (user ID), gender, age (age), job (job), and academic history; the short video features include features such as video ID (video ID), video genre (video tag), producer and duration; the contextual features are spatiotemporal features related to the current behavior of the user (i.e., the behavior of viewing the short video corresponding to the short video feature), such as the time (time) at which the user views the short video, the location and viewing platform (pid), etc. The dimension of the gating vector is the same as the total number of the three types of features, and assuming that the three types of features contain M features in total, the dimension of the gating vector is M dimensions, that is, a weight is generated for each feature. The weights are multiplied into vectors (components) of the corresponding features. The purpose of this is to: different activity (activity) user behavior patterns differ, e.g. the active user sees more short videos, but the average duration of each video is shorter; the inactive user sees less video but the average video duration is longer. The difference of the behavior modes causes different data distribution corresponding to the users, if one user is more active, the characteristics of the user ID corresponding to the user can well represent related users, and secondly, as the users watch more videos and have various interests, the characteristics of fine granularity such as the ID of the videos are emphasized, so that finer characterization is realized for the active users; if an inactive user, the feature of gender, age, academy, video style should be used to characterize the coarse granularity, because the inactive user has a small sample, and their user ID features do not provide much information as the active user.

The preferred embodiment is as follows:

1) Setting a plurality of liveness levels, arranging the liveness levels from high to low, wherein the higher the liveness level is, the higher the corresponding liveness is, the lower the liveness level is, the lower the corresponding liveness level is, a part of the liveness levels arranged in front is selected as a high liveness level, and the rest is selected as a low liveness level.

In the embodiment of the invention, the upper limit and the lower limit of different activity level ranges can be set according to the data analysis of the activity of all users.

In the embodiment of the invention, the number of the activity levels can be set according to actual conditions or experience.

2) And determining the activity level of the user according to the activity value of the user.

In the embodiment of the invention, the user activity refers to the number of videos browsed by the user in a past period of time, and the user is classified into a specific certain activity level according to the activity of the user, so that the user can determine whether the user belongs to a high activity level or a low activity level according to the arrangement position of the activity level to which the user belongs

3) If the user belongs to the high liveness level, the interaction samples of the user are more, the weight of the fine granularity characteristic in the generated gating vector is higher than that of the coarse granularity characteristic, and in the high liveness level, the weight of the fine granularity characteristic is higher before the arrangement, and the weight of the coarse granularity characteristic is lower.

4) If the user belongs to the low liveness level, the interaction samples of the user are fewer, the weight of the coarse granularity features in the generated gating vector is higher than that of the fine granularity features, and in the low liveness level, the weight of the coarse granularity features is higher after arrangement, and the weight of the fine granularity features is lower.

In embodiments of the present invention, the "granularity nature" of a feature indicates the degree to which the feature contains information when describing a user or video. For example, the user ID in the user profile can indicate a specific user, and the ID of the video in the short video profile indicates a specific video, thus being fine-grained; while similar gender, age can only indicate which type of user is coarse-grained. Based on this, the present invention divides the user ID and the ID of the video into fine granularity features, and the rest of the user image features, the rest of the short video features, and the context features are all divided into coarse granularity features.

As will be appreciated by those skilled in the art, the user ID is a unique identifier of the user, and the feature corresponding to the unique identifier is the user ID feature.

In the embodiment of the invention, for a high-activity user (namely, the user belongs to a high-activity level), as a large number of interaction samples exist in a short video advertisement scene, the interaction samples refer to historical behaviors of the user in the short video advertisement scene, the user brushes a short video, namely, one-time behavior of the user, an interaction sample is formed, and according to whether the user watches, clicks the short video and whether a store is entered in the video watching process, whether conversion behaviors occur after entering the store or not, a label corresponding to the interaction samples is generated, and the clicking behaviors related to the label are a set of behaviors including actions such as time point of watching the short video by the user, collection, forwarding, commenting the short video and the like. For example, an active user watches three times a cell phone short video and clicks on, enters a store, but does not translate (purchase). Then this interaction sample can be described as: (Zhang three, short video of cell phone, watch, click, enter store, unconverted), whose user ID feature can be well learned to characterize the user, and therefore should be given high weight; the characteristics of age, gender and the like are implicit in the interactive behavior of the user, and the characteristics of the user ID are also implicit, so that the characteristics are given low weight, and the redundancy of information is reduced. For a user with low liveness (namely, the user belongs to a low liveness grade), because of fewer interaction samples, the user ID characteristics of the user are difficult to learn effectively, the direct use of the characteristics can influence the prediction result, and the user ID characteristics should be given low weight; whereas age, gender, work such coarse-grained features should instead be more important for learning of low-activity users and should be given more weight.

And 2, dynamically generating task related parameters by using a hyper network technology and an attention mechanism according to each target task in the multi-target tasks and using corresponding target task vectors and type vectors corresponding to each behavior sequence, and then respectively modeling each behavior sequence of a corresponding user by using the generated parameters to obtain task related behavior sequence interest expression.

Most Of the existing multi-task learning models adopt a Miture-Of-Expert mode, a general expression vector is extracted for each task through a shared feature extraction module, and then the expression vector is input into different sub-networks Of each task to predict a final target, but the south-to-north rut information processing mode can cause information loss and influence the accuracy Of a prediction result. Therefore, the invention provides a new information processing mode to obtain accurate prediction results.

As shown in the lower right hand corner of fig. 1, the starting point in this step is that different tasks should extract different representations of interest from the user's behavior sequence. The behavior sequence of the user is a sequence (video sequence 1/2/3/4) formed by arranging short videos of the corresponding behaviors of the target task generated in the historical time (the past period of time) of the user according to the time sequence. The multi-target tasks in the present invention include: the effective watching task, clicking task, store entering task, converting task and the like, and the actions corresponding in sequence are as follows: effective viewing behavior, clicking behavior, store-in behavior, conversion behavior, etc.; wherein, for the effective viewing task, the interest representation extracted from the user behavior sequence only needs to reflect the wide interests of the user, because of the diversity of the interests of the user and the uneconomical price of the effective viewing behavior. For conversion tasks, however, the representations extracted from the user's behavior sequence should reflect the user's consumption needs due to the costs involved; the Task related behavior sequence interests corresponding to the above 4 target tasks are expressed as Task-1 Interest, task-2 Interest, task-3Interest, task-4 Interest.

Specific: the behavior sequence corresponding to the watching behavior is a user short video watching sequence; the action sequence corresponding to the clicking action is a user short video clicking sequence; the behavior sequence corresponding to the store-entering behavior is a user store-entering short video behavior sequence; and the behavior sequence corresponding to the conversion behavior is a short video behavior sequence converted by the user. The behavior sequence is used for describing the user, the user portrait characteristic is static and can not change for a long time, and the behavior sequence of the user represents the real-time interest change of the user, so that the recommendation system can feed back the behavior of the user in real time by using the behavior sequence. For each target task, modeling is needed for all the behavior sequences, corresponding behavior sequence interest vectors are obtained from all the behavior sequences, and task-related behavior sequence interest representations are formed by all the behavior sequence interest vectors; taking an effective watching Task (called Task-1) as an example, a user short video watching sequence, a user short video clicking sequence, a user store-entering short video behavior sequence and a user conversion short video behavior sequence need to be respectively modeled to obtain Interest vectors of four part of behavior sequences aiming at the effective watching Task, and finally Task-related behavior sequence Interest representation (Task-1 Interest) corresponding to the effective watching Task is formed.

In the embodiment of the invention, each target Task corresponds to a unique ID, and a corresponding target Task vector (Task Embedding) can be obtained through a first Embedding matrix (Embedding table), and each target Task vector is obtained through learning. Meanwhile, each target task also has a corresponding Behavior (Behavir), each Behavior has a corresponding Behavior sequence, each Behavior also corresponds to a unique ID (identity), a type vector (Behavior Embdding) corresponding to the Behavior sequence of the user can be obtained through a second Embedding matrix (Embedding table), and the corresponding target task vector and the type vector corresponding to the Behavior sequence are input into a Hypernetwork (super network).

In the embodiment of the invention, each embedded matrix is a matrix composed of a learnable parameter, and the learnable parameter is also a part of the neural network parameters. Each row of the two embedding matrices is a vector of learnable parameters representing specific information. For example, in a first embedding matrix, each row of vectors corresponds to a target task, and a second embedding matrix is similarly defined.

As shown in fig. 2, a schematic diagram of modeling the behavior sequence related to the task of the present invention is shown. The main process is as follows: inputting the target task vector and the type vector corresponding to each behavior sequence into a Hypernetwork network to obtain parameters related to the task; modeling each behavior sequence of the corresponding user respectively by combining the task related parameters and Attention mechanisms (Attention), obtaining interest vectors of each behavior sequence, and forming task related behavior sequence interest representations.

It will be understood by those skilled in the art that the Hypernetwork is an existing network module, and may be understood as a multi-layer perceptron, where the input is a type vector corresponding to a corresponding target task vector and a behavior sequence, and the output is a parameter of a behavior sequence processing neural network sub-module (FNN described later), and the obtained parameter is used to model a corresponding behavior sequence to obtain an interest vector of the corresponding behavior sequence, and for each target task, the interest vectors of all the behavior sequences form a task-related behavior sequence interest representation. So for each (task, behavior sequence type) combination pair, the corresponding target task vector and the corresponding type vector of the behavior sequence are input into the super network, the super network dynamically generates the parameters related to the task, the interest vector of the behavior sequence which the current task should extract from the current behavior sequence can be obtained by combining the attention mechanism, and the interest vectors of all the behavior sequences form the behavior sequence interest representation related to the task, which is called as the task related behavior sequence interest representation because of being closely related to the task type. And assuming S target tasks and S behaviors, corresponding to the S behavior sequences, each target task can obtain S interest vectors, and the S interest vectors form a task-related behavior sequence interest representation of the target task.

W in FIG. 2 ₁ And W is equal to ₂ Namely, parameters related to tasks obtained by a Hypernetwork, fig. 2 shows a two-part attention mechanism (middle and right part), and the attention mechanism belongs to the existing scheme and is mainly used for processing behavior sequence information of historical behaviors; the idea is that each element in the sequence refers to the correlation between other elements in the sequence and itself in the representation, and then the information is aggregated according to the correlation strength to obtain a new representation of itself, and the function is mainly completed by a Multi-head Attention module in the figure, add&Norm is the residual connection and layer normalization operation, FFN is a two-layer multi-layer perceptron, performing nonlinear variations. The parameters used by the two FNNs in FIG. 2 are the task related parameters W ₁ And W is equal to ₂ The method comprises the steps of carrying out a first treatment on the surface of the The middle attention mechanism (first part of attention mechanism) in fig. 2 is a self-attention mechanism, Q, K and V information are obtained by linearly transforming the behavior sequence of the user, the output information is K and V information of the right attention mechanism (second part of attention mechanism), and Q is obtained by linearly transforming the target video (i.e. the short video corresponding to the short video feature in step 1).

In the modeling of the task related behavior sequence, complex interaction relations among different behaviors in a single behavior can be modeled, and particularly, as parameters for processing different behavior sequences are generated by the same super network, the super network can automatically adjust a parameter generation mode for the correlation among the behaviors, and a Multi-head Attention module in an Attention mechanism is shared by all pairs of the tasks, and can also model the complex interaction relations among different behaviors; and can generate the special interest representation for the task, which is beneficial to obtaining accurate prediction results. In addition, there are two advantages: (1) The parameters are efficient, namely, the same super network is used for generating parameters related to different tasks, so that a group of parameters are prevented from being set for each task; (2) The original parameters of the attention mechanism module are shared by a plurality of tasks, so that the knowledge sharing among the plurality of tasks can be realized.

3. For each target task, generating behavior representations by using the weighted characteristics and the behavior sequence interest representations related to the corresponding tasks, wherein cascade relations exist among different target tasks, determining whether the behavior representations of the corresponding tasks need to be updated according to the cascade relations, and if the behavior representations do not need to be updated, directly predicting the probability of executing the corresponding behaviors of the corresponding target tasks by using the behavior representations; if the user needs to update, the updated behavior is used for representing the probability of predicting the corresponding behavior of the user for executing the corresponding target task; and integrating the probability of executing the corresponding behaviors of each target task by the user, generating a video recommendation result of the user and recommending the video recommendation result to the user.

In the embodiment of the invention, in consideration of cascade relation among different target tasks, a mode of cascade knowledge migration among tasks is used to improve the prediction accuracy, and the reason is that in a short video scene, the number of positive samples is less for the subsequent target tasks such as store entry, conversion and the like, and the corresponding sub-network training effect is poor. And more information can be provided for the subsequent target task in a manner of cascade knowledge migration among tasks, so that the prediction accuracy of the subsequent target task can be improved. Specific: for the most primary target task, the behavior representation is not required to be updated, and the probability of the user to execute the corresponding behavior of the most primary target task can be directly predicted by utilizing the corresponding behavior representation; for other target tasks, updating behavior representations, fusing the behavior representations corresponding to the precursor target tasks with the behavior representations corresponding to the subsequent target tasks according to the cascade relation, obtaining updated behavior representations, and predicting the probability of executing the behavior corresponding to the subsequent target tasks by using the updated behavior representations; if the precursor target task is not the most primary target task, the behavior representation corresponding to the precursor target task is the updated behavior representation of the precursor target task when the behavior representation corresponding to the subsequent target task is fused; in this way, for each user, the probability that he performs the corresponding behavior of the respective target task for each short video can be predicted.

In the embodiment of the invention, each target task is provided with a corresponding sub-network, the input of each sub-network is the behavior sequence interest representation of which the weighting characteristics are related to the corresponding task, the corresponding behavior representation is generated, and then the probability of executing the corresponding behavior of the corresponding target task by utilizing the behavior representation in a prediction mode can be obtained.

For example, all sub-networks corresponding to 4 target tasks can be implemented by using a multi-layer perceptron, fig. 1 shows that all the 4 sub-networks are 3-layer perceptrons, the first-layer input is a behavior sequence interest representation of the corresponding weighting characteristics and related to the corresponding tasks, and the corresponding behavior representation is output through the second layer; if the sub-network is the sub-network corresponding to the primary target task, the second layer output behavior representation is subjected to the third layer to obtain the probability of executing the primary target task corresponding behavior; if the sub-network is not the sub-network corresponding to the primary target task, the second layer outputs the behavior representation and then carries out the migration of knowledge between tasks, and then the third layer obtains the probability of executing the corresponding behavior of the corresponding target task, wherein the migration of knowledge between tasks is the behavior representation updating process introduced above.

The information of the interaction sample is introduced previously, and the positive sample refers to the interaction sample that the user has forward behavior. For example, the user watches the video, clicks the video, and has a store-entering behavior in the process of watching the video, and purchases goods after entering the store, which belong to forward behaviors, and the corresponding interaction sample is a positive sample. Because the present invention models each target task as a bifurcated task, noted as executing or not executing the corresponding behavior, the interaction sample where no positive behavior occurs is referred to as a negative sample.

As described above, in the e-commerce short video advertisement scenario, there are two behavior logic links, namely: the cascade relation between the target tasks in the first behavior logic link is as follows: the method comprises the steps of effectively watching a task and clicking the task; the cascade relation between the target tasks in the second behavior logic link is as follows: effectively watching a task, entering a store, and converting the task; i.e. the most primary target task in both logical links is the active viewing task.

The foremost target task is the effective watching task, and the prediction result is the probability (effective watching rate) that the user performs the effective watching behavior. Taking a first behavior logic link as an example, taking a click task as a subsequent target task, taking a precursor target task as an effective watching task, fusing (weighting and summing) behavior representations of the click task and the precursor target task into a new behavior representation, and predicting the probability (click rate) of a user to execute the click behavior; the same is true for the in-store and conversion tasks.

In the process of fusing the behavior representations, the two behavior representations can be input into a two-layer perceptron, the two-dimensional vector output by the perceptron is used as the weight of the two behavior representations, and then the weighted sum of the two behavior representations is used as the new behavior representation, and the sum of the two-dimensional vector is not negative and is 1.

Based on the above manner, the probability of the user executing the corresponding behavior of each target task can be predicted, the video recommendation result of the user is generated after synthesis and is recommended to the user (the process is omitted in fig. 1), specifically, a scaling factor (which can be set according to the actual situation or experience) can be set for the probability of the user executing the corresponding behavior of each target task, the scaling factors are multiplied by the probability of the user executing the corresponding behavior of each target task and summed, the obtained value is the scoring value of the user and the short video, the preference degree of the user for the short video can be understood, and then whether the short video is recommended to the user can be considered based on the scoring value. For example, a threshold (which may be set according to actual conditions or experience) may be set, and if the score value is higher than the threshold, the video recommendation result is added; or different short videos can be arranged in descending order according to the grading value, and the grading value at the front of the arrangement is higher, so that a plurality of short videos at the front of the arrangement are selected to be added into the video recommendation result; and finally, recommending the short videos in the video recommendation result to the user in sequence according to the scoring value.

According to the scheme provided by the embodiment of the invention, the importance weights of different features are adaptively adjusted based on the features of the user liveness, so that the recommending effect to the cold start user is improved, the complex interaction relationship among different behaviors in a single behavior is modeled in a parameter efficient mode, the specific interest representation of the task can be generated, the data property of the scene itself is modeled explicitly, the effective migration of knowledge among the tasks is realized, the predicting precision on a plurality of targets is improved, and the recommending effect of the user is further improved.

From the description of the above embodiments, it will be apparent to those skilled in the art that the above embodiments may be implemented in software, or may be implemented by means of software plus a necessary general hardware platform. With such understanding, the technical solutions of the foregoing embodiments may be embodied in a software product, where the software product may be stored in a nonvolatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and include several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the methods of the embodiments of the present invention.

Example two

The invention also provides a multi-target pre-estimating system for the short video advertisement of the E-commerce, which is realized mainly based on the method provided by the embodiment, as shown in the figure 3, and mainly comprises the following steps:

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the system is divided into different functional modules to perform all or part of the functions described above.

Example III

The present invention also provides a processing apparatus, as shown in fig. 4, which mainly includes: one or more processors; a memory for storing one or more programs; wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods provided by the foregoing embodiments.

Further, the processing device further comprises at least one input device and at least one output device; in the processing device, the processor, the memory, the input device and the output device are connected through buses.

In the embodiment of the invention, the specific types of the memory, the input device and the output device are not limited; for example:

the input device can be a touch screen, an image acquisition device, a physical key or a mouse and the like;

the output device may be a display terminal;

the memory may be random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as disk memory.

Example IV

The invention also provides a readable storage medium storing a computer program which, when executed by a processor, implements the method provided by the foregoing embodiments.

The readable storage medium according to the embodiment of the present invention may be provided as a computer readable storage medium in the aforementioned processing apparatus, for example, as a memory in the processing apparatus. The readable storage medium may be any of various media capable of storing a program code, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, and an optical disk.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A multi-target prediction method for short video advertisements of electronic commerce is characterized by comprising the following steps:

2. The method of claim 1, wherein the generating a gating vector according to user liveness to apply different weights to user portrayal features, short video features, and contextual features comprises:

setting a plurality of liveness levels, arranging the liveness levels from high to low, wherein the higher the liveness level is, the higher the corresponding liveness is, the lower the liveness level is, the lower the corresponding liveness level is, a part which is arranged in front is selected as a high liveness level, and the rest part is selected as a low liveness level;

determining the activity level of the user according to the activity value of the user;

if the user belongs to the high-activity level, the weight of the fine granularity characteristic in the generated gating vector is higher than that of the coarse granularity characteristic, and in the high-activity level, the weight of the fine granularity characteristic is higher before arrangement, and the weight of the coarse granularity characteristic is lower;

if the user belongs to the low-liveness level, the weight of the coarse-granularity characteristic in the generated gating vector is higher than that of the fine-granularity characteristic, and in the low-liveness level, the weight of the coarse-granularity characteristic is higher after arrangement, and the weight of the fine-granularity characteristic is lower;

the dimension of the gating vector is the same as the total number of the user image feature, the short video feature and the context feature; the user ID in the user portrait feature and the ID of the video in the short video feature belong to fine granularity features, and the rest of the user portrait features, the rest of the short video features and the context features all belong to coarse granularity features.

3. The method for multi-objective prediction of e-commerce short video advertisement according to claim 1, wherein the generating task related parameters dynamically by using a hyper network technology and an attention mechanism by using corresponding objective task vectors and type vectors corresponding to each behavior sequence, and then modeling each behavior sequence of the corresponding user by using the generated parameters respectively, and obtaining task related behavior sequence interest representations comprises:

inputting the target task vector and the type vector corresponding to each behavior sequence into a Hypernetwork network to obtain parameters related to the task; modeling each behavior sequence of the corresponding user respectively by combining the task related parameters and the attention mechanism to obtain interest vectors of each behavior sequence, and forming task related behavior sequence interest representation.

4. The method for multi-target prediction of e-commerce short video advertisements according to claim 3, wherein each target task corresponds to a unique ID, and a corresponding target task vector is obtained through a first embedding matrix; each target task has corresponding behaviors, each behavior has a corresponding behavior sequence, each behavior corresponds to a unique ID (identity), and a type vector corresponding to the behavior sequence of the user is obtained through a second embedding matrix.

5. The method for multi-objective prediction of e-commerce short video advertisements according to claim 1, wherein the multi-objective tasks include: the effective watching task, clicking task, store entering task and converting task correspond to the following actions in sequence: the method comprises the steps of effective watching behavior, clicking behavior, store entering behavior and conversion behavior, wherein each behavior has a corresponding behavior sequence, and each behavior sequence is a sequence formed by arranging short videos of the corresponding behavior according to time sequence in the historical time of a user.

6. The method for multi-objective prediction of e-commerce short video advertisements according to claim 1 or 5, wherein whether the behavior representation of the corresponding task needs to be updated is determined according to the cascade relation, and if the behavior representation does not need to be updated, the probability of the user executing the corresponding behavior of the corresponding objective task is predicted directly by using the behavior representation; if the user needs to update, the probability of predicting the corresponding behavior of the user to execute the corresponding target task by using the updated behavior representation comprises the following steps:

for the most primary target task, the probability of the user to execute the corresponding behavior of the most primary target task is directly predicted by using the corresponding behavior representation without updating the behavior representation;

for other target tasks, updating behavior representations, fusing the behavior representations corresponding to the precursor target tasks with the behavior representations corresponding to the subsequent target tasks according to the cascade relation, obtaining updated behavior representations, and predicting the probability of executing the behavior corresponding to the subsequent target tasks by using the updated behavior representations; if the precursor target task is not the most primary target task, the behavior representation corresponding to the precursor target task is the updated behavior representation of the precursor target task when the behavior representation corresponding to the subsequent target task is fused.

7. The method for multi-target prediction of e-commerce short video advertisements according to claim 6, wherein the multi-target tasks comprise two behavior logic links, and a cascade relationship between target tasks in a first behavior logic link is as follows: the method comprises the steps of effectively watching a task and clicking the task; the cascade relation between the target tasks in the second behavior logic link is as follows: effectively watching a task, entering a store, and converting the task; the primary target task in both logical links is the active viewing task.

8. An e-commerce short video advertisement multi-target prediction system, which is characterized in that the system is realized based on the method of any one of claims 1 to 7, and comprises:

9. A processing apparatus, comprising: one or more processors; a memory for storing one or more programs;

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

10. A readable storage medium storing a computer program, characterized in that the method according to any one of claims 1-7 is implemented when the computer program is executed by a processor.