CN114363671A

CN114363671A - Multimedia resource pushing method, model training method, device and storage medium

Info

Publication number: CN114363671A
Application number: CN202111676174.6A
Authority: CN
Inventors: 廖一桥
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-15
Anticipated expiration: 2041-12-31
Also published as: CN114363671B

Abstract

A multimedia resource pushing method, a model training method, a device and a storage medium are disclosed. The method comprises the following steps: acquiring feature information of an object to be processed, and generating original dimension feature data and newly added dimension feature data; inputting the original dimension characteristic data and the newly added dimension characteristic data into a multimedia push model; matching with multimedia resources in a multimedia resource library based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model to obtain candidate multimedia resources; and determining the target multimedia resource to be pushed from the candidate multimedia resources. Through the embodiment scheme provided by the disclosure, the Martian effect of the pushing model is effectively reduced, and the precision and the processing efficiency of the multimedia pushing model for pushing the multimedia resources are improved.

Description

Multimedia resource pushing method, model training method, device and storage medium

Technical Field

The present disclosure relates to the field of computer data processing technologies, and in particular, to a multimedia resource pushing method, a model training method, an apparatus, and a storage medium.

Background

At present, in the training of a short video push model, the cold start of a new user and a new video has a crucial influence on the ecology and the preservation of the whole model system. However, because the new user and the new video lack sufficient behavior data, fewer behavior samples can be used in training, the push model can learn about the new user and the new video by using limited behavior samples, so that the push model tends to learn about behavior samples of old users, the Martian effect is obvious, and the push result is not accurate enough.

Disclosure of Invention

The present disclosure provides a multimedia resource pushing method, a model training method, an apparatus, and a storage medium, which can reduce the Martian effect of a pushing model and improve the accuracy of pushing multimedia resources. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a multimedia asset pushing method, including:

acquiring feature information of an object to be processed, and generating original dimension feature data and newly added dimension feature data of the object to be processed according to the feature information;

inputting the original dimension characteristic data and the newly added dimension characteristic data into a multimedia pushing model, wherein the multimedia pushing model comprises an original multimedia pushing model and a personalized model, the original dimension characteristic data is processed through the original multimedia pushing model to obtain the characteristics output by the original multimedia pushing model, and the newly added dimension characteristic data is processed through the personalized model to obtain the characteristics output by the personalized model;

matching with multimedia resources in a multimedia resource library based on the characteristics output by the original multimedia push model and the characteristics output by the personalized model to obtain candidate multimedia resources;

and determining the target multimedia resource to be pushed from the candidate multimedia resources.

Optionally, in the method, the candidate multimedia resources include a pushing value of a preset behavior, and the determining, from the candidate multimedia resources, a target multimedia resource to be pushed includes:

and determining a target multimedia resource to be pushed from the candidate multimedia resources according to the pushing value of the preset behavior.

Optionally, in the method, the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, from the candidate multimedia resources, a target multimedia resource to be pushed according to the push value of the preset behavior includes:

determining a target preset behavior in the plurality of preset behaviors;

acquiring a push value of the target preset behavior in the candidate multimedia resource;

and determining the target multimedia resource to be pushed from the candidate multimedia resources according to the pushing value of the target preset behavior.

Optionally, in the method, the object to be processed is a newly added account, the newly added account includes an account whose generation duration is shorter than a first duration, and/or the multimedia resources in the multimedia resource library include newly added multimedia resources processed by the multimedia push model, and the newly added multimedia resources include multimedia resources whose generation duration is shorter than a second duration.

Optionally, in the method, the matching, based on the features output by the original multimedia push model and the features output by the personalized model, with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources includes:

combining the characteristics output by the original multimedia push model and the characteristics output by the personalized model to obtain combined characteristics;

and matching with the multimedia resources in the multimedia resource library according to the combination characteristics to obtain candidate multimedia resources.

Optionally, in the method, the original multimedia push model is a multitask learning model, and the personalized model is a multilayer perceptron;

matching the characteristics output by the original multimedia pushing model and the characteristics output by the personalized model with multimedia resources in a multimedia resource library to obtain candidate multimedia resources, wherein the matching comprises the following steps:

respectively converting original dimension characteristic data of each subtask network in the multi-task learning model into first target characteristics with the same dimension as the newly added dimension characteristic data;

inputting the first target characteristics of each subtask network into the personalized model to obtain second target characteristics output by each subtask network;

respectively splicing the second target characteristics output by each subtask network with the characteristics output by the personalized model to obtain first combined characteristics of each subtask network;

inputting the first combined characteristic into each subtask network in the multi-task learning model to obtain a third target characteristic;

and matching with the multimedia resources in the multimedia resource library according to the third target characteristics to obtain candidate multimedia resources.

the processing the original dimension feature data through the original multimedia push model to obtain the features output by the original multimedia push model comprises: inputting the original dimension characteristic data into each subtask network in the multi-task learning model to obtain a fourth target characteristic output by each subtask network in the multi-task learning model;

matching the characteristics output by the original multimedia pushing model and the characteristics output by the personalized model with multimedia resources in a multimedia resource library to obtain candidate multimedia resources comprises the following steps:

respectively inputting original dimension characteristic data of each subtask network in the multi-task learning model into the personalized model to obtain a one-dimensional fifth target characteristic;

carrying out weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combined features output by all subtask networks in the multi-task learning model;

and matching with the multimedia resources in the multimedia resource library according to the second combination characteristics to obtain candidate multimedia resources.

Optionally, in the method, the original multimedia push model is a multitask learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of the expert networks included in the multitask learning model;

the processing the original dimension feature data through the original multimedia push model to obtain the features output by the original multimedia push model comprises: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain sixth target characteristics output by each expert network in the multi-task learning model;

matching the characteristics output based on the original multimedia pushing model and the characteristics output based on the personalized model with multimedia resources in a multimedia resource library to obtain candidate multimedia resources;

combining the features output by the personalized model with the sixth target features output by each expert network respectively to obtain third combined features corresponding to each expert network;

inputting the third combined characteristic into a gating network in the multi-task learning model to obtain a seventh target characteristic;

and matching with the multimedia resources in the multimedia resource library according to the seventh target characteristic to obtain candidate multimedia resources.

Optionally, in the method, the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and the number of output channels in the convolutional neural network are respectively the same as the number of single-head attention models included in the multi-head attention model;

the processing the original dimension feature data through the original multimedia push model to obtain the features output by the original multimedia push model comprises: inputting the original dimension feature data into each single attention model in the multi-head attention model to obtain a seventh target feature output by each single attention network in the multi-head attention model;

combining the features output by the personalized model with the seventh features output by each single-head attention network in the multi-head attention model respectively to obtain fourth combined features corresponding to each single-head attention network;

and matching with the multimedia resources in the multimedia resource library according to the fourth combination characteristic to obtain candidate multimedia resources.

Optionally, in the method, when the original multimedia push model is a multitask learning model, different subtask networks in the multitask learning model use personalized models with different parameters to perform data processing.

According to a second aspect of the embodiments of the present disclosure, there is also provided a method for training a multimedia resource pushing model, including:

acquiring characteristic information of a training object, and generating an original training sample and a newly added training sample according to the characteristic information of the training object;

processing the newly added training sample through the personalized model to obtain the characteristics output by the personalized model, and processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics;

under the condition that the training object comprises a newly added account, matching the combined features with multimedia resources in a multimedia resource library to obtain multimedia pushing information of the newly added account, wherein the newly added account comprises an account with the generation time length being less than the first time length;

and comparing the multimedia pushing information of the newly added account with the characteristic information of the newly added account, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cutoff condition.

Optionally, in the method, in a case that the training object includes a newly added multimedia resource, after obtaining the combined feature, the method further includes:

inputting the combined characteristics into the original multimedia pushing model, and matching the combined characteristics with the characteristics of multimedia resources to obtain multimedia pushing information of the newly added multimedia resources, wherein the newly added multimedia resources comprise multimedia resources with the generation time length less than a second time length;

and comparing the multimedia pushing information of the newly added multimedia resource with the characteristic information of the newly added multimedia resource, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cutoff condition.

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: respectively converting original training samples of each subtask network in the multi-task learning model into first training characteristics with the same dimensionality as the newly added training samples of the personalized model;

the step of combining the features output by the personalized model and the features output by the original multimedia push model to obtain combined features comprises:

inputting the first training characteristics of each subtask network into the personalized model to obtain second training characteristics output by each subtask network;

and respectively splicing the second training characteristics output by each subtask network with the characteristics output by the personalized model to obtain the first training combination characteristics of each subtask network.

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: respectively inputting original training samples of each subtask network in the multi-task learning model into the personalized model to obtain one-dimensional training output characteristics;

the step of combining the features output by the personalized model and the features output by the original multimedia push model to obtain combined features comprises: and performing weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combined characteristics.

Optionally, in the method, the original multimedia push model is a multitask learning model, and the personalized model is a convolutional neural network;

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with the convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

the step of combining the features output by the personalized model and the features output by the original multimedia push model to obtain combined features comprises: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model respectively to obtain third training combination characteristics corresponding to each expert network;

and inputting the third training combination characteristic into a gating network in the multi-task learning model to obtain a fourth training combination characteristic.

the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training feature output by each single-head attention network in the multi-head attention model;

the step of combining the features output by the personalized model and the features output by the original multimedia push model to obtain combined features comprises: and combining the features output by the personalized model with the fourth training features output by each single-head attention network in the multi-head attention model respectively to obtain fifth combined features corresponding to each single-head attention network.

According to a third aspect of the embodiments of the present disclosure, there is also provided a multimedia asset pushing apparatus, including:

the system comprises a feature generation module, a feature extraction module and a feature extraction module, wherein the feature generation module is used for acquiring feature information of an object to be processed and generating original dimension feature data and newly added dimension feature data of the object to be processed according to the feature information;

the feature processing module is used for inputting the original dimension feature data and the newly added dimension feature data into a multimedia pushing model, the multimedia pushing model comprises an original multimedia pushing model and a personalized model, the original dimension feature data are processed through the original multimedia pushing model to obtain features output by the original multimedia pushing model, and the newly added dimension feature data are processed through the personalized model to obtain features output by the personalized model;

the combination matching module is used for matching with multimedia resources in a multimedia resource library based on the characteristics output by the original multimedia pushing model and the characteristics output by the personalized model to obtain candidate multimedia resources;

and the pushed resource determining module is used for determining the target multimedia resource to be pushed from the candidate multimedia resources.

Optionally, in the apparatus, the candidate multimedia resources include a pushing value of a preset behavior, and the determining, from the candidate multimedia resources, a target multimedia resource to be pushed includes:

Optionally, in the apparatus, the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, from the candidate multimedia resources, a target multimedia resource to be pushed according to the push value of the preset behavior includes:

determining a target preset behavior in the plurality of preset behaviors;

Optionally, in the apparatus, the object to be processed is a newly added account, the newly added account includes an account whose generation duration is less than a first duration, and/or the multimedia resources in the multimedia resource library include newly added multimedia resources processed by the multimedia push model, and the newly added multimedia resources include multimedia resources whose generation duration is less than a second duration.

Optionally, in the apparatus, the matching, based on the features output by the original multimedia push model and the features output by the personalized model, with the multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

Optionally, in the apparatus, when the original multimedia push model is a multitask learning model and the personalized model is a multilayer perceptron, the combination matching module includes:

the first target characteristic unit is used for respectively converting original dimension characteristic data of each subtask network in the multi-task learning model into first target characteristics with the same dimension as the newly added dimension characteristic data;

the second target characteristic unit is used for inputting the first target characteristics of each subtask network into the personalized model to obtain second target characteristics output by each subtask network;

the first combination unit is used for respectively splicing the second target characteristics output by each subtask network with the characteristics output by the personalized model to obtain the first combination characteristics of each subtask network;

the third target characteristic unit is used for inputting the first combined characteristic into each subtask network in the multi-task learning model to obtain a third target characteristic;

and the first matching unit is used for matching the multimedia resources in the multimedia resource library according to the third target characteristic to obtain candidate multimedia resources.

Optionally, in the apparatus, in a case that the original multimedia pushing model is a multitask learning model and the personalized model is a multilayer perceptron,

the feature processing module processes the original dimension feature data through the original multimedia push model to obtain features output by the original multimedia push model, and the feature processing module comprises: inputting the original dimension characteristic data into each subtask network in the multi-task learning model to obtain a fourth target characteristic output by each subtask network in the multi-task learning model;

the combination matching module includes:

a fifth target feature unit, configured to input original dimension feature data of each subtask network in the multi-task learning model into the personalized model, respectively, to obtain a one-dimensional fifth target feature;

the second combination unit is used for performing weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combination features output by all subtask networks in the multi-task learning model;

and the second matching unit is used for matching the multimedia resources in the multimedia resource library according to the second combination characteristic to obtain candidate multimedia resources.

Optionally, in the apparatus, when the original multimedia push model is a multitask learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of the expert networks included in the multitask learning model;

the feature processing module processes the original dimension feature data through the original multimedia push model to obtain features output by the original multimedia push model, and the feature processing module comprises: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain sixth target characteristics output by each expert network in the multi-task learning model;

the combination matching module comprises;

the third combination unit is used for combining the characteristics output by the personalized model with the sixth target characteristics output by each expert network respectively to obtain third combination characteristics corresponding to each expert network;

a seventh target feature unit, configured to input the third combined feature into a gating network in the multitask learning model to obtain a seventh target feature;

and the third matching unit is used for matching the multimedia resources in the multimedia resource library according to the seventh target characteristic to obtain candidate multimedia resources.

Optionally, in the apparatus, when the original multimedia push model is a multi-head attention model, the personalized model is a convolutional neural network, and the number of input channels and the number of output channels in the convolutional neural network are respectively the same as the number of single-head attention models included in the multi-head attention model,

the feature processing module processes the original dimension feature data through the original multimedia push model to obtain features output by the original multimedia push model, and the feature processing module comprises: inputting the original dimension feature data into each single attention model in the multi-head attention model to obtain a seventh target feature output by each single attention network in the multi-head attention model;

the combination matching module comprises;

a fourth combining unit, configured to combine features output by the personalized model with seventh features output by each single-headed attention network in the multi-headed attention model, respectively, to obtain fourth combined features corresponding to each single-headed attention network;

and the fourth matching unit is used for matching the multimedia resources in the multimedia resource library according to the fourth combination characteristic to obtain candidate multimedia resources.

Optionally, in the apparatus, when the original multimedia push model is a multitask learning model, different subtask networks in the multitask learning model use personalized models with different parameters to perform data processing.

According to a fourth aspect of the embodiments of the present disclosure, there is also provided a training apparatus for a multimedia resource pushing model, where the multimedia resource pushing model includes an original multimedia pushing model and a personalized model, the apparatus includes:

the training sample generation module is used for acquiring the characteristic information of a training object and generating an original training sample and a newly added training sample according to the characteristic information of the training object;

the training sample processing module is used for processing the newly added training sample through the personalized model to obtain the characteristics output by the personalized model, and processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

the training feature processing module is used for combining the features output by the personalized model and the features output by the original multimedia pushing model to obtain combined features;

the account feature processing module is used for matching the combined features with multimedia resources in a multimedia resource library to obtain multimedia pushing information of the newly added account under the condition that the training object comprises the newly added account, wherein the newly added account comprises an account with the generation time length being less than the first time length;

and the first parameter updating module is used for comparing the multimedia pushing information of the newly added account with the characteristic information of the newly added account, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cutoff condition.

Optionally, the apparatus further comprises:

the multimedia resource feature processing module is used for inputting the combined features into the original multimedia pushing model under the condition that the training object comprises the newly added multimedia resources, matching the combined features with the features of the multimedia resources to obtain multimedia pushing information of the newly added multimedia resources, wherein the newly added multimedia resources comprise the multimedia resources with the generation duration less than the second duration;

and the second parameter updating module is used for comparing the multimedia pushing information of the newly added multimedia resource with the characteristic information of the newly added multimedia resource and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cutoff condition.

Optionally, in the apparatus, when the original multimedia push model is a multitask learning model and the personalized model is a multilayer perceptron;

the training sample processing module processes the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model, and the characteristics comprise: respectively converting original training samples of each subtask network in the multi-task learning model into first training characteristics with the same dimensionality as the newly added training samples of the personalized model;

the training feature processing module combines the features output by the personalized model and the features output by the original multimedia push model to obtain combined features, and the combined features comprise:

the training sample processing module processes the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model, and the characteristics comprise: respectively inputting original training samples of each subtask network in the multi-task learning model into the personalized model to obtain one-dimensional training output characteristics;

the training feature processing module combines the features output by the personalized model and the features output by the original multimedia push model to obtain combined features, and the combined features comprise: and performing weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combined characteristics.

Optionally, in the apparatus, in a case that the original multimedia pushing model is a multitask learning model and the personalized model is a convolutional neural network,

the training sample processing module processes the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model, and the characteristics comprise: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with the convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

the training feature processing module combines the features output by the personalized model and the features output by the original multimedia push model to obtain combined features, and the combined features comprise: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model respectively to obtain third training combination characteristics corresponding to each expert network;

the training sample processing module processes the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model, and the characteristics comprise: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training feature output by each single-head attention network in the multi-head attention model;

the training feature processing module combines the features output by the personalized model and the features output by the original multimedia push model to obtain combined features, and the combined features comprise: and combining the features output by the personalized model with the fourth training features output by each single-head attention network in the multi-head attention model respectively to obtain fifth combined features corresponding to each single-head attention network.

In a fifth aspect of the disclosed embodiments, there is also provided a computer device, including:

at least one processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of the first and/or second aspects of the disclosure.

In a sixth aspect of the embodiments of the present disclosure, there is also provided a computer-readable storage medium, wherein instructions of the computer-readable storage medium, when executed by a processor of a computer device, enable the electronic device to perform the method of any one of the first and/or second aspects of the present disclosure.

A seventh aspect of embodiments of the present disclosure further provides a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the first and/or second aspects of the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the scheme of the embodiment of the disclosure, a plurality of pieces of feature data with different dimensions can be generated based on the feature information of the object to be processed and used as the input of the multimedia push model. The original multimedia push model in the multimedia push model is still processed by using original dimension characteristic data generated based on the characteristic information, and meanwhile, another newly added dimension characteristic data can be generated based on the characteristic information and can be used as the input of a personalized model in the multimedia push model for processing. Furthermore, the output characteristics of the personalized model and the output characteristics of the original multimedia recommendation model can be combined and matched with multimedia resources in a multimedia resource library to obtain candidate multimedia resources, and then the target multimedia resources to be pushed are determined. The multimedia push model provided in the scheme of the embodiment of the disclosure adopts the mode that the personalized model is fused into the original multimedia recommendation model, and the feature data used by the personalized model is generated based on the original feature information, so that the feature information of a user is more fully utilized (trained or predicted), and the push effect of the whole multimedia push model is prompted. And the personalized model and the original multimedia pushing model are combined and then used on line, the original multimedia pushing model is still processed by using original dimension characteristic data generated based on the characteristic information, the invasion of the personalized model to the original multimedia model is greatly reduced, and the iteration difficulty of the model is reduced or even not increased. By utilizing the multimedia pushing model provided by the disclosure to push multimedia resources, the problem that the prediction result of the multimedia pushing model always tends to learn the behavior of an old user (Martha effect) is effectively reduced, and the accuracy of multimedia resource pushing is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of a multimedia asset push method according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a multimedia asset push method according to an exemplary embodiment.

Fig. 3 is a flow chart illustrating a multimedia asset push method according to an exemplary embodiment.

Fig. 4 is a flow chart illustrating a multimedia asset push method according to an exemplary embodiment.

Fig. 5 is a flow chart illustrating a multimedia asset push method according to an exemplary embodiment.

Fig. 6 is a flow chart illustrating a multimedia asset push method according to an exemplary embodiment.

Fig. 7 is a flowchart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 8 is a flowchart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 9 is a flowchart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 10 is a flowchart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 11 is a flowchart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 12 is a flowchart illustrating a multimedia asset push model training method according to an exemplary embodiment.

Fig. 13 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 14 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 15 is a block diagram illustrating a multimedia asset pushing device according to an example embodiment.

Fig. 16 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

Fig. 17 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment.

FIG. 18 is a block diagram illustrating a training apparatus for a multimedia asset push model, according to an example embodiment.

FIG. 19 is a block diagram illustrating a training apparatus for a multimedia asset push model, according to an example embodiment.

FIG. 20 is a schematic block diagram illustrating an internal architecture of a computer device according to an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. For example, if the terms first, second, etc. are used to denote names, they do not denote any particular order.

It should be further noted that the information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) of the user or account or the object to be processed or the training object, etc. referred to in the present disclosure are information and data authorized by the user or sufficiently authorized by each party. The multimedia described in this disclosure may be a composite of multiple media, typically including one or more media forms of text, sound, images, video, animation, and so forth. The multimedia push model of the embodiment of the present disclosure may be used for video push, and for convenience of description, in the following embodiments, the embodiment of the present disclosure is described with video push as an application scenario, but the embodiment of the present disclosure is not limited to the application scenario of video push.

To reduce the horse-Tai effect of the model, some current schemes use a separate model for a new user or a new video to handle the cold start problem of the push model. The cold start problem described in the present disclosure may generally include business processes of business processes in situations where the amount of business data is small (does not meet the requirements of normal business processes) or no business data. If a new user is newly registered, the video push for the new user may be a cold start because there is no history data of watching, like likes, favorites, etc. of the new user in the video application before, and how to match the pushed video of the new user. Similarly, how a newly generated new video is more accurately described or classified, which users the new video is suitable for pushing to, etc. may also be a cold start for the new video. At present, for the cold start problem of a new user or a new video, a technical scheme based on meta-learning can be used for processing, but the scheme usually only affects the initialization of embedding (embedding or mapping, a way of converting discrete variables into continuous vectors), and can be applied to the cold start of the new user and the new video under the condition of a small amount of sample characteristic information even without any data. However, sparse historical data exists in both a new user and a new video which are cold started in an actual scene, and in order to guarantee the accuracy of a model, business personnel usually use the historical data as much as possible under the condition that behavior sample characteristic information exists, and use the historical data as the sample characteristic information of the model, but the problem that the model tends to learn the behavior of an old user when a small amount of sample characteristic information exists cannot be solved by the method. And the scheme of independently deploying the models of the new user or the new video requires that a plurality of sets of models are deployed simultaneously on line, such as two sets of models of the new user (or the new video) and the old user, so that more requirements are required for machine resources, and higher iteration processing difficulty is achieved.

For the cold start problem of a new user or a new video, the embodiment scheme provided by the disclosure can construct a new user and/or a new video model in an original multimedia push model, so that each new user and/or each new video can be characterized by a separate personalized model. After the personalized model and the original multimedia pushing model are combined (can be regarded as a multimedia resource pushing model), the combined model is used on line, so that only one model needs to be deployed on line, and the iteration difficulty of the model is rarely or even not increased. And the parameters of the personalized model can be obtained by more fully training based on the historical data of each user and video, and the personalized model is fused into the fusion of the original multimedia pushing model, so that the personalized model influences the output of the whole multimedia pushing model, and the problem that the final pushing model always tends to learn the behaviors of old users (Martian effect) is effectively reduced.

The multimedia asset pushing method provided by the present disclosure can be applied to the application environment shown in fig. 1. The server 120 may include a server for pushing multimedia resources, and may communicate with the terminal 110 to push multimedia resources to the terminal 110. Of course, the servers (including the computer devices described below) described in the embodiments of the present disclosure may be a single server, a server cluster, a distributed subsystem, a cloud processing platform, a server including a blockchain node, and a combination thereof.

The following describes an implementation scenario of performing multimedia resource pushing processing on a multimedia resource pushing model formed by fusing and adding a personalized model of a new user and/or a new video on the basis of an original multimedia pushing model. It is understood that the descriptions of the new user, the new video, the old user, the old video, etc. described in the embodiments of the present disclosure may be preset with rules to determine whether the user is the new user or the old user and whether the video is the new video or the old video. For example, a user whose account registration time is less than one week away from the multimedia push model training or prediction time may be defined as a new user. As described above, in the following embodiments, the scheme of the present disclosure is described with a video in a multimedia resource as an application scene. Fig. 2 is a flowchart illustrating a multimedia asset pushing method according to an exemplary embodiment, and as shown in fig. 2, the method may be implemented in the server 120 and may include the following steps.

In step S20, feature information of the object to be processed is obtained, and original dimension feature data and added dimension feature data of the object to be processed are generated according to the feature information.

The object to be processed may be an account to which the multimedia resource needs to be pushed currently. In the model training phase, the training object can be an account or a multimedia resource. When the object to be processed is an account, the characteristic information of the object to be processed may include account identification, gender, age, geographic location, hobbies, and the like. The original dimension characteristic data and the added dimension characteristic data can be generated according to the characteristic information of the object to be processed. In the embodiment of the present disclosure, the original dimension characteristic data and the newly added dimension characteristic data may both be generated based on the characteristic information, and are respectively used for data processing in different models in different multimedia push models.

The feature data added on the basis of the original dimension feature data can be called as added dimension feature data. The original dimension characteristic data and the newly added dimension characteristic data can be regarded as a plurality of pieces of characteristic data which are generated based on the same piece of characteristic information and are respectively used for different model processing. For example, the original dimension feature data of a video feature includes 128-dimensional feature data, and the 128-dimensional feature data may be newly added on the basis of the 128-dimensional feature data by using some preset transformation rules, and the newly added 128-dimensional feature data serves as new dimension feature data. In some embodiments of the present disclosure, the original dimension feature data and the added dimension feature data may have different feature dimension numbers, for example, the original dimension feature data is 128 dimensions, and the added dimension feature data may be 64 dimensions.

The original dimension characteristic data can be expanded in various ways to obtain newly-added dimension characteristic data. Such as using GAN (generate countermeasure network) or new data obtained by embedding learning based on the original dimension feature data as the added dimension feature data. The imbedding can realize the conversion of high-dimensional sparse original feature data into low-dimensional dense feature data, and the transformed imbedding feature data can be used as original or newly-added dimension feature data to participate in the training of a neural network. For example, in one example, feature information _ Fea1 such as the identification, age, geographical location, and the like of the User _ User1 is obtained, and the feature information _ Fea1 is converted by embedding to generate 128-dimensional feature data _ D1, and this feature data _ D1 can be used as original dimensional feature data used by an original multimedia recommendation model. And, based on the feature information of the User _ User1, another piece of 128-dimensional feature data _ D2 is generated by converting the feature information _ Fea1 through embedding, and this feature data _ D2 can be used as newly added dimension feature data of a personalized model. Of course, the above is only one example of generating the original dimension feature data and the added dimension feature data according to the feature information, and the specific processing may further include other processing steps or other processing manners such as data deformation, transformation, and combination.

In step S22, the original dimension characteristic data and the newly added dimension characteristic data are input into a multimedia push model, where the multimedia push model includes an original multimedia push model and a personalized model, the original dimension characteristic data is processed by the original multimedia push model to obtain characteristics output by the original multimedia push model, and the newly added dimension characteristic data is processed by the personalized model to obtain characteristics output by the personalized model.

In this embodiment, a pre-built multimedia push model may be used, and the push model may be obtained by combining an original multimedia push model and a newly generated personalized model. The original multimedia push model may comprise a multimedia asset push model that has been previously existing or used, such as a model of currently pushing video to an old user. The original multimedia pushing model can be a multi-task model or a single-task model, such as an estimation model for judging the probability of clicking by a user.

The object to be processed may have a newly added account, where the newly added account includes an account whose generation duration is less than the first duration, and/or the multimedia resources in the multimedia resource library include newly added multimedia resources processed by the multimedia push model, and the newly added multimedia resources include multimedia resources whose generation duration is less than the second duration. The original multimedia pushing model can be used for pushing processing of a new user (a new account) and training processing of a new multimedia resource (such as a new video), and for the new user, the Martian effect can be reduced, and the multimedia resource which better accords with the characteristics of the new user can be obtained; for the newly added multimedia resource, more accurate multimedia pushing information can be marked by aiming at the characteristics of the newly added multimedia resource, so that the new user and the newly added multimedia resource can be more accurately depicted, and the pushing accuracy of the whole multimedia resource is improved. The newly added multimedia resources processed by the multimedia push model can be stored in the multimedia resource library. And matching the original dimension characteristic data generated according to the characteristic information of the object to be processed with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources obtained by matching the original multimedia pushing model.

The personalized model may include a model selected for use by a new user or a new video in some embodiments of the present disclosure, and may be obtained by training a new training sample generated from feature information used by an original multimedia push model. In the model training stage, the trained user or video may be referred to as a training object, and the original training sample and the new training sample may be generated based on feature information of the training object. The feature information of the training object may also include other data information, such as behavior data of the user on the pushed video, such as praise, attention, watching duration, and the like. Whether the training object is a user or a video, generally, the acquired characteristic information is usually the real data information of the acquired user or video.

After the original dimension characteristic data and the newly added dimension characteristic are obtained, the original dimension characteristic data can be processed through the original multimedia push model to obtain the characteristic output by the original multimedia push model, and the newly added dimension characteristic data is processed through the personalized model to obtain the characteristic output by the personalized model.

In step S24, matching the multimedia resources in the multimedia resource library based on the features output by the original multimedia push model and the features output by the personalized model, so as to obtain candidate multimedia resources.

The characteristics output by the original multimedia push model and the characteristics output by the personalized model can be processed according to a model structure or other structures and parameter adjustment modes after the personalized model and the original push model are fused, for example, the characteristics output by the original multimedia push model and the characteristics output by the personalized model are combined into one or a group of new characteristics. And then, matching the multimedia resources based on the new characteristics to obtain candidate multimedia resources.

Therefore, when the multimedia resources are matched, the characteristics output by the original multimedia resource pushing model are not only used for matching, but also the characteristics output by the personalized model are fused, and the subsequent multimedia resources with the characteristics more prone to personalized output can be obtained by matching the characteristics output by the original multimedia resource pushing model and the characteristics output by the personalized model with the multimedia resources in the multimedia resource library, so that the Martian effect generated by the original multimedia pushing model only performing multimedia resource matching according to original dimension characteristic data is effectively reduced, and the obtained candidate multimedia resources are more accurate.

In step S26, a target multimedia resource to be pushed is determined from the candidate multimedia resources.

The candidate multimedia assets typically include a plurality of multimedia assets. In some solutions of this embodiment, the candidate multimedia resources may be directly pushed to the user as the target multimedia resources, or the target multimedia resources to be pushed may be determined from the subsequent multimedia resources according to other set manners, for example, the target multimedia resources are determined according to output results of other models, or the candidate multimedia resources are further screened by other manners to determine the target multimedia resources to be pushed.

The embodiment can construct a new user and a new video model (personalized model) in an old user model (original multimedia push model), and each new user and each new video are characterized by a separate personalized small model. The personalized model and the original multimedia pushing model are combined to obtain a multimedia pushing model, when the online application is carried out, only one multimedia pushing model needs to be deployed, the original dimension characteristic data generated by the characteristic information of the object to be processed and the input value original multimedia pushing model and the personalized model corresponding to the newly added dimension characteristic data are matched with the multimedia resources in the multimedia resource library based on the characteristics output by the original multimedia pushing model and the characteristics output by the personalized model, candidate multimedia resources are obtained, the Martian effect in the whole multimedia pushing model can be relieved, meanwhile, the resource consumption and the calculation processing difficulty of the multimedia pushing model can be reduced remarkably, and the precision and the processing efficiency of the multimedia resources pushed by the multimedia pushing model are improved.

In another implementation of the method provided by the present disclosure, the candidate multimedia resources include a pushing value of a preset behavior, and the determining a target multimedia resource to be pushed from the candidate multimedia resources includes:

s260: and determining a target multimedia resource to be pushed from the candidate multimedia resources according to the pushing value of the preset behavior.

The preset behaviors may include praise, collection, sharing and the like. Each preset behavior may be represented by a corresponding push value, and the push value may identify a predicted value of the user performing the preset behavior on the multimedia resource, and may be represented by multiple representation manners such as a probability value, a score, a level, and the like. For example, the multitask pre-estimation scores output by the multimedia push model can be combined according to an ensemble sort formula to obtain a total score of a certain preset behavior, and the total score can be used as a push value of the preset behavior. In this embodiment, when the multimedia resource pushing model matches a candidate multimedia resource, a certain preset behavior pushing value executed by the User on the multimedia resource may also be simultaneously output, for example, a certain candidate multimedia resource is the wrong-and-wrong-gathering short video _ V1, the feature information of the User _ User2 includes a character type of "kindergarten" and a occupation of a student ", and when the multimedia resource pushing model matches the collection behavior of the domestic wrong-and-wrong-gathering short video _ V1 with respect to the User _ User2, the pushing value is 85 points or 0.85, which indicates that the collection score of the wrong-and-wrong-gathering short video _ V1 by the User _ User2 is 85 points or the probability of collection is 0.85. Similarly, if the push value of another mistaken-for-gathering short video _ V2 matched by the User _ User2 to abroad is 80 points or 0.8, it indicates that the probability that the User _ User2 collects the mistaken-for-gathering short video _ V2 is 80 points or 0.8. The push value of the mischief album short video _ V2 is higher than that of the mischief album short video _ V1 for the preset behavior of collection, so the priority of the mischief album short video _ V2 is higher than that of the mischief album short video _ V1 for the preset behavior of collection. Therefore, in this embodiment, a target multimedia resource to be pushed can be determined from the candidate multimedia resources based on the pushing value of the preset behavior, and a multimedia resource which is likely to be executed by the user by the preset behavior or has a higher pushing value of the preset behavior is selected from the candidate multimedia resources, so that the determined target pushing resource is more likely to be executed by the user by the preset behavior, the accuracy of the target multimedia resource sent to the user is improved, the user experience is improved, the pushing of the multimedia resource can be more consistent with an expected pushing effect, for example, the user approves or collects the pushed video, and the cold start effect of a new user or a new video is improved.

In other implementations, the candidate multimedia assets can include a plurality of predetermined behaviors. Therefore, in another embodiment of the method provided by the present disclosure, the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and the determining, according to the push values of the preset behaviors, a target multimedia resource to be pushed from the candidate multimedia resources may include:

s2600: determining a target preset behavior in the plurality of preset behaviors;

s2602: acquiring a push value of the target preset behavior in the candidate multimedia resource;

s2604: and determining the target multimedia resource to be pushed from the candidate multimedia resources according to the pushing value of the target preset behavior.

In particular, a single multimedia asset of the plurality of candidate multimedia assets may comprise one or more predetermined behaviors, such that the entire candidate multimedia asset may comprise one or more predetermined behaviors. For example, 10 candidate multimedia resources are matched, wherein 8 candidate multimedia resources correspond to the preset behaviors of praise and collection, two candidate multimedia resources correspond to the preset behaviors of collection, and each preset behavior corresponds to a corresponding push value. If the push score of the preset behavior favored by the user in the candidate multimedia resource _ V3 is 80 points, the push score of the preset behavior collected by the user is 85 points. Of course, some embodiments of the present disclosure do not exclude implementations where a portion of the candidate multimedia assets have no predetermined behavior. And when the target multimedia resource to be pushed is determined, determining a target preset behavior of the preset behavior executed by the expected user from the preset behaviors. The target preset behavior can be one or more. The target pre-behavior may be selected and determined from a plurality of preset behaviors included in the subsequent multimedia resource, for example, the target pre-behavior is like, or the target pre-behavior is like and favorite. And then determining the target multimedia resource to be pushed from the candidate multimedia resources according to the pushing value of the target preset behavior. For example, after the target preset behavior is determined, the push values of the candidate multimedia resources including the target preset behavior may be sorted in a descending order, and a preset number of multimedia resources sorted in the top order are selected as the target multimedia resources. By means of the scheme of the embodiment of the disclosure, the target multimedia resource can be determined according to the pushing of the target preset behavior, so that the determined target multimedia resource better meets the requirement of an expected user for executing a certain preset behavior, the pushing accuracy of the target multimedia resource to be pushed is improved, and the pushing effect of the multimedia resource is better met.

In other embodiments of the method of the present disclosure, the matching the features output based on the original multimedia push model and the features output based on the personalized model with multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

and matching with the multimedia resources in the multimedia resource library according to the first combination characteristics to obtain candidate multimedia resources.

The original dimension characteristic data is input into the original multimedia push model to obtain the characteristics output by the original multimedia push model, and the newly added dimension characteristic data is input into the personalized model to obtain the characteristics output by the personalized model. In this embodiment, the characteristics output by the original multimedia push model and the characteristics output by the personalized model may be combined to obtain combined characteristics. Different combination modes can be set according to the type and structure of the original multimedia push model and/or the personalized model, the matching of multimedia resources and/or the push requirement, for example, the N-dimensional feature output by the personalized model and the M feature output by the original multimedia push model are spliced to form the (M + N) -dimensional combination feature, or the N-dimensional feature output by the personalized model is used for further weighting, normalizing, correcting and the like of the feature output by the original multimedia push model. Therefore, the combined features are fused with the features output by the personalized model, the Martian effect in the original multimedia pushing model is relieved, and the multimedia pushing accuracy of the whole multimedia pushing model for a new user can be improved.

The personalization model in the multimedia push model provided by the present disclosure may include multiple types of models. For example, the personalized model may be an MLP model (MLP, Multilayer Perceptron, also called artificial Neural Network), a CNN (Convolutional Neural Networks, CNN, model Convolutional Neural Network), an RNN model (Recurrent Neural Network, RNN, Recurrent Neural Network) or a Transformer model. The personalized model is an MLP model, and if a 216-dimensional feature is taken, it can be changed into three layers of MLP parameters, where each layer of input/output channels is 8, that is, 216 dimensions (8 × 8+8) × 3 are shared. In the case of the CNN model, if 72-dimensional features are taken, the CNN model can be changed into a layer of 1-dimensional convolutional layer parameters, where the input channel and the output channel are both 8, and the convolutional kernel size is 1, i.e., (8 × 1+1) × 8 ═ 72. Fig. 3 is a flow diagram (partially shown) illustrating a method for multimedia asset push according to an example embodiment. Specifically, as shown in fig. 3, the original multimedia push model is a multitask learning model, and the personalized model is a multilayer perceptron;

s302: respectively converting original dimension characteristic data of each subtask network in the multi-task learning model into first target characteristics with the same dimension as the newly added dimension characteristic data;

s304: inputting the first target characteristics of each subtask network into the personalized model to obtain second target characteristics output by each subtask network;

s306: respectively splicing the second target characteristics output by each subtask network with the characteristics output by the personalized model to obtain first combined characteristics of each subtask network;

s308: and matching with the multimedia resources in the multimedia resource library according to the first combination characteristics to obtain candidate multimedia resources.

Fig. 3 is a flow diagram (partially shown) illustrating a method for multimedia asset push according to an example embodiment. In another embodiment of the method provided by the present disclosure, the multimedia push model may be obtained by using an MLP model as a personalized model, and combining the personalized model into an original multimedia push model. Specifically, the matching between the features output by the original multimedia pushing model and the features output by the personalized model and the multimedia resources in the multimedia resource library to obtain candidate multimedia resources may include:

s308: inputting the first combined characteristic into each subtask network in the multi-task learning model to obtain a third target characteristic;

s310: and matching with the multimedia resources in the multimedia resource library according to the third target characteristics to obtain candidate multimedia resources.

The original multimedia push model in this embodiment may be a multitask learning model. The multitask model generally comprises a plurality of task networks, and the learning efficiency and the learning quality of each task can be improved by learning the connection and the difference of different tasks. The multi-task learning framework can be used for sharing the structure of the shared-bottom, different tasks share the hidden layer at the bottom, and the risk of overfitting can be reduced. In this embodiment, the multimedia push model may use an MLP (multi layer per predictor) model as a personalized model, and the MLP model may be added to a tower (network or sub-network sub-task) output by the multitask model. Each network sub-task may have corresponding input features and output features. Typically the MLP corresponding to a network sub-task is called a tower. Typically, there may be several tower for several tasks in the multitasking model. In this embodiment, the input features (original dimension data) of each subtask network in the multi-task learning model may be converted into first target features having the same dimension as the input features (newly added dimension feature data) of the personalized model, and then the first target features of each subtask network are input into the personalized model to obtain the second target features of each subtask network. For example, if the input and output features of the personalized model have a dimension of three layers MLP of 8, then the personalized model has a total dimension of (8 × 8+8) × 3 — 216 dimensions. At this time, the input features of tower can be transformed into specific dimensions (8 dimensions), and then input into the three-layer MLP composed of 216-dimensional features, so as to obtain the second target features of each subtask network. The personalized model can be obtained by training by using the newly added training sample. Here, the first target feature of each subtask network is input into the personalized model, and the second target feature output by each subtask network is obtained. Further, the second target characteristics output by each subtask network are spliced with the characteristics output by the personalized model respectively to obtain the first combined characteristics of each subtask network.

And after the first combined characteristic is obtained, processing the third combined characteristic as a new input characteristic (third target characteristic) of each subtask network, and matching the third combined characteristic with the multimedia resources in the multimedia resource library to obtain candidate multimedia resources. In the implementation scenario described above, the 8-dimensional output features of the personalized model may be spliced to the first combined features obtained after the three-layer MLP process composed of 216-dimensional features, and then the first combined features are input into each tower of the original multimedia recommendation model for operation processing. The splicing and combining described in the embodiments of the present disclosure include, but are not limited to, end-to-end connection between feature data, performing operations (addition, multiplication, etc.) on feature data at corresponding positions, and performing feature vector operations, etc. For example, 8-dimensional features output by a tower are spliced with 8-dimensional features output by the personalized model to obtain 16-dimensional features of the tower.

The embodiment provides a multimedia resource pushing method in which the original multimedia pushing model is a multitask learning model and the personalized model is a multilayer perceptron, the features output by the personalized model and the features output by each subtask network in the multitask learning model are spliced and combined, and the combined first combined features fuse the features output by the personalized model, so that the Martian effect in the original multimedia pushing model can be effectively achieved, and the multimedia pushing accuracy of the whole multimedia pushing model for a new user can be improved. The embodiment provides a mode of combining the MLP personalized model into each tower in the multitasking model, the MLP personalized model can be effectively fused into the original multimedia pushing model, so that the parameters of the personalized model can enable the feature information of a newly added account or a newly added multimedia resource to be sufficiently trained, the whole multimedia pushing model can be processed by combining the output features of the personalized model to reduce the Martian effect, the complexity of the combination of the personalized model and the original multimedia model can be reduced, and the requirement on resources and the model calculation complexity are reduced.

In another embodiment of the multimedia push model provided by the present disclosure, the input features of tower can be finally transformed into one-dimensional feature output through a three-layer MLP model. The output one-dimensional features can be directly weighted and summed with the original tower output to obtain the output result of the final push model. Fig. 4 is a flow chart illustrating a multimedia asset push method according to an exemplary embodiment. As shown in fig. 4, the original multimedia push model is a multitask learning model, and the personalized model is a multilayer perceptron;

s402: the processing the original dimension feature data through the original multimedia push model to obtain the features output by the original multimedia push model comprises:

inputting the original dimension characteristic data into each subtask network in the multi-task learning model to obtain a fourth target characteristic output by each subtask network in the multi-task learning model;

s404: respectively inputting original dimension characteristic data of each subtask network in the multi-task learning model into the personalized model to obtain a one-dimensional fifth target characteristic;

s406: carrying out weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combined features output by all subtask networks in the multi-task learning model;

s408: and matching with the multimedia resources in the multimedia resource library according to the second combination characteristics to obtain candidate multimedia resources.

The embodiment provides another mode of combining the MLP personalized model into each tower in the multitasking model, and the MLP personalized model can be effectively fused into the original multimedia push model, so that the parameters of the personalized model can enable the feature information of a newly added account or a newly added multimedia resource to be sufficiently trained, the whole multimedia push model can be processed by combining the output features of the personalized model to reduce the martensitic effect, the complexity of the combination of the personalized model and the original multimedia model can be reduced, and the requirement on resources and the model calculation complexity are reduced.

It should be noted that, in the embodiment of the present disclosure, the combination or fusion of the features or models involved in the application and training phase of the multimedia push model is not limited to the connection manner between the models, and may include the mutual operations of input data, output data, intermediate results, and the like between different models. As described above, the result of the weighted summation is used as the feature of the multimedia push model for matching the multimedia resource, which is also one of the implementation manners of obtaining the multimedia push model based on the combination of the original multimedia push model and the personalized model in some embodiments of the present disclosure.

The present disclosure also provides another embodiment of obtaining candidate multimedia resources by matching multimedia resources using a multimedia push model. In some embodiments, a CNN model is used as a personalized model in the multimedia push model. Fig. 5 is a flow diagram (partially shown) illustrating a method for multimedia asset push according to an example embodiment. As shown in fig. 5, the original multimedia push model is a multitask learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of the expert networks included in the multitask learning model;

s502: the processing the original dimension feature data through the original multimedia push model to obtain the features output by the original multimedia push model comprises: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain sixth target characteristics output by each expert network in the multi-task learning model;

s504: combining the features output by the personalized model with the sixth target features output by each expert network respectively to obtain third combined features corresponding to each expert network;

s506: inputting the third combined characteristic into a gating network in the multi-task learning model to obtain a seventh target characteristic;

s508: and matching with the multimedia resources in the multimedia resource library according to the seventh target characteristic to obtain candidate multimedia resources.

When the original multimedia push model may be a multitask Learning model, such as a Multi-Task Learning model (MMoE) (Modeling Task Relationships in Multi-Task Learning with Multi-gate), and the personalized model is a convolutional neural network, combining output features of the personalized model with sixth target features output by each expert network in the multitask Learning model MMoE, respectively, to obtain third combined features corresponding to each expert network, where the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks included in the multitask Learning model MMoE;

the general multi-task learning model is usually to share hidden layers close to the input layer as a whole. The MMoE is adapted to multi-task learning by sharing a sub-model of an expert network between tasks on the basis of the expert network. The MMoE may divide the shared underlying presentation layer into multiple experts, while setting gates, so that different tasks may use the shared layer variously. Taking an 8-task model and an MMoE with 8 expeters as an example, in the conventional method, the outputs of 8 expeters are combined through a gating network. However, the parameters of the gating network are the same for different video IDs and different user IDs, which cannot realize personalization, so that the parameters of the gating network tend to be better learned by old users with more behavior data, and new users with less behavior and new videos are ignored, thereby bringing a Martian effect. In the scheme provided by the disclosure, the number of input channels and output channels of the personalized model of the CNN model is the same, for example, the number of input channels and the number of output channels are both 8 one-dimensional convolution layers, and the size of a convolution kernel is 1. Based on the convolution layer, the output characteristics of the eight expeters and the characteristics of the newly added account and/or the newly added multimedia resource can be combined in a personalized way, and then the combination is processed through a gating network. Therefore, each newly added account and/or newly added multimedia resource has different expert combination modes, on one hand, the Martian effect can be relieved, and the newly added account and/or the newly added multimedia resource can learn the expert combination mode suitable for the user. On the other hand, through output fusion of a plurality of experts, information can be transmitted among the experts, and the situation that some of the experts are degraded into noise and zero output is avoided.

The present disclosure also provides another implementation way of using the CNN model as a personalized model and adding the personalized model to the original multimedia push model to generate a multimedia push model for multimedia resource push. Fig. 6 is a flow diagram (partially shown) illustrating a method for multimedia asset push according to an example embodiment. As shown in fig. 6, in this embodiment, the original multimedia push model may be a multi-task learning model including a multi-head attention model, where the number of input channels and the number of output channels in the convolutional neural network are respectively the same as the number of single-head attention models included in the multi-head attention model;

s602: the processing the original dimension feature data through the original multimedia push model to obtain the features output by the original multimedia push model comprises: inputting the original dimension feature data into each single attention model in the multi-head attention model to obtain a seventh target feature output by each single attention network in the multi-head attention model;

s604: combining the features output by the personalized model with the seventh features output by each single-head attention network in the multi-head attention model respectively to obtain fourth combined features corresponding to each single-head attention network;

s606: and matching with the multimedia resources in the multimedia resource library according to the fourth combination characteristic to obtain candidate multimedia resources.

In the embodiment of the present disclosure, the CNN model may be added to a Multi-head Attention model. Taking Multi-head Attention with 8 heads as an example, Attention weights obtained by 8 heads in the conventional method are independent, and in practical application, there is often a case where a plurality of heads are degenerated into a Sum posing (a pooling layer in a neural network). According to the scheme, the Attention weights of different heads are combined based on the one-dimensional convolution layers with the number of input and output channels being 8, so that the individualized multimedia resource pushing of the multimedia pushing model can be realized while the Attention degradation is effectively avoided, the Martian effect is reduced, the complexity of the multimedia pushing model processing can be reduced, and the efficiency of the multimedia pushing model for multimedia resource pushing processing is improved.

In each embodiment of the foregoing method, when the original multimedia push model is a multitask learning model, different subtask networks in the multitask learning model may use personalized models with different parameters to perform data processing. If the original multimedia push model is a multitask learning model, different tools (tasks or network subtasks) in the multitask learning model can use personalized models with the same parameters. In a specific example scenario, only two personalized models obtained based on the video ID and the user ID may be used, or more personalized models obtained based on other video and user characteristic parameters may be used. Each personalized model can be combined with the original recommendation model and the original multimedia push model to form a multimedia push model, such as a multimedia push model of a new user or a multimedia push model of a new video, or a multimedia push model capable of simultaneously aiming at the new user and the new video. Different subtask networks in the multi-task learning model can use personalized models with different parameters to perform data processing, so that each task has different personalized feature outputs, the Martian effect of the whole multimedia pushing model can be further reduced, personalized pushing of multimedia resource pushing can be realized, and the multimedia resource pushing experience of a user is improved.

Based on the foregoing description of the embodiment of the multimedia resource pushing method, the present disclosure also provides a training method for a multimedia resource pushing model. Specifically, fig. 7 is a flowchart illustrating a method for training a multimedia asset push model according to an exemplary embodiment. As shown in fig. 7, a method for training a multimedia resource pushing model, where the multimedia resource pushing model includes an original multimedia pushing model and a personalized model, includes:

s702: acquiring characteristic information of a training object, and generating an original training sample and a newly added training sample according to the characteristic information of the training object;

s704: processing the newly added training sample through the personalized model to obtain the characteristics output by the personalized model, and processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model;

s706: combining the characteristics output by the personalized model with the characteristics output by the original multimedia push model to obtain combined characteristics;

s708: under the condition that the training object comprises a newly added account, matching the combined features with multimedia resources in a multimedia resource library to obtain multimedia pushing information of the newly added account, wherein the newly added account comprises an account with the generation time length being less than the first time length;

s710: and comparing the multimedia pushing information of the newly added account with the characteristic information of the newly added account, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cutoff condition.

The multimedia resource pushing model obtained by the multimedia resource pushing model training method provided by the embodiment of the disclosure can be applied to implementation scenes comprising the multimedia resource pushing method. The method for training the multimedia resource pushing model mainly comprises the steps of building a personalized model through a newly added training sample, and training an original multimedia pushing model and the personalized model by using the newly added training sample and an original training sample, wherein the newly added training sample is generated based on characteristic information for generating the original training sample. And fusing the personalized model into the original multimedia push model, wherein the specific fusion can comprise the adjustment of the model structure and parameters, the different combination processing of the input and output characteristic data of each model and the like. The training object can comprise an account which is newly added and can also comprise a multimedia resource which is newly added. Generally, the feature information of the real training object, for example, a preset behavior acquired and obtained when a certain newly added account actually approves a certain video, may be used as the feature information of the newly added account as the training object. In the model training process, multimedia push information output by the model can be obtained, for example, the probability that the newly added account approves the video is predicted, then the multimedia push information output by the model is compared with the preset behavior actually generated by the newly added account, and then the network parameters of the whole multimedia push model are updated according to the comparison result.

In the actual model training, user characteristics, video characteristics, user historical behaviors (such as behavior of praise, watching duration and the like on a certain or certain video pushing resource) and the like can be taken as input, and the score of the user for executing a certain behavior on the video, such as the probability of praise, can be estimated. And comparing the estimated scores with the actual behaviors of the users in the training samples, and the like, for example, calculating a BCE loss function (a loss function), further updating the network parameters of the push model, and optimizing the push model.

According to the training method of the multimedia push model provided by the embodiment scheme provided by the disclosure, the personalized model of the new user can be constructed in the original multimedia push model, and each new user can be characterized by a separate personalized model. After the personalized model and the original multimedia pushing model are combined (can be regarded as a multimedia resource pushing model), the combined model is used on line, so that only one model needs to be deployed on line, and the iteration difficulty of the model is rarely or even not increased. And the parameters of the personalized model can be obtained by more fully training based on the characteristic information of each user, and the personalized model is fused into the fusion of the original multimedia pushing model, so that the personalized model influences the output of the whole multimedia pushing model, and the problem that the original multimedia pushing model always tends to learn the behaviors of old users (Martian effect) is effectively reduced.

Fig. 8 is a flowchart illustrating a method for training a multimedia asset push model according to an exemplary embodiment. As shown in fig. 8, in the case that the training object includes a newly added multimedia resource, after obtaining the combined feature, the method may further include:

s802: inputting the combined characteristics into the original multimedia pushing model, and matching the combined characteristics with the characteristics of multimedia resources to obtain multimedia pushing information of the newly added multimedia resources, wherein the newly added multimedia resources comprise multimedia resources with the generation time length less than a second time length;

s804: and comparing the multimedia pushing information of the newly added multimedia resource with the characteristic information of the newly added multimedia resource, and updating the parameters of the multimedia resource pushing model according to the comparison result until the comparison result meets the model training cutoff condition.

The training method for the multimedia push model provided by the embodiment scheme provided by the disclosure can construct the personalized model of the newly added multimedia (such as a new video) in the original multimedia push model, so that each newly added multimedia has a separate personalized model for representing. After the personalized model and the original multimedia pushing model are combined (can be regarded as a multimedia resource pushing model), the combined model is used on line, so that only one model needs to be deployed on line, and the iteration difficulty of the model is rarely or even not increased. And the parameters of the personalized model can be obtained by more fully training based on the characteristic information of each newly added multimedia, and the personalized model is fused into the fusion of the original multimedia pushing model, so that the personalized model influences the output of the whole multimedia pushing model, and the Martian effect of the original multimedia pushing model is effectively reduced.

Fig. 9 is a flowchart (partially shown) illustrating a method for training a multimedia asset push model according to an exemplary embodiment. As shown in fig. 9, the original multimedia push model is a multitask learning model, and the personalized model is a multilayer perceptron;

s902: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: respectively converting original training samples of each subtask network in the multi-task learning model into first training characteristics with the same dimensionality as the newly added training samples of the personalized model;

s904: inputting the first training characteristics of each subtask network into the personalized model to obtain second training characteristics output by each subtask network;

s906: and respectively splicing the second training characteristics output by each subtask network with the characteristics output by the personalized model to obtain the first training combination characteristics of each subtask network.

In the multimedia push model training process in this embodiment, the features output by the personalized model and the features output by the original multimedia push model are combined, and network model training is performed based on the combined features, so that the martensitic effect in the original multimedia push model is alleviated, and the multimedia push accuracy of the whole multimedia push model for a new user can be improved.

Fig. 10 is a flowchart (partially shown) illustrating a method for training a multimedia asset push model according to an exemplary embodiment. As shown in fig. 10, the original multimedia push model is a multitask learning model, and the personalized model is a multilayer perceptron;

s1002: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: respectively inputting original training samples of each subtask network in the multi-task learning model into the personalized model to obtain one-dimensional training output characteristics;

s1004: the step of combining the features output by the personalized model and the features output by the original multimedia push model to obtain combined features comprises: and performing weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combined characteristics.

Fig. 11 is a flow diagram (partially shown) illustrating a method for training a multimedia asset push model according to an exemplary embodiment. As shown in fig. 11, the original multimedia push model is a multitask learning model, and the personalized model is a multilayer perceptron;

s1102: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with the convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

s1104: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model respectively to obtain third training combination characteristics corresponding to each expert network;

s1106: and inputting the third training combination characteristic into a gating network in the multi-task learning model to obtain a fourth training combination characteristic.

In this embodiment, when the original multimedia push model may be a multitask learning model, such as MMoE (, where the personalized model is a convolutional neural network, based on the convolutional layer, output features of multiple experts and features of a newly added account and/or a newly added multimedia resource may be personalized and combined and then processed through a gating network

Fig. 12 is a flowchart illustrating a method for training a multimedia asset push model according to an exemplary embodiment. As shown in fig. 12, the original multimedia push model is a multi-headed attention model, the personalized model is a convolutional neural network, and the number of input channels and the number of output channels in the convolutional neural network are respectively the same as the number of single-headed attention models included in the multi-headed attention model;

s1202: the processing the original training sample pair through the original multimedia push model to obtain the characteristics output by the original multimedia push model comprises: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training feature output by each single-head attention network in the multi-head attention model;

s1204: the step of combining the features output by the personalized model and the features output by the original multimedia push model to obtain combined features comprises: and combining the features output by the personalized model with the fourth training features output by each single-head attention network in the multi-head attention model respectively to obtain fifth combined features corresponding to each single-head attention network.

In the scheme of the embodiment of the present disclosure, a CNN model may be added to a Multi-head Attention model, and by combining the Attention weights of different heads based on one-dimensional convolution layers with the same number of input/output channels as that of single-head Attention models included in the Multi-head Attention model, personalized multimedia resource pushing of a multimedia pushing model can be realized while avoiding the deterioration of the Attention with efficiency, so as to reduce the martai effect, reduce the complexity of processing the multimedia pushing model, and provide the efficiency of the multimedia pushing model for multimedia resource pushing processing.

As mentioned above, in another embodiment of the training method for a multimedia push model provided by the present disclosure, in a case that the original multimedia push model is a multitask learning model, different subtask networks in the multitask learning model use personalized models with different parameters for data processing. Different subtask networks in the multi-task learning model can use personalized models with different parameters to perform data processing, so that each task has different personalized feature outputs, the Martian effect of the whole multimedia pushing model can be further reduced, personalized pushing of multimedia resource pushing can be realized, and the multimedia resource pushing experience of a user is improved.

In the above embodiment of the method for training the multimedia push model, the specific manner of performing the same or similar operations as the method for pushing the multimedia resource in the online application has been described in detail in the embodiment of the method, and will not be elaborated herein.

It is understood that the embodiments of the method described above are described in a progressive manner, and the same/similar parts of the embodiments are referred to each other, and each embodiment focuses on differences from the other embodiments. Reference may be made to the description of other method embodiments for relevant points.

It should be understood that, although the steps in the flowchart diagrams referred to in the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in the figures may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the steps or stages of other steps.

Based on the above description of the embodiments of the multimedia resource pushing method and the training method of the multimedia resource pushing model, the present disclosure also provides a multimedia resource pushing device and a training device of the multimedia resource pushing model. The apparatus may include systems (including distributed systems), software (applications), modules, components, servers, clients, etc. that use the methods described in embodiments of the present specification in conjunction with any necessary apparatus to implement the hardware. Based on the same innovative concept, the embodiments of the present disclosure provide an apparatus in one or more embodiments as described in the following embodiments. Since the implementation scheme of the apparatus for solving the problem is similar to that of the method, the specific implementation of the apparatus in the embodiment of the present specification may refer to the implementation of the foregoing method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 13 is a block diagram illustrating a multimedia asset pushing device according to an exemplary embodiment. The apparatus may be the aforementioned server, or a module, component, device, unit, etc. integrated with the server. Referring specifically to fig. 13, the apparatus 100 may include:

the feature generation module 1302 may be configured to obtain feature information of an object to be processed, and generate original dimension feature data and newly added dimension feature data of the object to be processed according to the feature information;

a feature processing module 1304, configured to input the original dimension feature data and the newly added dimension feature data into a multimedia push model, where the multimedia push model includes an original multimedia push model and a personalized model, process the original dimension feature data through the original multimedia push model to obtain features output by the original multimedia push model, and process the newly added dimension feature data through the personalized model to obtain features output by the personalized model;

the combination matching module 1306 may be configured to match, based on the features output by the original multimedia push model and the features output by the personalized model, multimedia resources in a multimedia resource library to obtain candidate multimedia resources;

a pushed resource determining module 1308, configured to determine a target multimedia resource to be pushed from the candidate multimedia resources.

In another embodiment of the apparatus provided by the present disclosure, the candidate multimedia resources include a pushing value of a preset behavior, and the determining, from the candidate multimedia resources, a target multimedia resource to be pushed includes:

In another embodiment of the apparatus provided by the present disclosure, the candidate multimedia resources include a plurality of preset behaviors and push values corresponding to the preset behaviors, and determining, from the candidate multimedia resources, a target multimedia resource to be pushed according to the push value of the preset behavior includes:

determining a target preset behavior in the plurality of preset behaviors;

In another embodiment of the apparatus provided by the present disclosure, the object to be processed is a newly added account, the newly added account includes an account whose generation duration is less than a first duration, and/or the multimedia resources in the multimedia resource library include newly added multimedia resources processed by the multimedia push model, and the newly added multimedia resources include multimedia resources whose generation duration is less than a second duration.

In another embodiment of the apparatus provided by the present disclosure, the matching, based on the features output by the original multimedia push model and the features output by the personalized model, with the multimedia resources in a multimedia resource library to obtain candidate multimedia resources includes:

An exemplary embodiment is shown in fig. 14, and fig. 14 is a block diagram (partially not shown) of a multimedia asset pushing apparatus according to an exemplary embodiment. Referring to fig. 14, in the case that the original multimedia push model is a multitask learning model and the personalized model is a multi-layer perceptron, the combination matching module 1306 may include:

a first target feature unit 1402, configured to convert original dimension feature data of each sub-task network in the multi-task learning model into first target features having the same dimension as the newly added dimension feature data, respectively;

a second target feature unit 1404, configured to input the first target feature of each subtask network into the personalized model, to obtain a second target feature output by each subtask network;

a first combining unit 1406, configured to splice the second target features output by each subtask network with the features output by the personalized model, respectively, to obtain first combined features of each subtask network;

a third target feature unit 1408, configured to input the first combined feature into each subtask network in the multi-task learning model to obtain a third target feature;

the first matching unit 1410 may be configured to match the multimedia resources in the multimedia resource library according to the third target feature, so as to obtain candidate multimedia resources.

An exemplary embodiment is shown in fig. 15, and fig. 15 is a block diagram (partially not shown) of a multimedia asset pushing apparatus according to an exemplary embodiment. Referring to fig. 15, in a case that the original multimedia push model is a multitask learning model and the personalized model is a multilayer perceptron, the feature processing module 1304 processes the original dimension feature data through the original multimedia push model, and obtaining features output by the original multimedia push model includes: inputting the original dimension characteristic data into each subtask network in the multi-task learning model to obtain a fourth target characteristic output by each subtask network in the multi-task learning model;

the combination matching module 1306 includes:

a fifth target feature unit 1502, configured to input original dimension feature data of each subtask network in the multi-task learning model into the personalized model, respectively, to obtain a one-dimensional fifth target feature;

a second combining unit 1504, configured to perform weighted summation on the one-dimensional fifth target feature and the fourth target feature to obtain second combined features output by all subtask networks in the multi-task learning model;

the second matching unit 1506 may be configured to match the multimedia resources in the multimedia resource library according to the second combination feature, so as to obtain candidate multimedia resources.

An exemplary embodiment is shown in fig. 16, and fig. 16 is a block diagram (partially not shown) of a multimedia asset pushing apparatus according to an exemplary embodiment. Referring to fig. 16, in a case that the original multimedia push model is a multitask learning model including an expert network, the personalized model is a convolutional neural network, the convolutional neural network is a one-dimensional convolutional layer with a convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of the expert networks included in the multitask learning model;

the feature processing module 1304 processes the original dimension feature data through the original multimedia push model, and obtaining features output by the original multimedia push model includes: inputting the original dimension characteristic data into each expert network in the multi-task learning model to obtain sixth target characteristics output by each expert network in the multi-task learning model;

the combination matching module 1306 includes;

a third combining unit 1602, configured to combine the features output by the personalized model with the sixth target features output by each of the expert networks, respectively, to obtain third combined features corresponding to each of the expert networks;

a seventh target feature unit 1604, configured to input the third combined feature into a gating network in the multitasking learning model, so as to obtain a seventh target feature;

the third matching unit 1606 may be configured to match the multimedia resource in the multimedia resource library according to the seventh target feature, so as to obtain a candidate multimedia resource.

An exemplary embodiment is shown in fig. 17, and fig. 17 is a block diagram (partially not shown) of a multimedia asset pushing apparatus according to an exemplary embodiment. Referring to fig. 17, in a case that the original multimedia push model is a multi-headed attention model, the personalized model is a convolutional neural network, and the number of input channels and output channels in the convolutional neural network is respectively the same as the number of single-headed attention models included in the multi-headed attention model,

the feature processing module 1304 processes the original dimension feature data through the original multimedia push model, and obtaining features output by the original multimedia push model includes: inputting the original dimension feature data into each single attention model in the multi-head attention model to obtain a seventh target feature output by each single attention network in the multi-head attention model;

the combination matching module 1306 includes;

a fourth combining unit 1702, configured to combine the features output by the personalized model with the seventh features output by each single-headed attention network in the multi-headed attention model, respectively, to obtain fourth combined features corresponding to each single-headed attention network;

a fourth matching unit 1704, configured to match the multimedia resource in the multimedia resource library according to the fourth combined feature, so as to obtain a candidate multimedia resource.

In another embodiment of the apparatus provided by the present disclosure, in a case that the original multimedia push model is a multitask learning model, different subtask networks in the multitask learning model use personalized models with different parameters for data processing.

An exemplary embodiment is shown in fig. 18, and fig. 18 is a block diagram of a training apparatus for a multimedia asset push model according to an exemplary embodiment. The apparatus may be the aforementioned server, or a module, component, device, unit, etc. integrated with the server. Specifically, referring to fig. 18, the multimedia resource pushing model includes an original multimedia pushing model and a personalized model, and the apparatus 200 may include:

a training sample generation module 1802, configured to obtain feature information of a training object, and generate an original training sample and a new training sample according to the feature information of the training object;

a training sample processing module 1804, configured to process the newly added training sample through the personalized model to obtain features output by the personalized model, and process the original training sample pair through the original multimedia push model to obtain features output by the original multimedia push model;

a training feature processing module 1806, configured to combine features output by the personalized model with features output by the original multimedia push model to obtain combined features;

an account feature processing module 1808, configured to, when the training object includes a newly added account, match the combined feature with a multimedia resource in a multimedia resource library to obtain multimedia push information of the newly added account, where the newly added account includes an account whose generation duration is less than a first duration;

the first parameter updating module 1810 may be configured to compare the multimedia pushing information of the newly added account with the feature information of the newly added account, and update the parameter of the multimedia resource pushing model according to a comparison result until the comparison result meets a model training cutoff condition.

An exemplary embodiment is shown in fig. 19, and fig. 19 is a block diagram of a training apparatus for a multimedia asset push model according to an exemplary embodiment. Referring to fig. 19, the apparatus 200 may further include:

the multimedia resource feature processing module 1902, configured to, when the training object includes a newly added multimedia resource, input the combined feature into the original multimedia push model, and match the combined feature with a feature of the multimedia resource to obtain multimedia push information of the newly added multimedia resource, where the newly added multimedia resource includes a multimedia resource whose generation duration is less than a second duration;

the second parameter updating module 1904 may be configured to compare the multimedia pushing information of the newly added multimedia resource with the feature information of the newly added multimedia resource, and update the parameter of the multimedia resource pushing model according to a comparison result until the comparison result meets a model training cutoff condition.

In another embodiment of the apparatus provided by the present disclosure, in a case that the original multimedia pushing model is a multitasking learning model and the personalized model is a multilayer perceptron;

the training sample processing module 1804 is configured to process the original training sample pair through the original multimedia push model, and obtain the feature output by the original multimedia push model includes: respectively converting original training samples of each subtask network in the multi-task learning model into first training characteristics with the same dimensionality as the newly added training samples of the personalized model;

the training feature processing module 1806 combines the features output by the personalized model and the features output by the original multimedia push model, and obtaining combined features includes:

the training sample processing module 1804 is configured to process the original training sample pair through the original multimedia push model, and obtain the feature output by the original multimedia push model includes: respectively inputting original training samples of each subtask network in the multi-task learning model into the personalized model to obtain one-dimensional training output characteristics;

the training feature processing module 1806 combines the features output by the personalized model and the features output by the original multimedia push model, and obtaining combined features includes: and performing weighted summation on the one-dimensional training output characteristics and the characteristics output by each sub-network model of the multi-task learning model to obtain second training combined characteristics.

In another embodiment of the apparatus provided by the present disclosure, in case that the original multimedia pushing model is a multitasking learning model and the personalized model is a convolutional neural network,

the training sample processing module 1804 is configured to process the original training sample pair through the original multimedia push model, and obtain the feature output by the original multimedia push model includes: training the multi-task learning model by using the original training sample to obtain third training characteristics of each expert network in the multi-task learning model; the convolutional neural network is a one-dimensional convolutional layer with the convolutional kernel size of 1, and the number of input channels and output channels of the convolutional neural network is the same as the number of expert networks contained in the multi-task learning model;

the training feature processing module 1806 combines the features output by the personalized model and the features output by the original multimedia push model, and obtaining combined features includes: combining the output characteristics of the personalized model with the third training characteristics of each expert network in the multi-task learning model respectively to obtain third training combination characteristics corresponding to each expert network;

In another embodiment of the apparatus provided by the present disclosure, in a case that the original multimedia push model is a multi-head attention model, and the personalized model is a convolutional neural network, where the number of input channels and the number of output channels in the convolutional neural network are respectively the same as the number of single-head attention models included in the multi-head attention model,

the training sample processing module 1804 is configured to process the original training sample pair through the original multimedia push model, and obtain the feature output by the original multimedia push model includes: inputting the original training sample into each single-head attention model in the multi-head attention model to obtain a fourth training feature output by each single-head attention network in the multi-head attention model;

the training feature processing module 1806 combines the features output by the personalized model and the features output by the original multimedia push model, and obtaining combined features includes: and combining the features output by the personalized model with the fourth training features output by each single-head attention network in the multi-head attention model respectively to obtain fifth combined features corresponding to each single-head attention network.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

It is understood that the same/similar parts between the embodiments of the method and the apparatus in the present specification can be referred to each other, each embodiment focuses on the difference from the other embodiments, and the related points can be referred to the description of the other method embodiments.

FIG. 20 is a schematic block diagram illustrating an internal structure of a computer device S00, according to an example embodiment. For example, the device S00 may be a server. Referring to FIG. 20, device S00 includes a processing component S20 that further includes one or more processors and memory resources represented by memory S22 for storing instructions, e.g., applications, that are executable by processing component S20. The application program stored in the memory S22 may include one or more modules each corresponding to a set of instructions. Furthermore, the processing component S20 is configured to execute instructions to perform the method multimedia asset pushing method and/or the training method of the multimedia asset pushing model described above.

The device S00 may also include a power supply component S24 configured to perform power management of the device S00, a wired or wireless network interface S26 configured to connect the device S00 to a network, and an input-output (I/O) interface S28. The device S00 may operate based on an operating system stored in the memory S22, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or the like.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory S22 comprising instructions, executable by the processor of the device S00 to perform the above multimedia resource pushing method and/or the training method of the multimedia resource pushing model is also provided. The storage medium may be a computer-readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, a graphene storage device, or the like.

In an exemplary embodiment, a computer program product is also provided, which comprises instructions executable by a processor of the electronic device S00 to perform a multimedia asset push method and/or a training method of a multimedia asset push model.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class embodiment, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.

It should be noted that, the descriptions of the apparatus, the computer device, the server, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments. Meanwhile, the new embodiment formed by the mutual combination of the features of the methods, the devices, the equipment and the server embodiments still belongs to the implementation range covered by the present disclosure, and the details are not repeated herein.

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the coupling, communication connection, etc. between the devices or units shown or described may be realized by direct and/or indirect coupling/connection, and may be realized by some standard or customized interfaces, protocols, etc., in an electrical, mechanical or other form.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.

Claims

1. A multimedia resource pushing method is characterized by comprising the following steps:

2. The method according to claim 1, wherein the object to be processed is an added account, the added account includes an account with a generation duration less than a first duration, and/or the multimedia resources in the multimedia resource library include the added multimedia resources processed by the multimedia push model, and the added multimedia resources include multimedia resources with a generation duration less than a second duration.

3. The method of claim 1, wherein the matching the characteristics output by the original multimedia push model and the characteristics output by the personalized model with multimedia resources in a multimedia resource library to obtain candidate multimedia resources comprises:

4. A training method for a multimedia resource pushing model is characterized in that the multimedia resource pushing model comprises an original multimedia pushing model and a personalized model, and the method comprises the following steps:

5. The method of claim 4, wherein in the case that the training object comprises a newly added multimedia resource, after obtaining the combined features, the method further comprises:

6. A multimedia asset pushing device, comprising:

7. An apparatus for training a multimedia asset push model, wherein the multimedia asset push model comprises an original multimedia push model and a personalized model, the apparatus comprising:

8. A computer device, comprising:

at least one processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the multimedia asset pushing method according to any one of claims 1 to 3 and/or to implement the training method of the multimedia asset pushing model according to any one of claims 4 to 5.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the computer device to perform the multimedia asset push method according to any one of claims 1 to 3, and/or implement the training method of the multimedia asset push model according to any one of claims 4 to 5.

10. A computer program product comprising a computer program, wherein the computer program, when being executed by a processor, implements the multimedia asset pushing method according to any of the claims 1 to 3 and/or implements the training method of the multimedia asset pushing model according to any of the claims 4 to 5.