CN116956183A - Multimedia resource recommendation method, model training method, device and storage medium - Google Patents

Multimedia resource recommendation method, model training method, device and storage medium Download PDF

Info

Publication number
CN116956183A
CN116956183A CN202310645304.2A CN202310645304A CN116956183A CN 116956183 A CN116956183 A CN 116956183A CN 202310645304 A CN202310645304 A CN 202310645304A CN 116956183 A CN116956183 A CN 116956183A
Authority
CN
China
Prior art keywords
candidate
sample
resource
information
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310645304.2A
Other languages
Chinese (zh)
Inventor
刘刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310645304.2A priority Critical patent/CN116956183A/en
Publication of CN116956183A publication Critical patent/CN116956183A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/437Administration of user profiles, e.g. generation, initialisation, adaptation, distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a multimedia resource recommendation method, a model training method, a device and a storage medium, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, internet of vehicles and the like, wherein the method comprises the following steps: acquiring target object attributes of a target object and candidate resource information of candidate multimedia resources; inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features; inputting the candidate comprehensive characteristics into a first task prediction network of a resource recommendation model to predict playing information, and obtaining a playing information prediction result; inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; and obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result. The application improves the recommendation accuracy of the multimedia resources.

Description

Multimedia resource recommendation method, model training method, device and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a multimedia resource recommendation method, a model training device, and a storage medium.
Background
In the age of rapid internet development, as the threshold for content production decreases, the amount of video upload increases at an exponential rate. Firstly, introducing the distribution flow of short video content at present, and starting uploading, successfully uploading and successfully entering the process of user consumption of the short video content: shooting a video through a terminal shooting tool, uploading the video through the terminal, transcoding the video in the video uploading process, normalizing the video file, storing the meta-information of the video, and improving the playing compatibility of the video on each platform. Then the video can be subjected to manual auditing, and the machine can also carry out acquisition of some auxiliary features such as multi-dimensional classification, labels and the like on the content through an algorithm at the same time of the manual auditing; and then, manually standardized labeling is carried out on the basis of machine algorithm processing, related information such as various label information of the video is labeled, that is, standardized starting of the content is realized, and the label is used as one of the expression modes of the content characteristics, has the advantages of strong interpretability and easy expansion compared with other modes such as a main body or a content vector, and is widely applied to scenes such as recommendation, operation and the like. After being started, a content pool is built for the recommendation engine. Finally, the recommendation engine is based on the portrait features of the user, and the recommendation engine is based on a deep learning model and the like through recommendation algorithms such as collaborative recommendation, matrix decomposition and supervised learning algorithm models. The recommendation algorithm is essentially an information processing logic, which processes information according to a certain logic after acquiring information of users and contents, and generates recommendation results.
In the related art, in the process of recommending multimedia resources, the click rate of a user is generally used as a prediction index to recommend resources, and the index with a single dimension is used to recommend resources, so that the recommendation accuracy is difficult to ensure.
Disclosure of Invention
The application provides a multimedia resource recommendation method, a model training method, a device and a storage medium, which can improve the recommendation accuracy of multimedia resources.
In one aspect, the present application provides a multimedia resource recommendation method, which includes:
acquiring target object attributes of a target object and candidate resource information of candidate multimedia resources;
inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features;
inputting the candidate comprehensive characteristics into a first task prediction network of the resource recommendation model to predict playing information, and obtaining a playing information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource;
Inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resource;
obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending the candidate multimedia asset to the target object.
In another aspect, a method for training a resource recommendation model is provided, where the method includes:
acquiring sample training information of a sample object; the sample training information comprises sample object attributes of the sample objects and sample resource information of sample multimedia resources interacted with the sample objects; the sample training information is marked with a sample playing information label and a sample feedback information label; the sample playing information tag represents a playing result corresponding to the sample multimedia resource under the interactive operation of the sample object, and the sample feedback information tag represents feedback information of the sample object for the sample multimedia resource;
Inputting the sample training information into a preset comprehensive feature extraction network of a model to be trained, and performing feature extraction processing to obtain sample comprehensive features;
inputting the sample comprehensive characteristics into a first preset task prediction network of the model to be trained, and predicting playing information to obtain a sample playing information prediction result;
inputting the sample comprehensive characteristics into a second preset task prediction network of the preset model to perform feedback information prediction, so as to obtain a sample feedback information prediction result;
and training the preset model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label to obtain a resource recommendation model.
Another aspect provides a multimedia resource recommendation apparatus, the apparatus comprising:
the information acquisition module is used for acquiring target object attributes of the target objects and candidate resource information of the candidate multimedia resources;
the candidate feature determining module is used for inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features;
The play information prediction module is used for inputting the candidate comprehensive characteristics into a first task prediction network of the resource recommendation model to perform play information prediction so as to obtain a play information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource;
the feedback information prediction module is used for inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction so as to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resource;
the recommendation result determining module is used for obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending the candidate multimedia asset to the target object.
In an exemplary embodiment, the integrated feature extraction network includes an attribute feature extraction network, a resource feature extraction network, and a multi-headed attention network, and the candidate feature determination module includes:
The target feature determining unit is used for inputting the target object attribute into the attribute feature extraction network and carrying out attribute feature extraction processing to obtain a target attribute feature;
the candidate resource feature determining unit is used for inputting the candidate resource information into the resource feature extraction network to perform resource feature extraction processing to obtain candidate resource features;
and the candidate comprehensive feature determining unit is used for inputting the target attribute feature and the candidate resource feature into the multi-head attention network to perform fusion processing so as to obtain the candidate comprehensive feature.
In an exemplary embodiment, the information acquisition module includes:
the object attribute acquisition unit is used for acquiring the object attribute of the object;
an information flow analysis unit, configured to analyze the candidate multimedia resources into a plurality of information flows;
the resource acquisition unit is used for acquiring candidate resource information corresponding to each information flow;
in an exemplary embodiment, the integrated feature extraction network further comprises a feature fusion network, the apparatus further comprising:
the fusion module is used for inputting the target attribute characteristics and the candidate resource characteristics corresponding to each information flow into the characteristic fusion network, and carrying out fusion processing on the candidate resource characteristics corresponding to each information flow and the target attribute characteristics to obtain candidate fusion characteristics corresponding to each information flow;
The candidate integrated feature determination unit includes:
the attention determining subunit is used for inputting the candidate fusion characteristics corresponding to each information flow into the multi-head attention network, and performing attention prediction processing to obtain an attention result corresponding to each candidate fusion characteristic;
and the characteristic determining subunit is used for obtaining the candidate comprehensive characteristics based on the attention result corresponding to each candidate fusion characteristic.
In an exemplary embodiment, the resource feature extraction network includes a visual feature extraction sub-network, an audio feature extraction sub-network, a text feature extraction sub-network, and a resource feature fusion sub-network, and the candidate resource feature determination unit includes:
the visual characteristic determining subunit is used for inputting the candidate resource information into the visual characteristic extracting sub-network to extract visual characteristics so as to obtain candidate visual characteristics;
the audio feature determining subunit is used for inputting the candidate resource information into the audio feature extracting sub-network to extract the audio features so as to obtain candidate audio features;
a text feature determining subunit, configured to input the candidate resource information into the text feature extracting sub-network, and perform text feature extraction to obtain candidate text features;
And the feature fusion subunit is used for inputting the candidate visual features, the candidate audio features and the candidate text features into the resource feature fusion subunit to perform feature fusion processing so as to obtain the candidate resource features.
In an exemplary embodiment, the first task prediction network is at least two, the second task prediction network is at least two, and the play information prediction module includes:
the play information prediction unit is used for inputting the candidate comprehensive characteristics into at least two first task prediction networks to perform play information prediction so as to obtain play information prediction results corresponding to the at least two first task prediction networks respectively;
in an exemplary embodiment, the feedback information prediction module includes:
the feedback information prediction unit is used for inputting the candidate comprehensive characteristics into at least two second task prediction networks and performing feedback information prediction to obtain feedback information prediction results corresponding to the at least two second task prediction networks respectively;
in an exemplary embodiment, the recommendation result determining module includes:
and the recommending unit is used for obtaining the target recommending result based on the play information predicting result corresponding to each of the at least two first task predicting networks and the feedback information predicting result corresponding to each of the at least two second task predicting networks.
In an exemplary embodiment, the play information prediction unit includes:
the first prediction subunit is used for inputting the candidate comprehensive characteristics into a play time task prediction network to predict the play time so as to obtain a play time prediction result;
the second prediction subunit is used for inputting the candidate comprehensive characteristics into a play completion task prediction network to perform play completion prediction so as to obtain a play completion prediction result;
and the result determining subunit is used for determining the play duration prediction result and the play completion degree prediction result as the play information prediction result.
In an exemplary embodiment, the feedback information prediction unit includes:
the fast-sliding rate prediction subunit is used for inputting the candidate comprehensive characteristics into a fast-sliding rate task prediction network to perform fast-sliding rate prediction so as to obtain a fast-sliding rate prediction result; the fast slip rate prediction result characterizes the frequency of executing a preset interaction instruction by the target object aiming at the candidate multimedia resource; the preset interaction instruction is an instruction that the interaction time is smaller than a preset threshold value;
the sharing rate prediction subunit is used for inputting the candidate comprehensive characteristics into a sharing rate task prediction network to perform sharing rate prediction so as to obtain a sharing rate prediction result;
The praise rate prediction subunit is used for inputting the candidate comprehensive features into a praise rate task prediction network to perform praise rate prediction so as to obtain a praise rate prediction result;
the attention rate prediction subunit is used for inputting the candidate comprehensive characteristics into an attention rate task prediction network to perform attention rate prediction so as to obtain attention rate prediction results;
the comment rate prediction subunit is used for inputting the candidate comprehensive features into a comment rate task prediction network to perform comment rate prediction so as to obtain a comment rate prediction result;
and the prediction result determining subunit is configured to determine the fast slip rate prediction result, the sharing rate prediction result, the praise rate prediction result, the attention rate prediction result and the comment rate prediction result as the feedback information prediction result.
In an exemplary embodiment, the candidate multimedia resources are at least two, and the recommendation result determining module includes:
the parameter determining unit is used for determining recommended parameters corresponding to each candidate multimedia resource based on the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource;
a recommendation result determining unit, configured to determine a target recommendation result corresponding to each candidate multimedia resource based on a recommendation parameter corresponding to each candidate multimedia resource;
In an exemplary embodiment, the apparatus further comprises:
the resource determining module is used for determining the multimedia resources to be recommended based on the target recommendation result corresponding to each candidate multimedia resource;
and the recommending module is used for recommending the multimedia resource to be recommended to the target object.
In an exemplary embodiment, the apparatus further comprises:
the weight determining module is used for acquiring a first weight corresponding to the play information prediction result and a second weight corresponding to the feedback information prediction result;
the parameter determination unit includes:
a first information determining subunit, configured to determine a product of a play information prediction result corresponding to each candidate multimedia resource and the first weight, to obtain a first information prediction result corresponding to each candidate multimedia resource;
a second information determining subunit, configured to determine a product of a feedback information prediction result corresponding to each candidate multimedia resource and the second weight, to obtain a second information prediction result corresponding to each candidate multimedia resource;
and the recommendation parameter determining subunit is used for determining recommendation parameters corresponding to each candidate multimedia resource based on the first information prediction result and the second information prediction result corresponding to each candidate multimedia resource.
In an exemplary embodiment, the parameter determination unit includes:
the result input subunit is used for inputting the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource into a reinforcement learning model to obtain a benefit prediction result corresponding to each candidate multimedia resource;
a parameter determining subunit, configured to determine a revenue prediction result corresponding to each candidate multimedia resource as a recommended parameter corresponding to each candidate multimedia resource;
the system comprises a sample multimedia resource, a sample feedback information prediction result, a sample return label and a reinforcement learning model training module, wherein the reinforcement learning model training module is used for acquiring the sample play information prediction result, the sample feedback information prediction result and the sample return label corresponding to the sample multimedia resource; inputting the sample play information prediction result, the sample feedback information prediction result and the sample profit label into a preset model to obtain a sample profit prediction result; and training the preset model based on the difference between the sample benefit label and the sample benefit prediction result to obtain the reinforcement learning model.
In another aspect, a training device for a resource recommendation model is provided, where the device includes:
The sample information acquisition module is used for acquiring sample training information of a sample object; the sample training information comprises sample object attributes of the sample objects and sample resource information of sample multimedia resources interacted with the sample objects; the sample training information is marked with a sample playing information label and a sample feedback information label; the sample playing information tag represents a playing result corresponding to the sample multimedia resource under the interactive operation of the sample object, and the sample feedback information tag represents feedback information of the sample object for the sample multimedia resource;
the sample comprehensive feature determining module is used for inputting the sample training information into a preset comprehensive feature extraction network of the model to be trained, and carrying out feature extraction processing to obtain sample comprehensive features;
the sample play information prediction module is used for inputting the sample comprehensive characteristics into a first preset task prediction network of the model to be trained, and performing play information prediction to obtain a sample play information prediction result;
the sample feedback information prediction module is used for inputting the sample comprehensive characteristics into a second preset task prediction network of the preset model to perform feedback information prediction so as to obtain a sample feedback information prediction result;
The model training module is used for training the preset model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label to obtain a resource recommendation model.
Another aspect provides an electronic device comprising a processor and a memory having stored therein at least one instruction or at least one program loaded and executed by the processor to implement a multimedia asset recommendation method as described above.
Another aspect provides a computer storage medium storing at least one instruction or at least one program loaded and executed by a processor to implement the multimedia asset recommendation method as described above.
Another aspect provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device executes to implement the multimedia resource recommendation method as described above.
The multimedia resource recommendation method, the model training method, the device and the storage medium provided by the application have the following technical effects:
the application obtains the target object attribute of the target object and the candidate resource information of the candidate multimedia resource; inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features; inputting the candidate comprehensive characteristics into a first task prediction network of a resource recommendation model to predict playing information, and obtaining a playing information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource; inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resources; obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending candidate multimedia assets to the target object. According to the application, for various characteristics of the target object, the same comprehensive characteristic extraction network is adopted for characteristic extraction, so that the construction of a plurality of characteristic extraction networks is avoided, and the calculated amount in the repeated content characteristic extraction process is saved; and then, respectively predicting the extracted characteristics through the two branch task prediction networks to obtain different prediction results, so that a plurality of prediction results corresponding to the multimedia resources can be combined to determine whether to recommend the multimedia resources to the target object, and the recommendation accuracy of the multimedia resources can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic diagram of a multimedia asset recommendation system according to an embodiment of the present disclosure;
fig. 2 is a flow chart of a multimedia resource recommendation method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a method for inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing, so as to obtain candidate comprehensive features according to the embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for determining candidate resource characteristics according to an embodiment of the present disclosure;
FIG. 5 is a flow chart of a method for providing a schematic structural diagram of a resource recommendation model according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a SE Net according to an embodiment of the present disclosure;
fig. 7 is a schematic flow chart of a method for inputting the candidate comprehensive features into at least two first task prediction networks to perform play information prediction to obtain play information prediction results corresponding to the at least two first task prediction networks according to the embodiment of the present disclosure;
FIG. 8 is a schematic flow chart of a method for inputting the candidate comprehensive features into at least two second task prediction networks to perform feedback information prediction to obtain feedback information prediction results corresponding to the at least two second task prediction networks according to the embodiment of the present disclosure;
FIG. 9 is a flowchart of a method for obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result according to the embodiment of the present disclosure;
fig. 10 is a flowchart of a method for determining a recommendation parameter corresponding to each candidate multimedia resource based on a play information prediction result and a feedback information prediction result corresponding to each candidate multimedia resource according to an embodiment of the present disclosure;
FIG. 11 is a flowchart of a training method of a resource recommendation model according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of a multimedia resource recommendation device according to an embodiment of the present disclosure;
FIG. 13 is a schematic structural diagram of a training device for a resource recommendation model according to an embodiment of the present disclosure;
fig. 14 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings of the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
First, partial nouns or terms appearing in the course of the description of the embodiments of the specification are explained as follows:
MCN (Multi-Channel Network): the method is a product form of a multi-channel network, combines PGC contents, and ensures continuous output of the contents under the powerful support of capital, thereby finally realizing stable realization of business.
PGC (full name: professional Generated Content) internet terminology refers to professional production content (video website), expert production content (microblog). Is used to refer broadly to content personalization, view angle diversification, and social relationship virtualization. Also known as PPC, (productive-produced Content).
UGC (User Generated Content) refers to the original content of the user, which is brought up with the web2.0 concept advocating personalization as a main feature. It is not a specific service, but a new way for users to use the internet, namely to change from original download to download and upload again.
PUGC (Professional User Generated Content): is professional audio content in the form of UGC that is produced relatively close to PGC.
CNN (Convolutional Neural Networks) convolutional neural network is a feedforward neural network (Feedforward Neural Networks) comprising convolutional calculation and having a depth structure, is one of representative algorithms of deep learning, has a characteristic learning (representation learning) capability, and can perform translation invariant classification (shift-invariant classification) on input information according to a hierarchical structure thereof.
Feeds: the source of the message, which is translated into source material, feed, information providing, contribution, abstract, source, news subscription, web Feed (english) is a data format through which the web site propagates the latest information to the users, usually arranged in a time-axis fashion, timeline being the most primitive and intuitive presentation of feeds. A prerequisite for a user to be able to subscribe to a website is that the website provides a source of messages. Feed is converged at one place called aggregation (aggregation), and software for aggregation is called an aggregator (aggregator). For the end user, the aggregator is software dedicated to subscribing to the website, also commonly referred to as RSS reader, feed reader, news reader, etc.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
Deep learning: the concept of deep learning is derived from the study of artificial neural networks. The multi-layer sensor with multiple hidden layers is a deep learning structure. Deep learning forms more abstract high-level representation attribute categories or features by combining low-level features to discover distributed feature representations of data.
It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The scheme provided by the embodiment of the application relates to the technology of artificial intelligence such as machine learning, and the like, and is specifically described by the following embodiments:
referring to fig. 1, fig. 1 is a schematic diagram of a multimedia resource recommendation system provided in the embodiment of the present disclosure, and as shown in fig. 1, the multimedia resource recommendation system may at least include a server 01 and a client 02.
Specifically, in the embodiment of the present disclosure, the server 01 may include a server that operates independently, or a distributed server, or a server cluster that is formed by a plurality of servers, and may also be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms. The server 01 may include a network communication unit, a processor, a memory, and the like. In particular, the server 01 may be configured to train a resource recommendation model and predict whether to recommend candidate multimedia resources to a target object.
Specifically, in the embodiment of the present disclosure, the client 02 may include smart phones, desktop computers, tablet computers, notebook computers, digital assistants, smart wearable devices, smart speakers, vehicle terminals, smart televisions, and other types of physical devices, or may include software running in the physical devices, for example, web pages provided by some service providers to users, or may also provide applications provided by the service providers to users. Specifically, the client 02 may be configured to present the recommended multimedia resources.
In the following description, fig. 2 is a schematic flow chart of a multimedia resource recommendation method according to an embodiment of the present application, where the method operation steps described in the embodiment or the flowchart are provided, but more or less operation steps may be included based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). As shown in fig. 2, the method may include:
s201: and obtaining target object attributes of the target objects and candidate resource information of the candidate multimedia resources.
In the embodiment of the present specification, the target object may be an end user, and the candidate multimedia resources may be one or more; candidate multimedia assets may include, but are not limited to, video, audio, etc. assets.
The target object attributes may include, but are not limited to:
(1) Demographic attributes such as age, gender, academic, educational level, resident region, etc.;
(2) Object representation: including long-term representations and short-term representations; long-term representation: long-term accumulated, more stable points of interest; short-term representation: short-term interests accumulated by the recent N-day (N usually takes 14 days and 2 weeks) behaviors of users, and interest points are characterized by using tag features and weights thereof to represent the interest degree of the users on tags carried by the multimedia information;
(3) The context characteristics corresponding to the user when consuming the content, such as what batch the user is in, the presentation location corresponding to the user, session identification (SessionID), time stamp, etc.
Wherein the candidate resource information of the candidate multimedia resource may include, but is not limited to, visual modality information, audio modality information, text modality information, etc. in the candidate multimedia resource.
In an exemplary embodiment, the obtaining the target object attribute of the target object and the candidate resource information of the candidate multimedia resource includes:
acquiring a target object attribute of a target object;
analyzing the candidate multimedia resources into a plurality of information streams;
candidate resource information corresponding to each information flow is obtained.
In the embodiment of the present specification, the candidate multimedia resources may be parsed into a plurality of information streams according to time, and the information streams may include images, texts, and the like; when the candidate multimedia asset is video, each information stream may include one or more video frames.
S203: and inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features.
In this embodiment of the present disclosure, the integrated feature extraction network includes an attribute feature extraction network, a resource feature extraction network, and a multi-head attention network, as shown in fig. 3, where the step of inputting the target object attribute and the candidate resource information into the integrated feature extraction network of the resource recommendation model to perform feature extraction processing to obtain candidate integrated features includes:
s2031: inputting the target object attribute into the attribute feature extraction network, and performing attribute feature extraction processing to obtain a target attribute feature;
in this embodiment of the present disclosure, the attribute feature extraction network may be an encoder, and the target object attribute may be encoded by the encoder to obtain a target attribute feature by single-hot (One-hot) or Multi-hot (Multi-hot).
S2033: inputting the candidate resource information into the resource feature extraction network, and carrying out resource feature extraction processing to obtain candidate resource features;
in some embodiments, as shown in fig. 4, the resource feature extraction network includes a visual feature extraction sub-network, an audio feature extraction sub-network, a text feature extraction sub-network, and a resource feature fusion sub-network, where inputting the candidate resource information into the resource feature extraction network, performing a resource feature extraction process, and obtaining a candidate resource feature includes:
S20331: inputting the candidate resource information into the visual feature extraction sub-network to extract visual features to obtain candidate visual features;
in the embodiment of the present disclosure, the candidate visual features are visual mode features, mainly for providing a good feature representation for video content, where an embedded (Embedding) vector of the video content is a low-dimensional vector representing a video, and a "distance" between two embedded vectors represents a distance between two videos, so as to calculate similarity of the videos, where the embedded video content can be understood as an "implicit" feature based on the content. The video content vector contains 2-layer meaning: layer 1 meaning: representation learning, dense features of low dimension, one-dimensional arrays (e.g. video coding is 128 float), layer 2 meaning: the vector of the metric similarity measure, the "distance" of the two vectors, represents the "similarity" of the two objects. Wherein the visual feature extraction subnetwork may employ Video Swin Transformer. Video Swin Transformer is based on the Swin transducer improvement, the basic structure of Video Swin Transformer is very close to Swin transducer, and the frame (time) dimension is increased when the model is calculated.
S20333: inputting the candidate resource information into the audio feature extraction sub-network to extract audio features to obtain candidate audio features;
in the embodiment of the present specification, the candidate audio feature is an audio mode, which adopts microsoft pre-trained WavLM-base-plus (WavLM) as a feature extractor, namely an audio feature extraction sub-network, and is based on training on 94,000 hours of unsupervised english data, and further obtains SOTA results on a plurality of voice-related data sets, so that conversion of audio feature vectorization is realized, and the audio feature extraction sub-network has a strong special expression capability on scene-like sound events. After the audio mode is added, the accuracy of predicting interaction characteristics after semantic modeling of contents such as action pieces, music and the like is obviously improved.
S20335: inputting the candidate resource information into the text feature extraction sub-network, and extracting text features to obtain candidate text features;
in the embodiment of the present disclosure, the candidate text features are text modes, where the text modes include a title, a label and a secondary classification (where the content is generally classified into a secondary classification, generally refers to a secondary classification of video content, and is generally obtained by a specific classification model in a supervised learning manner according to a classification tree structure, or is manually marked), and the text feature extraction sub-network may be a lishes (a gart model based on the prediction training of a large-scale text corpus of information streams) for extracting semantic features of the text of the information streams and modeling the text.
S20337: and inputting the candidate visual features, the candidate audio features and the candidate text features into the resource feature fusion sub-network to perform feature fusion processing to obtain the candidate resource features.
In the embodiment of the present disclosure, the candidate visual feature, the candidate audio feature, and the candidate text feature may be input into the resource feature fusion sub-network to perform feature fusion processing, so as to obtain a candidate resource feature. Feature fusion may include, but is not limited to, feature stitching, etc.
S2035: and inputting the target attribute characteristics and the candidate resource characteristics into the multi-head attention network, and carrying out fusion processing to obtain the candidate comprehensive characteristics.
In the present embodiment, a Multi-headed attention network (Multi-head Attention layer) employs sequential feature encoding while employing a self-attention mechanism belongs to one of the attention mechanisms. As with conventional attention mechanisms, the self-attention mechanism may focus more on key information in the input, and the corresponding video content features may be better extracted. Self-attribute can be seen as a special case when the input data of Multi-head attribute is the same.
The comprehensive feature extraction network further comprises a feature fusion network, and the method further comprises the steps of:
and inputting the target attribute characteristics and the candidate resource characteristics corresponding to each information flow into the characteristic fusion network, and carrying out fusion processing on the candidate resource characteristics corresponding to each information flow and the target attribute characteristics to obtain the candidate fusion characteristics corresponding to each information flow.
In this embodiment of the present disclosure, the inputting the target attribute feature and the candidate resource feature into the multi-head attention network, and performing a fusion process to obtain the candidate integrated feature includes:
inputting the candidate fusion features corresponding to each information flow into the multi-head attention network, and performing attention prediction processing to obtain an attention result corresponding to each candidate fusion feature;
in some embodiments, the multi-head attention network includes a plurality of head attention networks, the candidate fusion features corresponding to each information stream are input into the multi-head attention network, and attention prediction processing is performed to obtain an attention result corresponding to each candidate fusion feature, including:
and inputting the candidate fusion features corresponding to each information flow into each head attention network, and carrying out attention prediction to obtain an attention result corresponding to each candidate fusion feature.
And obtaining the candidate comprehensive characteristics based on the attention result corresponding to each candidate fusion characteristic.
In this embodiment of the present disclosure, the inputting the candidate fusion features corresponding to each information flow into each head attention network to perform attention prediction, to obtain an attention result corresponding to each candidate fusion feature includes:
inputting the corresponding candidate fusion features of each information stream into a linear transformation layer of the multi-head attention network, and performing linear transformation processing on the candidate fusion features to obtain a preset query matrix, a preset key matrix and a preset value matrix corresponding to the candidate fusion features;
inputting the preset query matrix, the preset key matrix and the preset value matrix into a plurality of head networks of the multi-head attention network, and determining weights corresponding to the preset query matrix, the preset key matrix and the preset value matrix respectively based on each head network; and outputs a weighted query matrix, a weighted key matrix, and a weighted value matrix;
in the embodiment of the present specification, a product of a preset query matrix and a corresponding weight in each head network may be calculated to obtain a first product; calculating the sum of the first products corresponding to the preset query matrixes to obtain a weighted query matrix; calculating the products of the preset key matrixes and the corresponding weights of the preset key matrixes in each head network to obtain second products, and calculating the sum of the second products corresponding to the preset key matrixes to obtain weighted key matrixes; and calculating the product of the preset value matrix and the corresponding weight in each head network to obtain a third product, and calculating the sum of the third products corresponding to the preset value matrices to obtain the weighted value matrix.
Inputting a weighted query matrix, a weighted key matrix and a weighted value matrix corresponding to each head network into a spliced network layer of the multi-head attention network, and splicing the weighted query matrix, the weighted key matrix and the weighted value matrix corresponding to each head network to obtain a spliced matrix corresponding to each head network;
inputting the splicing matrix and the original matrix corresponding to each head network into the characteristic fusion layer of the multi-head attention network, and determining the product of the splicing matrix and the original matrix corresponding to each head network to obtain the attention result corresponding to each head network; the original matrix corresponding to each head network is a spliced matrix of a preset query matrix, a preset key matrix and a preset value matrix corresponding to each head network; and obtaining the attention result corresponding to each information flow according to the attention results corresponding to the head networks.
In the embodiment of the present specification, for a Multi-head attribute, it may accept three sequences of preset query matrix (query), preset key matrix (key), and preset value matrix (value), where the key and value are the same in length, and the query sequence length may be different from the key and value. The output sequence length of the Multi-head Attention is consistent with the input query sequence length. Assume that the query is herein denoted as Lq in length, and the key and value are denoted as Lk in length. Second, for the input sequences query, key, value, their characteristic lengths (each element dimension dim) can be different, noting that dim of the three sequences is Dq, dk, dv, respectively. After the input of the Multi-head position of these sequences, the dim of the internal sequences can be different from Dq, dk and Dv, and is called the embedding (embedding) dimension, denoted as De, and the dim of the output sequences is also De. The Multi-head attribute is formed by combining one or more parallel unit structures, each such unit structure is called a head (one head is named as one-head attribute, and when the head number is 1 in a broad sense, the Multi-head attribute is formed by a plurality of one-head attributes; note that one Multi-head attribute has n heads, and the weight of the i-th head is W respectively i Q ,W i K ,W i V Then:
head i =Attention(q·W i Q ,k·W i K ,v.W i V )
MultiHead(q,k,v)=Concat(head 1 ,head 2 ,…,head n ).W O
the process is as follows: inputting the q, k and v matrixes respectively, inputting each one-head attribute, splicing each head output matrix according to the characteristic (dim) dimension to obtain a new matrix, and then combining the new matrix with W O Matrix multiplication results in the output (in practice it may also be a full-join layer (Linear), where W O Is the original input matrix and the output shape is still (Lq, de).
In some embodiments, the obtaining the candidate integrated feature based on the attention result corresponding to each candidate fusion feature includes:
inputting the attention result corresponding to each information flow into a compression excitation network of the resource recommendation model, and determining the weight of each attention result based on the compression excitation network; and determining the candidate comprehensive characteristics based on the attention results corresponding to each information flow and the weights corresponding to each attention result.
In the embodiment of the present specification, attention results corresponding to each information flow may be input into a compression excitation network (SE Net) of the resource recommendation model, and the weight of each attention result is determined based on the compression excitation network; and finally, calculating the product of each attention result and the corresponding weight to obtain a weighted fusion feature, and determining candidate comprehensive features according to a plurality of weighted fusion features.
In an exemplary embodiment, as shown in fig. 5, fig. 5 is a schematic structural diagram of a resource recommendation model, where a candidate multimedia resource includes n information flows, each information flow corresponds to one candidate resource information, the candidate resource information includes a first candidate resource information, a second candidate resource information, … …, and an nth candidate resource information, the resource recommendation model includes a feature extraction layer, a feature fusion layer, a multi-head attention network, a compression excitation network, a full connection layer, and a plurality of task prediction networks, features of a target object attribute and each candidate resource information are extracted through the feature extraction layer, and then each candidate resource information is respectively fused with a feature input feature fusion layer corresponding to the target object attribute, so as to obtain a plurality of feature fusion results. And inputting each characteristic fusion result into a multi-head attention network to predict the attention result, predicting the weight of each attention result through a compression excitation network to obtain candidate comprehensive characteristics corresponding to candidate multimedia resources, processing through a full-connection layer, and respectively inputting a plurality of task prediction networks to obtain a plurality of task prediction results.
In an exemplary embodiment, as shown in fig. 6, fig. 6 is a schematic structural diagram of a SE Net, including a feature embedding layer, an attention mechanism layer, and an attention embedding layer; SE Net is a deep learning model for image classification (which is introduced here into the information fusion of the recommendation system) that introduces a mechanism of attention, the core goal of which is to learn the importance of each channel adaptively, thereby improving the performance of the model. That is, the importance degree of each feature channel is automatically obtained through a learning mode, and then the useful features are promoted according to the importance degree, and the features which are not much used for the current task are restrained. The core idea is to learn the feature weights according to loss through a network, so that a model is trained in a mode that the feature map with high weight is effective, invalid or small in effect is small in weight, and a better result is achieved. The method has more nonlinearity, can better fit complex correlations among channels, can greatly reduce the quantity of parameters and calculation, has very simple SENet structure, is easy to deploy, and does not need to introduce new functions or layers.
S205: inputting the candidate comprehensive characteristics into a first task prediction network of the resource recommendation model to predict playing information, and obtaining a playing information prediction result; and the play information prediction result characterizes a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource.
In this embodiment of the present disclosure, the at least two first task prediction networks input the candidate integrated feature into the first task prediction network of the resource recommendation model to perform play information prediction, to obtain a play information prediction result, including:
and inputting the candidate comprehensive characteristics into at least two first task prediction networks to perform play information prediction, so as to obtain play information prediction results corresponding to the at least two first task prediction networks.
In this embodiment of the present disclosure, there may be at least two first task prediction networks, where each first task prediction network corresponds to one play information prediction result, so that a plurality of play information prediction results may be obtained. The at least two first task prediction networks may include, but are not limited to, a play duration task prediction network and a play completion task prediction network, and the obtained play information prediction results may include a play duration prediction result and a play completion prediction result.
In this embodiment of the present disclosure, as shown in fig. 7, the inputting the candidate integrated feature into at least two first task prediction networks to perform play information prediction, to obtain play information prediction results corresponding to the at least two first task prediction networks, includes:
s2051: inputting the candidate comprehensive characteristics into a playing time task prediction network to predict the playing time so as to obtain a playing time prediction result;
in this embodiment of the present disclosure, the play duration task prediction network may be a full-connection layer (FC layer), and the play duration prediction result may be obtained by performing play duration prediction on the candidate comprehensive feature input through the play duration task prediction network. In the application process, the playing duration can be divided into a plurality of categories according to a time range, for example, 5-10 minutes is one category, 10-30 minutes is one category, and in the prediction process, the playing duration prediction result can represent the category corresponding to the playing duration. For example, the candidate multimedia resource may include n information flows, where each information flow corresponds to a candidate integrated feature, and the play duration prediction result may be represented by the following formula:
Wherein n is the number of information flows and candidate comprehensive features, and playtime i Is the playing time length.
S2053: inputting the candidate comprehensive characteristics into a play completion task prediction network to predict the play completion, so as to obtain a play completion prediction result;
in this embodiment of the present disclosure, the play completion task prediction network may be a full-connection layer (FC layer), and the play completion prediction result may be obtained by performing play completion prediction on the candidate comprehensive feature input through the play completion task prediction network. In the application process, the playing completion degree can be divided into a plurality of categories according to a specific numerical range, for example, the completion degree is one category in the range of 5-10%, the completion degree is one category in the range of 10-30%, and in the prediction process, the playing completion degree prediction result can represent the category corresponding to the playing completion degree. For example, the candidate multimedia resource may include n information flows, where each information flow corresponds to a candidate integrated feature, and the play completion prediction result may be represented by the following formula:
wherein n is the number of information streams and candidate comprehensive features, playtime is the playing completion degree, cmpl i And the weight corresponding to the playing completion degree is obtained.
S2055: and determining the play duration prediction result and the play completion degree prediction result as the play information prediction result.
In the embodiment of the present disclosure, the play duration prediction result and the play completion degree prediction result may be both determined as the play information prediction result.
In the embodiment of the specification, the play duration task prediction network and the play completion task prediction network can be regression tasks respectively; the prediction of the multi-class playing information can be performed through a plurality of first task prediction networks, so that the diversity of the playing information prediction results is improved, and the accuracy of the recommendation results is improved.
S207: inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; and the feedback information prediction result represents the feedback information of the target object aiming at the candidate multimedia resource.
In this embodiment of the present disclosure, there are at least two second task prediction networks, and the second task prediction network that inputs the candidate comprehensive features into the resource recommendation model performs feedback information prediction to obtain a feedback information prediction result;
And inputting the candidate comprehensive characteristics into at least two second task prediction networks to perform feedback information prediction, so as to obtain feedback information prediction results corresponding to the at least two second task prediction networks.
In this embodiment of the present disclosure, there may be at least two second task prediction networks, where each second task prediction network corresponds to one feedback information prediction result, so that a plurality of feedback information prediction results may be obtained. The at least two second task prediction networks may include, but are not limited to, a fast slip rate task prediction network, a sharing rate task prediction network, a praise rate task prediction network, a focus rate task prediction network, a comment rate task prediction network; the feedback information prediction results obtained can comprise a quick slip rate prediction result, a sharing rate prediction result, a praise rate prediction result, a attention rate prediction result and a comment rate prediction result.
In some embodiments, the tasks corresponding to the second task prediction network may be regression tasks or classification tasks, as shown in fig. 8, where the inputting the candidate comprehensive features into at least two second task prediction networks, performing feedback information prediction, and obtaining feedback information prediction results corresponding to the at least two second task prediction networks respectively, includes:
S2071: inputting the candidate comprehensive characteristics into a fast-slip rate task prediction network to perform fast-slip rate prediction to obtain a fast-slip rate prediction result; the fast slip rate prediction result represents the frequency of executing a preset interaction instruction by the target object aiming at the candidate multimedia resource; the preset interaction instruction is an instruction that the interaction time is smaller than a preset threshold value;
in the embodiment of the present disclosure, the fast-sliding rate may reflect the interest degree of the target object in the candidate multimedia resource, for example, the fast-sliding rate exceeds a preset threshold value, and it may be determined that the target object is not interested in the candidate multimedia resource; or the quick slip rate can be divided into a plurality of levels, and the different levels represent the interest degree of the target object in the candidate multimedia resources. If the corresponding task of the fast-sliding-rate task prediction network is a regression task, the fast-sliding-rate prediction result can be the probability of operations such as fast interaction, exit and the like triggered by the target object aiming at the page, for example, the interaction after 1 second, 3 seconds and 5 seconds of browsing the current page belongs to the fast interaction. If the task corresponding to the fast-sliding rate task prediction network is a classification task, the fast-sliding rate prediction result is a classification result corresponding to the fast-sliding rate; for example, the fast-slip rate may be the duty cycle of 1 second, 3 seconds, or 5 seconds fast-slip behavior over 1 session, or the ordered duty cycle of the first 10%,20%, etc. over all fast-slip events.
S2073: inputting the candidate comprehensive features into a sharing rate task prediction network to perform sharing rate prediction, so as to obtain a sharing rate prediction result;
in the embodiment of the present disclosure, the sharing rate is the probability that the target object shares the candidate multimedia resource; the sharing rate prediction result may also characterize the interest level of the target object in the candidate multimedia resources.
S2075: inputting the candidate comprehensive characteristics into a praise rate task prediction network, and predicting the praise rate to obtain a praise rate prediction result;
in the embodiment of the present specification, the praise rate is the probability that the target object praise the candidate multimedia resource; the praise rate prediction result can also characterize the interest degree of the target object in the candidate multimedia resources.
S2077: inputting the candidate comprehensive features into a attention rate task prediction network to perform attention rate prediction, so as to obtain attention rate prediction results;
in the embodiment of the present specification, the attention rate is the probability that the target object focuses on the corresponding publisher of the candidate multimedia resource; the attention rate prediction result may also characterize the degree of interest of the target object in the candidate multimedia asset.
S2079: inputting the candidate comprehensive features into a comment rate task prediction network to perform comment rate prediction to obtain a comment rate prediction result;
In the embodiment of the specification, the comment rate is the probability that the target object issues positive comments or negative comments for the candidate multimedia resources; the comment rate prediction result may also characterize the interest level of the target object in the candidate multimedia asset.
S20711: and determining the fast slip rate prediction result, the sharing rate prediction result, the praise rate prediction result, the attention rate prediction result and the comment rate prediction result as the feedback information prediction result.
In the embodiment of the present specification, the feedback information prediction result may include, but is not limited to, a fast slip rate prediction result, the sharing rate prediction result, the praise rate prediction result, the attention rate prediction result, and the evaluation rate prediction result.
S209: obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes probability of recommending the candidate multimedia resource to the target object
In this embodiment of the present disclosure, the obtaining the target recommendation result based on the play information prediction result and the feedback information prediction result includes:
and obtaining the target recommendation result based on the play information prediction results corresponding to the at least two first task prediction networks and the feedback information prediction results corresponding to the at least two second task prediction networks.
In the embodiment of the present disclosure, the comprehensive prediction result may be determined according to each play information prediction result and each feedback information prediction result, and the target recommendation result may be determined according to the comprehensive prediction result. For example, if the comprehensive prediction result is smaller than the preset value, the target recommendation result characterizes that candidate multimedia resources are not recommended to the target object; and if the comprehensive prediction result is greater than or equal to the preset value, recommending candidate multimedia resources to the target object by the target recommendation result characterization.
In this embodiment of the present disclosure, as shown in fig. 9, the obtaining the target recommendation result based on the play information prediction result and the feedback information prediction result includes:
s2091: determining recommendation parameters corresponding to each candidate multimedia resource based on the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource;
in the embodiment of the present disclosure, the recommendation parameter may represent a probability of recommending a candidate multimedia resource to the target object, and may convert the prediction result output by each network into a forward feedback result, and calculate a weighted sum of each forward feedback result to obtain the recommendation parameter.
In an embodiment of the present disclosure, the method further includes:
acquiring a first weight corresponding to the play information prediction result and a second weight corresponding to the feedback information prediction result;
in this embodiment of the present disclosure, a first weight corresponding to the play information prediction result and a second weight corresponding to the feedback information prediction result may be preset, or the first weight corresponding to the play information prediction result and the second weight corresponding to the feedback information prediction result may be determined according to a preset algorithm. The preset algorithm may include, but is not limited to, pairing method (pairing), list method (Listwise), etc.
In some embodiments, as shown in fig. 10, the determining the recommendation parameter corresponding to each candidate multimedia resource based on the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource includes:
s2091: determining the product of the play information prediction result corresponding to each candidate multimedia resource and the first weight to obtain a first information prediction result corresponding to each candidate multimedia resource;
s20913: determining the product of the feedback information prediction result corresponding to each candidate multimedia resource and the second weight to obtain a second information prediction result corresponding to each candidate multimedia resource;
S20915: and determining recommended parameters corresponding to each candidate multimedia resource based on the first information prediction result and the second information prediction result corresponding to each candidate multimedia resource.
In this embodiment of the present disclosure, after determining the first weight and the second weight, the weight sum of each prediction result may be calculated according to the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource, so as to obtain a recommendation parameter corresponding to each candidate multimedia resource, thereby performing resource recommendation according to the recommendation parameter.
In some embodiments, a list method (Listwise) may be used to process the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource to obtain a recommendation parameter, and order the plurality of candidate multimedia resources according to the recommendation parameter, so that the multimedia resources to be recommended may be screened from the plurality of candidate multimedia resources according to the ordering result. For example, a preset number of candidate multimedia resources with recommendation parameters greater than a preset threshold may be used as the multimedia resources to be recommended.
In some embodiments, the determining the recommendation parameter corresponding to each candidate multimedia resource based on the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource includes:
Inputting the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource into a reinforcement learning model to obtain a benefit prediction result corresponding to each candidate multimedia resource;
and determining a profit prediction result corresponding to each candidate multimedia resource as a recommendation parameter corresponding to each candidate multimedia resource.
The training method of the reinforcement learning model comprises the following steps:
obtaining a sample play information prediction result, a sample feedback information prediction result and a sample profit label corresponding to a sample multimedia resource;
inputting the sample play information prediction result, the sample feedback information prediction result and the sample profit label into a preset model to obtain a sample profit prediction result;
training the preset model based on the difference between the sample benefit label and the sample benefit prediction result to obtain the reinforcement learning model.
In the present description embodiment, reinforcement learning may direct learning of the attention mechanism by learning an appropriate bonus function to maximize the jackpot for multiple objectives. Reinforcement learning may also help the recommendation system implement personalized long-term strategies to better adapt to the interests and behavior of the user. By integrating the prediction capability of different targets, such as a mode of reinforcement learning and different target weight super-parameters, a more comprehensive and accurate sequencing result can be obtained, and the overall effect of the recommendation system is improved.
S2093: and determining a target recommendation result corresponding to each candidate multimedia resource based on the recommendation parameter corresponding to each candidate multimedia resource.
In the embodiment of the present disclosure, a target recommendation result of each candidate multimedia resource may be determined according to a recommendation parameter corresponding to each candidate multimedia resource; if the recommendation parameter is smaller than the preset value, the target recommendation result represents that candidate multimedia resources are not recommended to the target object; if the recommendation parameter is greater than or equal to the preset value, the target recommendation result represents that the candidate multimedia resource is recommended to the target object.
In an embodiment of the present disclosure, the method further includes:
determining the multimedia resources to be recommended based on the target recommendation result corresponding to each candidate multimedia resource;
recommending the multimedia resource to be recommended to the target object.
In the embodiment of the present disclosure, the multimedia resources to be recommended may be determined according to the target recommendation result corresponding to each candidate multimedia resource; and determining candidate multimedia resources, which are represented by the target recommendation result and are recommended to the target object, as multimedia resources to be recommended, so that the multimedia resources to be recommended are recommended to the target object, and in the recommendation process, the multimedia resources to be recommended can be sent to a terminal corresponding to the target object, so that the terminal displays the multimedia resources to be recommended.
As can be seen from the technical solutions provided in the embodiments of the present specification, the target object attribute of the target object and the candidate resource information of the candidate multimedia resource are obtained; inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features; inputting the candidate comprehensive characteristics into a first task prediction network of a resource recommendation model to predict playing information, and obtaining a playing information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource; inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resources; obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending candidate multimedia assets to the target object. According to the application, for various characteristics of the target object, the same comprehensive characteristic extraction network is adopted for characteristic extraction, so that the construction of a plurality of characteristic extraction networks is avoided, and the calculated amount in the repeated content characteristic extraction process is saved; and then, respectively predicting the extracted characteristics through the two branch task prediction networks to obtain different prediction results, so that a plurality of prediction results corresponding to the multimedia resources can be combined to determine whether to recommend the multimedia resources to the target object, and the recommendation accuracy of the multimedia resources can be improved.
The embodiment of the present disclosure further provides a training method of the resource recommendation model, as shown in fig. 11, where the method includes:
s1101: acquiring sample training information of a sample object; the sample training information comprises sample object attributes of the sample objects and sample resource information of sample multimedia resources interacted with the sample objects; the sample training information is marked with a sample playing information label and a sample feedback information label; the sample playing information tag represents a playing result corresponding to the sample multimedia resource under the interactive operation of the sample object, and the sample feedback information tag represents feedback information of the sample object for the sample multimedia resource;
s1103: inputting the sample training information into a preset comprehensive feature extraction network of a model to be trained, and performing feature extraction processing to obtain sample comprehensive features;
s1105: inputting the sample comprehensive characteristics into a first preset task prediction network of the model to be trained, and predicting playing information to obtain a sample playing information prediction result;
s1107: inputting the sample comprehensive characteristics into a second preset task prediction network of the preset model to perform feedback information prediction, so as to obtain a sample feedback information prediction result;
S1109: training the preset model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label to obtain a resource recommendation model.
In this embodiment of the present disclosure, training the preset model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label to obtain a resource recommendation model includes:
determining first loss data based on a difference between the sample play information prediction result and the sample play information tag;
determining second loss data based on a difference between the sample feedback information prediction result and the sample feedback information tag;
determining target loss data based on the first loss data and the first loss data;
training the preset model based on the target loss data to obtain a resource recommendation model.
In the embodiment of the present specification, the network structure of the preset integrated feature extraction network is the same as the structure of the integrated feature extraction network in the resource recommendation model. The network structure of the first preset task prediction network is the same as the structure of the first task prediction network in the resource recommendation model; the network structure of the second preset task prediction network is the same as the structure of the second task prediction network in the resource recommendation model.
The embodiment of the present disclosure further provides a multimedia resource recommendation device, as shown in fig. 12, where the device includes:
an information obtaining module 1210, configured to obtain a target object attribute of a target object and candidate resource information of a candidate multimedia resource;
a candidate feature determining module 1220, configured to input the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing, so as to obtain a candidate comprehensive feature;
the play information prediction module 1230 is configured to input the candidate integrated feature into the first task prediction network of the resource recommendation model, and perform play information prediction to obtain a play information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource;
a feedback information prediction module 1240, configured to input the candidate integrated feature into the second task prediction network of the resource recommendation model, and perform feedback information prediction to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resource;
A recommendation result determining module 1250, configured to obtain a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending the candidate multimedia resource to the target object.
In an exemplary embodiment, the integrated feature extraction network includes an attribute feature extraction network, a resource feature extraction network, and a multi-head attention network, and the candidate feature determination module includes:
the target feature determining unit is used for inputting the target object attribute into the attribute feature extraction network and carrying out attribute feature extraction processing to obtain a target attribute feature;
a candidate resource feature determining unit, configured to input the candidate resource information into the resource feature extraction network, and perform resource feature extraction processing to obtain a candidate resource feature;
and the candidate comprehensive feature determining unit is used for inputting the target attribute feature and the candidate resource feature into the multi-head attention network to perform fusion processing so as to obtain the candidate comprehensive feature.
In an exemplary embodiment, the information obtaining module includes:
the object attribute acquisition unit is used for acquiring the object attribute of the object;
An information flow analysis unit for analyzing the candidate multimedia resources into a plurality of information flows;
the resource acquisition unit is used for acquiring candidate resource information corresponding to each information flow;
in an exemplary embodiment, the integrated feature extraction network further includes a feature fusion network, and the apparatus further includes:
the fusion module is used for inputting the target attribute characteristics and the candidate resource characteristics corresponding to each information flow into the characteristic fusion network, and carrying out fusion processing on the candidate resource characteristics corresponding to each information flow and the target attribute characteristics to obtain candidate fusion characteristics corresponding to each information flow;
the candidate integrated feature determination unit includes:
the attention determining subunit is used for inputting the candidate fusion characteristics corresponding to each information flow into the multi-head attention network, and performing attention prediction processing to obtain an attention result corresponding to each candidate fusion characteristic;
and the characteristic determining subunit is used for obtaining the candidate comprehensive characteristics based on the attention result corresponding to each candidate fusion characteristic.
In an exemplary embodiment, the resource feature extraction network includes a visual feature extraction sub-network, an audio feature extraction sub-network, a text feature extraction sub-network, and a resource feature fusion sub-network, and the candidate resource feature determining unit includes:
A visual feature determining subunit, configured to input the candidate resource information into the visual feature extracting sub-network, and perform visual feature extraction to obtain a candidate visual feature;
an audio feature determining subunit, configured to input the candidate resource information into the audio feature extracting sub-network, and perform audio feature extraction to obtain candidate audio features;
a text feature determining subunit, configured to input the candidate resource information into the text feature extracting sub-network, and perform text feature extraction to obtain candidate text features;
and the feature fusion subunit is used for inputting the candidate visual features, the candidate audio features and the candidate text features into the resource feature fusion subunit for feature fusion processing to obtain the candidate resource features.
In an exemplary embodiment, the first task prediction network is at least two, the second task prediction network is at least two, and the play information prediction module includes:
the play information prediction unit is used for inputting the candidate comprehensive characteristics into at least two first task prediction networks to perform play information prediction so as to obtain play information prediction results corresponding to the at least two first task prediction networks respectively;
In an exemplary embodiment, the feedback information prediction module includes:
the feedback information prediction unit is used for inputting the candidate comprehensive characteristics into at least two second task prediction networks to perform feedback information prediction so as to obtain feedback information prediction results corresponding to the at least two second task prediction networks respectively;
in an exemplary embodiment, the recommendation result determining module includes:
and the recommending unit is used for obtaining the target recommending result based on the play information predicting result corresponding to each of the at least two first task predicting networks and the feedback information predicting result corresponding to each of the at least two second task predicting networks.
In an exemplary embodiment, the play information prediction unit includes:
the first prediction subunit is used for inputting the candidate comprehensive characteristics into a play time task prediction network to predict the play time so as to obtain a play time prediction result;
the second prediction subunit is used for inputting the candidate comprehensive characteristics into a play completion task prediction network to predict the play completion degree so as to obtain a play completion degree prediction result;
and the result determining subunit is used for determining the play duration prediction result and the play completion degree prediction result as the play information prediction result.
In an exemplary embodiment, the feedback information prediction unit includes:
the fast-sliding rate prediction subunit is used for inputting the candidate comprehensive characteristics into a fast-sliding rate task prediction network to perform fast-sliding rate prediction so as to obtain a fast-sliding rate prediction result; the fast slip rate prediction result represents the frequency of executing a preset interaction instruction by the target object aiming at the candidate multimedia resource; the preset interaction instruction is an instruction that the interaction time is smaller than a preset threshold value;
the sharing rate prediction subunit is used for inputting the candidate comprehensive characteristics into a sharing rate task prediction network to perform sharing rate prediction so as to obtain a sharing rate prediction result;
a praise rate prediction subunit, configured to input the candidate comprehensive features into a praise rate task prediction network, and perform praise rate prediction to obtain a praise rate prediction result;
the attention rate prediction subunit is used for inputting the candidate comprehensive characteristics into an attention rate task prediction network to perform attention rate prediction so as to obtain attention rate prediction results;
the comment rate prediction subunit is used for inputting the candidate comprehensive features into a comment rate task prediction network to perform comment rate prediction so as to obtain a comment rate prediction result;
and a prediction result determining subunit configured to determine the fast slip rate prediction result, the sharing rate prediction result, the praise rate prediction result, the attention rate prediction result, and the comment rate prediction result as the feedback information prediction result.
In an exemplary embodiment, the candidate multimedia resources are at least two, and the recommendation result determining module includes:
the parameter determining unit is used for determining recommended parameters corresponding to each candidate multimedia resource based on the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource;
a recommendation result determining unit, configured to determine a target recommendation result corresponding to each candidate multimedia resource based on a recommendation parameter corresponding to each candidate multimedia resource;
in an exemplary embodiment, the apparatus further includes:
the resource determining module is used for determining the multimedia resources to be recommended based on the target recommendation result corresponding to each candidate multimedia resource;
and the recommending module is used for recommending the multimedia resource to be recommended to the target object.
In an exemplary embodiment, the apparatus further includes:
the weight determining module is used for obtaining a first weight corresponding to the play information prediction result and a second weight corresponding to the feedback information prediction result;
the parameter determination unit includes:
a first information determining subunit, configured to determine a product of a play information prediction result corresponding to each candidate multimedia resource and the first weight, to obtain a first information prediction result corresponding to each candidate multimedia resource;
A second information determining subunit, configured to determine a product of the feedback information prediction result corresponding to each candidate multimedia resource and the second weight, to obtain a second information prediction result corresponding to each candidate multimedia resource;
and the recommendation parameter determining subunit is used for determining recommendation parameters corresponding to each candidate multimedia resource based on the first information prediction result and the second information prediction result corresponding to each candidate multimedia resource.
In an exemplary embodiment, the parameter determining unit includes:
a result input subunit, configured to input a play information prediction result and a feedback information prediction result corresponding to each candidate multimedia resource into a reinforcement learning model, so as to obtain a benefit prediction result corresponding to each candidate multimedia resource;
a parameter determining subunit, configured to determine, as a recommendation parameter corresponding to each candidate multimedia resource, a revenue prediction result corresponding to each candidate multimedia resource;
the system further comprises a reinforcement learning model training module, wherein the reinforcement learning model training module is used for acquiring a sample play information prediction result, a sample feedback information prediction result and a sample profit label corresponding to the sample multimedia resource; inputting the sample play information prediction result, the sample feedback information prediction result and the sample profit label into a preset model to obtain a sample profit prediction result; and training the preset model based on the difference between the sample benefit label and the sample benefit prediction result to obtain the reinforcement learning model.
The apparatus and method embodiments described above in the apparatus embodiments are based on the same inventive concept.
The embodiment of the present disclosure further provides a training device for a resource recommendation model, as shown in fig. 13, where the device includes:
a sample information obtaining module 1310, configured to obtain sample training information of a sample object; the sample training information comprises sample object attributes of the sample objects and sample resource information of sample multimedia resources interacted with the sample objects; the sample training information is marked with a sample playing information label and a sample feedback information label; the sample playing information tag represents a playing result corresponding to the sample multimedia resource under the interactive operation of the sample object, and the sample feedback information tag represents feedback information of the sample object for the sample multimedia resource;
the sample comprehensive feature determining module 1320 is configured to input the sample training information into a preset comprehensive feature extraction network of a model to be trained, and perform feature extraction processing to obtain sample comprehensive features;
the sample playing information prediction module 1330 is configured to input the sample comprehensive feature into a first preset task prediction network of the model to be trained, and perform playing information prediction to obtain a sample playing information prediction result;
The sample feedback information prediction module 1340 is configured to input the sample comprehensive features into a second preset task prediction network of the preset model, and perform feedback information prediction to obtain a sample feedback information prediction result;
the model training module 1350 is configured to train the preset model to obtain a resource recommendation model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label.
The apparatus and method embodiments described above in the apparatus embodiments are based on the same inventive concept.
The embodiment of the specification provides an electronic device, which includes a processor and a memory, where at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or at least one section of program is loaded and executed by the processor to implement a multimedia resource recommendation method as provided in the embodiment of the method.
The embodiment of the application also provides a computer storage medium, which can be arranged in a terminal to store at least one instruction or at least one program related to a multimedia resource recommendation method in the method embodiment, and the at least one instruction or the at least one program is loaded and executed by the processor to realize the multimedia resource recommendation method provided in the method embodiment.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes to implement the multimedia resource recommendation method provided by the above method embodiment.
Alternatively, in the present description embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The memory according to the embodiments of the present disclosure may be used to store software programs and modules, and the processor executes the software programs and modules stored in the memory to perform various functional applications and data processing. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The embodiments of the multimedia resource recommendation method provided in the embodiments of the present disclosure may be implemented in a mobile terminal, a computer terminal, a server, or a similar computing device. Taking the operation on the server as an example, fig. 14 is a block diagram of a hardware structure of the server of a multimedia resource recommendation method according to the embodiment of the present disclosure. As shown in fig. 14, the server 1400 may vary considerably in configuration or performance and may include one or more central processing units (Central Processing Units, CPU) 1410 (the central processing unit 1410 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 1430 for storing data, one or more storage mediums 1420 (e.g., one or more mass storage devices) storing applications 1423 or data 1422. Wherein the memory 1430 and the storage medium 1420 may be transitory or persistent storage. The program stored on the storage medium 1420 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 1410 may be configured to communicate with a storage medium 1420, and execute a series of instruction operations in the storage medium 1420 on the server 1400. The server 1400 may also include one or more power supplies 1460, one or more wired or wireless network interfaces 1450, one or more input/output interfaces 1440, and/or one or more operating systems 1421, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
Input-output interface 1440 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1400. In one example, input/output interface 1440 includes a network adapter (Network Interface Controller, NIC) that may connect to other network devices through a base station to communicate with the internet. In one example, the input-output interface 1440 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 14 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, server 1400 may also include more or fewer components than shown in fig. 14, or have a different configuration than shown in fig. 14.
As can be seen from the embodiments of the method, apparatus, device or storage medium for recommending multimedia resources provided by the present application, the present application obtains the target object attribute of the target object and the candidate resource information of the candidate multimedia resources;
inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features; inputting the candidate comprehensive characteristics into a first task prediction network of a resource recommendation model to predict playing information, and obtaining a playing information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource; inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resources; obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending candidate multimedia assets to the target object. According to the application, for various characteristics of the target object, the same comprehensive characteristic extraction network is adopted for characteristic extraction, so that the construction of a plurality of characteristic extraction networks is avoided, and the calculated amount in the repeated content characteristic extraction process is saved; and then, respectively predicting the extracted characteristics through the two branch task prediction networks to obtain different prediction results, so that a plurality of prediction results corresponding to the multimedia resources can be combined to determine whether to recommend the multimedia resources to the target object, and the recommendation accuracy of the multimedia resources can be improved.
It should be noted that: the embodiment sequence of the present disclosure is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, device, storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims (15)

1. A method for recommending multimedia resources, the method comprising:
acquiring target object attributes of a target object and candidate resource information of candidate multimedia resources;
inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features;
inputting the candidate comprehensive characteristics into a first task prediction network of the resource recommendation model to predict playing information, and obtaining a playing information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource;
inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resource;
Obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending the candidate multimedia asset to the target object.
2. The method according to claim 1, wherein the integrated feature extraction network includes an attribute feature extraction network, a resource feature extraction network, and a multi-head attention network, the inputting the target object attribute and the candidate resource information into the integrated feature extraction network of the resource recommendation model performs feature extraction processing to obtain candidate integrated features, including:
inputting the target object attribute into the attribute feature extraction network, and carrying out attribute feature extraction processing to obtain a target attribute feature;
inputting the candidate resource information into the resource feature extraction network, and carrying out resource feature extraction processing to obtain candidate resource features;
and inputting the target attribute characteristics and the candidate resource characteristics into the multi-head attention network, and carrying out fusion processing to obtain the candidate comprehensive characteristics.
3. The method according to claim 2, wherein the obtaining the target object attribute of the target object and the candidate resource information of the candidate multimedia resource includes:
Acquiring a target object attribute of a target object;
parsing the candidate multimedia asset into a plurality of information streams;
candidate resource information corresponding to each information flow is obtained;
the integrated feature extraction network further includes a feature fusion network, the method further comprising:
inputting the target attribute characteristics and the candidate resource characteristics corresponding to each information flow into the characteristic fusion network, and carrying out fusion processing on the candidate resource characteristics corresponding to each information flow and the target attribute characteristics to obtain candidate fusion characteristics corresponding to each information flow;
inputting the target attribute feature and the candidate resource feature into the multi-head attention network for fusion processing to obtain the candidate comprehensive feature, wherein the method comprises the following steps:
inputting the candidate fusion features corresponding to each information flow into the multi-head attention network, and performing attention prediction processing to obtain an attention result corresponding to each candidate fusion feature;
and obtaining the candidate comprehensive characteristics based on the attention result corresponding to each candidate fusion characteristic.
4. The method according to claim 2, wherein the resource feature extraction network includes a visual feature extraction sub-network, an audio feature extraction sub-network, a text feature extraction sub-network, and a resource feature fusion sub-network, the inputting the candidate resource information into the resource feature extraction network, performing a resource feature extraction process, and obtaining a candidate resource feature includes:
Inputting the candidate resource information into the visual feature extraction sub-network, and extracting visual features to obtain candidate visual features;
inputting the candidate resource information into the audio feature extraction sub-network, and extracting audio features to obtain candidate audio features;
inputting the candidate resource information into the text feature extraction sub-network, and extracting text features to obtain candidate text features;
and inputting the candidate visual features, the candidate audio features and the candidate text features into the resource feature fusion sub-network to perform feature fusion processing to obtain the candidate resource features.
5. The method according to claim 1, wherein the number of the first task prediction networks is at least two, the number of the second task prediction networks is at least two, the inputting the candidate integrated feature into the first task prediction network of the resource recommendation model performs play information prediction to obtain a play information prediction result, and the method includes:
inputting the candidate comprehensive characteristics into at least two first task prediction networks, and predicting playing information to obtain playing information prediction results corresponding to the at least two first task prediction networks respectively;
The step of inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction to obtain a feedback information prediction result, comprising the following steps:
inputting the candidate comprehensive characteristics into at least two second task prediction networks, and predicting feedback information to obtain feedback information prediction results corresponding to the at least two second task prediction networks respectively;
the obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result includes:
and obtaining the target recommendation result based on the play information prediction results corresponding to the at least two first task prediction networks and the feedback information prediction results corresponding to the at least two second task prediction networks.
6. The method according to claim 5, wherein the inputting the candidate integrated feature into at least two first task prediction networks to perform play information prediction, to obtain play information prediction results corresponding to the at least two first task prediction networks, includes:
inputting the candidate comprehensive features into a playing time task prediction network to predict the playing time so as to obtain a playing time prediction result;
Inputting the candidate comprehensive characteristics into a play completion task prediction network to predict the play completion, and obtaining a play completion prediction result;
and determining the play duration prediction result and the play completion degree prediction result as the play information prediction result.
7. The method according to claim 5, wherein inputting the candidate integrated features into at least two of the second task prediction networks to perform feedback information prediction, and obtaining feedback information prediction results corresponding to the at least two of the second task prediction networks respectively, includes:
inputting the candidate comprehensive characteristics into a fast-slip rate task prediction network to perform fast-slip rate prediction to obtain a fast-slip rate prediction result; the fast slip rate prediction result characterizes the frequency of executing a preset interaction instruction by the target object aiming at the candidate multimedia resource; the preset interaction instruction is an instruction that the interaction time is smaller than a preset threshold value;
inputting the candidate comprehensive features into a sharing rate task prediction network to perform sharing rate prediction, so as to obtain a sharing rate prediction result;
inputting the candidate comprehensive features into a praise rate task prediction network, and predicting the praise rate to obtain a praise rate prediction result;
Inputting the candidate comprehensive features into a attention rate task prediction network, and predicting attention rate to obtain attention rate prediction results;
inputting the candidate comprehensive features into a comment rate task prediction network to perform comment rate prediction to obtain a comment rate prediction result;
and determining the quick slip rate prediction result, the sharing rate prediction result, the praise rate prediction result, the attention rate prediction result and the comment rate prediction result as the feedback information prediction result.
8. The method of claim 1, wherein the candidate multimedia resources are at least two, the obtaining the target recommendation based on the play information prediction result and the feedback information prediction result comprises:
determining recommendation parameters corresponding to each candidate multimedia resource based on play information prediction results and feedback information prediction results corresponding to each candidate multimedia resource;
determining a target recommendation result corresponding to each candidate multimedia resource based on recommendation parameters corresponding to each candidate multimedia resource;
the method further comprises the steps of:
determining the multimedia resources to be recommended based on the target recommendation result corresponding to each candidate multimedia resource;
And recommending the multimedia resource to be recommended to the target object.
9. The method of claim 8, wherein the method further comprises:
acquiring a first weight corresponding to the play information prediction result and a second weight corresponding to the feedback information prediction result;
the determining, based on the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource, a recommendation parameter corresponding to each candidate multimedia resource includes:
determining the product of the play information prediction result corresponding to each candidate multimedia resource and the first weight to obtain a first information prediction result corresponding to each candidate multimedia resource;
determining the product of the feedback information prediction result corresponding to each candidate multimedia resource and the second weight to obtain a second information prediction result corresponding to each candidate multimedia resource;
and determining recommendation parameters corresponding to each candidate multimedia resource based on the first information prediction result and the second information prediction result corresponding to each candidate multimedia resource.
10. The method of claim 8, wherein determining the recommendation parameter for each candidate multimedia asset based on the play information prediction result and the feedback information prediction result for each candidate multimedia asset comprises:
Inputting the play information prediction result and the feedback information prediction result corresponding to each candidate multimedia resource into a reinforcement learning model to obtain a benefit prediction result corresponding to each candidate multimedia resource;
determining a profit prediction result corresponding to each candidate multimedia resource as a recommendation parameter corresponding to each candidate multimedia resource;
the training method of the reinforcement learning model comprises the following steps:
obtaining a sample play information prediction result, a sample feedback information prediction result and a sample profit label corresponding to a sample multimedia resource;
inputting the sample play information prediction result, the sample feedback information prediction result and the sample profit label into a preset model to obtain a sample profit prediction result;
and training the preset model based on the difference between the sample benefit label and the sample benefit prediction result to obtain the reinforcement learning model.
11. A method for training a resource recommendation model, the method comprising:
acquiring sample training information of a sample object; the sample training information comprises sample object attributes of the sample objects and sample resource information of sample multimedia resources interacted with the sample objects; the sample training information is marked with a sample playing information label and a sample feedback information label; the sample playing information tag represents a playing result corresponding to the sample multimedia resource under the interactive operation of the sample object, and the sample feedback information tag represents feedback information of the sample object for the sample multimedia resource;
Inputting the sample training information into a preset comprehensive feature extraction network of a model to be trained, and performing feature extraction processing to obtain sample comprehensive features;
inputting the sample comprehensive characteristics into a first preset task prediction network of the model to be trained, and predicting playing information to obtain a sample playing information prediction result;
inputting the sample comprehensive characteristics into a second preset task prediction network of the preset model to perform feedback information prediction, so as to obtain a sample feedback information prediction result;
and training the preset model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label to obtain a resource recommendation model.
12. A multimedia asset recommendation device, the device comprising:
the information acquisition module is used for acquiring target object attributes of the target objects and candidate resource information of the candidate multimedia resources;
the candidate feature determining module is used for inputting the target object attribute and the candidate resource information into a comprehensive feature extraction network of a resource recommendation model to perform feature extraction processing to obtain candidate comprehensive features;
The play information prediction module is used for inputting the candidate comprehensive characteristics into a first task prediction network of the resource recommendation model to perform play information prediction so as to obtain a play information prediction result; the play information prediction result represents a play result corresponding to the candidate multimedia resource under the condition that the target object interacts with the candidate multimedia resource;
the feedback information prediction module is used for inputting the candidate comprehensive characteristics into a second task prediction network of the resource recommendation model to perform feedback information prediction so as to obtain a feedback information prediction result; the feedback information prediction result represents feedback information of the target object aiming at the candidate multimedia resource;
the recommendation result determining module is used for obtaining a target recommendation result based on the play information prediction result and the feedback information prediction result; the target recommendation result characterizes a probability of recommending the candidate multimedia asset to the target object.
13. A training device for a resource recommendation model, the device comprising:
the sample information acquisition module is used for acquiring sample training information of a sample object; the sample training information comprises sample object attributes of the sample objects and sample resource information of sample multimedia resources interacted with the sample objects; the sample training information is marked with a sample playing information label and a sample feedback information label; the sample playing information tag represents a playing result corresponding to the sample multimedia resource under the interactive operation of the sample object, and the sample feedback information tag represents feedback information of the sample object for the sample multimedia resource;
The sample comprehensive feature determining module is used for inputting the sample training information into a preset comprehensive feature extraction network of the model to be trained, and carrying out feature extraction processing to obtain sample comprehensive features;
the sample play information prediction module is used for inputting the sample comprehensive characteristics into a first preset task prediction network of the model to be trained, and performing play information prediction to obtain a sample play information prediction result;
the sample feedback information prediction module is used for inputting the sample comprehensive characteristics into a second preset task prediction network of the preset model to perform feedback information prediction so as to obtain a sample feedback information prediction result;
the model training module is used for training the preset model based on the difference between the sample play information prediction result and the sample play information label and the difference between the sample feedback information prediction result and the sample feedback information label to obtain a resource recommendation model.
14. An electronic device, characterized in that it comprises a processor and a memory, in which at least one instruction or at least one program is stored, which is loaded and executed by the processor to implement the multimedia resource recommendation method according to any one of claims 1-10 or the training method of the resource recommendation model according to claim 11.
15. A computer storage medium having stored therein at least one instruction or at least one program loaded and executed by a processor to implement the multimedia resource recommendation method of any one of claims 1-10 or the training method of the resource recommendation model of claim 11.
CN202310645304.2A 2023-06-01 2023-06-01 Multimedia resource recommendation method, model training method, device and storage medium Pending CN116956183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310645304.2A CN116956183A (en) 2023-06-01 2023-06-01 Multimedia resource recommendation method, model training method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310645304.2A CN116956183A (en) 2023-06-01 2023-06-01 Multimedia resource recommendation method, model training method, device and storage medium

Publications (1)

Publication Number Publication Date
CN116956183A true CN116956183A (en) 2023-10-27

Family

ID=88441784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310645304.2A Pending CN116956183A (en) 2023-06-01 2023-06-01 Multimedia resource recommendation method, model training method, device and storage medium

Country Status (1)

Country Link
CN (1) CN116956183A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573925A (en) * 2024-01-15 2024-02-20 腾讯科技(深圳)有限公司 Method and device for determining predicted playing time, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117573925A (en) * 2024-01-15 2024-02-20 腾讯科技(深圳)有限公司 Method and device for determining predicted playing time, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111444428B (en) Information recommendation method and device based on artificial intelligence, electronic equipment and storage medium
CN111931062B (en) Training method and related device of information recommendation model
Sun et al. Conversational recommender system
CN111061946B (en) Method, device, electronic equipment and storage medium for recommending scenerized content
CN111859160B (en) Session sequence recommendation method and system based on graph neural network
CN111143684B (en) Artificial intelligence-based generalized model training method and device
CN111625715B (en) Information extraction method and device, electronic equipment and storage medium
CN111949886B (en) Sample data generation method and related device for information recommendation
CN112015928A (en) Information extraction method and device of multimedia resource, electronic equipment and storage medium
CN115659008A (en) Information pushing system and method for big data information feedback, electronic device and medium
US20230237093A1 (en) Video recommender system by knowledge based multi-modal graph neural networks
CN116956183A (en) Multimedia resource recommendation method, model training method, device and storage medium
CN114817692A (en) Method, device and equipment for determining recommended object and computer storage medium
CN116977701A (en) Video classification model training method, video classification method and device
CN116955591A (en) Recommendation language generation method, related device and medium for content recommendation
CN114595370A (en) Model training and sorting method and device, electronic equipment and storage medium
CN113076453A (en) Domain name classification method, device and computer readable storage medium
CN114741587A (en) Article recommendation method, device, medium and equipment
CN114417944B (en) Recognition model training method and device, and user abnormal behavior recognition method and device
CN117556149B (en) Resource pushing method, device, electronic equipment and storage medium
CN114996561B (en) Information recommendation method and device based on artificial intelligence
CN116955786A (en) Media content recommendation method, device, electronic equipment and storage medium
CN117216361A (en) Recommendation method, recommendation device, electronic equipment and computer readable storage medium
CN117216707A (en) Feature extraction model processing method, device, computer equipment and storage medium
CN117574920A (en) Training method and device for text prediction model and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication