CN113742562A

CN113742562A - Video recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN113742562A
Application number: CN202010461519.5A
Authority: CN
Inventors: 白明
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-05-27
Filing date: 2020-05-27
Publication date: 2021-12-03
Anticipated expiration: 2040-05-27
Also published as: CN113742562B

Abstract

The disclosure relates to a video recommendation method, a video recommendation device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring video attribute information of a target video; converting the video attribute information to obtain video characteristic information used for representing a target video; determining account attribute information of each candidate account in the candidate account set, an interactive video identifier of a video interacted with the candidate account and interactive video attribute information of the video; determining account characteristic information for representing the candidate account according to the candidate account, the account attribute information, the interactive video identification and the interactive video attribute information; and determining a target account from the candidate account set according to the account characteristic information of each candidate account and the video characteristic information of the target video, and recommending the target video to the target account. Therefore, the target account is determined for the new video according to the video data recorded by the existing interaction, and the accuracy of cold start recommendation of the new video is improved.

Description

Video recommendation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to data processing technologies, and in particular, to a video recommendation method and apparatus, an electronic device, and a storage medium.

Background

The appearance and popularization of the internet bring a great deal of information to users, and the demand of the users on the information in the information age is met, but the amount of information on the internet greatly increases along with the rapid development of the network, so that the users cannot obtain a part which is really useful for the users when facing a great amount of information, and the use efficiency of the information is reduced on the contrary, which is the problem of information overload. The recommendation system can better solve the problem of information overload, and the recommendation system aims to recommend articles liked by a user to the user according to existing information, so that interaction behaviors such as clicking, collecting and the like between the user and the articles are promoted. But for newly brought-on-line items, there is a risk of not being accepted by the user due to lack of user feedback.

For example, for a new video (e.g., a new online video), it is difficult to make an accurate personalized recommendation for the new video compared to an old video (i.e., a video with more user interactions and feedbacks) due to the lack of user interactions and feedbacks.

In the related art, some recommendation systems recommend a new video directly to a consumer who likes other videos of the producer according to the producer of the video. This has the disadvantage that the consumer does not necessarily like all the videos of this producer, and the same producer cannot guarantee that the quality and the theme of each video produced are the same, thereby affecting the consumer experience of the user and reducing the production willingness of the author on the platform. In addition, the method reduces the interest discovery rate of consumers and is easy to arouse information cocoons.

Disclosure of Invention

The disclosure provides a video recommendation method and device, which at least solve the problem of low accuracy in cold start recommendation of a new video in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video recommendation method, including:

acquiring video attribute information of a target video, wherein the target video is a video without video interaction behavior on a current platform or a video with video interaction behavior not meeting preset conditions;

converting the video attribute information to obtain video characteristic information used for representing the target video;

determining account attribute information of each candidate account in a candidate account set, an interactive video identifier of a video interacted with the candidate account and interactive video attribute information of the video;

according to the candidate account, the account attribute information, the interactive video identification and the interactive video attribute information, determining account characteristic information for representing the candidate account;

and determining a target account from the candidate account set according to the account characteristic information of each candidate account and the video characteristic information of the target video, and recommending the target video to the target account.

Optionally, the step of performing conversion processing on the video attribute information to obtain video feature information representing the target video includes:

constructing a video attribute differential graph according to the target video and the video attribute information of the target video, wherein the video attribute differential graph comprises a target video identification node and a video attribute information node;

and inputting the video attribute heterogeneous graph into a trained first graph convolution neural network model, performing aggregation convolution operation on values of nodes of the same type in the first graph convolution neural network model to obtain a first feature vector, performing joint convolution operation on values of nodes of different types to obtain a second feature vector, performing dimension reduction processing on the first feature vector and the second feature vector, and outputting video feature information.

Optionally, the target video identification node is represented by a vector trained by the target video identification through an existing model;

the video attribute information node is represented by the video attribute information, or represented by a vector of the video attribute information after the training of an existing model, or represented by the video attribute information and the vector of the video attribute information after the training of the existing model in a combined manner.

Optionally, the first graph convolution neural network model is obtained by training in the following manner:

determining a plurality of sample videos, wherein the sample videos are popular videos with video interaction behaviors on a current platform;

acquiring sample video attribute information corresponding to each sample video;

constructing a sample video attribute differential graph according to the sample video and the corresponding sample video attribute information;

and learning the sample video attribute heterogeneous graph by adopting a preset graph convolution neural network learner to generate a first graph convolution neural network model.

Optionally, the step of determining account feature information for representing the candidate account according to the candidate account, the account attribute information, the interactive video identifier, and the interactive video attribute information includes:

constructing a video account attribute differential graph according to the candidate account, the account attribute information, the interactive video identifier and the interactive video attribute information, wherein the video account attribute differential graph comprises an interactive video identifier node, an interactive video attribute information node, a candidate account node and an account attribute information node;

and inputting the video account attribute heterogeneous graph to a trained second graph convolution neural network model, performing aggregation convolution operation on values of nodes of the same type in the second graph convolution neural network model to obtain a third feature vector, performing joint convolution operation on values of nodes of different types to obtain a fourth feature vector, performing dimension reduction processing on the third feature vector and the fourth feature vector, and outputting account feature information.

Optionally, the candidate account node is represented by a vector trained by the candidate account via an existing model;

the interactive video identification node is represented by a vector which is formed by training an interactive video identification through an existing model;

the account attribute information node is represented by the account attribute information, or represented by a vector of the account attribute information after the existing model training, or represented by the account attribute information and the vector of the account attribute information after the existing model training in a combined manner;

the interactive video attribute information node is represented by the interactive video attribute information, or represented by a vector of the interactive video attribute information trained by an existing model, or represented by the interactive video attribute information and a vector of the interactive video attribute information trained by the existing model in a combined manner.

Optionally, the second graph convolution neural network model is obtained by training in the following manner:

determining an account with interactive behavior on the sample video as a sample account;

acquiring sample account attribute information corresponding to each sample account;

constructing a sample account video attribute differential graph according to the sample account, the sample account attribute information, the video identification of the sample video and the sample video attribute information;

and learning the sample account video attribute abnormal graph by adopting a preset graph convolution neural network learner to generate a second graph convolution neural network model.

Optionally, the step of determining a target account from the candidate account set according to the account feature information of each candidate account and the video feature information of the target video includes:

inputting the video characteristic information and the account characteristic information of each candidate account into a trained matching model, and acquiring matching scores of the video characteristic information and the account characteristic information of each candidate account output by the matching model;

and selecting one or more candidate accounts with matching scores ranked in the front as target accounts from the candidate account set.

According to a second aspect of the embodiments of the present disclosure, there is provided a video recommendation apparatus including:

the video attribute information acquisition unit is configured to acquire video attribute information of a target video, wherein the target video is a video without a video interaction behavior on a current platform or a video with a video interaction behavior not meeting a preset condition;

a video characteristic information determining unit configured to perform conversion processing on the video attribute information to obtain video characteristic information representing the target video;

the account characteristic information determining unit is configured to determine account attribute information of each candidate account in a candidate account set, an interactive video identifier of a video interacted with the candidate account and interactive video attribute information of the video; determining account characteristic information for representing the candidate account according to the candidate account, the account attribute information, the interactive video identification and the interactive video attribute information;

and the target account determining unit is configured to determine a target account from the candidate account set according to the account characteristic information of each candidate account and the video characteristic information of the target video, and recommend the target video to the target account.

Optionally, the video feature information determining unit includes:

the video attribute heterogeneous graph constructing subunit is configured to construct a video attribute heterogeneous graph according to the target video and the video attribute information of the target video, wherein the video attribute heterogeneous graph comprises a target video identification node and a video attribute information node;

the video feature information obtaining subunit is configured to input the video attribute heterogeneous graph to a trained first graph convolution neural network model, perform aggregation convolution operation on values of nodes of the same type in the first graph convolution neural network model to obtain a first feature vector, perform joint convolution operation on values of nodes of different types to obtain a second feature vector, perform dimension reduction processing on the first feature vector and the second feature vector, and output video feature information.

Optionally, the apparatus further comprises a first model training unit configured to train the first atlas neural network model, comprising:

a sample video determining subunit configured to determine a plurality of sample videos, wherein the sample videos are popular videos with video interaction behaviors existing in a current platform;

the sample video attribute information acquisition subunit is configured to acquire sample video attribute information corresponding to each sample video;

the sample video attribute heterogeneous graph constructing subunit is configured to construct a sample video attribute heterogeneous graph according to the sample video and the corresponding sample video attribute information;

and the first learning subunit is configured to learn the sample video attribute heterogeneous graph by adopting a preset graph convolution neural network learner to generate a first graph convolution neural network model.

Optionally, the account characteristic information determining unit includes:

the video account attribute heterogeneous graph constructing subunit is configured to construct a video account attribute heterogeneous graph according to the candidate account, the account attribute information, the interactive video identifier and the interactive video attribute information, wherein the video account attribute heterogeneous graph comprises an interactive video identifier node, an interactive video attribute information node, a candidate account node and an account attribute information node;

the account characteristic information obtaining subunit is configured to input the video account attribute heterogeneous graph to a trained second graph convolution neural network model, perform aggregation convolution operation on values of nodes of the same type in the second graph convolution neural network model to obtain a third characteristic vector, perform joint convolution operation on values of nodes of different types to obtain a fourth characteristic vector, perform dimension reduction processing on the third characteristic vector and the fourth characteristic vector, and output account characteristic information.

Optionally, the apparatus further comprises a second model training unit configured to train the second atlas neural network model, comprising:

a sample account determination subunit configured to determine an account for which there is an interactive behavior with respect to the sample video as a sample account;

the sample account attribute information acquisition subunit is configured to acquire sample account attribute information corresponding to each sample account;

a sample account video attribute heterogeneous graph constructing subunit configured to construct a sample account video attribute heterogeneous graph according to the sample account, the sample account attribute information, the video identifier of the sample video, and the sample video attribute information;

and the second learning subunit is configured to learn the sample account video attribute abnormal graph by adopting a preset graph convolution neural network learner to generate a second graph convolution neural network model.

Optionally, the target account determination unit includes:

the matching score obtaining subunit is configured to input the video feature information and the account feature information of each candidate account into a trained matching model, and obtain a matching score between the video feature information output by the matching model and the account feature information of each candidate account;

and the target account selecting subunit is configured to select one or more candidate accounts with matching scores ranked in the front as the target accounts in the candidate account set.

According to a third aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of the device, enable the electronic device to perform the above-mentioned method.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer program product comprising executable program code, wherein the program code, when executed by the above-described apparatus, implements the above-described method.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in this embodiment, for a target video with no video interaction behavior of a current platform or a target video with a video interaction behavior not meeting a preset condition, video attribute information of the target video may be obtained, and the video attribute information is converted into video feature information used for representing the target video. And simultaneously, determining account attribute information of each candidate account in the candidate account set, interactive video identification of the video interacted with the candidate account and interactive video attribute information of the video, and determining account characteristic information of the candidate account according to the account attribute information, the interactive video identification of the video interacted with the candidate account and the interactive video attribute information of the video. And then, according to the account characteristic information of each candidate account and the video characteristic information of the target video, determining the target account from the candidate account set, and recommending the target video to the target account, so that the target account is determined for the new video according to the video data recorded by the existing interaction, and the accuracy rate of recommending the cold start of the new video is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flow diagram illustrating a video recommendation method according to an example embodiment.

Fig. 2 is a flow diagram illustrating another video recommendation method in accordance with an example embodiment.

Fig. 3 is a video attribute heterogeneous diagram illustration shown in accordance with an example embodiment.

FIG. 4 is a diagram illustrating a dimension reduction process, according to an exemplary embodiment.

FIG. 5 is a flowchart illustrating a method embodiment of first atlas convolutional neural network model generation, according to an exemplary embodiment.

FIG. 6 is a video account attribute heterogeneous pictorial illustration shown in accordance with an exemplary embodiment.

FIG. 7 is a flowchart illustrating an embodiment of a method of second atlas neural network model generation, according to an example embodiment.

Fig. 8 is a block diagram illustrating a video recommendation device according to an example embodiment.

FIG. 9 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely as detailed in the appended claims,

Examples of apparatus and methods consistent with aspects of the present disclosure.

Fig. 1 is a flowchart illustrating a video Recommendation method according to an exemplary embodiment, which may be applied to a scene of Cold Start Recommendation (Cold Start Recommendation, CSR for short) for a newly online target video in a video Recommendation platform or a video playing platform (hereinafter may be referred to as a platform), so as to improve a click rate or attention of the target video on the platform. Wherein, the platform can be connected with the client through a network. The platform may be implemented as a stand-alone server or as a server cluster of multiple servers. The terminal where the client is located may be a desktop terminal or a mobile terminal, and the mobile terminal may include at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The terminal may be used to play multimedia content such as video, audio, etc.

As shown in fig. 1, the present embodiment includes the following steps.

In step S11, video attribute information of the target video is acquired.

As an example, the target video may be a video where no video interaction behavior exists on the current platform, e.g., a new photo.

In other examples, the target video may also be a video whose existing video interaction behavior does not satisfy the preset condition, for example, a video whose existing video interaction behavior does not satisfy the preset condition on the current platform (e.g., the video interaction records are less than 50).

For the target video, since there is no or only a small amount of interactive behavior data, in this embodiment, video attribute information of the target video may be obtained, and feature representation of the target video may be obtained according to the video attribute information. As an example, the video attribute information may include, but is not limited to, one or a combination of the following: video tag, text information of video (i.e. text information related to video, such as comment text data of video, voice text data of video, etc.), audio information, video cover representation, and all frames representation of video. The text information, the audio information, and the video information of the video may be referred to as multi-modal information of the video.

In step S12, the video attribute information is subjected to conversion processing, and video feature information representing the target video is obtained.

In this step, since the target video has no video interaction behavior on the current platform or the existing video interaction behavior does not satisfy the preset condition, the video characteristic information can be obtained by obtaining the video attribute information and converting the video attribute information, and the video characteristic information can be used for representing the target video.

In one embodiment, the video attribute information may be converted into video attribute vectors, and all of the video attribute vectors may be used as video feature information.

In step S13, account attribute information of each candidate account in the candidate account set, an interactive video identifier of a video interacted with the candidate account, and interactive video attribute information of the video are determined.

In one embodiment, the set of candidate accounts (user set) may be a set of all accounts in the platform, including registered accounts and guests; or, the candidate account set may also be a set formed by all registered accounts in the platform, and the registered account may include a registered account with an interaction behavior record or a new registered account without an interaction behavior record; alternatively, in order to save computing resources, the candidate account set may also be a set of all registered accounts with interaction behavior records in the platform, or a set of registered accounts that are more active in a recent period of time, which is not limited in this embodiment.

It should be noted that the number of candidate accounts in the candidate account set may be determined according to actual requirements, which is not limited in this embodiment.

In this step, for each candidate account in the candidate account set, account attribute information of each candidate account, an interactive video identifier of a video interacted with the candidate account, and interactive video attribute information of the video may be obtained.

Illustratively, the account attribute information may include at least one or a combination of: information Of the information point poi (point Of information) where the candidate account is located, device information (such as device model, device brand, device sub-brand, etc.) Of the device used by the candidate account, a list Of applications installed by the device, etc.

In one example, POI information may include, but is not limited to: the City of the candidate account, the City level City _ level of the City, the Province Province _ name of the City, the Community type Community _ type of the candidate account, and so on. In one implementation, the POI information may be obtained by a client of the candidate account invoking a location function of the device in which the candidate account is located.

The device information of the device used by the candidate account and the information such as the app _ list installed by the device may also be obtained by calling the relevant interface of the device by the client in which the candidate account is located, for example, by calling the GetAPPList () function to obtain the app _ list.

In one implementation, the log records may be searched for interactive video identifications of the candidate accounts where the interactive behavior occurred. The interactive video attribute information corresponding to each interactive video identifier may include, but is not limited to: video tag, multimodal information of video, and the like. In one example, the multimodal information can include textual information (e.g., commentary text data for a video, voice text data for a video, etc.), video information (e.g., a video cover representation, all frame representations for a video, etc.), audio information, and the like.

In step S14, determining account feature information representing the candidate account according to the candidate account, the account attribute information, the interactive video identifier, and the interactive video attribute information.

In this step, the account characteristic information may be used to characterize the candidate account. In one embodiment, the candidate account, the account attribute information, the interactive video identifier, and the interactive video attribute information may be converted into vectors, and all the converted vectors may be used as the account feature information.

In step S15, a target account is determined from the candidate account set according to the account feature information of each candidate account and the video feature information of the target video, and the target video is recommended to the target account.

In an embodiment, a similarity algorithm may be used to calculate the matching degree between the account feature information of each candidate account and the video feature information of the target video, and this embodiment does not limit the specific similarity algorithm, and may be, for example, a cosine similarity algorithm, an euclidean distance similarity algorithm, or the like.

In one example, the degree of matching may be expressed as a matching score (matching score), and the greater the matching score, the closer the account feature information representing the candidate account is to the video feature information of the target video, and the more the two match. Conversely, the smaller the matching score, the farther the distance between the representative account feature information and the video feature information, the more mismatched the two. For example, assuming that the matching score is in the [0,1] interval, when the matching score of the account characteristic information and the video characteristic information is closer to 1, the more the two are matched; when the matching score of the account characteristic information and the video characteristic information is closer to 0, the more unmatched the account characteristic information and the video characteristic information is represented.

Then, a Ranking operation (Ranking) may be performed on the matching scores, and the top N candidate accounts with the highest matching scores may be used as the target accounts, where N may be determined according to actual needs, which is not limited in this embodiment.

In this embodiment, for a target video with no video interaction behavior of a current platform or a target video with a video interaction behavior not meeting a preset condition, video attribute information of the target video may be obtained, and the video attribute information is converted into video feature information used for representing the target video. And simultaneously, determining account attribute information of each candidate account in the candidate account set, interactive video identification of the video interacted with the candidate account and interactive video attribute information of the video, and determining account characteristic information of the candidate account according to the account attribute information, the interactive video identification of the video interacted with the candidate account and the interactive video attribute information of the video. And then, according to the account characteristic information of each candidate account and the video characteristic information of the target video, determining the target account from the candidate account set, and recommending the target video to the target account, so that the target account is determined for the new video according to the existing video data recorded interactively, the accuracy rate of cold start recommendation of the new video is improved, the video recommendation effect is ensured, and the production intention of a video producer can be improved. Meanwhile, the interest discovery rate of consumers is improved, and information cocoon houses are avoided.

Fig. 2 is a flow chart illustrating another video recommendation method according to an example embodiment, as shown in fig. 2, including the following steps.

In step S21, video attribute information of the target video is acquired.

The target video is a video with no video interaction behavior on the current platform or a video with video interaction behavior not meeting preset conditions.

Illustratively, the video attribute information may include, but is not limited to, one or a combination of: video tag, text information of video (i.e. text information related to video, such as comment text data of video, voice text data of video, etc.), audio information, video cover representation, and all frames representation of video. .

In step S22, a video attribute differential image is constructed from the target video and the video attribute information of the target video.

A video attribute profile (photo/photo-attribute graph) is a graph representation method for representing a relationship between a target video and video attribute information, and a target video identification node and a video attribute information node may be included in the video attribute profile, where the target video identification node and the video attribute information node are heterogeneous. As shown in the video attribute heterogeneous diagram of fig. 3, a target video identification node P and a video attribute information node PA may be included. In fig. 3, circles labeled the same represent the same semantic (e.g., the same label as "P") and circles labeled different represent different semantic (e.g., the circle labeled "P" and the circle labeled "PA"). In a graph, a plurality of nodes with different semantics participate in construction, namely, are defined as an abnormal graph.

In the video attribute heteromorphic graph, a first-order relationship (i.e., a direct relationship, such as a circle in fig. 3 is connected by one edge) or a high-order relationship (i.e., an indirect relationship, such as a circle in fig. 3 is connected by more than two edges) between different nodes can be represented, wherein the indirect relationship means that in one graph, as long as 2 nodes (i.e., circles in fig. 3) are not directly connected, but a plurality of edges are needed for connection. The term "not in accordance with the indirect relationship" means that the graph does not have any indirect relationship regardless of the number of edges connecting together. In practice, a first order relationship, a second order relationship (two nodes that can be connected together by 2 edges, which are said to be in accordance with the second order relationship, and so on), and a third order relationship (that can be connected together by 3 edges) are generally required.

In this embodiment, the target video identification node and the video attribute information node in the video attribute heterogeneous composition may adopt different representation modes as required. In one example, the target video identification node may be represented by a vector trained by the target video identification via an existing model, i.e., the target video identification node may be a trainable vector representation (learnable embedding) learned through other models. The video attribute information node is represented by video attribute information (i.e., pre-trained attribute representation), or represented by a vector of the video attribute information after being trained by an existing model (i.e., sparse embedding), or represented by the video attribute information and a vector of the video attribute information after being trained by an existing model (i.e., pre-trained attribute + sparse embedding).

In step S23, the video attribute heterogeneous graph is input to a trained first graph convolution neural network model, in the first graph convolution neural network model, an aggregation convolution operation is performed on values of nodes of the same type to obtain a first feature vector, a joint convolution operation is performed on values of nodes of different types to obtain a second feature vector, and dimension reduction processing is performed on the first feature vector and the second feature vector to output video feature information.

In this step, after a video attribute heterogeneous Graph (photo/photo-attribute Graph) is constructed, the video attribute heterogeneous Graph may be input to a trained first Graph convolution neural Network (GCN) model, and the first Graph convolution neural Network model performs a multi-layer convolution operation (i.e., GCN on lights) on the video attribute heterogeneous Graph, so as to finally output video feature information, thereby learning feature representation of the target video.

In this embodiment, the GCN model is applied to an abnormal pattern, and compared with the same pattern, the present embodiment may improve the convolution operation in the GCN model as follows:

and performing aggregation convolution operation on the values of the nodes of the same type to obtain a first feature vector. For example, when performing convolution operation on a node which is also identified by a video in the video attribute heterogeneous graph, an aggregation function (aggregators functions) may be used for performing convolution to obtain a first feature vector.

Illustratively, the aggregation function may include, but is not limited to, the following functions:

GCN agglomeror: the node indicates that an add operation is performed.

MEAN agglomerator: the node represents the averaging operation.

LSTM aggregators: and taking the nodes needing convolution operation as a sequence, and performing representation learning through LSTM, wherein the LSTM can be unidirectional or bidirectional.

MAX POOLING agglomerator: a max pooling operation is performed for each dimension of the node representation.

In addition, joint convolution operation is carried out on values of different types of nodes in the video attribute heterogeneous graph to obtain a second feature vector. For example, the joining may be performed according to a preset joining rule to construct a new vector as the second feature vector.

In practice, because there are many different types of nodes in the photo/photo-attribute graph, the dimensionality of the second feature vector obtained after the concatedation operation is performed on the different types of nodes in the convolution operation is too large, such as a vector of the above ten thousand dimensions, in this embodiment, the dimensionality reduction operation may be performed on the feature vector obtained after the convolution, and finally, a vector of the low dimensionality is obtained.

In one embodiment, the dimension reduction process may include a DAE (Denoising Auto Encoder) dimension reduction, in which a DAE model is used to denoise the first feature vector and the second feature vector.

In one example, as shown in the dimension reduction processing diagram of fig. 4, the input of the DAE model is the first eigenvector output after the operation of the homogeneous node aggregator in the convolution operation of the GCN, and the second eigenvector output after the operation of the heterogeneous node collocation. corrupting represents the noisy data processing of the input; finally, dimension compression of input data is carried out through encoders and decoders, and the output of the DAE model is the output of a hidden layer, namely compressed account feature information (encoded feature).

Fig. 5 is a flowchart illustrating an embodiment of a method for generating a first atlas neural network model, as shown in fig. 5, comprising the following steps, in accordance with an exemplary embodiment.

In step S51, a plurality of sample videos are determined, wherein the sample videos are popular videos with video interaction behaviors existing in the current platform.

In this step, the sample video refers to a hot video (existing photo) with more video interaction behaviors in the current platform relative to the target video.

In step S52, sample video attribute information corresponding to each sample video is acquired.

Illustratively, similar to the video attribute information of the target video, the sample video attribute information may include, but is not limited to, one or a combination of: video tag, text information of video (i.e. text information related to video, such as comment text data of video, voice text data of video, etc.), audio information, video cover representation, and all frames representation of video.

In step S53, a sample video attribute differential map is constructed from the sample video and the corresponding sample video attribute information.

The sample video attribute heterogeneous image is similar to the video attribute heterogeneous image, and is not described herein again.

In step S54, a preset convolution neural network learner is used to learn the sample video attribute heterogeneous map, so as to generate a first convolution neural network model.

In this step, in training the first graph convolution neural network model, a GCN model learner may be used to learn low-dimensional video feature information in the sample video attribute anomaly graph.

In step S24, account attribute information of each candidate account in the candidate account set, an interactive video identifier of a video interacted with the candidate account, and interactive video attribute information of the video are determined.

In step S25, a video account attribute differential map is constructed according to the candidate account, the account attribute information, the interactive video identifier, and the interactive video attribute information.

The video account attribute profile is a graph representation method of the relationship between the account, the video and the interaction matrix as a whole. For example, as shown in fig. 6, the video account attribute differential graph (UAPA graph) may include an interactive video identification node P ', an interactive video attribute information node PA', a candidate account node U ', and an account attribute information node UA'. In the video account attribute heterogeneous graph, different nodes may have a first order relationship or a higher order relationship (e.g., a second order relationship, a third order relationship, etc.).

In this embodiment, each node in the video account attribute heterogeneous graph can adopt different representation modes as required. In one example, the candidate account node may be represented by a vector trained by the candidate account via an existing model, that is, a learnable embedding.

The interactive video identification node can be represented by a vector trained by the interactive video identification through an existing model, namely, learnable embedding.

The account attribute information node may be represented by account attribute information (pre-routed feature), or by a vector trained by the existing model for the account attribute information (leading embedded), or by a combination of the account attribute information and the vector trained by the existing model for the account attribute information (pre-routed feature + leading embedded).

The interactive video attribute information node may be represented by interactive video attribute information (pre-trained feed), or represented by a vector trained by the interactive video attribute information via an existing model (sparse embedding), or represented by the interactive video attribute information and a vector trained by the interactive video attribute information via the existing model (pre-trained feed + sparse embedding).

In step S26, the video account attribute differential map is input to a trained second graph convolution neural network model, in the second graph convolution neural network model, an aggregate convolution operation is performed on values of nodes of the same type to obtain a third feature vector, a joint convolution operation is performed on values of nodes of different types to obtain a fourth feature vector, and dimension reduction processing is performed on the third feature vector and the fourth feature vector to output account feature information.

In this step, after a video account attribute heterogeneous graph (UAPA graph) is constructed, the video account attribute heterogeneous graph may be input to a trained second graph convolution neural network model, and the second graph convolution neural network model performs a multi-layer convolution operation (i.e., GCN on UAPA) on the account attribute heterogeneous graph, so as to finally output account feature information, thereby learning feature representation of the candidate account.

It should be noted that the first and second atlas neural network models may be two separate GCN models, or may be integrated into one GCN model, which is not limited in this embodiment.

Fig. 7 is a flowchart illustrating an embodiment of a method for generating a second atlas neural network model, as shown in fig. 7, including the following steps, in accordance with an exemplary embodiment.

In step S71, a plurality of sample videos are determined, wherein the sample videos are popular videos with video interaction behaviors existing in the current platform.

In step S72, sample video attribute information corresponding to each sample video is acquired.

In step S73, an account for which there is an interactive behavior for the sample video is determined as a sample account.

In this step, for each sample video, an account in which an interactive behavior occurs for the sample video may be acquired as a sample account. When the method is implemented, the account of the video interactive behavior can be acquired through the log record of each sample video.

In step S74, sample account attribute information corresponding to each sample account is acquired.

As one example, sample account attribute information may include, but is not limited to: POI information of the sample account, device information (such as device model, device brand, device sub-brand, and the like) of the device used by the sample account, a list of applications installed by the device, and the like.

In step S75, a sample account video attribute exception map is constructed according to the sample account, the sample account attribute information, the video identifier of the sample video, and the sample video attribute information.

The representation and construction method of the sample account video attribute heterogeneous graph is similar to the account video attribute heterogeneous graph, and the description of the account video attribute heterogeneous graph may be specifically referred to, and is not repeated here.

In step S76, a preset convolution neural network learner is used to learn the sample account video attribute histogram, so as to generate a second convolution neural network model.

In this step, a GCN model learner may be used to learn low-dimensional account feature information in the sample account video attribute anomaly map during training of the second map convolutional neural network model.

In this embodiment, a second graph convolution neural network can be obtained by constructing a sample account video attribute differential graph and performing model training in the sample account video attribute differential graph by using a GCN algorithm.

In step S27, the video feature information and the account feature information of each candidate account are input to a trained matching model, and a matching score between the video feature information output by the matching model and the account feature information of each candidate account is obtained.

In this step, the embodiment may further include a Matching Model (Matching Model), which may be a deep neural network Model, for example.

The outputs of the first and second convolutional neural network models may be used as inputs to the matching model. Specifically, the first convolutional neural network model may input video feature information into the matching model, and the second convolutional neural network model may input each account feature information into the matching model. After obtaining the account characteristic information and the video characteristic information of each candidate account, the matching model realizes matching of each account characteristic information and the video characteristic information through multilayer convolution operation, and outputs a matching score list, wherein the matching score list can comprise matching scores of each account characteristic information and the video characteristic information.

In step S28, one or more candidate accounts with matching scores ranked in the top are selected from the candidate account set as target accounts.

In an embodiment, a sorting operation (Ranking) may be performed on each matching score in the matching score list, and then the top N candidate accounts with the highest matching score are used as target accounts, and the target video is recommended to the target accounts, where N may be determined according to actual requirements, which is not limited by this embodiment.

In this embodiment, for the target video, since there is no video interaction behavior or only a small amount of video interaction behavior, the information related to the video is only the video identifier and the video attribute information, and here, the idea of ZSL (zero-shot learning) is combined, and the video attribute information and the interaction behavior data of the existing video with the video interaction behavior on the platform are utilized to train the first graph convolution neural network model and the second graph convolution neural network model. And then, obtaining video characteristic information of the target video by acting the first graph convolution neural network model on the video attribute differential graph of the target video, obtaining account characteristic information (namely account representing space) of each candidate account by acting the second graph convolution neural network model on the video account attribute differential graph of each candidate account, matching each account characteristic information with the video characteristic information through the matching model, determining the target account according to the obtained matching score, and recommending the target video to the target account. Meanwhile, for the consumers, the video recommendation is carried out according to the interactive history records of the consumers on the videos, so that the interest discovery rate of the consumers can be improved, and the information cocoon rooms are avoided.

Fig. 8 is a block diagram illustrating a video recommendation device according to an example embodiment. Referring to fig. 9, the apparatus includes a video attribute information acquisition unit 801, a video feature information determination unit 802, an account feature information determination unit 803, and a target account determination unit 804.

A video attribute information obtaining unit 801 configured to obtain video attribute information of a target video, where the target video is a video where no video interaction behavior exists on a current platform or a video where existing video interaction behavior does not meet a preset condition;

a video feature information determining unit 802, configured to perform conversion processing on the video attribute information to obtain video feature information representing the target video;

an account characteristic information determining unit 803, configured to determine account attribute information of each candidate account in a candidate account set, an interactive video identifier of a video interacted with the candidate account, and interactive video attribute information of the video; determining account characteristic information for representing the candidate account according to the candidate account, the account attribute information, the interactive video identification and the interactive video attribute information; (ii) a

A target account determining unit 804, configured to determine a target account from the candidate account set according to the account feature information of each candidate account and the video feature information of the target video, and recommend the target video to the target account. .

In an alternative embodiment, the video feature information determining unit 802 may include the following sub-units:

In an alternative embodiment, the target video identification node is represented by a vector trained by the target video identification through an existing model;

In an alternative embodiment, the apparatus further comprises a first model training unit configured to train the first atlas neural network model, comprising:

In an optional implementation manner, the account characteristic information determining unit 803 may include the following sub-units:

In an alternative embodiment, the candidate account node is represented by a vector trained by the candidate account via an existing model;

In an alternative embodiment, the apparatus further comprises a second model training unit configured to train the second atlas neural network model, comprising:

In an alternative embodiment, the target account determining unit 804 may include the following sub-units:

For the specific limitations of the video recommendation apparatus, reference may be made to the limitations of the video recommendation method above, and details are not repeated here. The various elements of the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The units can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 9 is an electronic device, which may be a terminal or a server, according to an exemplary embodiment, and its internal structure diagram may be as shown in fig. 9. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a video recommendation method as described above. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration relevant to the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine certain components, or have a different arrangement of components.

The present disclosure also provides a computer program product comprising: computer program code which, when run by a computer, causes the computer to perform the model training method and the multimedia content recommendation method described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be noted that, the account/user information related to the present disclosure is collected after being authorized by the user/account and is analyzed by subsequent processing.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for video recommendation, comprising:

2. The video recommendation method according to claim 1, wherein said converting the video attribute information to obtain video feature information representing the target video comprises:

3. The method of claim 2, wherein the target video identity node is represented by a vector trained by the target video identity via an existing model;

4. The video recommendation method according to claim 2 or 3, wherein the first graph convolution neural network model is trained by:

5. The video recommendation method according to claim 1, wherein the step of determining account feature information representing the candidate account according to the candidate account, the account attribute information, the interactive video identifier and the interactive video attribute information comprises:

6. The method of claim 5, wherein the candidate account node is represented using a vector trained by the candidate account via an existing model;

7. The video recommendation method according to claim 5 or 6, wherein the second graph convolution neural network model is trained by:

8. A video recommendation apparatus, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video recommendation method of any of claims 1 to 7.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a video recommendation method as recited in any one of claims 1-7.