Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment according to the present application, where the implementation environment is a video recommendation system, and the video recommendation system includes a terminal 100 and a server 200.
The terminal 100 and the server 200 are connected in advance through a network, and data transmission, such as transmission of information required in a video recommendation process, is performed through the network. The network may be a wired network or a wireless network, without limitation.
The terminal 100 is provided with a video recommendation client, a plurality of videos are displayed in a homepage of the video recommendation client, a user clicks a certain video in the homepage and jumps to a video page associated with the video, the video page simultaneously displays a video tag of the video, when the user clicks a certain video tag, the video recommendation client jumps to a recommended video page associated with the video tag, the recommended video associated with the video tag clicked by the user is displayed in the recommended video page, and the user clicks the recommended video and can play the corresponding recommended video. The terminal 100 may be any electronic device capable of installing and running a video recommendation client, such as a smart phone, a tablet computer, a computer, and the like.
The server 200 is configured to provide data support for personalized video recommendations made in the video playback client. The server 200 may be a single server or may be a server cluster formed by a plurality of servers, which is not limited herein.
As described above, in the existing video recommendation system, it is difficult to capture the interestingness of the user in different aspects according to the video recommendation process performed by the video tag, and it is difficult to ensure the accuracy in personalized video recommendation. In order to solve the technical problem, the application provides a video recommendation method on one hand and a video recommendation device on the other hand, so as to realize accurate personalized video recommendation.
Referring to fig. 2, fig. 2 is a flowchart illustrating a video recommendation method applicable to the terminal 100 in the implementation environment shown in fig. 1 according to an exemplary embodiment. As shown in fig. 2, in an exemplary embodiment, the video recommendation method at least includes the following steps:
step 110, according to the triggering of the selected target video on the homepage of the terminal, the terminal jumps to the video page associated with the target video.
The terminal homepage is usually displayed with a video list, and the video list can be changed along with a page turning operation or a sliding operation triggered by a user so as to display more videos for the user.
When the user triggers to select one of the videos in the video list through clicking or other operations, the video is determined to be the target video, and the user jumps to a video page associated with the target video. The video page associated with the target video is usually a playing page corresponding to the target video, and when the terminal jumps to the corresponding playing page, the target video can be automatically played, or the target video is played after the playing operation triggered by the user is detected, or the target video is automatically played after the terminal jumps to the playing page for a period of time, which is not limited in this place.
Step 120, obtaining a recommendation label for video recommendation of the target video, wherein the recommendation label is obtained by sorting candidate labels contained in the target video according to historical viewing behavior data of a user and screening the candidate labels according to a sorting result.
It should be noted first that candidate tags of the target video are video tags of the target video, which are known and describe the main content of the target video from different aspects and granularities, and thus the number of candidate tags is typically a plurality.
The candidate tags of the target video may be all video tags of the target video, or may be partial video tags obtained by preliminarily screening all video tags of the target video in a certain manner, which is not limited herein.
The historical viewing behavior data of the user is recorded on the historical behavior of the user for video viewing on the terminal, so that videos watched by the user can be obtained according to the historical viewing behavior data of the user, and personalized information such as video viewing interests of the user can be mined.
Therefore, the candidate labels contained in the target video are ranked according to the historical viewing behavior data of the user, the candidate labels are screened according to the ranking result, the obtained recommended labels are part of the video labels contained in the target video, and the recommended labels can be matched with personalized information such as video viewing interests of the user to a great extent.
In this embodiment, the recommendation tag is used to make video recommendation for the target video. The recommended labels may be obtained from the terminal itself, that is, the recommended labels are obtained by the terminal sorting and screening candidate labels contained in the target video according to the historical viewing behavior data of the user, or the recommended labels are obtained from the server by the terminal, which is not limited in this place.
And 130, displaying the recommendation label on the video page.
And 140, triggering the selected recommendation label according to the video page, and jumping to the recommendation video page associated with the recommendation label.
The recommendation video pages associated with the recommendation tags are provided with a plurality of recommendation videos, and the recommendation videos all contain the recommendation tags which are triggered to be selected.
As described above, since the recommendation tag is largely matched with the user personalized information such as the video viewing interest of the user, the recommended video displayed in the recommended video page skipped by the terminal can be matched with the user interest, that is, the recommended video displayed in the recommended video page is the video of interest to the user. Compared with the process that the video recommendation contains the same video label according to the video label contained in the target video directly in the existing video recommendation, the video recommendation process carried out according to the recommendation label fully considers the user interests, and accurate personalized video recommendation can be achieved.
It should be further mentioned that, in this embodiment, by displaying the recommendation tag on the video page associated with the target video, the user may trigger to select the recommendation tag interested in the user to obtain the recommendation video, so that the user interest is further considered in the process of video recommendation, and the accuracy of video recommendation is further improved. In order to more clearly explain the video recommendation method provided in the present embodiment, the video recommendation method is described below in an exemplary application scenario.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a terminal interface according to an exemplary embodiment. Wherein, fig. 3a is a terminal homepage on which several videos are shown. When the user clicks "video 4" in the homepage of the terminal, the terminal jumps to the video page shown in fig. 3b, which simultaneously displays the video title and three recommendation tags. The recommended labels are all or part of video labels contained in the video 4 are ordered according to the user history watching behavior data of the user corresponding to the terminal, and then the selected video labels are ranked as top three.
When the user clicks the "recommendation tag 2" in the video page shown in fig. 3b, the terminal jumps to the video recommendation page shown in fig. 3c, in which several recommended videos recommended for the video 4 according to the recommendation tag 2 are displayed, and all the recommended videos contain the recommendation tag 2 and are matched with the user interests.
It should be noted that, in the video page shown in fig. 3b, after the user clicks any recommendation label, the user will jump to the video recommendation page corresponding to the recommendation label clicked by the user, and the videos displayed in the video recommendation pages are all matched with the interests of the user, so that accurate personalized video recommendation can be realized.
Referring to fig. 4, fig. 4 is a flowchart of step 120 in the embodiment shown in fig. 2, in one embodiment, the step of obtaining a video tag for video recommendation for a target video may include the steps of:
at step 210, feature representations of candidate tags in the target video are determined.
The feature representation of the candidate tag means that the corresponding features and contents of the candidate tag are represented in the form of feature vectors. In one embodiment, a heterogeneous network is pre-configured for the video recommendation system, where the heterogeneous network contains feature representations of video tags of all videos in the video recommendation system, so that, for a target video to be recommended, the feature representation of each candidate tag can be directly determined according to the heterogeneous network.
Of course, the feature representation of the candidate tags of the target video may also be determined in other ways, for example, the video recommendation system generates a video tag for the video and correspondingly generates a feature representation of each video tag, so that the video tag and the feature representation of each video tag of the target video may be determined directly for the target video.
Step 220, obtaining video tags in the video watched by the user according to the historical watching behavior data of the user, and constructing user characteristic representation according to the video tags.
As described above, the user history viewing behavior data is a record of the history behavior of the user for video viewing on the terminal, and therefore, the video viewed by the user can be acquired from the user history viewing behavior data, and the video tag in the video viewed by the user can be further acquired.
Because video tags describe the main content of the video from different aspects and granularities, in conjunction with the user's historical viewing behavior, user interests hidden in the video can be mined from these video tags and user feature representations can be constructed therefrom, describing the user's video viewing interests through the constructed user feature representations. Video recommendation is performed based on video watching interests of users, namely, accuracy of personalized video recommendation can be effectively improved.
In step 230, a similarity between the user feature representation and the feature representation of each candidate tag, respectively, is calculated.
As described above, the user feature representation and the feature representation of the candidate tag are feature vector representations for the user information and the candidate tag information, respectively, and thus the calculation of the similarity between the user feature representation and the feature representation of each candidate tag in the present embodiment is essentially to calculate the similarity between the two feature vectors.
Because the user feature representation describes the video watching interests of the user, the feature representation of the candidate tag describes the characteristics and the content of the candidate tag, and therefore the similarity between the two feature representations reflects the matching degree between the video associated with the candidate tag and the interests of the user. The video associated with the candidate tag refers to the video containing the candidate tag.
The higher the similarity between the user feature representation and the feature representation of the candidate tag, the more interesting the user is to the videos associated with the candidate tag, and the higher the degree of interest the user has to the recommended videos, the higher the accuracy of the video recommendation if these videos are recommended to the user.
And 240, sorting the candidate labels according to the obtained similarity, and screening the candidate labels according to the sorting result to obtain recommended labels.
As described above, the higher the obtained similarity in step 230, the higher the accuracy of video recommendation for the video associated with the candidate tag of the target video, so that the candidate tags of the target video need to be ranked according to the obtained similarity in step 230, and the recommendation tag is obtained by screening from the candidate tags according to the ranking result of the obtained candidate tags, and the accuracy of personalized video recommendation can be effectively improved by performing video recommendation for the target video according to the obtained recommendation tag.
In the video page shown in fig. 3b, the video 4 is a target video to be subjected to video recommendation, after determining the feature representation of the candidate tags in the target video, the user feature representation is constructed according to the video tags of the video watched by the user, then the similarity between the user feature representation and each candidate tag is calculated, the candidate tags are ranked according to the similarity, and the candidate tags with the ranking name of the first three are obtained and displayed as recommended tags. The user can obtain the recommended video of interest by clicking any one of the recommended labels displayed in the graph (3 b), so that the video recommending method provided by the embodiment can accurately and individually recommend the target video.
And in prior art implementations, video tag ordering is applied in the process of video tag generation, while no video tag ordering is applied in the actual video recommendation. The embodiment aims to generate an accurate and personalized ordered list of video labels for videos based on the video labels, attract users to click the video labels and jump to video pages corresponding to the video labels, and enable the users to watch more videos related to the video labels, so that indexes such as click rate, play duration, page turning quantity and the like of whole video recommendation are improved.
Fig. 5 is a flow chart of step 220 in one embodiment of the embodiment shown in fig. 4. As shown in fig. 5, in an exemplary embodiment, obtaining a video tag in a video watched by a user according to user historical viewing behavior data, and constructing a user feature representation according to the video tag may include the steps of:
step 221, obtaining the video watched by the user and the video tag in the watched video from the user historical watching behavior data.
Step 222, determining the weight of the video tag according to the attribution relation between the video tag and the video watched by the user, the completeness of the video watched by the user and the watching time.
The attribution relation between the video tag and the video watched by the user refers to a corresponding relation between the video and the video tag, and the same video tag can also be attributed to different videos at the same time. For example, if video a and video B both contain video tag C, then video tag C is assigned to both video a and video B.
The integrity degree of the video watched by the user is the ratio between the playing time length of the video and the total time length of the video, and the higher the integrity degree of the video watched by the user is, the higher the interest degree of the user in the video is.
The viewing time of the video watched by the user refers to the natural time of the video watched by the user, i.e. the historical playing time of the video. The historical play time of a video can also highlight short-term interests of the user to some extent, e.g., the user's interest in recently viewed videos is typically greater than for long-term viewed videos.
In the process of historical video watching by a user, the interest degree of the user for different videos is also different, and the interest degree of the user for different videos can be reflected through the integrity degree and watching time of the video watched by the user, so that the interest degree of the user for different video tags is also different based on the attribution relation between the video tags and the video, and therefore, the different video tags should have different weights. Therefore, the video labels which can better reflect the interests of the user can be screened according to the weights of the video labels, and the user characteristic representation can be constructed more accurately according to the screened video labels.
In one embodiment, a video sequence obtained by sorting videos watched by a user obtained from historical viewing behavior data of the user according to viewing time is expressed as { v } 1 ,v 2 ,......,v m And a plurality of video labels are obtained after the video labels of all videos in the video sequence are de-duplicated, and the weight calculation formula for a certain video label is as follows:
where i denotes the ith video tag in the resulting video tag, j denotes the jth video in the video sequence, only if the ith video tag is assigned to video v j When x (ij) is 1, otherwise 0; complete set j Representing a user viewing video v j Is the degree of integrity of (a); time of j Then video v is represented j Is a time decay factor of (a).
Time decay factor time j Viewing video v with user j Is related to the viewing time of (a), exemplary, time decay factor time j The calculation formula of (2) is as follows:
time j =η*time j+1
wherein eta represents a time factor, eta e (0, 1), and time m =1. It can be seen that the longer the viewing time corresponding to the video is, the smaller the time attenuation factor corresponding to the video is, and the weight of the video tag contained in the video is affected to a certain extent, so that the short-term interest of the user can be highlighted.
And step 223, screening the video tags according to the obtained weight to obtain a user tag set.
As described above, the weights of the video tags represent the interest of the user in viewing the video, so the video tags can be filtered according to the weights obtained in step 222 to obtain a user tag set. The video labels contained in the user label set are video labels with larger weight, and the video labels can reflect user interests, so that user characteristic representations can be constructed more accurately according to the video labels.
For example, a set of video tags having a weight greater than a preset weight threshold may be selected as a set of user tags, or video tags may be ordered according to the weight, and a set of video tags with a specified rank may be selected as a set of user tags, which is not limited herein.
Step 224, constructing a user feature representation according to the feature representation and the weights of each video tag in the user tag set.
The feature representation of each video tag in the user tag set may be obtained from a pre-constructed heterogeneous network, or may be directly obtained by other means, which is not described herein.
For each video tag in the user tag set, the weight factor of each video tag needs to be calculated according to the weight of each video tag, and the calculation formula is as follows:
wherein T is u Representing a user tag set, t i ∈T u Representing a user tag set T u I-th video tag of (a).
According to the weight factor of each video tag in the user tag set and the feature representation of each video tag, the user feature representation can be constructed, and the calculation formula is as follows:
wherein t is i Representing a user tag set T u The feature representation of the ith video tag in (c).
According to the calculation formula, the user characteristic representation is fused with the characteristic representation and the weight factor of each video tag in the user tag set, and the user interest is reflected by the weight factor of the video tag, and the main content of the video is reflected by the characteristic representation of the video tag from different aspects and granularity, so that the user characteristic representation is fused with various interests of the user, and the accuracy in the personalized video recommendation process is improved.
In another exemplary embodiment, a heterogeneous network is also pre-built for the video recommendation system to facilitate determining the feature representation of each candidate tag in the target video directly from the built heterogeneous network, and determining the feature representation of each video tag in the user tag set.
As shown in fig. 6, in an exemplary embodiment, the method further includes the steps of:
in step 310, heterogeneous information related to the video in the video recommendation system is used as a node, and association relation between the heterogeneous information is used as an edge to construct a heterogeneous network.
It should be noted that, the video recommendation system described in this embodiment generally refers to any video system that supports video playing, and involves video recommendation in the process of playing video, so as to recommend relevant playing video for a user, for example, a video playing client installed in the terminal 100 in the implementation environment shown in fig. 1.
The videos in the video recommendation system include not only the target video to be subjected to video recommendation described in step 210, but also the videos watched by the user acquired according to the user history viewing behavior data described in step 220.
In one embodiment, the heterogeneous information related to the video in the video recommendation system includes the video published in the video recommendation system, a video tag of the video published by the video recommendation system, a media account of the published video, and a user group of the video recommendation system. Therefore, the heterogeneous network constructed by the embodiment comprises four different types of nodes, namely video nodes, label nodes, media nodes and user group nodes.
The video nodes are main nodes in the heterogeneous network, and other nodes in the heterogeneous network are obtained by expanding the video nodes. The tag nodes contain physical content of the video as well as probabilistic content that represents different potential interests of the user for the video from different aspects and granularities. Because the interaction frequency between the individual users is low, in order to alleviate the sparseness problem of the user related interactions in the heterogeneous network, the embodiment selects the user group as the node in the heterogeneous network, and does not select the individual user as the node.
The user group is obtained by clustering individual users in the video recommendation system, and in an exemplary embodiment, the individual users in the video recommendation system can be clustered according to the 3 conditions of the gender, age and position of the users to obtain different user groups of the video recommendation system.
Correspondingly, the association relationship between heterogeneous information comprises an association playing relationship between videos, an attribution relationship between videos and video labels, a release relationship between videos and media account numbers, an effective watching relationship between videos and user groups and a common attribution relationship between video labels. These associations also reflect the interactivity of heterogeneous information to date.
The association playing relationship between videos refers to that certain association exists between different videos in the video playing process, for example, certain two videos are videos which are continuously played, and the association playing relationship between the two videos reflects the preference behavior of a user to a certain extent. In one embodiment, the associative play relationship between videos may be determined according to the following steps:
and determining the video which is effectively watched in the video recommendation system according to the integrity degree of the video which is watched, sequencing the video which is effectively watched according to the time of the video which is watched, and determining that the two videos which are adjacently sequenced have an associated play relationship.
For example, videos which are watched for more than 30% of the total duration of the videos in the video recommendation system are determined to be effectively watched videos, and the videos are sequenced according to the watched time of the videos to obtain an effectively watched video sequence, so that it can be determined that two videos which are adjacent to each other in the video sequence have an associated play relationship, and a relationship edge should exist between corresponding video nodes in a heterogeneous network.
For any video in the video recommendation system, each video label contained in the video has a attribution relation with the video, so that video nodes corresponding to the video in the heterogeneous network have relation edges with label nodes corresponding to the video labels respectively.
Similarly, a media account number for publishing any video in the video recommendation system has a publishing relationship with the video published by the media account number, and a relationship edge should be provided between a corresponding media node and a video node in the heterogeneous network. If the video is effectively watched by a certain user group, the corresponding video node and the user group node in the heterogeneous network should also have a relation edge. If two video labels are commonly assigned to the same video, a relationship edge is also arranged between corresponding label nodes in the heterogeneous network.
In one embodiment, if the number of times the video is effectively watched by the user group within the preset time period exceeds a preset value, the video is effectively watched by the user group.
In another embodiment, the heterogeneous information used to construct the nodes in the heterogeneous network may further include more, such as text information contained in a video header, portrait information of a user, video categories of interest to the user, etc., to construct a more complex heterogeneous network, and this is not meant to limit the specific form of the heterogeneous information.
The heterogeneous network constructed in the embodiment can capture the interaction relationship among the video, the video tag, the user and the media in the video recommendation system, and the interaction relationship also contains rich user preference information, so that the characteristic representation of the video tag determined based on the heterogeneous network also carries the user preference information, and further contributes to the accuracy of the personalized video recommendation process.
Step 320, determining the characteristic representation of each node according to the characteristic information of each node and its neighbor nodes in the heterogeneous network.
A neighbor node refers to another node that has a relationship edge with a node in the heterogeneous network, which should have at least one neighbor node for any node in the heterogeneous network.
As described above, the interaction relationship between the nodes in the heterogeneous network also includes abundant user preference information, so the feature representation of each node determined according to the feature information of each node and its neighboring nodes in the heterogeneous network will be fused with the user preference information.
The feature representations of the candidate tags for the target video determined directly from the heterogeneous network in step 210 and the feature representations of the video tags in the user tag set determined directly from the heterogeneous network in step 224 will all contain user preference information, so that the personalized information of the user is considered to the greatest extent in the similarity ranking obtained based on these feature representations in step 240, thereby realizing accurate personalized video recommendation.
FIG. 7 is a flow chart of step 320 in one embodiment of the embodiment shown in FIG. 6. As shown in fig. 7, in an exemplary embodiment, determining the feature representation of each node according to the feature information of each node and its neighboring nodes in the heterogeneous network may include the following steps:
and 321, projecting each node in the heterogeneous network to the same semantic space, and obtaining the feature codes of each node on the semantic space.
The projection of each node in the heterogeneous network to the same semantic space means that semantic features of each node in the heterogeneous network are extracted through the same semantic model, so that feature codes of each node on the same semantic space are obtained.
The feature code of each node in the semantic space may be a one-hot (one coding mode) vector representation of each node, or may be a vector representation obtained by coding in other coding modes, which is not described herein.
Step 322, obtaining the original feature vector of each node by concatenating the feature codes of the neighboring nodes of each node.
The feature codes of all nodes in the heterogeneous network carry semantic information of all nodes, so that the original feature vector obtained by serially connecting the feature codes of the neighbor nodes of any node in the heterogeneous network carries the semantic information of all the neighbor nodes.
In one embodiment, as shown in fig. 8, step 322 may specifically include the steps of:
step 3221, for each node in the heterogeneous network, the neighboring nodes need to be grouped according to the node type corresponding to the neighboring node of the node.
Firstly, it should be noted that, in this embodiment, a description will be given of a process for acquiring an original feature vector of one node in a heterogeneous network, and the process for acquiring original feature vectors of other nodes in the heterogeneous network is the same, which is not repeated in this embodiment.
The node types corresponding to the nodes corresponding to the neighbor nodes correspond to the node types contained in the heterogeneous network, and as previously described, in one embodiment, the node types of the neighbor nodes include video nodes, tag nodes, media nodes, and user group nodes.
Thus, for node k in the heterogeneous network, n neighbor nodes { h } of node k may be determined 1 ,h 2 ,……,h n And dividing the neighbor nodes corresponding to the same node type into a group according to the node type corresponding to each neighbor node. In general, these neighbor nodes may be divided into video node groups, label node groups, media node groups, and user group node groups.
Step 3222, the feature codes of the neighboring nodes in each packet are concatenated to obtain the concatenated codes corresponding to each packet.
And (3) concatenating the feature codes of the neighbor nodes in each group, namely splicing the feature codes of each neighbor node in each group to obtain the corresponding concatenated codes of each group.
The corresponding concatenated codes of each packet may be expressed as
Wherein->
Representing the corresponding tandem coding of video nodes in neighboring nodes, < >>
Representing the corresponding tandem coding of video nodes in neighboring nodes, < >>
Representing the corresponding tandem coding of video nodes in neighboring nodes, < >>
Representing the corresponding tandem codes of the video nodes in the neighbor nodes.
And 3223, connecting the serial codes corresponding to the packets in series to obtain the original feature vector of the node.
For a node k in the heterogeneous network, concatenating the concatenation codes corresponding to each packet, the obtained original feature vector is expressed as follows:
Where "|" denotes a tandem operation between encodings.
It can be obtained from the above that, in this embodiment, the neighboring nodes in the heterogeneous network are grouped, then the feature codes of the neighboring nodes in each group are respectively connected in series, and finally the serial codes of each group are connected in series, so that the original feature vector carrying the semantic information of all the neighboring nodes can be obtained.
Step 323, aggregating the original feature vectors of the neighboring nodes of each node to obtain an aggregate feature vector of each node.
Aiming at each node in the heterogeneous network, the original feature vectors of the neighbor nodes are aggregated, and the obtained aggregated feature vectors contain information of mutual influence among different neighbor nodes under different node types.
In one embodiment, as shown in fig. 9, step 323 may include the steps of:
step 3231, according to the preset first weight matrix, the attention of each node in the heterogeneous network to the neighbor node corresponding to each node type is calculated.
It should be still noted that, in this embodiment, the process of acquiring the aggregate feature vector of one node in the heterogeneous network will be described, and the process of acquiring the aggregate feature vector of other nodes in the heterogeneous network is the same and will not be described in detail here.
Still for node k in the heterogeneous network, the preset first weight matrix includes weight vectors corresponding to different node types, and the first weight matrix may be expressed as
Wherein (1)>
Weight vector representing the corresponding video node type, < +.>
Weight vector representing label node type, < ->
Weight vector representing media node type, +.>
And the weight vector corresponding to the user group node type is represented.
For one of the node types, such as the video node type, a weight vector corresponding to the video node type is needed first
Computing nodek and the ith neighbor node in the video node type (i.e. video v
i ) Related importance->
The calculation formula is as follows:
where T represents the transpose operation of the matrix.
And then according to the obtained importance degree
Computing node k's attention to the ith neighbor node in the video node type>
The calculation formula is as follows:
according to the calculation process, the attention of each node in the heterogeneous network relative to the neighbor node corresponding to each node type can be calculated.
Step 3232, aggregating the attention corresponding to the same node type and the original feature vector of each neighboring node to obtain the type aggregation feature vector of each node under different node types.
Still taking node k in the heterogeneous network as an example, and node k contains n neighbor nodes corresponding to video node types, the process of aggregating the attention corresponding to each neighbor node in the video node types and the original feature vector of each neighbor under the video node types can be expressed by the following formula:
it can be seen that the type aggregation feature vector of the node k under the video node type can be obtained by respectively calculating the products of the attention and the original feature vector corresponding to each neighbor node in the video node type and then summing the products corresponding to all the neighbor nodes.
Similarly, the type aggregation feature vectors of the nodes in the heterogeneous network under different node types can be obtained through calculation.
Step 3233, aggregating the type aggregation feature vectors of the nodes under different node types according to the preset second weight matrix to obtain the aggregation feature vector of each node.
The aggregation process of the aggregate feature vector can also be expressed by the following formula:
wherein, the liquid crystal display device comprises a liquid crystal display device,
an aggregate eigenvector representing node k, relu representing a Relu nonlinear activation function, W
neigh Representing a second weight matrix,/->
A type aggregation feature vector representing node k under the tag node type, +. >
Representing a type aggregation feature vector of node k under media node type,/for the media node type>
Representing the type aggregate feature vector of node k under the user group node type.
The type aggregation feature vectors of other nodes in the heterogeneous network are also obtained in the same manner, which is not described here in detail.
Step 324, obtaining a feature representation of each node by aggregating the aggregate feature vector and the original feature vector of each node.
In the method, the aggregate feature vector of each node in the heterogeneous network is obtained, and the obtained feature representation of each node combines the feature information of the node and the feature information of the neighbor node by aggregating the aggregate feature vector of each node and the original feature vector, so that each node contains more comprehensive feature information, and the obtained feature representation of each node is more accurate.
In one embodiment, as shown in FIG. 10, step 324 may include the steps of:
step 3241, performing a weighting operation on the original feature vectors of the nodes according to a preset third weight matrix to obtain weighted feature vectors of the nodes;
step 3242, obtaining a feature representation of each node by aggregating the weighted feature vectors and the aggregate feature vectors of each node.
Taking node k as an example, if the third weight matrix is represented as W
self Representing the original eigenvector of node k as
Weighted feature vector of node k>
The calculation formula of (2) is as follows: />
Weighted feature vector for node k
And aggregate feature vector->
The calculation formula for the aggregation treatment is as follows:
the above process of feature representation of each node in the heterogeneous network is a machine learning process, λ is a super parameter, and it needs to be optimized in a learning process of feature representation of each node in the heterogeneous network continuously, so as to perform aggregation processing on weighted feature vectors and aggregated feature vectors of each node through the optimal super parameter, to obtain an optimal value of feature representation of each node.
In an exemplary embodiment, the loss function J employed by the learning process for the feature representation of each node in the heterogeneous network is as follows:
wherein node i and node j in the heterogeneous network are neighbor nodes, node i and node h are non-neighbor nodes, f represents the last learned feature representation, and N i Representing the neighbor node set of node i, σ (·) represents an s-type activation function.
From this loss function, it can be derived that the machine learning process is performed such that the feature representation aggregated by node i is similar to the feature representation of its neighboring nodes and is different from the feature representation of its non-neighboring nodes.
Therefore, the characteristic representation of each node in the heterogeneous network obtained according to the embodiment combines the characteristic information of the node and the characteristic information of the neighbor node, and the characteristic representation of each node also has richer user preference information, thereby being very beneficial to realizing accurate personalized video recommendation.
The video method provided in the present application will be described in one specific implementation scenario.
As shown in FIG. 11, in an exemplary video recommendation scenario, video recommendation includes two processes, offline network construction and online tag ordering.
The offline network construction process is to construct a heterogeneous network by taking videos released in a video recommendation system, video labels of the released videos, media accounts of the released videos and user groups acquired in a clustering mode as nodes, and taking an association play relation between the videos, a attribution relation between the videos and the video labels, a release relation between the videos and the media account, an effective watching relation between the videos and the user groups and a common attribution relation between the video labels as edges, and then extracting feature representations of all nodes in the constructed heterogeneous network by adopting a machine learning model, so that a feature map corresponding to the heterogeneous network is obtained, wherein the feature map should contain feature representations corresponding to all nodes in the heterogeneous network. The process of extracting the feature representation of each node in the heterogeneous network by the machine learning model is specifically please refer to the embodiment shown in fig. 7 to 10, and the description of the process is omitted here. And because the calculation amount is huge in the construction of the heterogeneous network and the acquisition of the feature map, the construction of the heterogeneous network and the acquisition of the feature map are required to be performed offline so as to meet the real-time requirement of personalized video recommendation.
The online tag ranking process refers to a process of online video recommendation, in which user characteristic representations corresponding to users are extracted according to historical viewing behavior data of the users, the user characteristic representations contain information related to user interests, and similarity between the user characteristic representations and characteristic representations of candidate tags of target videos is calculated, so that ranking is performed according to the obtained similarity as the candidate tags of the target videos, and recommendation tags for video recommendation are further determined as the target videos according to ranking results. The feature representation of the video tag required to be used in the online process can be obtained from the feature map of the heterogeneous network.
It should be mentioned that, in the present application, the machine learning model uses the methods described in the embodiments shown in fig. 7 to 10 to extract the feature representation of each node in the heterogeneous network, which is an optimal implementation obtained through experiments.
In the experiment, the machine learning model adopts a Random mode, a Deep Walk (one algorithm in the graphic neural network) mode, a Graph Sage (another algorithm in the graphic neural network) mode and an HGAT+user (refer to the description of the embodiment shown in fig. 7 to 10 in the application) mode to carry out video recommendation for the video recommendation system, and the statistical obtained experimental data are shown in tables 1 and 2.
Experimental mode
|
Click rate
|
Coefficient of inversion
|
Random
|
0.420
|
0.210
|
Deep Walk
|
0.593
|
0.330
|
Graph Sage
|
0.596
|
0.335
|
HGAT+user
|
0.632
|
0.354 |
TABLE 1
Table 1 shows experimental data obtained by offline application of the above experimental modes to the video recommendation system, respectively, and it can be seen that the click rate and the back-off system in the HGAT+user mode are higher than those in other modes.
Table 2 is experimental data obtained by respectively applying the above experimental modes to the video recommendation system on line, and it can be seen that the click data and the video watching data of the video tag in the hgat+user mode are both optimal, so that the video recommendation method provided by the present application can effectively improve indexes such as click rate, play duration, page turning amount, etc. of the whole video recommendation.
TABLE 2
Fig. 12 is a block diagram illustrating a video recommendation device suitable for use with the terminal 100 in the implementation environment of fig. 1, according to an exemplary embodiment. As shown in fig. 12, the apparatus includes a first page skip module 410, a recommended tag acquisition module 420, a recommended tag presentation module 430, and a second page skip module 440.
The first page skip module 410 is configured to trigger the selected target video according to the terminal homepage, so that the terminal skips to the video page associated with the target video.
The recommendation tag obtaining module 420 is configured to obtain recommendation tags for video recommendation for a target video, where the recommendation tags are obtained by sorting candidate tags contained in the target video according to historical viewing behavior data of a user, and screening the candidate video tags according to a sorting result.
The recommendation tag display module 430 is configured to display a recommendation tag on a video page.
The second page skip module 440 is configured to skip to the recommended video page associated with the recommended tag according to triggering the selected recommended tag on the video page.
In another exemplary embodiment, the recommended tag acquisition module 420 specifically includes a candidate tag feature determination module, a user feature representation construction module, a similarity calculation module, and a similarity calculation module.
The candidate tag feature determination module is used for determining feature representations of candidate tags in the target video.
The user characteristic representation construction module is used for acquiring video tags in videos watched by users according to the historical watching behavior data of the users and constructing user characteristic representations according to the video tags.
The similarity calculation module is used for calculating the similarity between the user characteristic representation and the characteristic representation of each candidate label.
And the candidate label sorting module is used for sorting the candidate labels according to the obtained similarity, and screening the candidate labels according to the sorting result to obtain recommended labels.
In another exemplary embodiment, the user feature representation construction module includes a video tag acquisition unit, a weight acquisition unit, a user tag set acquisition unit, and a feature construction unit.
The video tag acquisition unit is used for acquiring videos watched by the user and video tags in the watched videos from the historical watching behavior data of the user.
The weight acquisition unit is used for determining the weight of the video tag according to the attribution relation between the video tag and the video watched by the user, the completeness degree of the video watched by the user and the watching time.
The user tag set acquisition unit is used for screening the video tags according to the weights to obtain a user tag set.
The feature construction unit is used for constructing the user feature representation according to the feature representation and the weight of each video tag in the user tag set.
In another exemplary embodiment, the apparatus further comprises a heterogeneous network construction module and a node characteristic determination module.
The heterogeneous network construction module is used for constructing a heterogeneous network by taking heterogeneous information related to videos in the video recommendation system as nodes and taking association relations among the heterogeneous information as edges, wherein the heterogeneous information contains candidate tags and video tags.
The node characteristic determining module is used for determining characteristic representation of each node according to characteristic information of each node and neighbor nodes thereof in the heterogeneous network.
In another exemplary embodiment, heterogeneous information related to a video in a video recommendation system includes the video published in the video recommendation system, a video tag of the video published, a media account of the video published, and a user group in the video recommendation system, wherein the user group is obtained by clustering users in the video recommendation system.
In another exemplary embodiment, the association relationship between heterogeneous information includes an association play relationship between videos, a attribution relationship between videos and video labels, a release relationship between videos and media accounts, an effective viewing relationship between videos and user groups, and a common attribution relationship between video labels, where the manner of determining the association play relationship between videos is as follows:
and determining the video which is effectively watched in the video recommendation system according to the integrity degree of the video which is watched, sequencing the video which is effectively watched according to the time of the video which is watched, and determining that the two videos which are adjacently sequenced have an associated play relationship.
In another exemplary embodiment, the node feature determination module includes a semantic space mapping unit, a feature encoding concatenation unit, an original feature vector concatenation unit, and a vector aggregation unit.
The semantic space mapping unit is used for projecting each node in the heterogeneous network to the same semantic space, and obtaining feature codes of each node on the semantic space.
The feature code serial unit is used for obtaining the original feature vector of each node through feature codes of neighbor nodes of each node in serial connection.
The original feature vector serial unit is used for aggregating the original feature vectors of the neighbor nodes of each node to obtain the aggregate feature vector of each node.
The vector aggregation unit is used for obtaining the characteristic representation of each node by aggregating the aggregate characteristic vector and the original characteristic vector of each node.
In another exemplary embodiment, the feature encoding concatenation unit includes a node grouping subunit, an intra-group feature concatenation subunit, and an out-group feature concatenation subunit.
The node grouping subunit is configured to group, for each node in the heterogeneous network, the neighboring nodes according to node types corresponding to the neighboring nodes of the node.
The intra-group feature tandem subunit is configured to obtain tandem codes corresponding to each group by tandem connection of feature codes of neighboring nodes in each group.
The out-of-group feature series subunit is used for connecting the corresponding series codes of each group in series to obtain the original feature vector of the node.
In another exemplary embodiment, the original feature vector concatenation unit includes an attention calculation subunit, a type aggregate feature vector acquisition subunit, and an aggregate feature vector acquisition subunit.
The attention calculating subunit is configured to calculate, according to a preset first weight matrix, attention of each node in the heterogeneous network to a neighboring node corresponding to each node type, where the first weight matrix includes weight vectors corresponding to each node type.
The type aggregation feature vector obtaining subunit is used for aggregating the attention corresponding to the same node type and the original feature vector of each neighbor node to obtain the type aggregation feature vector of each node under different node types.
The aggregation feature vector obtaining subunit is configured to aggregate type aggregation feature vectors of each node under different node types according to a preset second weight matrix, so as to obtain an aggregation feature vector of each node.
In another exemplary embodiment, the vector aggregation unit includes a weight operation subunit and a feature vector aggregation subunit.
The weighting operation subunit is configured to perform a weighting operation on the original feature vector of each node according to a preset third weight matrix, so as to obtain a weighted feature vector of each node.
The feature vector aggregation subunit is configured to obtain feature representations of each node by performing aggregation processing on the weighted feature vectors and the aggregated feature vectors of each node.
It should be noted that, the apparatus provided in the foregoing embodiments and the method provided in the foregoing embodiments belong to the same concept, and the specific manner in which each module and unit perform the operation has been described in detail in the method embodiments, which is not repeated herein.
Another aspect of the present application also provides an electronic device comprising a processor and a memory, wherein the memory has stored thereon computer readable instructions that when executed by the processor implement the video recommendation method as described above.
Referring to fig. 13, fig. 13 is a schematic diagram illustrating a hardware structure of an electronic device according to an exemplary embodiment.
It should be noted that the device is just one example adapted to the present application and should not be construed as providing any limitation to the scope of use of the present application. Nor should the device be construed as necessarily relying on or necessarily having one or more of the components of the exemplary electronic device shown in fig. 13.
The hardware structure of the device may vary widely depending on the configuration or performance, as shown in fig. 13, and the device includes: a power supply 510, an interface 530, at least one memory 550, and at least one central processing unit (CPU, centra lProcessing Units) 570.
Wherein the power supply 510 is configured to provide an operating voltage for each hardware device on the device.
The interface 530 includes at least one wired or wireless network interface 531, at least one serial-to-parallel interface 533, at least one input-output interface 535, and at least one USB interface 537, etc., for communicating with external devices.
The memory 550 is used as a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, where the resources stored include an operating system 551, application programs 553, and data 555, and the storage mode may be transient storage or permanent storage. The operating system 551 is used to manage and control each hardware device and application programs 553 on the device, so as to implement calculation and processing of the mass data 555 by the central processor 570, which may be Windows server, mac OSXTM, unixTM, linuxTM, etc. The application programs 553 are computer programs that perform at least one particular task based on the operating system 551, and may include at least one module, each of which may each include a series of computer readable instructions for the device.
The central processor 570 may include one or more of the above processors and is arranged to communicate with the memory 550 via a bus for computing and processing the mass data 555 in the memory 550.
As described in detail above, a video recommendation device embodying the present application will perform the video recommendation method described above by the central processor 570 reading a series of computer readable instructions stored in the memory 550.
Furthermore, the present application may also be implemented in hardware circuitry or in combination with software instructions, and thus, implementation of the present application is not limited to any specific hardware circuitry, software, or combination of the two.
Another aspect of the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a video recommendation method as described above. The computer-readable storage medium may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device.
The foregoing is merely a preferred exemplary embodiment of the present application and is not intended to limit the embodiments of the present application, and those skilled in the art may make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.