CN114048349A

CN114048349A - Method and device for recommending video cover and electronic equipment

Info

Publication number: CN114048349A
Application number: CN202111326938.9A
Authority: CN
Inventors: 李纯懿
Original assignee: Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Current assignee: Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2022-02-15

Abstract

The invention provides a method and a device for recommending video covers and electronic equipment, wherein the method comprises the following steps: extracting a plurality of historical video covers related to a target user, and determining the category of the historical video covers; training a preset classification model according to a plurality of historical video covers, and determining a video cover classification model of a target user; determining a current interest point of a target user; and determining effective video covers according to the video cover classification model, and recommending the effective video covers to the target user. By the method, the device and the electronic equipment for recommending the video cover, the video cover which is interested by the target user can be recommended to the target user, or the video cover of the video is adjusted to the video cover which is interested by the target user, so that the user can conveniently and quickly position the video which is interested by the user, and the click rate of the user for clicking the video can be improved.

Description

Method and device for recommending video cover and electronic equipment

Technical Field

The invention relates to the technical field of video recommendation, in particular to a method and a device for recommending video covers, electronic equipment and a computer-readable storage medium.

Background

Currently, videos in a video website or a video platform both show corresponding video covers, and the video covers may be, for example, video pictures captured from the videos. The video cover has great influence on the user clicking the video and improving the effectiveness of the advertisement video. However, the current video covers mainly play a role in presentation, and are not good for recommending proper video covers to the user, for example, video resources with video covers that the user may like are not recommended to the user, or video covers of video resources are not adaptively changed so that video covers in which the user is interested can be recommended to the user.

Disclosure of Invention

In order to solve the existing technical problems, embodiments of the present invention provide a method and an apparatus for recommending a video cover, an electronic device, and a computer-readable storage medium.

In a first aspect, an embodiment of the present invention provides a method for recommending a video cover, including:

extracting a plurality of historical video covers related to a target user, and determining the category of the historical video covers;

training a preset classification model according to a plurality of historical video covers, and determining a video cover classification model of the target user;

determining the current interest point of the target user according to the video information operated by the target user within a preset time period;

and determining effective video covers according to the video cover classification model, and recommending the effective video covers to the target user, wherein the effective video covers belong to the video covers of the category corresponding to the current interest points.

In a second aspect, an embodiment of the present invention further provides an apparatus for recommending a video cover, including:

the system comprises a preprocessing module, a storage module and a display module, wherein the preprocessing module is used for extracting a plurality of historical video covers related to a target user and determining the categories of the historical video covers;

the training module is used for training a preset classification model according to a plurality of historical video covers and determining a video cover classification model of the target user;

the interest determining module is used for determining the current interest point of the target user according to the video information operated by the target user within a preset time period;

and the recommending module is used for determining an effective video cover according to the video cover classifying model and recommending the effective video cover to the target user, wherein the effective video cover belongs to a video cover of a category corresponding to the current interest point.

In a third aspect, an embodiment of the present invention provides an electronic device, including a bus, a transceiver, a memory, a processor, and a computer program stored on the memory and executable on the processor, where the transceiver, the memory, and the processor are connected via the bus, and the computer program, when executed by the processor, implements any one of the above steps in the method for recommending video covers.

In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the method for recommending video covers according to any one of the above items.

According to the method, the device, the electronic equipment and the computer-readable storage medium for recommending the video cover, which are provided by the embodiment of the invention, based on the previous historical video cover and the corresponding category of the target user, a video cover classification model can be obtained by training the target user, and then a proper video cover can be recommended to the target user by using the video cover classification model, so that the video cover which is interested by the target user can be recommended to the target user, or the video cover of the video is adjusted to the video cover which is interested by the target user, so that the user can conveniently and quickly position the video which is interested by the user, and the click rate of the user for clicking the video can also be improved. In addition, the video cover classification model is set by taking the group as a target user, so that the training cost can be reduced, and the updating and iteration efficiency of the video cover classification model can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present invention, the drawings required to be used in the embodiments or the background art of the present invention will be described below.

FIG. 1 is a flow chart illustrating a method for recommending video covers in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating a classification model in a method for recommending video covers according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating an apparatus for recommending video covers according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device for executing a method for recommending a video cover according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings.

Fig. 1 is a flow chart illustrating a method for recommending a video cover according to an embodiment of the present invention. As shown in fig. 1, the method includes:

step 101: a plurality of historical video covers related to the target user are extracted, and the category to which the historical video covers belong is determined.

In the embodiment of the invention, if a video cover needs to be recommended to a certain user, the user can be taken as a target user. The video covers are generally based on videos, and the "recommended video cover" in this embodiment may refer to a video that is recommended to have a video cover that is interesting to the target user, or may refer to a video cover that is adjusted to be an interesting video cover for the target user.

In order to recommend video covers to a target user, the target user needs to have a process of interacting with the video, such as watching the video or clicking on the video, so as to be able to determine a plurality of historical video covers of the target user. The historical video cover may be a video cover of a video previously watched by the target user, a video cover of a video previously clicked on by the target user, a video cover of a video previously rated by the target user (e.g., praise, forward rating, etc.), and the like. And, each historical video cover belongs to a corresponding category, so that a plurality of historical video covers can be classified into the same or different categories. The category can be a category to which videos relied on by historical video covers belong, such as movies, TV shows, and art shows, or comedy, science fiction, documentaries, and the like; alternatively, a label may be set in advance for a video cover of each video, and the label or similar labels may be classified into one category. The present embodiment does not limit the specific implementation manner of setting the category for the video cover. If the category of a history video cover cannot be determined currently, the history video cover can be removed.

Step 102: and training a preset classification model according to a plurality of historical video covers, and determining the video cover classification model of the target user.

In the embodiment of the invention, under the condition of determining a plurality of historical video covers and corresponding categories, the plurality of historical video covers can form a training set, so that a classification model capable of classifying the video covers, namely a video cover classification model, can be obtained through training.

In addition, since the number of users of the video website or the video platform is large, generally millions or more, if a classification model is set for each target user, the training cost is greatly increased. Therefore, the embodiment also groups the target users, determines the video cover classification model of each group of target users, and can greatly reduce the training cost. The step 102 "training a preset classification model according to a plurality of historical video covers to determine a video cover classification model of a target user" may include: determining a group to which a target user belongs, taking video covers related to other users belonging to the same group as historical video covers, training a preset classification model according to a plurality of historical video covers, and determining the video cover classification model of the group to which the target user belongs.

Under the condition of user authorization, the user portrait of each user can be determined according to personal information, viewing records and other information of the user, and clustering analysis is performed based on the similarity of the user portraits to determine which users belong to the same group.

Step 103: and determining the current interest point of the target user according to the video information operated by the target user in the preset time period.

When a video envelope needs to be recommended to a target user, a certain time period may be used as a preset time period, for example, the last week, the last month, and the like, and based on the operation of the target user on different videos in the preset time period, corresponding video information may be determined, so as to determine an interest point of the target user in the preset time period, that is, a current interest point. The video information is used to indicate the user's operations on different videos, such as clicking a movie a, watching a tv show B, and the like. The current interest point refers to the content which the target user is interested in at the current time, and the current interest point is the same as or similar to the category of the video cover; for example, if a target user watches a large number of science fiction movies within a preset time period, the current interest point of the target user may be considered as a science fiction movie.

Step 104: and determining effective video covers according to the video cover classification model, and recommending the effective video covers to the target user, wherein the effective video covers belong to the category corresponding to the current interest points.

In the embodiment of the invention, under the condition of determining the current interest point of the target user, the interested category of the target user can be determined, and the video cover belonging to the category is recommended to the target user as the effective video cover. Specifically, in the case that the video covers of the videos are not classified, the category to which the video cover of each video belongs may be determined based on a video cover classification model (when the categories determined by different video cover classification models are different, the category to which the video cover belongs may be determined by applying a minority majority-obeying principle), and the video belonging to the category in which the target user is interested is recommended to the target user, and the video that the target user sees before opening the video is the video cover belonging to the category in which the target user is interested. Or, multiple frames of images can be automatically extracted from the pending video, the category to which each frame of image belongs is respectively determined based on the video cover classification model, a video cover belonging to the category corresponding to the current interest point is generated for the pending video under the condition that the pending video has more than the preset number of images belonging to the category corresponding to the current interest point, and the pending video is recommended to the target user. In the case of grouping the target users, the "video cover classification model" in the step 104 is a video cover classification model of the group to which the target users belong.

According to the method for recommending the video cover, the video cover classification model can be obtained by training the target user based on the previous historical video cover and the corresponding category of the target user, and then the video cover classification model can be used for recommending a proper video cover to the target user so as to recommend the video cover interested by the target user or adjust the video cover of the video to the video cover interested by the target user, so that the user can conveniently and quickly position the video interested by the user, and the click rate of the user for clicking the video can be improved. In addition, the video cover classification model is set by taking the group as a target user, so that the training cost can be reduced, and the updating and iteration efficiency of the video cover classification model can be improved.

Since the number of historical video covers associated with a target user is generally small, it may be difficult to effectively train a classification model using historical video covers directly, and the present embodiment solves the problem of the small number of samples by generating training samples that include a plurality of historical video covers. On the basis of the foregoing embodiment, the aforementioned step 102 "training a preset classification model according to a plurality of historical video envelopes" includes:

step A1: generating a plurality of different training samples, the training samples including a reference set and a standard video cover; the reference group is a historical video cover set formed by selecting at least one historical video cover from each category, and the standard video cover is one of the historical video covers except the reference group.

In the embodiment of the present invention, if the number of categories of historical video covers related to a target user is N (in a case that no specific description is given, the number of categories of historical video covers is assumed to be N by default), at least one historical video cover is selected from each category to form a set of historical video covers including N categories, where the set is referred to as a reference group; in general, to reduce the amount of processing, the number of historical video covers for each category in the reference set may be set to 1. And, in addition to the reference group, selecting another historical video cover from all historical video covers as a standard video cover, and combining the reference group and the standard video cover to form a training sample. Likewise, other training samples may be formed in the same manner.

For example, 100 historical video covers are currently extracted and classified into 5 categories (i.e., N-5); if the number of each type of historical video covers is 20, 100 historical video covers can be sequentially represented as: a is₁,a₂,…,a₂₀,b₁,b₂,…,b₂₀,c₁,c₂,…,c₂₀,d₁,d₂,…,d₂₀,e₁,e₂,…,e₂₀. The reference group can be formed by selecting one historical video cover from 5 categories, such as selecting the historical video cover a₁,b₁,c₁,d₁,e₁Form a reference group [ a₁,b₁,c₁,d₁,e₁]And in addition to the historical video cover a₁,b₁,c₁,d₁,e₁Selecting one seat standard video cover from other historical video covers, e.g. the standard video cover is historical video cover a₂Then a training sample can be formed: { [ a ]₁,b₁,c₁,d₁,e₁]；a₂}. Since the reference groups include which historical video covers are various, the 20 × 20 × 20 × 20 × 20 in the above example is 3200000 reference groups, and each reference group can correspond to a plurality of standard video covers, so that the number of training samples can be greatly increased.

Step A2: and inputting the training samples into a classification model for training, wherein the classification model is used for identifying which category of historical video covers in the standard video cover and the reference group belongs to the same category.

In the embodiment of the invention, the training sample comprises a standard video cover and a reference group, and the reference group comprises a plurality of categoriesAnd the standard video cover and the historical video cover of a certain category in the reference group belong to the same category. As in the above example, training sample { [ a { [ A ]₁,b₁,c₁,d₁,e₁]；a₂In }, standard video cover a₂And historical video cover a in reference group₁Belong to the same category. Accordingly, the classification model is used to identify which category of historical video covers in the reference set, and the standard video covers, belong to the same category. Specifically, the standard video covers and the historical video covers of each category in the reference group can be respectively combined to form N video cover combinations, and the classification model can improve the recognition probability of the video cover combinations formed by the standard video covers and the historical video covers belonging to the same category as much as possible, and reduce the recognition probability of the video cover combinations formed by the standard video covers and the historical video covers belonging to different categories as much as possible, so that the trained classification model (i.e., the video cover classification model) can recognize which category the historical video covers belonging to the standard video covers and the historical video covers in the reference group belong to the same category.

Correspondingly, in the process of identification based on the trained video cover classification model, a reference group can be preset, for example, a historical video cover with higher quality is selected from each category to form the reference group, the video cover to be identified is used as a standard video cover, the preset reference group and the video cover to be identified are used as a sample to be input into the classification model, and the classification model can output which video cover in the video cover to be identified and the preset reference group belongs to the same category, so that the category of the video cover to be identified can be determined.

Optionally, referring to fig. 2, the classification model includes a feature extraction layer and a classification layer.

The step a2 "inputting training samples into the classification model for training" includes:

step A21: inputting the training samples into a classification model, and extracting a feature map matrix of each to-be-determined video cover in the training samples based on the feature extraction layer; the pending video cover is one of a standard video cover of the training sample and all historical video covers in the reference set.

In the embodiment of the invention, the feature extraction layer of the classification model is used for extracting the feature map matrix of the video cover. Specifically, the training sample includes a standard video cover and a reference group, and the reference group includes N categories of historical video covers, and in this embodiment, the video covers in the training sample (the standard video cover or the historical video covers in the reference group) are collectively referred to as pending video covers, so as to facilitate the subsequent description of the processing process of the video covers in the training sample. For example, the training sample is { [ a ]₁,b₁,c₁,d₁,e₁]；a₂H, the video cover a therein₁,b₁,c₁,d₁,e₁,a₂May be referred to as a pending video cover.

The undetermined video cover is essentially a video cover which is an image, and the characteristics of the undetermined video cover, namely the characteristic diagram matrix, can be extracted by using the characteristic extraction layer. The training sample comprises a plurality of undetermined video covers, and a feature map matrix of each undetermined video cover can be determined based on the feature extraction layer. The characteristic extraction layer can be a convolution layer, and the characteristics of the cover of the video to be determined are extracted in a convolution processing mode; the feature extraction layer may also be of other configurations. The feature map matrix is a feature set expressed in a matrix form, and the feature map matrix may be a two-dimensional matrix or a three-dimensional matrix. For example, for a pending video cover with a length L and a width W, the feature map matrix may be L × W × D, where D represents the depth of the feature map matrix.

Optionally, an attention mechanism may be introduced to the feature map matrix to improve recognition. This embodiment introduces different dimensions of attention to the feature map matrix. Specifically, the step a21 "extracting the feature map matrix of each pending video cover in the training sample based on the feature extraction layer" includes steps a211 to a 213.

Step A211: and extracting an initial feature map of the cover of the to-be-determined video, wherein the initial feature map has D depths.

In the embodiment of the invention, the features of the cover of the undetermined video can be extracted in a conventional mode, for example, the features of the cover of the undetermined video are extracted through the convolutional layer, so that an initial feature map can be determined. The depth of the initial feature map is D. In general, the initial feature map has a plurality of depths, i.e., D is a positive integer not less than 1.

Referring to fig. 2, if the pending video cover is a color drawing in RGB form, which has three channels, if the length of the pending video cover is L and the width of the pending video cover is W, the format of the pending video cover can be represented as L × W × 3. And then, extracting an initial feature map of the cover of the to-be-determined video by utilizing a convolutional layer and the like, wherein the depth of the initial feature map is D, and the format of the initial feature map is L multiplied by W multiplied by D. In the training process, the input of the feature extraction layer is a training sample.

Step A212: and performing pooling processing on the features of each depth in the initial feature map respectively to generate a first attention vector of the D depth, and generating an intermediate feature map according to the initial feature map and the first attention vector.

In this embodiment of the present invention, the depth of the initial feature map is D, each depth corresponds to an L × W feature, in this embodiment, pooling processing is performed on the feature corresponding to each depth, the L × W feature corresponding to each depth may be converted into a corresponding feature, for example, an a × a feature, a1 × 1 feature, and the like, and after pooling processing is performed on all depths, the first attention vector may be generated, and in this embodiment, the format of the first attention vector may be a × a × D, for example, 1 × 1 × D.

Then, combining the initial feature map and the first attention vector, an intermediate feature map of the pending video cover can be generated. For example, the result of multiplying the initial feature map by the first attention vector may be taken as the intermediate feature map. The intermediate feature map may have the same format as the initial feature map, for example, all of the intermediate feature map is L × W × D.

Step A213: and performing pooling processing on each plane feature on the depth dimension in the initial feature map respectively to generate a second attention vector, and generating a feature map matrix of the cover of the undetermined video according to the intermediate feature map and the second attention vector.

In the embodiment of the present invention, the planar features refer to features in a plane perpendicular to the depth direction, and in the planar dimension, the number of the planar features is related to the length L and the width W of the initial feature map, and is L × W; in the depth dimension, the number of planar features is related to the depth of the initial feature map, which is D in number. In this embodiment, the planar features are pooled in the depth dimension, and an attention vector, i.e., a second attention vector, may also be generated. For example, the format of the initial feature map is L × W × D, pooling of each planar feature in the depth dimension may result in 1 × 1 × b features, and combining the pooling results of all planar features, the L × W × b second attention vector may be determined. Wherein b may be 1.

Then, combining the intermediate feature map and the second attention vector, the intermediate feature map of the pending video cover can be generated. For example, the result of multiplying the intermediate feature map by the second attention vector may be used as the feature map matrix. The feature map matrix may have the same format as the intermediate feature map (or the initial feature map), for example, each of the feature maps is L × W × D.

According to the method, attention vectors are introduced from two dimensions, the characteristic diagram matrix of the cover of the undetermined video can be better extracted, and the identification accuracy of a subsequent classification layer can be improved.

Step A22: inputting the characteristic diagram matrix of the standard video cover of the training sample and the characteristic diagram matrix of the historical video cover corresponding to each category in the reference group into a classification layer, and identifying which category of historical video covers in the standard video cover and the reference group belong to the same category based on the classification layer.

In the embodiment of the invention, the characteristic diagram matrix of each to-be-determined video cover in the training sample is determined, which is equivalent to the characteristic diagram matrix of the standard video cover in the training sample and the historical video cover corresponding to each category in the reference group. And then inputting the characteristic map matrix of the standard video cover and the characteristic map matrix of the historical video cover corresponding to each category in the reference group into a classification layer, and identifying which category of historical video covers in the standard video cover and the reference group belong to the same category by using the classification layer. For example, the similarity between the feature map matrix of the standard video cover and the feature map matrix of the historical video cover belonging to each category in the reference group is determined by using the classification layer, and the training process aims to increase the similarity between the standard video cover belonging to the same category and the historical video cover as much as possible and reduce the similarity between the standard video cover not belonging to the same category and the historical video cover as much as possible. If a plurality of historical video covers are included in a certain category of the reference group, the average characteristic of the characteristic graph matrix of all the historical video covers in the category can be used as the characteristic graph matrix corresponding to the category.

Alternatively, the feature map matrix of the standard video cover and the feature map matrix corresponding to a certain category may be input to the classification layer, and then the feature map matrix of the standard video cover and the feature map matrix corresponding to another category may be input to the classification layer until the feature map matrices of all categories have been input to the classification layer, so as to determine which category the standard video cover belongs to, respectively. But the mode is essentially that two video covers are compared, and at most 9900 combinations exist in 100 historical video covers, so that the number of samples in training the classification layer is reduced; moreover, this approach does not effectively compare historical video covers of all categories in the reference set, resulting in a general classification effect.

In the embodiment of the invention, all characteristic diagram matrixes of the training samples form an undirected graph, and classification is realized based on the undirected graph. Specifically, the step a22 "inputting the feature map matrix of the standard video cover of the training sample and the feature map matrix of the historical video cover corresponding to each category in the reference group into the classification layer" includes:

step A221: and respectively combining the characteristic graph matrix of the standard video cover with the characteristic graph matrix of the historical video cover corresponding to each type in the reference group to generate N combined characteristics, wherein N is the number of the types of the historical video covers in the reference group.

In an embodiment of the invention, the training sample comprises a standard video cover and a plurality of types of historical video covers, and the standard video cover can be combined with each type of historical video cover. In particular, by mixingThe feature map matrix of the standard video cover is combined with the feature map matrices of the historical video covers corresponding to the N categories in the reference group respectively, so that N combined features can be generated. For example, the training sample is { [ a ]₁,b₁,c₁,d₁,e₁]；a₂H, then cover a of standard video₂Can be matched with historical video cover a₁Combine to form a combined feature (a)₁,a₂) (ii) a Likewise, other binding features may be formed: (b)₁,a₂)、(c₁,a₂)、(d₁,a₂)、(e₁,a₂). Wherein each combined feature may be a concatenation of two feature map matrices.

Step A222: and taking each combined feature as a vertex to form a complete undirected graph.

In the embodiment of the invention, each combination feature is taken as a vertex, so that a complete undirected graph can be formed, and the complete undirected graph means that an edge exists between any two vertices in the undirected graph. One representation of the complete undirected graph can be seen in fig. 2, where fig. 2 illustrates an example where N-4, where white boxes represent the feature map matrix of a standard video cover, and boxes with different gray levels represent the feature map matrices corresponding to different categories.

Optionally, in the complete directed graph, an edge between any two vertices is provided with a weight, and the weight is a distance between feature map matrices of corresponding two categories of historical video covers in the reference group. By introducing the weight of the edge, the subsequent classification layer can better distinguish different types of feature map matrixes conveniently.

Step A223: the complete undirected graph is input to a classification layer that determines the probability that a standard video cover in each vertex belongs to the same category as a corresponding historical video cover in the reference set.

In an embodiment of the invention, an undirected graph containing all combined features is input to a classification layer, based on which it can be determined which historical video covers in a reference group belong to the same category as the standard video covers. The classification layer may specifically adopt a structure of a graph-convolution network, so as to be able to output a probability corresponding to each vertex, where the probability represents a probability that a standard video cover and a historical video cover of a corresponding category belong to the same category.

Correspondingly, in the process of identifying by using the trained video cover classification model, a reference group can be preset, for example, a historical video cover with higher quality is selected from each category to form the reference group, the video cover to be identified is used as a standard video cover, the preset reference group and the video cover to be identified are used as a sample to be input into the classification model, the classification model can generate a directed graph containing the preset reference group and the video cover to be identified, and then which video cover in the preset reference group and the video cover to be identified belong to the same category is output, so that the category of the video cover to be identified can be determined.

According to the method for recommending the video cover, the video cover classification model can be obtained by training the target user based on the previous historical video cover and the corresponding category of the target user, and then the video cover classification model can be used for recommending a proper video cover to the target user so as to recommend the video cover interested by the target user or adjust the video cover of the video to the video cover interested by the target user, so that the user can conveniently and quickly position the video interested by the user, and the click rate of the user for clicking the video can be improved. In addition, the video cover classification model is set by taking the group as a target user, so that the training cost can be reduced, and the updating and iteration efficiency of the video cover classification model can be improved. A training sample containing a reference group and a standard video cover is generated by using a historical video cover, so that the number of the sample can be expanded, and the model training is favorably realized; and an attention mechanism is introduced into the characteristic diagram matrix, so that the identification effect can be improved. The characteristic diagram matrixes of all the covers of the videos to be determined are combined to form a complete undirected graph, the sample number of the classification layers can be increased, and in the subsequent identification process, the classification can be more accurately performed by utilizing the difference among the characteristic diagram matrixes of different classes.

The method for recommending video covers provided by the embodiment of the invention is described above in detail, and the method can also be implemented by a corresponding device.

Fig. 3 is a schematic structural diagram illustrating an apparatus for recommending video covers according to an embodiment of the present invention. As shown in fig. 3, the apparatus for recommending a video cover includes:

the preprocessing module 31 is used for extracting a plurality of historical video covers related to a target user and determining the categories of the historical video covers;

the training module 32 is configured to train a preset classification model according to a plurality of historical video covers, and determine a video cover classification model of the target user;

the interest determining module 33 is configured to determine a current interest point of the target user according to video information operated by the target user within a preset time period;

and the recommending module 34 is configured to determine an effective video cover according to the video cover classification model, and recommend the effective video cover to the target user, where the effective video cover is a video cover belonging to a category corresponding to the current interest point.

On the basis of the above embodiment, the training module 32 includes a sample generation unit and a training unit;

the sample generating unit is used for generating a plurality of different training samples, and the training samples comprise a reference group and a standard video cover; the reference group is a historical video cover set formed by selecting at least one historical video cover from each category, and the standard video cover is one of other historical video covers except the reference group;

the training unit is used for inputting the training samples into the classification model for training, and the classification model is used for identifying the standard video cover and the historical video cover belonging to which category in the reference group belongs to the same category.

On the basis of the embodiment, the classification model comprises a feature extraction layer and a classification layer;

the training unit inputs the training samples to the classification model for training, and comprises the following steps:

inputting the training samples into the classification model, and extracting a feature map matrix of each to-be-determined video cover in the training samples based on the feature extraction layer; the pending video cover is one of the standard video cover of the training sample and all historical video covers in the reference set;

inputting the feature map matrix of the standard video cover of the training sample and the feature map matrix of the historical video cover corresponding to each category in the reference group into the classification layer, and identifying which category the standard video cover and the historical video cover belonging to in the reference group belong to the same category based on the classification layer.

On the basis of the above embodiment, the extracting, by the training unit, the feature map matrix of each to-be-determined video cover in the training sample based on the feature extraction layer includes:

extracting an initial feature map of the cover of the to-be-determined video, wherein the initial feature map has D depths;

pooling the features of each depth in the initial feature map respectively to generate a first attention vector of a D depth, and generating an intermediate feature map according to the initial feature map and the first attention vector;

pooling is conducted on each plane feature on the depth dimension in the initial feature map, a second attention vector is generated, and a feature map matrix of the cover of the undetermined video is generated according to the intermediate feature map and the second attention vector.

On the basis of the above embodiment, the inputting, by the training unit, the feature map matrix of the standard video cover of the training sample and the feature map matrix of the historical video cover corresponding to each category in the reference group to the classification layer includes:

respectively combining the feature map matrix of the standard video cover with the feature map matrix of the historical video cover corresponding to each category in the reference group to generate N combined features, wherein N is the number of the categories of the historical video covers in the reference group;

taking each combination characteristic as a vertex to form a complete undirected graph;

inputting the complete undirected graph to the classification layer, the classification layer for determining a probability that the standard video cover in each vertex belongs to the same category as the corresponding historical video cover in the reference set.

On the basis of the above embodiment, in the complete directed graph, an edge between any two vertexes is provided with a weight, and the weight is a distance between feature graph matrices of corresponding two types of historical video covers in the reference group.

In addition, an embodiment of the present invention further provides an electronic device, which includes a bus, a transceiver, a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the transceiver, the memory, and the processor are connected via the bus, and when being executed by the processor, the computer program implements each process of the above-mentioned method for recommending a video cover, and can achieve the same technical effect, and is not described herein again to avoid repetition.

Specifically, referring to fig. 4, an embodiment of the present invention further provides an electronic device, which includes a bus 1110, a processor 1120, a transceiver 1130, a bus interface 1140, a memory 1150, and a user interface 1160.

In an embodiment of the present invention, the electronic device further includes: a computer program stored on the memory 1150 and executable on the processor 1120, the computer program when executed by the processor 1120 performs the processes of the above-described method embodiment of recommending a video cover.

A transceiver 1130 for receiving and transmitting data under the control of the processor 1120.

In embodiments of the invention in which a bus architecture (represented by bus 1110) is used, bus 1110 may include any number of interconnected buses and bridges, with bus 1110 connecting various circuits including one or more processors, represented by processor 1120, and memory, represented by memory 1150.

Bus 1110 represents one or more of any of several types of bus structures, including a memory bus, and memory controller, a peripheral bus, an Accelerated Graphics Port (AGP), a processor, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include: an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA), a Peripheral Component Interconnect (PCI) bus.

Processor 1120 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits in hardware or instructions in software in a processor. The processor described above includes: general purpose processors, Central Processing Units (CPUs), Network Processors (NPs), Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Complex Programmable Logic Devices (CPLDs), Programmable Logic Arrays (PLAs), Micro Control Units (MCUs) or other Programmable Logic devices, discrete gates, transistor Logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in embodiments of the present invention may be implemented or performed. For example, the processor may be a single core processor or a multi-core processor, which may be integrated on a single chip or located on multiple different chips.

Processor 1120 may be a microprocessor or any conventional processor. The steps of the method disclosed in connection with the embodiments of the present invention may be directly performed by a hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software modules may be located in a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), a register, and other readable storage media known in the art. The readable storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The bus 1110 may also connect various other circuits such as peripherals, voltage regulators, or power management circuits to provide an interface between the bus 1110 and the transceiver 1130, as is well known in the art. Therefore, the embodiments of the present invention will not be further described.

The transceiver 1130 may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. For example: the transceiver 1130 receives external data from other devices, and the transceiver 1130 transmits data processed by the processor 1120 to other devices. Depending on the nature of the computer system, a user interface 1160 may also be provided, such as: touch screen, physical keyboard, display, mouse, speaker, microphone, trackball, joystick, stylus.

It is to be appreciated that in embodiments of the invention, the memory 1150 may further include memory located remotely with respect to the processor 1120, which may be coupled to a server via a network. One or more portions of the above-described networks may be an ad hoc network (ad hoc network), an intranet (intranet), an extranet (extranet), a Virtual Private Network (VPN), a Local Area Network (LAN), a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a Wireless Wide Area Network (WWAN), a Metropolitan Area Network (MAN), the Internet (Internet), a Public Switched Telephone Network (PSTN), a plain old telephone service network (POTS), a cellular telephone network, a wireless fidelity (Wi-Fi) network, and combinations of two or more of the above. For example, the cellular telephone network and the wireless network may be a global system for Mobile Communications (GSM) system, a Code Division Multiple Access (CDMA) system, a Worldwide Interoperability for Microwave Access (WiMAX) system, a General Packet Radio Service (GPRS) system, a Wideband Code Division Multiple Access (WCDMA) system, a Long Term Evolution (LTE) system, an LTE Frequency Division Duplex (FDD) system, an LTE Time Division Duplex (TDD) system, a long term evolution-advanced (LTE-a) system, a Universal Mobile Telecommunications (UMTS) system, an enhanced Mobile Broadband (eMBB) system, a mass Machine Type Communication (mtc) system, an Ultra Reliable Low Latency Communication (urrllc) system, or the like.

It is to be understood that the memory 1150 in embodiments of the present invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. Wherein the nonvolatile memory includes: Read-Only Memory (ROM), Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or Flash Memory.

The volatile memory includes: random Access Memory (RAM), which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as: static random access memory (Static RAM, SRAM), Dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), Enhanced Synchronous DRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DRRAM). The memory 1150 of the electronic device described in the embodiments of the invention includes, but is not limited to, the above and any other suitable types of memory.

In an embodiment of the present invention, memory 1150 stores the following elements of operating system 1151 and application programs 1152: an executable module, a data structure, or a subset thereof, or an expanded set thereof.

Specifically, the operating system 1151 includes various system programs such as: a framework layer, a core library layer, a driver layer, etc. for implementing various basic services and processing hardware-based tasks. Applications 1152 include various applications such as: media Player (Media Player), Browser (Browser), for implementing various application services. A program implementing a method of an embodiment of the invention may be included in application program 1152. The application programs 1152 include: applets, objects, components, logic, data structures, and other computer system executable instructions that perform particular tasks or implement particular abstract data types.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the processes of the above-mentioned method for recommending video covers, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here.

The computer-readable storage medium includes: permanent and non-permanent, removable and non-removable media may be tangible devices that retain and store instructions for use by an instruction execution apparatus. The computer-readable storage medium includes: electronic memory devices, magnetic memory devices, optical memory devices, electromagnetic memory devices, semiconductor memory devices, and any suitable combination of the foregoing. The computer-readable storage medium includes: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), non-volatile random access memory (NVRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape cartridge storage, magnetic tape disk storage or other magnetic storage devices, memory sticks, mechanically encoded devices (e.g., punched cards or raised structures in a groove having instructions recorded thereon), or any other non-transmission medium useful for storing information that may be accessed by a computing device. As defined in embodiments of the present invention, the computer-readable storage medium does not include transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses traveling through a fiber optic cable), or electrical signals transmitted through a wire.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electrical, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to solve the problem to be solved by the embodiment of the invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be substantially or partially contributed by the prior art, or all or part of the technical solutions may be embodied in a software product stored in a storage medium and including instructions for causing a computer device (including a personal computer, a server, a data center, or other network devices) to execute all or part of the steps of the methods of the embodiments of the present invention. And the storage medium includes various media that can store the program code as listed in the foregoing.

In the description of the embodiments of the present invention, it should be apparent to those skilled in the art that the embodiments of the present invention can be embodied as methods, apparatuses, electronic devices, and computer-readable storage media. Thus, embodiments of the invention may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), a combination of hardware and software. Furthermore, in some embodiments, embodiments of the invention may also be embodied in the form of a computer program product in one or more computer-readable storage media having computer program code embodied in the medium.

The computer-readable storage media described above may take any combination of one or more computer-readable storage media. The computer-readable storage medium includes: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer-readable storage medium include: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only Memory (ROM), an erasable programmable read-only Memory (EPROM), a Flash Memory, an optical fiber, a compact disc read-only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any combination thereof. In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, device, or apparatus.

The computer program code embodied on the computer readable storage medium may be transmitted using any appropriate medium, including: wireless, wire, fiber optic cable, Radio Frequency (RF), or any suitable combination thereof.

Computer program code for carrying out operations for embodiments of the present invention may be written in assembly instructions, Instruction Set Architecture (ISA) instructions, machine related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or in one or more programming languages, including an object oriented programming language, such as: java, Smalltalk, C + +, and also include conventional procedural programming languages, such as: c or a similar programming language. The computer program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be over any of a variety of networks, including: a Local Area Network (LAN) or a Wide Area Network (WAN), which may be connected to the user's computer, may be connected to an external computer.

The method, the device and the electronic equipment are described through the flow chart and/or the block diagram.

It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions. These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing apparatus to function in a particular manner. Thus, the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The above description is only a specific implementation of the embodiments of the present invention, but the scope of the embodiments of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present invention, and all such changes or substitutions should be covered by the scope of the embodiments of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of recommending video covers, comprising:

2. The method of claim 1, wherein training a predetermined classification model based on a plurality of historical video envelopes comprises:

generating a plurality of different training samples, the training samples including a reference set and a standard video cover; the reference group is a historical video cover set formed by selecting at least one historical video cover from each category, and the standard video cover is one of other historical video covers except the reference group;

inputting the training samples into the classification model for training, wherein the classification model is used for identifying the standard video cover and the historical video cover belonging to which category in the reference group belongs to the same category.

3. The method of claim 2, wherein the classification model comprises a feature extraction layer and a classification layer;

inputting the training samples into the classification model for training, wherein the training comprises:

4. The method of claim 3, wherein the extracting a feature map matrix for each cover of the video to be determined in the training sample based on the feature extraction layer comprises:

5. The method of claim 3, wherein the inputting the feature map matrix of the standard video cover of the training sample and the feature map matrix of the historical video cover corresponding to each category in the reference group to the classification layer comprises:

6. The method of claim 5, wherein in the complete directed graph, an edge between any two vertices is provided with a weight, and the weight is a distance between feature map matrices of corresponding two categories of historical video covers in the reference group.

7. An apparatus for recommending video covers, comprising:

8. The apparatus of claim 7, wherein the training module comprises a sample generation unit and a training unit;

9. An electronic device comprising a bus, a transceiver, a memory, a processor and a computer program stored on the memory and executable on the processor, the transceiver, the memory and the processor being connected via the bus, characterized in that the computer program, when executed by the processor, implements the steps of the method of recommending video covers according to any of claims 1 to 6.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of recommending video covers according to any of claims 1 to 6.