CN108833938B

CN108833938B - Method and apparatus for selecting video covers

Info

Publication number: CN108833938B
Application number: CN201810635762.7A
Authority: CN
Inventors: 王进波
Original assignee: Nanjing Shangwang Network Technology Co ltd
Current assignee: Nanjing Shangwang Network Technology Co.,Ltd.
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2021-05-28
Anticipated expiration: 2038-06-20
Also published as: CN108833938A

Abstract

The embodiment of the application discloses a method and equipment for selecting a video cover. One embodiment of a method for selecting a video cover includes: acquiring an original video stream; extracting a key frame set from an original video stream; determining whether a key frame meeting a preset condition exists in the key frame set; and in response to determining that the key frames meeting the preset condition exist in the key frame set, selecting the key frames from the key frames meeting the preset condition as video covers. According to the embodiment, the video cover with high picture quality can be selected, so that the user experience is promoted, and the video click rate is improved.

Description

Method and apparatus for selecting video covers

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and equipment for selecting video covers.

Background

With the continuous development of science and technology and society, a great deal of various videos emerge to greatly enrich the spiritual civilized life of people. In order to make the user acquire the videos more quickly and accurately or increase the click rate of the user, a corresponding video cover is generally required to be set for each video.

Currently, the commonly used video cover selection methods mainly include two methods. First, the first frame of the video is selected as a video cover. However, the front part of the video frame of the partial video may adopt a full white frame or a full black frame, which may result in selecting the full white frame or the full black frame as a video cover. And secondly, randomly selecting a frame of video frame at a certain moment as a video cover. However, the quality of randomly selected video frames is not uniform, and a more blurred video frame may be selected as a video cover.

Disclosure of Invention

The embodiment of the application provides a method and equipment for selecting a video cover.

In a first aspect, an embodiment of the present application provides a method for selecting a video cover, including: acquiring an original video stream; extracting a key frame set from an original video stream; determining whether a key frame meeting a preset condition exists in the key frame set; and in response to determining that the key frames meeting the preset condition exist in the key frame set, selecting the key frames from the key frames meeting the preset condition as video covers.

In some embodiments, the preset condition includes at least one of: the variance of the pixel values of the pixels of the image is not less than a preset variance threshold, the pixel values of the pixels of the image meet preset distribution, and the mean value of the pixel values of the pixels of the image in the frequency domain is not less than a preset mean threshold.

In some embodiments, the preset condition includes that a variance of pixel values of pixel points of the image is not less than a preset variance threshold; and determining whether a key frame meeting a preset condition exists in the key frame set, including: calculating the variance of pixel values of pixel points of key frames in the key frame set; comparing the variance with a preset variance threshold value to obtain a first comparison result; and determining whether the key frames meeting the preset condition exist in the key frame set or not based on the first comparison result.

In some embodiments, the preset condition includes that pixel values of pixel points of the image satisfy a preset distribution; and determining whether a key frame meeting a preset condition exists in the key frame set, including: generating a pixel value distribution histogram of pixel points of a key frame in a key frame set as a first pixel value distribution histogram; performing statistical analysis on the first pixel value distribution histogram to obtain a first pixel value distribution condition; and determining whether key frames meeting preset conditions exist in the key frame set or not based on the first pixel value distribution condition.

In some embodiments, the preset condition includes that a mean value of pixel values of pixel points of the image in the frequency domain is not less than a preset mean value threshold; and determining whether a key frame meeting a preset condition exists in the key frame set, including: transforming the key frames in the key frame set from a space domain to a frequency domain by utilizing discrete cosine transform to obtain a key frame set of the frequency domain; calculating the mean value of pixel values of pixel points of key frames in the key frame set of the frequency domain as a first mean value; comparing the first average value with a preset average value threshold value to obtain a second comparison result; and determining whether the key frames meeting the preset condition exist in the key frame set or not based on the second comparison result.

In some embodiments, the preset condition further includes that pixel values of pixel points of the image satisfy a preset distribution; and determining whether a key frame meeting a preset condition exists in the key frame set based on the first comparison result, wherein the determining comprises: if the first comparison result indicates that at least one key frame with the variance not less than the preset variance threshold exists in the key frame set, generating a first candidate video cover set based on the key frames with the variance not less than the preset variance threshold; generating a pixel value distribution histogram of pixel points of a first candidate video cover in a first candidate video cover set as a second pixel value distribution histogram; performing statistical analysis on the second pixel value distribution histogram to obtain a second pixel value distribution condition; and determining whether a first candidate video cover meeting a preset condition exists in the first candidate video cover set or not based on the second pixel value distribution condition.

In some embodiments, the preset condition further includes that a mean value of pixel values of pixel points of the image in the frequency domain is not less than a preset mean value threshold; and determining whether a first candidate video cover meeting a preset condition exists in the first candidate video cover set based on the second pixel value distribution condition, wherein the determining comprises the following steps: if the second pixel value distribution condition indicates that at least one frame of first candidate video covers meeting the preset distribution exists in the first candidate video cover set, generating a second candidate video cover set based on the at least one frame of first candidate video covers meeting the preset distribution; transforming a second candidate video cover in the second candidate video cover set from a space domain to a frequency domain by using discrete cosine transform to obtain a second candidate video cover set of the frequency domain; calculating the mean value of pixel values of pixels of a second candidate video cover in a second candidate video cover set of the frequency domain as a second mean value; comparing the second average value with a preset average value threshold value to obtain a third comparison result; and determining whether a second candidate video cover meeting a preset condition exists in the second candidate video cover set or not based on the third comparison result.

In some embodiments, determining whether there is a second candidate video cover satisfying a preset condition in the second candidate video cover set based on the third comparison result comprises: if the third comparison result indicates that at least one frame of second candidate video covers with the second mean value not smaller than the preset mean value threshold exist in the second candidate video cover set, determining that second candidate video covers meeting the preset condition exist in the second candidate video cover set, wherein the second candidate video covers meeting the preset condition comprise second candidate video covers with the second mean value not smaller than the preset mean value threshold; and if the third comparison result indicates that no second candidate video covers with the second mean value not smaller than the preset mean value threshold exist in the second candidate video cover set, determining that no second candidate video covers meeting the preset condition exist in the second candidate video cover set.

In some embodiments, before calculating the variance of the pixel values of the pixel points of the keyframes in the set of keyframes, the method further comprises: converting the RGB value of the pixel point of the key frame in the key frame set into YUV value; and taking the Y value in the YUV values of the pixel points of the key frames in the key frame set as the pixel value of the pixel point of the key frame in the key frame set.

In some embodiments, obtaining an original video stream comprises: acquiring a video file; decapsulating the video file to obtain a video compressed stream; and decompressing the video compression stream to obtain an original video stream.

In some embodiments, the method further comprises: in response to determining that no key frame meeting a preset condition exists in the key frame set, determining whether a video frame meeting the preset condition exists in the original video stream; and in response to determining that the video frames meeting the preset condition exist in the original video stream, selecting the video frames from the video frames meeting the preset condition as video covers.

In a second aspect, an embodiment of the present application provides an apparatus for selecting a video cover, including: an acquisition unit configured to acquire an original video stream; an extraction unit configured to extract a set of key frames from an original video stream; a first determining unit configured to determine whether there is a key frame satisfying a preset condition in the key frame set; the first selecting unit is configured to respond to the fact that key frames meeting the preset conditions exist in the key frame set, and select the key frames from the key frames meeting the preset conditions to serve as video covers.

In some embodiments, the preset condition includes that a variance of pixel values of pixel points of the image is not less than a preset variance threshold; and the first determination unit includes: a first calculation subunit configured to calculate a variance of pixel values of pixel points of a key frame in the set of key frames; the first comparison subunit is configured to compare the variance with a preset variance threshold value to obtain a first comparison result; and the first determining subunit is configured to determine whether a key frame meeting a preset condition exists in the key frame set or not based on the first comparison result.

In some embodiments, the preset condition includes that pixel values of pixel points of the image satisfy a preset distribution; and the first determination unit includes: a generating subunit configured to generate a pixel value distribution histogram of pixel points of a key frame in the key frame set as a first pixel value distribution histogram; the analysis subunit is configured to perform statistical analysis on the first pixel value distribution histogram to obtain a first pixel value distribution condition; and the second determining subunit is configured to determine whether a key frame meeting a preset condition exists in the key frame set or not based on the first pixel value distribution condition.

In some embodiments, the preset condition includes that a mean value of pixel values of pixel points of the image in the frequency domain is not less than a preset mean value threshold; and the first determination unit includes: a transformation subunit configured to transform the key frames in the key frame set from a spatial domain to a frequency domain by using discrete cosine transform, resulting in a key frame set of the frequency domain; a second calculation subunit configured to calculate, as a first mean value, a mean value of pixel values of pixel points of the keyframes in the set of keyframes of the frequency domain; the second comparison subunit is configured to compare the first average value with a preset average value threshold value to obtain a second comparison result; and the third determining subunit is configured to determine whether the key frames meeting the preset condition exist in the key frame set or not based on the second comparison result.

In some embodiments, the preset condition further includes that pixel values of pixel points of the image satisfy a preset distribution; and the first determining subunit includes: a first generating module configured to generate a first candidate video cover set based on the key frames of which the at least one frame variance is not less than a preset variance threshold if the first comparison result indicates that at least one key frame of which the at least one frame variance is not less than the preset variance threshold exists in the key frame set; a second generation module configured to generate a pixel value distribution histogram of pixels of a first candidate video cover in the first set of candidate video covers as a second pixel value distribution histogram; the analysis module is configured to perform statistical analysis on the second pixel value distribution histogram to obtain a second pixel value distribution condition; and the determining module is configured to determine whether a first candidate video cover meeting a preset condition exists in the first candidate video cover set or not based on the second pixel value distribution condition.

In some embodiments, the preset condition further includes that a mean value of pixel values of pixel points of the image in the frequency domain is not less than a preset mean value threshold; and the determining module comprises: a generation submodule configured to generate a second candidate video cover set based on at least one frame of the first candidate video cover satisfying a preset distribution if the second pixel value distribution indicates that at least one frame of the first candidate video cover satisfying the preset distribution exists in the first candidate video cover set; a transform submodule configured to transform a second candidate video cover in the second candidate set of video covers from the spatial domain to the frequency domain using discrete cosine transform, resulting in a second candidate set of video covers for the frequency domain; a calculation submodule configured to calculate an average of pixel values of pixels of a second candidate video cover in a second candidate video cover set of the frequency domain as a second average; the comparison submodule is configured to compare the second average value with a preset average value threshold value to obtain a third comparison result; a determination sub-module configured to determine whether there is a second candidate video cover in the second candidate video cover set that satisfies a preset condition based on the third comparison result.

In some embodiments, the determination submodule is further configured to: if the third comparison result indicates that at least one frame of second candidate video covers with the second mean value not smaller than the preset mean value threshold exist in the second candidate video cover set, determining that second candidate video covers meeting the preset condition exist in the second candidate video cover set, wherein the second candidate video covers meeting the preset condition comprise second candidate video covers with the second mean value not smaller than the preset mean value threshold; and if the third comparison result indicates that no second candidate video covers with the second mean value not smaller than the preset mean value threshold exist in the second candidate video cover set, determining that no second candidate video covers meeting the preset condition exist in the second candidate video cover set.

In some embodiments, the first determination unit further comprises: a conversion subunit configured to convert RGB values of pixel points of the keyframes in the set of keyframes into YUV values; and taking the Y value in the YUV values of the pixel points of the key frames in the key frame set as the pixel value of the pixel point of the key frame in the key frame set.

In some embodiments, the obtaining unit comprises: an acquisition subunit configured to acquire a video file; the decapsulation subunit is configured to decapsulate the video file to obtain a video compressed stream; and the decompression sub-unit is configured to decompress the video compressed stream to obtain an original video stream.

In some embodiments, wherein the means for selecting a video cover further comprises: a second determining unit configured to determine whether a video frame satisfying a preset condition exists in the original video stream in response to determining that no key frame satisfying the preset condition exists in the key frame set; and the second selecting unit is configured to select the video frames from the video frames meeting the preset conditions as the video cover in response to the fact that the video frames meeting the preset conditions exist in the original video stream. In a third aspect, an embodiment of the present application provides a network device, where the network device includes: one or more processors; a storage device on which one or more programs are stored; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the method and the device for selecting the video cover, a key frame set is extracted from an original video stream; then analyzing the key frame set to determine whether key frames meeting preset conditions exist; and finally, under the condition that the key frames meeting the preset conditions exist, selecting the video cover from the key frames meeting the preset conditions. The key frames are coded in an intra-frame coding mode, distortion is little, picture quality is relatively high, a video cover is selected from the key frames meeting preset conditions, the video cover with high picture quality can be selected, user experience is improved, and then video click rate is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for selecting video covers in accordance with the present application;

FIG. 3 is a flow diagram of yet another embodiment of a method for selecting video covers in accordance with the present application;

FIG. 4 is a schematic diagram of an application scenario of the method for selecting video covers provided in FIG. 3;

FIG. 5 is a flow diagram of another embodiment of a method for selecting video covers in accordance with the present application;

FIG. 6 is a block diagram of a computer system suitable for use in implementing a network device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which the method for selecting video covers of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include a video storage device 101, a network 102, and a network device 103. Network 102 is the medium used to provide a communication link between video storage device 101 and network device 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The video storage device 101 may interact with a network device 103 over a network 102 to receive or transmit video and the like. The video storage device 101 may be an electronic device that stores a large amount of video, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so forth.

Network device 103 may be a hardware device or software that supports network connectivity to provide various network services. When the network device is hardware, it may be various electronic devices that support video cover selection functionality, including but not limited to smartphones, tablets, laptop portable computers, desktop computers, servers, and the like. In this case, the hardware device may be implemented as a distributed network device group including a plurality of network devices, or may be implemented as a single network device. When the network device is software, the network device may be installed in the electronic devices listed above. At this time, as software, it may be implemented as a plurality of software or software modules for providing a distributed service, for example, or as a single software or software module. And is not particularly limited herein.

Network device 103 may provide various services. For example, the network device 103 may perform processing such as analysis on a video acquired from the video storage device 101, and generate a processing result (e.g., a video cover).

It should be noted that the method for selecting a video cover provided in the embodiment of the present application may be executed by the network device 103.

It should be understood that the number of video storage devices, networks, and network devices in fig. 1 is merely illustrative. There may be any number of video storage devices, networks, and network devices, as desired for an implementation. In the case where video is stored in the network device 103, the system architecture 100 may not provide the video storage device 101.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for selecting a video cover in accordance with the present application is shown. The method for selecting the video cover comprises the following steps:

step 201, an original video stream is obtained.

In this embodiment, the execution subject of the method for selecting a video cover (e.g., the network device 103 shown in fig. 1) may obtain the original video stream. Where the original video stream may be an uncompressed, unpacked video stream.

In some alternative implementations of the present embodiment, a video storage device (e.g., video storage device 101 shown in fig. 1) may store a large number of raw video streams therein. In this way, the execution main body can acquire the original video stream from the video storage device through a wired connection mode or a wireless connection mode.

In some alternative implementations of the present embodiment, a large number of video files may be stored in the video storage device. In this way, the execution subject may first acquire a video file from the video storage device; then, decapsulating the video file to obtain a video compressed stream; and finally, decompressing the video compression stream to obtain the original video stream. The video file may be generated by compressing and packaging an original video stream. Currently, common video compression coding standards may include, but are not limited to, H.264, H.265, MPEG-2, MPEG-4, and the like. Common video packaging standards may include, but are not limited to, MP4, AVI, FLV, MOV, and the like.

In some optional implementations of the present embodiment, the execution body may store a large number of original video streams or video files. Thus, the execution subject may obtain the original video stream locally, or obtain the video file locally, decapsulate and decompress the video file, and obtain the original video stream.

Step 202, extracting a key frame set from the original video stream.

In this embodiment, the execution subject may extract the key frame set from the original video stream. Wherein the original video stream may comprise at least one frame of video. A frame of video is a picture in the original video stream. Video frames may include I-frames, P-frames, and B-frames. An I-frame, i.e., an intra-coded frame, is an independent frame with all information, and can be independently decoded without referring to other video frames. A P frame, i.e., a forward predictive coded frame, is predicted from a P frame or an I frame preceding it, and the data of the frame is compressed according to the difference between the frame and the adjacent previous frame (I frame or P frame). The B frame, which is a bidirectional predictive interpolation-coded frame, compresses data of a frame according to a difference between adjacent previous, current, and next frame data.

It should be noted that the original video stream is usually organized in sequence units, and a sequence is a segment of the original video stream, starting with an I frame and ending with the next I frame. Where the first frame of a sequence of video frames is called an IDR frame, which belongs to an I frame. The key frames in this embodiment are generally referred to as IDR frames, and the set of key frames extracted from the original video stream may be a set of IDR frames composed of at least one IDR frame.

Step 203, determining whether a key frame meeting a preset condition exists in the key frame set.

In this embodiment, the execution body may analyze each frame of the key frame set one by one, and determine whether each frame of the key frame set satisfies a preset condition. And if at least one frame of key frames in the key frame set meets the preset condition, determining that the key frames meeting the preset condition exist in the key frame set. In this case, step 204 is continued. And if all the key frames in the key frame set do not meet the preset condition, determining that no key frame meeting the preset condition exists in the key frame set. The preset condition may be various conditions set in advance. For example, the preset condition may include, but is not limited to, at least one of: the variance of the pixel values of the pixels of the image is not less than a preset variance threshold, the pixel values of the pixels of the image satisfy a preset distribution, the mean value of the pixel values of the pixels of the image in the frequency domain is not less than a preset mean threshold, and the like.

In some optional implementations of the present embodiment, the preset condition may include that a variance of pixel values of pixel points of the image is not less than a preset variance threshold. At this time, the execution body may first calculate a variance of pixel values of pixel points of the key frame in the key frame set; then comparing the variance with a preset variance threshold value to obtain a first comparison result; and finally, determining whether the key frames meeting the preset conditions exist in the key frame set or not based on the first comparison result. In practice, if the variance of the pixel values of the pixels of the image is equal to zero, it indicates that only one color exists in the image, and the image is a full white frame or a full black frame. If the variance of the pixel values of the pixels of the image is small, it indicates that there are few color types in the image, and the picture content is usually not rich. If the variance of the pixel values of the pixels of the image is large, it indicates that there may be more color types in the image, and the picture content is usually rich. For example, for each frame of the key frame set, the executing entity may calculate a variance of pixel values of pixel points of the key frame, and compare the variance of pixel values of pixel points of the key frame with a preset variance threshold. And if the variance of the pixel values of the pixel points of the key frame is not less than a preset variance threshold, determining that the key frame meeting preset conditions exists in the key frame set. Wherein the key frame belongs to key frames meeting preset conditions. And if the variance of the pixel values of the pixel points of all the key frames in the key frame set is smaller than a preset variance threshold, determining that no key frame meeting preset conditions exists in the key frame set. The preset variance threshold value can be set according to the requirement. Generally, the larger the preset variance threshold is, the fewer the number of the selected key frames meeting the preset condition is, and the higher the selection precision is; the smaller the preset variance threshold is, the more the number of the selected key frames meeting the preset condition is, and the lower the selection precision is.

It should be noted that, the executing body may directly use the key frame whose variance of the pixel values of the pixels is not less than the preset variance threshold as the key frame satisfying the preset condition, or may further analyze the key frame whose variance of the pixel values is not less than the preset variance threshold, and select the key frame satisfying other preset conditions (for example, the pixel values of the pixels of the image satisfy the preset distribution, and the mean value of the pixel values of the pixels of the image in the frequency domain is not less than the preset mean threshold) as the key frame satisfying the preset condition. In some optional implementation manners of this embodiment, the preset condition may include that pixel values of pixel points of the image satisfy a preset distribution. At this time, the execution body may first generate a pixel value distribution histogram of a pixel point of a key frame in the key frame set as a first pixel value distribution histogram; then, carrying out statistical analysis on the first pixel value distribution histogram to obtain a first pixel value distribution condition; and finally, determining whether the key frames meeting the preset condition exist in the key frame set or not based on the first pixel value distribution condition. In practice, if the more groups and the less frequency in the pixel value distribution histogram of the pixel points of the image, it indicates that the more color types exist in the image, and the picture content is usually richer. If the number of groups in the pixel value distribution histogram of the pixel point of the image is smaller and the frequency is higher, it indicates that the color types existing in the image are smaller and the picture content is usually not rich. For example, for each frame of the key frame in the key frame set, the execution main body may generate a pixel value distribution histogram of a pixel point of the key frame, count a group number and a frequency in the pixel value distribution histogram of the pixel point of the key frame, and determine that a key frame meeting a preset condition exists in the key frame set if the group number in the pixel value distribution histogram of the pixel point of the key frame is not less than a preset group number and the frequency is not greater than a preset frequency. Wherein the key frame belongs to key frames meeting preset conditions. And if the group number in the pixel value distribution histograms of the pixel points of all the key frames in the key frame set is smaller than the preset group number and the frequency is larger than the preset frequency, determining that no key frame meeting the preset condition exists in the key frame set. The preset group number and the preset frequency number can be set according to the requirement. Generally, the larger the preset group number is, the smaller the preset frequency is, the fewer the number of the selected key frames meeting the preset condition is, and the higher the selection precision is; the smaller the preset group number is, the larger the preset frequency number is, the more the number of the selected key frames meeting the preset conditions is, and the lower the selection precision is.

It should be noted that, the execution main body may directly use the keyframes with the pixel values of the pixels satisfying the preset distribution as the keyframes satisfying the preset conditions, or further analyze the keyframes with the pixel values of the pixels satisfying the preset distribution, and select the keyframes satisfying other preset conditions (for example, the variance of the pixel values of the pixels of the image is not less than the preset variance threshold, and the mean of the pixel values of the pixels of the image in the frequency domain is not less than the preset mean threshold) as the keyframes satisfying the preset conditions.

In some optional implementations of the present embodiment, the preset condition may include that an average of pixel values of pixel points of the image in the frequency domain is not less than a preset average threshold. At this time, the executing body may first Transform the key frames in the key frame set from the spatial domain to the frequency domain by using Discrete Cosine Transform (DCT), so as to obtain a key frame set of the frequency domain; then calculating the mean value of the pixel values of the pixel points of the key frames in the key frame set of the frequency domain as a first mean value; then comparing the first average value with a preset average value threshold value to obtain a second comparison result; and finally, determining whether the key frames meeting the preset conditions exist in the key frame set or not based on the second comparison result. In practice, if the mean value of the pixel values of the pixels of the image in the frequency domain is larger, the picture of the image is clearer. If the average value of the pixel values of the pixel points of the image in the frequency domain is smaller, the image of the image is more blurred. For example, for each frame of the key frame in the key frame set, the executing entity may transform the key frame from the spatial domain to the frequency domain by using discrete cosine transform to obtain the key frame in the frequency domain, calculate a mean value of pixel values of pixel points of the key frame in the frequency domain, and compare the mean value of pixel values of pixel points of the key frame in the frequency domain with a preset mean threshold. And if the mean value of the pixel values of the pixel points of the key frame in the frequency domain is not less than the preset mean value threshold, determining that the key frame meeting the preset condition exists in the key frame set. Wherein the key frame belongs to key frames meeting preset conditions. And if the mean value of the pixel values of the pixel points of all the key frames in the key frame set of the frequency domain is smaller than the preset mean value threshold, determining that no key frame meeting the preset condition exists in the key frame set. The preset average value threshold value can be set according to the requirement. Generally, the larger the preset average threshold is, the fewer the number of the selected key frames meeting the preset condition is, and the higher the selection precision is; the smaller the preset average threshold value is, the more the number of the selected key frames meeting the preset condition is, and the lower the selection precision is.

It should be noted that, the execution main body may directly use the key frame whose mean value of the pixel values of the pixels in the frequency domain is not less than the preset mean threshold as the key frame satisfying the preset condition, or may further analyze the key frame whose mean value of the pixel values of the pixels in the frequency domain is not less than the preset mean threshold, and select the key frame satisfying other preset conditions (for example, the variance of the pixel values of the pixels in the image is not less than the preset variance threshold, and the pixel values of the pixels in the image satisfy the preset distribution) as the key frame satisfying the preset condition.

And step 204, selecting key frames from the key frames meeting the preset conditions as video covers.

In this embodiment, in the case that there is a key frame satisfying the preset condition in the key frame set, the execution main body may select a key frame from the key frames satisfying the preset condition as a video cover. For example, a frame of key frames may be randomly selected from among key frames satisfying a preset condition as a video cover. The method for selecting the video cover, provided by the embodiment of the application, comprises the steps of firstly extracting a key frame set from an original video stream; then analyzing the key frame set to determine whether key frames meeting preset conditions exist; and finally, under the condition that the key frames meeting the preset conditions exist, selecting the video cover from the key frames meeting the preset conditions. The key frames are coded in an intra-frame coding mode, distortion is little, picture quality is relatively high, a video cover is selected from the key frames meeting preset conditions, the video cover with high picture quality can be selected, user experience is improved, and then video click rate is improved.

With further reference to FIG. 3, a flow 300 of yet another embodiment of a method for selecting video covers in accordance with the present application is shown. The method for selecting the video cover comprises the following steps:

step 301, an original video stream is obtained.

Step 302, extracting a key frame set from the original video stream.

In the present embodiment, the specific operations of step 301-.

Step 303, calculating the variance of the pixel values of the pixel points of the key frames in the key frame set.

In this embodiment, for each frame of the key frame set, the execution subject (e.g., the network device 103 shown in fig. 1) of the method for selecting a video cover may calculate the variance of the pixel values of the pixels of the key frame.

In practice, if the variance of the pixel values of the pixels of the key frame is equal to zero, it indicates that only one color exists in the key frame, and the key frame is a full white frame or a full black frame. If the variance of the pixel values of the pixels of the key frame is small, it indicates that there are few color types in the key frame, and the picture content is usually not rich. If the variance of the pixel values of the pixels of the key frame is large, it indicates that there may be more color types in the key frame, and the picture content is usually rich.

In some optional implementation manners of this embodiment, the execution main body may first convert RGB values of pixel points of the key frame in the key frame set into YUV values; then taking the Y value in the YUV values of the pixel points of the key frames in the key frame set as the pixel value of the pixel points of the key frames in the key frame set; step 303 is then performed. The RGB color scheme is a color standard in the industry, and various colors are obtained by changing three color channels of red (R), green (G), and blue (B) and superimposing the three color channels on each other, where RGB represents three color channels of red, green, and blue. YUV is a color coding method adopted by european television systems, "Y" represents brightness (Luma), i.e., a gray scale value; "U" and "V" denote Chroma (Chroma) which describes the color and saturation of an image and is used to specify the color of a pixel.

And step 304, comparing the variance with a preset variance threshold value to obtain a first comparison result.

In this embodiment, the executing body may compare the variance of the pixel values of the pixels of each frame of the key frame set with a preset variance threshold one by one, so as to obtain a first comparison result. The first comparison result may include a key frame whose variance is not less than a preset variance threshold and/or a key frame whose variance is less than a preset variance threshold. The preset variance threshold value can be set according to the requirement. Generally, the larger the preset variance threshold is, the fewer the number of key frames of which the variance in the first comparison result is not greater than the preset variance threshold is; the smaller the preset variance threshold, the greater the number of keyframes for which the variance in the first comparison result is not greater than the preset variance threshold.

In step 305, if the first comparison result indicates that at least one key frame with the variance not less than the preset variance threshold exists in the key frame set, a first candidate video cover set is generated based on the key frames with the variance not less than the preset variance threshold.

In this embodiment, in the case that the first comparison result indicates that there is at least one key frame with a frame variance not less than the preset variance threshold in the key frame set, the execution subject may use the key frame with a frame variance not less than the preset variance threshold as the at least one frame of the first candidate video cover to generate the first candidate video cover set. The first candidate video covers in the first candidate video cover set generally have more color types and the picture content is generally richer.

Step 306, a pixel value distribution histogram of the pixel points of the first candidate video cover in the first candidate video cover set is generated as a second pixel value distribution histogram.

In this embodiment, for each frame of the first candidate video cover in the first candidate video cover set, the execution body may generate a pixel value distribution histogram of pixels of the first candidate video cover as the second pixel value distribution histogram. The pixel value distribution histogram can represent the pixel value interval of the corresponding group through the width of a rectangle, and the height of the rectangle represents the pixel point frequency of the corresponding group. Thus, the pixel value histogram not only can clearly display the frequency distribution of each group, but also can easily display the frequency difference among the groups.

In practice, if the number of groups in the second pixel value distribution histogram corresponding to the first candidate video cover is larger and the frequency is smaller, it means that the color types existing in the first candidate video cover are larger and the picture content is usually richer. If the number of groups in the second pixel value distribution histogram corresponding to the first candidate video cover is smaller and the number of frequencies is larger, the situation shows that the number of color types existing in the first candidate video cover is smaller, and the picture content is usually not rich.

Step 307, performing statistical analysis on the second pixel value distribution histogram to obtain a second pixel value distribution condition.

In this embodiment, for each frame of the first candidate video cover in the first candidate video cover set, the executing entity may perform statistical analysis on the second pixel value distribution histogram corresponding to the first candidate video cover, so as to obtain the pixel value distribution of the first candidate video cover as the second pixel value distribution corresponding to the first candidate video cover. For example, the pixel value distribution case may include the number of groups and the frequency of the pixel value distribution histogram. In this way, the execution main body may count the number of groups and the frequency count in the second pixel value distribution histogram corresponding to the first candidate video cover, and if the number of groups in the second pixel value distribution histogram corresponding to the first candidate video cover is not less than the preset number of groups and the frequency count is not greater than the preset frequency count, it is determined that the first candidate video cover satisfies the preset distribution.

In step 308, if the second pixel value distribution indicates that there is at least one frame of the first candidate video covers satisfying the predetermined distribution in the first candidate video cover set, a second candidate video cover set is generated based on the at least one frame of the first candidate video covers satisfying the predetermined distribution.

In this embodiment, in a case where the second pixel value distribution indicates that there is at least one frame of the first candidate video cover satisfying the preset distribution in the first candidate video cover set, the execution main body may regard the at least one frame of the first candidate video cover satisfying the preset distribution as the at least one frame of the second candidate video cover, and generate the second candidate video cover set. The second candidate video covers in the second candidate video cover set have more color types and rich picture content.

Step 309, transforming the second candidate video covers in the second candidate video cover set from the spatial domain to the frequency domain by using discrete cosine transform, so as to obtain a second candidate video cover set of the frequency domain.

In this embodiment, for each frame of the second candidate video cover in the second set of candidate video covers, the execution body may transform the second candidate video cover from the spatial domain to the frequency domain using discrete cosine transform.

It should be noted that, the executing entity may also transform the second candidate video cover from the spatial domain to the frequency domain by using other transform methods such as fourier transform, and the specific transform method is not limited herein.

Step 310, calculating an average value of pixel values of pixels of a second candidate video cover in a second candidate video cover set of the frequency domain as a second average value.

In this embodiment, for each frame of the second candidate video cover in the second candidate video cover set of the frequency domain, the executing entity may calculate a mean value of pixel values of pixels of the second candidate video cover of the frequency domain as a second mean value corresponding to the second candidate video cover.

In practice, the larger the second average corresponding to the second candidate video cover, the clearer the picture of the second candidate video cover is. If the second average corresponding to the second candidate video cover is smaller, the picture of the second candidate video cover is more blurred.

And 311, comparing the second average value with a preset average value threshold value to obtain a third comparison result.

In this embodiment, the execution subject may compare a second average value corresponding to each frame of the second candidate video cover in the second candidate video cover set of the frequency domain with a preset average value threshold, so as to obtain a third comparison result. The third comparison result may include a second candidate video cover whose second average value is not less than the preset average value threshold and/or a second candidate video cover whose second average value is less than the preset average value threshold. The preset average value threshold value can be set according to the requirement. Generally, the larger the preset mean threshold is, the smaller the number of second candidate video covers of which the second mean value in the third comparison result is not less than the preset mean threshold is; the smaller the preset mean threshold is, the greater the number of second candidate video covers in which the second mean value in the third comparison result is not less than the preset mean threshold is.

In step 312, it is determined whether there is a second candidate video cover meeting the preset condition in the second candidate video cover set based on the third comparison result.

In this embodiment, the execution body may determine whether there is a second candidate video cover satisfying a preset condition in the second candidate video cover set based on the third comparison result. In the case that there is a second candidate video cover satisfying the preset condition in the second candidate video cover set, the step 313 is continuously executed.

In some optional implementations of the embodiment, in a case that the third comparison result indicates that there is at least one frame of second candidate video cover in the second candidate video cover set, where the second mean value is not less than the preset mean value threshold, it is determined that there is a second candidate video cover in the second candidate video cover set, where the second candidate video cover meeting the preset condition includes a second candidate video cover whose second mean value is not less than the preset mean value threshold. And determining that the second candidate video covers meeting the preset condition do not exist in the second candidate video cover set under the condition that the third comparison result indicates that the second candidate video covers of which the second mean value is not smaller than the preset mean value threshold do not exist in the second candidate video cover set.

And 313, selecting a second candidate video cover from the second candidate video covers meeting the preset conditions as a video cover.

In this embodiment, in a case where there is a second candidate video cover satisfying the preset condition in the second candidate video cover set, the execution subject may select the second candidate video cover from the second candidate video covers satisfying the preset condition as the video cover. For example, a frame of the second candidate video cover may be randomly selected as the video cover from among the second candidate video covers satisfying the preset condition. The second candidate video cover meeting the preset conditions usually has rich picture contents and clear pictures.

With continued reference to fig. 4, fig. 4 is a schematic diagram of an application scenario of the method for selecting a video cover provided in fig. 3. In the application scenario of fig. 3, first, the network device obtains a locally stored documentary about pandas, and decapsulates and decompresses the documentary to obtain an original video stream of the documentary. And then, the terminal equipment extracts an IDR frame set in the original video stream, extracts an IDR frame of which the variance of the pixel values of at least one frame of pixel points is not less than a preset variance threshold from the IDR frame set, and generates a first candidate video cover set. And then, extracting the first candidate video cover with the pixel value distribution condition meeting the preset distribution from the first candidate video cover set to generate a second candidate video cover set. And then, selecting a second candidate video cover set of which the mean value of the pixel values of the pixel points in the frequency domain is not less than a preset mean value threshold from the second candidate video cover set. And finally, randomly selecting a frame of second candidate video cover from a second candidate video cover set which is not smaller than the preset mean value threshold value as a video cover, and adding the video cover to the documentary. At this time, as shown in 401, a documentary after adding a video cover can be displayed on the display screen of the network device.

As can be seen from fig. 3, compared with the embodiment shown in fig. 2, the flow 300 of the method for selecting a video cover in the present embodiment highlights the step of determining the keyframes in the set of keyframes that satisfy the predetermined condition. Therefore, the scheme described in the embodiment can select the video cover with rich picture content and clear picture.

With further reference to FIG. 5, a flow 500 of another embodiment of a method for selecting a video cover in accordance with the present application is shown. The method for selecting the video cover comprises the following steps:

step 501, obtaining an original video stream.

Step 502, extracting a key frame set from an original video stream.

Step 503, determining whether there is a key frame meeting a preset condition in the key frame set.

In the present embodiment, the specific operations of steps 501-503 are substantially the same as the operations of steps 201-203 in the embodiment shown in fig. 2, and are not repeated herein.

Step 504, determining whether a video frame meeting a preset condition exists in the original video stream.

In this embodiment, in the case that there is no key frame satisfying the preset condition in the key frame set, an executing body (for example, the network device 103 shown in fig. 1) of the method for selecting a video cover may analyze each frame of video frame in the original video stream one by one, and determine whether there is a video frame satisfying the preset condition in the original video stream. And if at least one frame of video frame in the original video stream meets the preset condition, determining that the video frame meeting the preset condition exists in the original video stream. In this case, step 505 is continued. And if all the video frames in the original video stream do not meet the preset condition, determining that no video frame meeting the preset condition exists in the original video stream. The preset condition may be various conditions set in advance. For example, the preset conditions may include, but are not limited to, at least one of: the variance of the pixel values of the pixels of the image is not less than a preset variance threshold, the pixel values of the pixels of the image satisfy a preset distribution, the mean value of the pixel values of the pixels of the image in the frequency domain is not less than a preset mean threshold, and the like.

It should be noted that the specific operation of determining whether a video frame meeting the preset condition exists in the original video stream is substantially the same as the specific operation of determining whether a key frame meeting the preset condition exists in the key frame set, and details are not repeated herein.

And 505, selecting a video frame from the video frames meeting the preset conditions as a video cover.

In this embodiment, in the case where there is a video frame satisfying a preset condition in the original video stream, the execution main body may select a video frame from the video frames satisfying the preset condition as a video cover. For example, a frame of video frame can be randomly selected from the video frames satisfying the preset condition as the video cover

As can be seen from fig. 5, compared to the embodiment corresponding to fig. 2, the process 500 of the method for selecting a video cover in this embodiment adds a step of selecting a video cover in a case that there is no key frame satisfying a preset condition in the key frame set. Therefore, in the solution described in this embodiment, a video cover with high picture quality can be selected even if there is no key frame satisfying the preset condition in the key frame set.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing a network device (e.g., network device 103 shown in FIG. 1) of an embodiment of the present application is shown. The network device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an extraction unit, a first determination unit, and a first selection unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, an acquisition unit may also be described as a "unit acquiring an original video stream".

As another aspect, the present application also provides a computer-readable medium, which may be included in the network device described in the above embodiments; or may exist separately without being assembled into the network device. The computer readable medium carries one or more programs which, when executed by the network device, cause the network device to: acquiring an original video stream; extracting a key frame set from an original video stream; determining whether a key frame meeting a preset condition exists in the key frame set; and in response to determining that the key frames meeting the preset condition exist in the key frame set, selecting the key frames from the key frames meeting the preset condition as video covers.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for selecting a video cover, comprising:

acquiring an original video stream;

extracting a key frame set from the original video stream, wherein the key frame set is an IDR frame set composed of at least one frame of IDR frame;

determining whether a key frame meeting a preset condition exists in the key frame set;

in response to the fact that the key frames meeting the preset condition exist in the key frame set, selecting the key frames from the key frames meeting the preset condition to serve as video covers;

the preset condition comprises that the mean value of pixel values of pixel points of the image in the frequency domain is not smaller than a preset mean value threshold value; and

the determining whether a key frame meeting a preset condition exists in the key frame set includes:

transforming the key frames in the key frame set from a space domain to a frequency domain by utilizing discrete cosine transform to obtain a key frame set of the frequency domain;

calculating the mean value of pixel values of pixel points of the key frames in the key frame set of the frequency domain as a first mean value;

comparing the first average value with the preset average value threshold value to obtain a second comparison result;

and determining whether the key frames meeting the preset condition exist in the key frame set or not based on the second comparison result.

2. The method of claim 1, wherein the preset condition further comprises at least one of: the variance of the pixel values of the pixels of the image is not less than a preset variance threshold, and the pixel values of the pixels of the image meet preset distribution.

3. The method according to claim 2, wherein the preset condition includes that a variance of pixel values of pixel points of the image is not less than a preset variance threshold; and

calculating the variance of the pixel values of the pixel points of the key frames in the key frame set;

comparing the variance with the preset variance threshold to obtain a first comparison result;

and determining whether key frames meeting the preset condition exist in the key frame set or not based on the first comparison result.

4. The method according to claim 2, wherein the preset condition includes that pixel values of pixel points of an image satisfy a preset distribution; and

generating a pixel value distribution histogram of pixel points of a key frame in the key frame set as a first pixel value distribution histogram;

performing statistical analysis on the first pixel value distribution histogram to obtain a first pixel value distribution condition;

and determining whether key frames meeting the preset condition exist in the key frame set or not based on the first pixel value distribution condition.

5. The method of claim 3, wherein the preset condition further comprises that pixel values of pixel points of the image satisfy a preset distribution; and

the determining whether a key frame meeting the preset condition exists in the key frame set based on the first comparison result includes:

if the first comparison result indicates that at least one key frame with the variance not smaller than the preset variance threshold exists in the key frame set, generating a first candidate video cover set based on the key frames with the variance not smaller than the preset variance threshold;

generating a pixel value distribution histogram of pixel points of a first candidate video cover in the first candidate video cover set as a second pixel value distribution histogram;

performing statistical analysis on the second pixel value distribution histogram to obtain a second pixel value distribution condition;

and determining whether a first candidate video cover meeting the preset condition exists in the first candidate video cover set or not based on the second pixel value distribution condition.

6. The method according to claim 5, wherein the preset condition further comprises that a mean value of pixel values of pixel points of the image in the frequency domain is not less than a preset mean threshold value; and

the determining whether there is a first candidate video cover meeting the preset condition in the first candidate video cover set based on the second pixel value distribution condition includes:

if the second pixel value distribution condition indicates that at least one frame of first candidate video cover meeting the preset distribution exists in the first candidate video cover set, generating a second candidate video cover set based on the at least one frame of first candidate video cover meeting the preset distribution;

transforming a second candidate video cover in the second candidate video cover set from a space domain to a frequency domain by using discrete cosine transform to obtain a second candidate video cover set of the frequency domain;

calculating the mean value of pixel values of pixels of a second candidate video cover in the second candidate video cover set of the frequency domain to serve as a second mean value;

comparing the second average value with the preset average value threshold value to obtain a third comparison result;

determining whether a second candidate video cover meeting the preset condition exists in the second candidate video cover set based on the third comparison result.

7. The method of claim 6, wherein the determining whether there is a second candidate video cover in the second set of candidate video covers that satisfies the preset condition based on the third comparison comprises:

if the third comparison result indicates that at least one frame of second candidate video cover with a second mean value not smaller than the preset mean value threshold exists in the second candidate video cover set, determining that a second candidate video cover meeting the preset condition exists in the second candidate video cover set, wherein the second candidate video cover meeting the preset condition comprises a second candidate video cover with a second mean value not smaller than the preset mean value threshold;

and if the third comparison result indicates that no second candidate video cover with the second mean value not smaller than the preset mean value threshold exists in the second candidate video cover set, determining that no second candidate video cover meeting the preset condition exists in the second candidate video cover set.

8. The method of claim 3, wherein prior to said calculating the variance of pixel values of pixel points of key frames in said set of key frames, further comprising:

converting the RGB value of the pixel point of the key frame in the key frame set into YUV value;

and taking the Y value in the YUV values of the pixel points of the key frames in the key frame set as the pixel value of the pixel point of the key frame in the key frame set.

9. The method of claim 1, wherein said obtaining an original video stream comprises:

acquiring a video file;

decapsulating the video file to obtain a video compressed stream;

and decompressing the video compressed stream to obtain the original video stream.

10. The method according to one of claims 1-9, wherein the method further comprises:

in response to determining that no key frame meeting the preset condition exists in the key frame set, determining whether a video frame meeting the preset condition exists in the original video stream;

and in response to determining that the video frames meeting the preset condition exist in the original video stream, selecting the video frames from the video frames meeting the preset condition as video covers.

11. A network device, comprising:

one or more processors;

a storage device on which one or more programs are stored;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-10.