WO2020052084A1

WO2020052084A1 - Video cover selection method, device and computer-readable storage medium

Info

Publication number: WO2020052084A1
Application number: PCT/CN2018/117713
Authority: WO
Inventors: 黄凯; 王长虎
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2018-09-13
Filing date: 2018-11-27
Publication date: 2020-03-19
Also published as: CN109165301B; CN109165301A

Abstract

Disclosed are a video cover selection method, a video cover selection device, a video cover selection hardware device, and a computer-readable storage medium. The video cover selection method comprises evaluating at least one image frame to be evaluated extracted from a video to be processed; according to an evaluation result, selecting an image frame that meets a preset cover condition from the at least one image frame to be evaluated as a cover of the video to be processed. In the embodiment of the present disclosure, first, at least one image frame to be evaluated is extracted from the video to be processed, according to the evaluation result, and an image frame that meets a preset cover condition is selected from the at least one image frame to be evaluated as a cover of the video to be processed; the cover selected in this way is targeted, which is conducive to the promotion of the video.

Description

Video cover selection method, device and computer-readable storage medium

cross reference

The present disclosure refers to a Chinese patent application filed on September 13, 2018, entitled "Video Cover Selection Method, Apparatus, and Computer-readable Storage Medium" with application number 201811069772.5, which is incorporated by reference in its entirety.

Technical field

The present disclosure relates to the technical field of information processing, and in particular, to a video cover selection method, device, and computer-readable storage medium.

Background technique

In recent years, with the rapid development of multimedia technology and computer networks, the capacity of digital video is growing at an alarming rate. How to stand out from many videos has aroused people's interest and has become a hot spot for video makers.

In the prior art, the cover of a video is generally automatically generated randomly or specified by the author. The selected cover of the video is not targeted and is not conducive to the promotion of the video.

Summary of the Invention

The technical problem solved by the present disclosure is to provide a video cover selection method to at least partially solve the technical problem that the selected video cover is not targeted and is not conducive to the promotion of the video. In addition, a video cover selection device, a video cover selection hardware device, a computer-readable storage medium, and a video cover selection terminal are also provided.

To achieve the above objective, according to one aspect of the present disclosure, the following technical solutions are provided:

A video cover selection method includes:

Evaluate at least one image frame to be evaluated extracted from the video to be processed;

According to the evaluation result, an image frame that satisfies a preset cover condition is selected from the at least one image frame to be evaluated as a cover of the video to be processed.

Further, the step of evaluating at least one image frame to be evaluated extracted from the video to be processed includes:

Performing cluster analysis on the at least one image frame to be evaluated to obtain at least one cluster center;

For each cluster center, evaluate the image frame closest to the cluster center;

The step of selecting an image frame satisfying a preset cover condition from the at least one image frame to be evaluated as a cover of the video to be processed according to the evaluation result includes:

According to the evaluation result, an image frame that satisfies a preset cover condition is selected from the image frames closest to the cluster center as the cover of the video to be processed.

Separately perform picture quality assessment on the at least one image frame to be evaluated, and the assessment result is a picture quality level; and / or,

The click rate prediction is performed on the at least one image frame to be evaluated, and the evaluation result is the click rate.

If the evaluation result is a picture quality level and a click rate, the picture quality level and the click rate are combined, and the evaluation result is recalculated.

The at least one image frame to be evaluated is input into an image evaluation model trained in advance for evaluation, and an output result of the image evaluation model is used as the evaluation result.

Further, the image evaluation model includes: an image quality evaluation model and / or a click rate prediction model, the image quality evaluation model is used to output a picture quality level, and the click rate prediction model is used to output a click rate.

Further, the method further includes:

Pre-processing at least one image frame extracted from the video to be processed, and selecting an image frame that meets a preset standard as the image frame to be evaluated.

To achieve the above object, according to another aspect of the present disclosure, the following technical solutions are also provided:

A video cover selection device includes:

An image evaluation module, configured to evaluate at least one image frame to be evaluated extracted from the video to be processed;

A cover selection module is configured to select an image frame that satisfies a preset cover condition from the at least one image frame to be evaluated as a cover of the video to be processed according to the evaluation result.

Further, the image evaluation module is specifically configured to perform cluster analysis on the at least one image frame to be evaluated to obtain at least one cluster center; and for each cluster center, perform an image closest to the cluster center. Frame for evaluation;

The cover selection module is specifically configured to: according to the evaluation result, select an image frame that meets a preset cover condition from the image frames closest to the cluster center as the cover of the video to be processed.

Further, the image evaluation module is specifically configured to perform picture quality evaluation on the at least one image frame to be evaluated, and the evaluation result is a picture quality level; and / or, the at least one frame to be evaluated image Frames are click-through rate predicted, and the evaluation result is click-through rate.

Further, the image evaluation module is specifically configured to: if the evaluation result is a picture quality level and a click rate, merge the picture quality level and the click rate, and recalculate the evaluation result.

Further, the image evaluation module is specifically configured to: input the at least one image frame to be evaluated into an image evaluation model trained in advance for evaluation, and use an output result of the image evaluation model as the evaluation result.

Further, the device further includes:

The pre-processing module is configured to pre-process at least one image frame extracted from the video to be processed, and select an image frame that meets a preset standard as the image frame to be evaluated.

A video cover selection hardware device includes:

Memory for storing non-transitory computer-readable instructions; and

A processor, configured to run the computer-readable instructions, so that the processor, when executed, implements the steps described in any one of the foregoing technical solutions of the video cover selection method.

A computer-readable storage medium is configured to store non-transitory computer-readable instructions, and when the non-transitory computer-readable instructions are executed by a computer, cause the computer to execute any of the technical solutions of the video cover selection method described above. The steps described.

A video cover selection terminal includes any of the above video cover selection devices.

Embodiments of the present disclosure provide a video cover selection method, a video cover selection device, a video cover selection hardware device, a computer-readable storage medium, and a video cover selection terminal. The video cover selection method includes evaluating at least one image frame to be evaluated extracted from the video to be processed, and selecting an image frame that satisfies a preset cover condition from the at least one image frame to be evaluated according to the evaluation result. As the cover of the video to be processed. The embodiment of the present disclosure first evaluates at least one image frame to be evaluated extracted from a video to be processed, and selects, from the at least one image frame to be evaluated, an image frame that satisfies a preset cover condition from the at least one image frame to be evaluated as the image frame. The cover of the video to be processed. The cover selected in this way is targeted, which is conducive to the promotion of the video.

The above description is only an overview of the technical solutions of the present disclosure. In order to better understand the technical means of the present disclosure, it can be implemented in accordance with the contents of the description, and to make the above and other objects, features, and advantages of the present disclosure more obvious and understandable The preferred embodiments are described below and described in detail with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1a is a schematic flowchart of a video cover selection method according to an embodiment of the present disclosure; FIG.

FIG. 1b is a schematic flowchart of a video cover selection method according to another embodiment of the present disclosure; FIG.

1c is a schematic flowchart of a video cover selection method according to another embodiment of the present disclosure;

FIG. 1d is a schematic flowchart of a video cover selection method according to another embodiment of the present disclosure; FIG.

2a is a schematic structural diagram of a device for selecting a video cover according to an embodiment of the present disclosure;

3 is a schematic structural diagram of a video cover selection hardware device according to an embodiment of the present disclosure;

4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a video cover selection terminal according to an embodiment of the present disclosure.

detailed description

The embodiments of the present disclosure are described below through specific specific examples. Those skilled in the art can easily understand other advantages and effects of the present disclosure from the content disclosed in this specification. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all the embodiments. The present disclosure can also be implemented or applied through different specific implementations, and various details in this specification can also be modified or changed based on different viewpoints and applications without departing from the spirit of the present disclosure. It should be noted that, in the case of no conflict, the following embodiments and features in the embodiments can be combined with each other. Based on the embodiments in the present disclosure, all other embodiments obtained by a person having ordinary skill in the art without making creative efforts fall within the protection scope of the present disclosure.

It should be noted that various aspects of the embodiments within the scope of the appended claims are described below. It should be apparent that aspects described herein may be embodied in a wide variety of forms and that any specific structure and / or function described herein is merely illustrative. Based on the present disclosure, those skilled in the art should understand that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, any number of the aspects set forth herein may be used to implement a device and / or a practice method. In addition, the apparatus and / or the method may be implemented using other structures and / or functionality than one or more of the aspects set forth herein.

It should also be noted that the illustrations provided in the following embodiments only illustrate the basic idea of the present disclosure in a schematic manner, and only the components related to the present disclosure are shown in the drawings instead of the number, shape and For size drawing, the type, quantity, and proportion of each component can be changed at will in actual implementation, and the component layout type may be more complicated.

In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, those skilled in the art will understand that the described aspects may be practiced without these specific details.

In order to solve the technical problem that the selected video cover is not targeted and is not conducive to the promotion of the video, the embodiment of the present disclosure provides a video cover selection method. As shown in FIG. 1a, the video cover selection method mainly includes the following steps S1 to S2. among them:

Step S1: Evaluate at least one image frame to be evaluated extracted from the video to be processed.

Specifically, several frames of images from the video to be processed may be extracted as the image frames to be evaluated, or all image frames in the video to be processed may be taken as the image frames to be evaluated.

Step S2: According to the evaluation result, an image frame satisfying a preset cover condition is selected from at least one image frame to be evaluated as a cover of the video to be processed.

The evaluation results include, but are not limited to, image quality levels and / or click-through rates.

The preset cover condition may be, but is not limited to, an image quality level meeting a preset requirement and / or an image click rate exceeding a preset click rate.

Specifically, when the evaluation result is a picture quality level, through this step, an image frame with an image quality level that meets a preset requirement, that is, a relatively high quality level, can be selected as the cover of the video to be processed, thereby helping to attract users to watch. When the evaluation result is the click-through rate, this step can select the image frame with the click-through rate exceeding the preset click-through rate, that is, the image frame with the higher click-through rate, as the cover of the video to be processed. Users are interested in this image frame, so it is also conducive to attract users to watch and facilitate the promotion of the video.

In this embodiment, at least one image frame to be evaluated extracted from the video to be processed is evaluated, and an image frame that satisfies a preset cover condition is selected from the at least one image frame to be evaluated as the cover of the video to be processed according to the evaluation result. , The cover selected in this way is targeted, which is conducive to the promotion of the video.

In an optional embodiment, as shown in FIG. 1b, step S1 includes:

S11: Perform cluster analysis on at least one image frame to be evaluated extracted from the video to be processed to obtain at least one cluster center.

Among them, the applicable clustering algorithms include, but are not limited to, any of the following: K-Means clustering, mean-shift clustering, density-based clustering methods, clustering clustering, graph group detection clustering, etc. .

Specifically, first select a plurality of image frames from at least one image frame to be evaluated as the initial cluster center, and then use the remaining image frames as samples to calculate the distance between each sample and the initial cluster center, and select the distance from the initial cluster. The nearest several samples of the class center are used as a class to obtain at least one type of image set. Then, for each type of image set, recalculate the cluster center and iterate continuously until the preset conditions are met, so that at least one cluster center is obtained.

S12: For each cluster center, evaluate the image frame closest to the cluster center.

Among them, the Euclidean distance and the cosine distance can be used to represent the distance between the image frame and the cluster center.

Accordingly, step S2 specifically includes:

In this embodiment, at least one image frame to be evaluated is subjected to cluster analysis to obtain at least one cluster center. For each cluster center, an image frame closest to the cluster center is evaluated, and according to the evaluation result, the image is clustered from the distance. The image frame that satisfies the preset cover condition is selected as the cover of the video to be processed from the nearest image frame of the center, so that the selected cover is targeted, which is conducive to the promotion of the video.

In an optional embodiment, as shown in FIG. 1c, step S1 specifically includes:

Separately perform picture quality assessment on at least one image frame to be evaluated extracted from the video to be processed, and the evaluation result is a picture quality level; and / or,

The click rate prediction is performed on at least one image frame to be evaluated extracted from the video to be processed, and the evaluation result is the click rate.

Among them, the picture quality level is divided into four levels, for example, A, B, C, and D, or determined based on the image scoring results, with 0-30 as a level, 30-60 as a level, and 60-80 as a level. Level, with 80-100 as a level. The standards for different level divisions have different corresponding cover conditions. For example, when divided into four levels: A, B, C, and D, the image frame can meet the level A. Set a preset cover condition. When determining based on the scoring result, you can set the image frame score to 80-100 as the preset cover condition.

The click rate of a frame image is the percentage of clicks on the frame image to the total clicks on all frames in the video.

Further, step S1 specifically includes:

If the evaluation result is an image quality level and a click rate, the image quality level and the click rate are combined, and the evaluation result is recalculated.

Specifically, the weighted sum of the picture quality level and the click rate can be calculated according to the weight calculation method, and the weighted sum can be used as the evaluation result.

In this embodiment, at least one image frame to be evaluated is subjected to picture quality evaluation and / or click-through rate prediction, and according to the image quality level and / or click-through rate, a picture that meets a preset cover condition is selected from the at least one image-to-be evaluated frame. The image frame is used as the cover of the video to be processed. The cover selected in this way is targeted to the promotion of the video.

In an optional embodiment, as shown in FIG. 1d, step S1 specifically includes:

At least one frame of the image to be evaluated extracted from the video to be processed is input to an image evaluation model trained in advance for evaluation, and the output result of the image evaluation model is used as the evaluation result.

Specifically, the training of the image quality assessment model includes the following steps: pre-statistically calculate the quality levels of the images in each frame of the video, use the images of known image quality levels as training samples, and label the training samples according to different levels, and then use The deep learning classification algorithm trains and learns the labeled training samples to obtain an image quality evaluation model. For the click-through rate prediction model, the click-through rate of each frame of images in the video is counted in advance, and the images with known click-through rates are used as training samples. The training samples are labeled according to different click-through rates, and then the deep learning classification algorithm is used to train the labeled training. The samples are trained and learned to obtain a click rate prediction model.

The deep learning classification algorithms that can be used include, but are not limited to, any of the following: Naive Bayes algorithm, artificial neural network algorithm, genetic algorithm, K-Nearest Neighbor (KNN) classification algorithm, clustering algorithm, and the like.

In this embodiment, at least one image frame to be evaluated is input into an image evaluation model trained in advance for evaluation, and an output result of the image evaluation model is used as an evaluation result, and then based on the output result, at least one image frame to be evaluated is selected. The image frames that meet the preset cover conditions are used as the cover of the video to be processed. The cover selected in this way is targeted, which is conducive to the promotion of the video.

In an optional embodiment, the method in this embodiment further includes:

Preprocess at least one image frame extracted from the video to be processed, and filter out image frames that meet the preset criteria as the image frames to be evaluated.

Among them, the screening criterion may be image sharpness.

Specifically, by pre-processing the image frames, low-quality images, such as images with black edges or blurring, can be further filtered out.

Those skilled in the art should understand that, on the basis of the foregoing embodiments, obvious modifications (for example, combining the listed modes) or equivalent replacements can also be performed.

In the foregoing, although the steps in the embodiment of the video cover selection method are described in the above order, those skilled in the art should understand that the steps in the embodiments of the present disclosure are not necessarily performed in the above order, and they may also be performed in reverse order and in parallel. , Cross, and other executions, and based on the above steps, those skilled in the art can also add other steps, these obvious variations or equivalent replacements should also be included in the scope of protection of the present disclosure, not here More details.

The following is a device embodiment of the present disclosure. The device embodiment of the present disclosure can be used to perform the steps implemented by the method embodiments of the present disclosure. For convenience of explanation, only parts related to the embodiments of the present disclosure are shown. Specific technical details are not disclosed. Reference is made to the method embodiments of the present disclosure.

In order to solve the technical problem that the selected video cover is not targeted and is not conducive to the promotion of the video, an embodiment of the present disclosure provides a video cover selection device. The device may perform the steps in the foregoing embodiment of the method for selecting a video cover. As shown in FIG. 2a, the device mainly includes: an image evaluation module 21 and a cover selection module 22; wherein, the image evaluation module 21 is configured to evaluate at least one image frame to be evaluated extracted from a video to be processed; a cover selection module 22 is used to select an image frame that satisfies a preset cover condition from at least one image frame to be evaluated as a cover of the video to be processed according to the evaluation result.

Specifically, the image evaluation module 21 may extract several frames from the video to be processed as the image frames to be evaluated, or may use all the image frames in the video to be processed as the image frames to be evaluated.

In this embodiment, the image evaluation module 21 evaluates at least one image frame to be evaluated extracted from the video to be processed, and the cover selection module 22 selects at least one image frame to be evaluated from the at least one image frame to be evaluated according to the evaluation result. The image frame is used as the cover of the video to be processed, so the selected cover is targeted, which is conducive to the promotion of the video.

In an optional embodiment, based on FIG. 2a, the image evaluation module 21 is specifically configured to: perform cluster analysis on at least one image frame to be evaluated to obtain at least one cluster center; and for each cluster center, The image frame closest to the cluster center is evaluated;

The cover selection module 22 is specifically configured to: according to the evaluation result, select an image frame that satisfies a preset cover condition from the image frames closest to the cluster center as the cover of the video to be processed.

The clustering algorithm that can be used by the image evaluation module 21 includes, but is not limited to, any of the following: K-Means clustering, mean-shift clustering, density-based clustering methods, agglomeration hierarchical clustering, and graph community Detect clusters, etc.

In this embodiment, the image evaluation module 21 performs cluster analysis on at least one image frame to be evaluated to obtain at least one cluster center. For each cluster center, the image frame closest to the cluster center is evaluated, and a cover selection module is used. 22 According to the evaluation result, an image frame that satisfies the preset cover condition is selected from the image frames closest to the clustering center as the cover of the video to be processed. The selected cover is targeted and thus facilitates the promotion of the video.

In an optional embodiment, based on FIG. 2a, the image evaluation module 21 is specifically configured to perform picture quality evaluation on at least one image frame to be evaluated, and the evaluation result is a picture quality level; and / or, for at least one A click-through rate prediction is performed for each image frame to be evaluated, and the evaluation result is the click-through rate.

Further, the image evaluation module 21 is specifically configured to: if the evaluation result is an image quality level and a click rate, merge the image quality level and the click rate, and recalculate the evaluation result.

Specifically, the image evaluation module 21 may calculate a weighted sum of a picture quality level and a click rate according to a weight calculation method, and use the weighted sum as an evaluation result.

In this embodiment, the image evaluation module 21 performs picture quality evaluation and / or click-through rate prediction on at least one image frame to be evaluated, and the cover selection module 22 uses the picture quality level and / or click-through rate to select at least one frame from the image to be evaluated. An image frame that satisfies a preset cover condition is selected as the cover of the video to be processed, and the selected cover is targeted, which is beneficial to the promotion of the video.

In an optional embodiment, based on FIG. 2a, the image evaluation module 21 is specifically configured to: input at least one image frame to be evaluated into an image evaluation model trained in advance for evaluation, and output the output of the image evaluation model As a result of the evaluation.

In this embodiment, the image evaluation module 21 inputs at least one image frame to be evaluated into a pre-trained image evaluation model for evaluation, and uses the output result of the image evaluation model as the evaluation result. The cover selection module 22 uses the output result to An image frame that satisfies a preset cover condition is selected as the cover of the video to be processed from at least one image frame to be evaluated, so that the selected cover is targeted, which is beneficial to the promotion of the video.

In an optional embodiment, the device further includes: a pre-processing module 23; wherein the pre-processing module 23 is configured to pre-process at least one image frame extracted from the video to be processed, and filter out those that meet a preset standard. The image frame is used as the image frame to be evaluated.

Among them, the screening criterion may be image sharpness.

For detailed descriptions about the working principle and technical effects of the embodiment of the video cover selection device, reference may be made to the relevant description in the foregoing video cover selection method embodiment, and details are not described herein again.

FIG. 3 is a hardware block diagram illustrating a video cover selection hardware device according to an embodiment of the present disclosure. As shown in FIG. 3, the video cover selection hardware device 30 according to an embodiment of the present disclosure includes a memory 31 and a processor 32.

The memory 31 is configured to store non-transitory computer-readable instructions. Specifically, the memory 31 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and / or non-volatile memory. The volatile memory may include, for example, a random access memory (RAM) and / or a cache memory. The non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.

The processor 32 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and / or instruction execution capabilities, and may control other components in the video cover selection hardware device 30 to perform a desired function. In an embodiment of the present disclosure, the processor 32 is configured to run the computer-readable instructions stored in the memory 31, so that the video cover selection hardware device 30 executes the foregoing video cover selection method of the embodiments of the present disclosure. All or part of the steps.

Those skilled in the art should understand that in order to solve the technical problem of how to obtain a good user experience effect, this embodiment may also include well-known structures such as a communication bus and an interface. These well-known structures should also be included in the protection scope of the present disclosure. within.

For detailed descriptions of this embodiment, reference may be made to corresponding descriptions in the foregoing embodiments, and details are not described herein again.

FIG. 4 is a schematic diagram illustrating a computer-readable storage medium according to an embodiment of the present disclosure. As shown in FIG. 4, a computer-readable storage medium 40 according to an embodiment of the present disclosure stores non-transitory computer-readable instructions 41 thereon. When the non-transitory computer-readable instruction 41 is executed by a processor, all or part of the steps of the method for comparing video features of the foregoing embodiments of the present disclosure are performed.

The computer-readable storage medium 40 includes, but is not limited to, optical storage media (for example, CD-ROM and DVD), magneto-optical storage media (for example, MO), magnetic storage media (for example, magnetic tape or mobile hard disk), Non-volatile memory rewritable media (for example: memory card) and media with built-in ROM (for example: ROM box).

FIG. 5 is a schematic diagram illustrating a hardware structure of a terminal according to an embodiment of the present disclosure. As shown in FIG. 5, the video cover selection terminal 50 includes the foregoing video cover selection device embodiment.

The terminal may be implemented in various forms, and the terminal in the present disclosure may include, but is not limited to, such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP ( Portable multimedia players), navigation devices, on-board terminals, on-board display terminals, on-board electronic rear-view mirrors, and other mobile terminals, and fixed terminals such as digital TVs, desktop computers, and the like.

As an equivalent alternative, the terminal may further include other components. As shown in FIG. 5, the video cover selection terminal 50 may include a power supply unit 51, a wireless communication unit 52, an A / V (audio / video) input unit 53, a user input unit 54, a sensing unit 55, an interface unit 56, and a control unit. Device 57, output unit 58, memory 59, and so on. FIG. 5 illustrates a terminal having various components, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead.

Among them, the wireless communication unit 52 allows radio communication between the terminal 50 and a wireless communication system or network. The A / V input unit 53 is used to receive audio or video signals. The user input unit 54 may generate key input data according to a command input by the user to control various operations of the terminal. The sensing unit 55 detects the current state of the terminal 50, the position of the terminal 50, the presence or absence of a user's touch input to the terminal 50, the orientation of the terminal 50, the acceleration or deceleration movement and direction of the terminal 50, and the like, and generates a signal for controlling the terminal 50 commands or signals for operation. The interface unit 56 functions as an interface through which at least one external device can be connected to the terminal 50. The output unit 58 is configured to provide an output signal in a visual, audio, and / or tactile manner. The memory 59 may store software programs and the like for processing and control operations performed by the controller 55, or may temporarily store data that has been output or is to be output. The memory 59 may include at least one type of storage medium. Moreover, the terminal 50 may cooperate with a network storage device that performs a storage function of the memory 59 through a network connection. The controller 57 generally controls the overall operation of the terminal. In addition, the controller 57 may include a multimedia module for reproducing or playing back multimedia data. The controller 57 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as characters or images. The power supply unit 51 receives external power or internal power under the control of the controller 57 and provides appropriate power required to operate each element and component.

Various embodiments of the video feature comparison method proposed by the present disclosure may be implemented in a computer-readable medium using, for example, computer software, hardware, or any combination thereof. For hardware implementation, various embodiments of the video feature comparison method proposed in the present disclosure can be implemented by using an application-specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), and a programmable logic device. (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, microprocessor, electronic unit designed to perform the functions described herein, and in some cases implemented Various embodiments of the video feature comparison method proposed in the present disclosure may be implemented in the controller 57. For software implementation, various embodiments of the video feature comparison method proposed by the present disclosure can be implemented with a separate software module that allows at least one function or operation to be performed. The software codes may be implemented by a software application (or program) written in any suitable programming language, and the software codes may be stored in the memory 59 and executed by the controller 57.

The basic principles of the present disclosure have been described above in conjunction with specific embodiments, but it should be noted that the advantages, advantages, effects, etc. mentioned in this disclosure are merely examples and not limitations, and these advantages, advantages, effects, etc. cannot be considered as Required for various embodiments of the present disclosure. In addition, the specific details of the above disclosure are only for the purpose of example and easy to understand, and are not limiting, and the above details do not limit the present disclosure to the implementation of the above specific details.

The block diagrams of the devices, devices, equipment, and systems involved in this disclosure are only illustrative examples and are not intended to require or imply that they must be connected, arranged, and configured in the manner shown in the block diagrams. As will be recognized by those skilled in the art, these devices, devices, equipment, systems can be connected, arranged, and configured in any manner. Words such as "including," "including," "having," and the like are open words that refer to "including, but not limited to," and can be used interchangeably with them. As used herein, the terms "or" and "and" refer to the terms "and / or" and are used interchangeably therewith unless the context clearly indicates otherwise. The term "such as" as used herein refers to the phrase "such as, but not limited to," and is used interchangeably with it.

In addition, as used herein, an "or" used in an enumeration of items beginning with "at least one" indicates a separate enumeration such that, for example, an "at least one of A, B, or C" enumeration means A or B or C, or AB or AC or BC, or ABC (ie A and B and C). Furthermore, the word "exemplary" does not mean that the described example is preferred or better than other examples.

It should also be noted that in the system and method of the present disclosure, each component or each step can be disassembled and / or recombined. These decompositions and / or recombinations should be regarded as equivalent solutions of the present disclosure.

Various changes, substitutions, and alterations to the techniques described herein can be made without departing from the techniques taught by the appended claims. Further, the scope of the claims of the present disclosure is not limited to the specific aspects of the processes, machines, manufacturing, composition of events, means, methods, and actions described above. The composition, means, methods, or actions of processes, machines, manufacturing, and events that currently exist or are to be developed later may be utilized that perform substantially the same functions or achieve substantially the same results as the corresponding aspects described herein. Accordingly, the appended claims include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or actions.

The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other aspects without departing from the scope of the present disclosure. Accordingly, the disclosure is not intended to be limited to the aspects shown herein, but to the broadest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been given for the purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the present disclosure to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

A video cover selection method, comprising:

Evaluate at least one image frame to be evaluated extracted from the video to be processed;

According to the evaluation result, an image frame that satisfies a preset cover condition is selected from the at least one image frame to be evaluated as a cover of the video to be processed.
The method according to claim 1, wherein the step of evaluating at least one image frame to be evaluated extracted from the video to be processed comprises:

Performing cluster analysis on the at least one image frame to be evaluated to obtain at least one cluster center;

For each cluster center, evaluate the image frame closest to the cluster center;

The step of selecting an image frame satisfying a preset cover condition from the at least one image frame to be evaluated as a cover of the video to be processed according to the evaluation result includes:

According to the evaluation result, an image frame that satisfies a preset cover condition is selected from the image frames closest to the cluster center as the cover of the video to be processed.
The method according to claim 1, wherein the step of evaluating at least one image frame to be evaluated extracted from the video to be processed comprises:

Separately perform picture quality assessment on the at least one image frame to be evaluated, and the assessment result is a picture quality level; and / or,

The click rate prediction is performed on the at least one image frame to be evaluated, and the evaluation result is the click rate.
The method according to claim 3, wherein the step of evaluating at least one image frame to be evaluated extracted from the video to be processed comprises:

If the evaluation result is a picture quality level and a click rate, the picture quality level and the click rate are combined, and the evaluation result is recalculated.
The method according to claim 1, wherein the step of evaluating at least one image frame to be evaluated extracted from the video to be processed comprises:

The at least one image frame to be evaluated is input into an image evaluation model trained in advance for evaluation, and an output result of the image evaluation model is used as the evaluation result.
The method according to claim 5, wherein the image evaluation model comprises: an image quality evaluation model and / or a click rate prediction model, the image quality evaluation model is used to output a picture quality level, and the click rate prediction The model is used to output the click-through rate.
The method according to any one of claims 1-6, further comprising:

Pre-processing at least one image frame extracted from the video to be processed, and selecting an image frame that meets a preset standard as the image frame to be evaluated.
A video cover selection device, comprising:

An image evaluation module, configured to evaluate at least one image frame to be evaluated extracted from the video to be processed;

A cover selection module is configured to select an image frame that satisfies a preset cover condition from the at least one image frame to be evaluated as a cover of the video to be processed according to the evaluation result.
The device according to claim 8, wherein the image evaluation module is specifically configured to: perform cluster analysis on the at least one image frame to be evaluated to obtain at least one cluster center; and for each cluster center, Evaluating an image frame closest to the cluster center;

The cover selection module is specifically configured to: according to the evaluation result, select an image frame that meets a preset cover condition from the image frames closest to the cluster center as the cover of the video to be processed.
The device according to claim 8, wherein the image evaluation module is specifically configured to: separately perform picture quality evaluation on the at least one image frame to be evaluated, and the evaluation result is a picture quality level; and / or , Respectively, performing a click-through rate prediction on the at least one image frame to be evaluated, and the evaluation result is a click-through rate.
The device according to claim 10, wherein the image evaluation module is specifically configured to: if the evaluation result is a picture quality level and a click rate, merge the picture quality level and the click rate, and recalculate the evaluation result .
The device according to claim 8, wherein the image evaluation module is specifically configured to: input the at least one image frame to be evaluated into an image evaluation model trained in advance for evaluation, and input the image evaluation model The output result is used as the evaluation result.
The device according to claim 12, wherein the image evaluation model comprises: an image quality evaluation model and / or a click rate prediction model, the image quality evaluation model is used to output a picture quality level, and the click rate prediction The model is used to output the click-through rate.
The device according to any one of claims 8-13, wherein the device further comprises:

The pre-processing module is configured to pre-process at least one image frame extracted from the video to be processed, and select an image frame that meets a preset standard as the image frame to be evaluated.
A video cover selection hardware device includes:

Memory for storing non-transitory computer-readable instructions; and

A processor, configured to run the computer-readable instructions, so that the processor, when executed, implements the video cover selection method according to any one of claims 1-7.
A computer-readable storage medium is configured to store non-transitory computer-readable instructions, and when the non-transitory computer-readable instructions are executed by a computer, cause the computer to execute any one of claims 1-7 Video cover selection method.