CN111984821A

CN111984821A - Method and device for determining dynamic cover of video, storage medium and electronic equipment

Info

Publication number: CN111984821A
Application number: CN202010575535.7A
Authority: CN
Inventors: 郑多如; 彭冲
Original assignee: Hanhai Information Technology Shanghai Co Ltd
Current assignee: Hanhai Information Technology Shanghai Co Ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-11-24

Abstract

The disclosure relates to a method, a device, a storage medium and an electronic device for determining video dynamic covers, wherein a plurality of preset dynamic covers of a target video are obtained, and the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover; and then determining a target dynamic cover corresponding to the target video from the preset dynamic covers according to the estimated click rate, determining a plurality of key frame images serving as the target dynamic cover according to the video click rate, and acquiring a part which is more attractive to click in the video, so that the user click rate of the target video can be remarkably improved after the target dynamic cover is used as the cover of the target video.

Description

Method and device for determining dynamic cover of video, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of video cover selection, and in particular, to a method and an apparatus for determining a dynamic cover of a video, a storage medium, and an electronic device.

Background

In the scene of the information flow recommendation system, videos have entertainment and content richness compared with pictures and characters, video key frames are used as an abstract of a video to embody the content of the video in a simplified mode, and excellent video key frames are extracted to be used as a dynamic video cover to attract users to Click, so that the improvement of CTR (Click-Through-Rate) of information flow is brought.

In the related art, target detection and behavior recognition are performed based on a deep learning model, a video key frame set with strong purposiveness can be obtained to serve as a dynamic cover of a video, for example, modes of detecting human bodies, gestures and the like in the video are adopted, but the method is applicable to narrower scenes due to the fact that the purposiveness is too strong, in addition, training data of the target detection and behavior recognition can be obtained only by needing a large amount of manual labeling, and therefore the obtaining cost of sample data is increased.

Disclosure of Invention

The invention aims to provide a method, a device, a storage medium and an electronic device for determining a video dynamic cover.

In a first aspect, the present disclosure provides a method of determining a video motion cover, the method comprising: acquiring a plurality of preset dynamic covers of a target video, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover; and determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.

Optionally, the determining, according to the estimated click rate, a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers includes: selecting a preset number of preset dynamic covers with the highest estimated click rate from a plurality of preset dynamic covers as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.

Optionally, after the candidate dynamic cover with the highest actual click rate is used as the target dynamic cover of the target video, the method further includes: increasing the preset exposure probability corresponding to the target dynamic cover to a target exposure probability; and carrying out on-line exposure on the dynamic cover of the target according to the target exposure probability.

Optionally, the obtaining of the plurality of preset dynamic covers of the target video includes: cutting the target video according to different preset time intervals to obtain a plurality of frame image sets, or cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, wherein the frame image sets comprise a plurality of frame images; and determining a plurality of preset dynamic covers according to the plurality of frame image sets, wherein the frame image sets correspond to the preset dynamic covers one to one.

Optionally, the video click rate pre-estimation model is obtained by training in the following manner: acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively; and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.

Optionally, the preset exposure condition includes: the exposure times are greater than or equal to a second preset exposure time threshold; or the exposure time is greater than or equal to a preset exposure time threshold.

In a second aspect, there is provided an apparatus for determining video motion covers, the apparatus comprising: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of preset dynamic covers of a target video, and the preset dynamic covers comprise a plurality of frames of images extracted from the target video; the first determining module is used for taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; and the second determining module is used for determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.

Optionally, the second determining module is configured to select, from the plurality of preset dynamic covers, a preset number of preset dynamic covers with a highest estimated click rate as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.

Optionally, the apparatus further comprises: the probability adjusting module is used for increasing the preset exposure probability corresponding to the target dynamic cover to a target exposure probability; and the exposure module is used for carrying out online exposure on the dynamic cover of the target according to the target exposure probability.

Optionally, the obtaining module is configured to cut the target video according to different preset time intervals to obtain a plurality of frame image sets, or cut the target video according to different preset frame intervals to obtain a plurality of frame image sets, where each frame image set includes multiple frames of images; and determining a plurality of preset dynamic covers according to the plurality of frame image sets, wherein the frame image sets correspond to the preset dynamic covers one to one.

In a third aspect, a computer readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method according to the first aspect of the disclosure.

In a fourth aspect, an electronic device is provided, comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.

According to the technical scheme, a plurality of preset dynamic covers of a target video are obtained, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover; the method for determining the target dynamic cover corresponding to the target video from the preset dynamic covers according to the estimated click rate can obtain the part which is more attractive to the user to click in the video according to the mode for determining the target dynamic cover from the preset dynamic covers of the video, so that the user click rate of the target video can be obviously improved after the target dynamic cover is used as the cover of the target video, and in addition, compared with the modes of target detection and behavior identification with strong purpose, the mode for determining the target dynamic cover according to the video click rate has strong generalization and wider application scenes.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram illustrating a first method of determining video motion covers in accordance with one illustrative embodiment;

FIG. 2 is a flow diagram illustrating a second method of determining video motion covers in accordance with one illustrative embodiment;

FIG. 3 is a block diagram illustrating a first apparatus for determining video motion covers in accordance with one illustrative embodiment;

FIG. 4 is a block diagram illustrating a second apparatus for determining video motion covers in accordance with an exemplary embodiment;

fig. 5 is a block diagram illustrating a structure of an electronic device according to an example embodiment.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

Firstly, introduction is made to an application scene of the present disclosure, the present disclosure is mainly applied to a scene of selecting a dynamic cover for a video in an information flow recommendation system, and in the information flow recommendation system, the video is more entertaining and richer in content than pictures and characters, for example, an exposure proportion of the video in the information flow recommendation system of the mass opinion application software is close to 20%, a video key frame is used as an abstract of the video to embody the content of the video in a simplified manner, and an excellent video key frame extracted as a video dynamic cover can attract a user to click more certainly, so that the promotion of the information flow CTR is brought.

In the related art, one way is to select a video key frame as a dynamic cover of a video by using a preset rule, for example, the video from 1 st second to 3 rd second is selected, but the rule formula for extracting the key frame is not necessarily the optimal solution for extracting the key frame, and potential excellent key frames are lost; the other mode is that a mathematical model for extracting key frames is defined based on topological potential and norm, the model is used for calculating the score of a key frame set, and the key frame set with the highest score is used as a dynamic cover of the video, but the video content is rich and all contents in one video cannot be described through the mathematical model at all, so that the mode does not well utilize rich information in the video; in addition, in the related art, an information quantity index of a video key frame is defined based on adaptive clustering, and then a candidate key frame set is defined according to the information quantity index, but the problem of the clustering mode is how to define the information quantity index, the definition of the index is generally characterized by some simple features, such as hue, saturation, brightness, texture, inter-frame similarity and the like, but the features cannot reflect more abstract contents of the video; in a word, the video key frames acquired by the method are used as the dynamic cover of the video, so that rich information in the video cannot be well embodied, the user click rate of the video is influenced, in order to improve the user click rate of the video, in the prior art, target detection and behavior recognition are performed based on a deep learning model, a video key frame set with high purposiveness can be acquired as the dynamic cover of the video, for example, human bodies, gestures and the like in the video are detected, the user click rate of the video can be improved to a certain extent by the method, but the method is too strong in purposiveness, the applicable scene is narrower, in addition, training data of target detection and behavior recognition can be acquired only by a large amount of manual labels, and the acquisition cost of sample data is increased.

In order to solve the existing problems, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for determining a dynamic cover of a video, which may first obtain a plurality of preset dynamic covers of a target video (i.e., a video of a dynamic cover to be determined), where the preset dynamic covers include a plurality of frames of images extracted from the target video; then, taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; therefore, the target dynamic cover corresponding to the target video can be determined from the preset dynamic covers according to the estimated click rate, the method can obtain the part which attracts the user to click in the video according to the mode of determining the target dynamic cover according to the video click rate, therefore, the user click rate of the target video can be obviously improved after the target dynamic cover is used as the cover of the target video, and in addition, compared with the modes of target detection and behavior identification with strong purpose, the mode of determining the target dynamic cover according to the video click rate has strong generalization and wider application scene, and the sample data based on the training label is also very easy to obtain while manual marking is not needed, so that the acquisition cost of the sample data is saved.

Specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

FIG. 1 is a flow diagram illustrating a method for determining video motion covers, as shown in FIG. 1, according to an exemplary embodiment, including the steps of:

in step S101, a plurality of preset dynamic covers of a target video are obtained, where the preset dynamic covers include a plurality of frames of images extracted from the target video.

The target video is a video of a dynamic cover to be determined, and the preset dynamic cover comprises a plurality of frames of images selected from the target video according to a preset rule.

In this step, the target video may be cut according to different preset time intervals to obtain a plurality of frame image sets, for example, a 0-2 second video segment in the target video is a first frame image set, a 1-3 second video segment in the target video is a second frame image set, a 2-4 second video segment in the target video is a third frame image set, and the target video is cut into a plurality of frame image sets by the analogy; or cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, for example, the 1 st to 4 th frame images in the target video form a first frame image set, the 2 nd to 5 th frame images in the target video form a second frame image set, the 3 rd to 6 th frame images in the target video form a third frame image set, and so on, cutting the target video into a plurality of frame image sets; the frame image set comprises a plurality of frame images, and then a plurality of preset dynamic covers can be determined according to the frame image sets, wherein the frame image sets correspond to the preset dynamic covers one by one.

In step S102, a plurality of preset dynamic covers are used as input of the pre-trained video click rate estimation model, and an estimated click rate corresponding to each preset dynamic cover is obtained.

The video click rate prediction model may include an S3DG model for video modeling, and the S3DG model is a modified version model in which a gate structure is added to an S3D (separable 3D CNN) model.

In step S103, a target dynamic cover corresponding to the target video is determined from the plurality of preset dynamic covers according to the estimated click rate.

The target dynamic cover may include a plurality of dynamic covers with the highest user click rate in the preset dynamic cover.

In this step, a preset number of the preset dynamic covers with the highest estimated click rate can be selected from a plurality of the preset dynamic covers as candidate dynamic covers; then, performing on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video, wherein the preset exposure probability is used for representing the exposure possibility of each candidate dynamic cover, and each candidate dynamic cover has the preset exposure probability corresponding to the candidate dynamic cover.

By adopting the method, a plurality of preset dynamic covers of the target video are obtained, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; the method comprises the steps of estimating the click rate of a target video, determining a target dynamic cover corresponding to the target video from a plurality of preset dynamic covers according to the estimated click rate, determining the mode of the target dynamic cover according to the video click rate, and acquiring a part which is more attractive to a user to click in the video.

FIG. 2 is a flow diagram illustrating a method of determining video motion covers, as shown in FIG. 2, according to an exemplary embodiment, including the steps of:

in step S201, the target video is cut according to different preset time intervals to obtain a plurality of frame image sets, or the target video is cut according to different preset frame intervals to obtain a plurality of frame image sets.

Wherein the frame image set comprises a plurality of frame images.

In the process of cutting the target video according to different preset time intervals to obtain a plurality of frame image sets, the target video may be divided into different video segments according to different preset time intervals, and one video segment is regarded as one frame image set, for example, the 0 th to 2 th second video segments in the target video are the first frame image set, the 1 st to 3 th second video segments in the target video are the second frame image set, the 2 nd to 4 th second video segments in the target video are the third frame image set, and so on, the target video is cut into a plurality of frame image sets.

In the process of cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, the target video may be divided into different video segments according to different preset frame intervals, and one video segment is regarded as one frame image set, for example, the video segments corresponding to the 1 st to 4 th frame images in the target video form a first frame image set, the video segments corresponding to the 2 nd to 5 th frame images in the target video form a second frame image set, the video segments corresponding to the 3 rd to 6 th frame images in the target video form a third frame image set, and so on, the target video is cut into a plurality of frame image sets.

In step S202, a plurality of preset dynamic covers are determined according to the plurality of image sets.

The preset dynamic cover includes multiple frames of images selected from the target video according to a preset rule, the frame image sets correspond to the preset dynamic covers one to one, for example, the frame image set corresponding to the 0 th to 2 th seconds of video in the target video is a first preset dynamic cover, the frame image set corresponding to the 1 st to 3 th seconds of video in the target video is a second preset dynamic cover, the frame image set corresponding to the 2 nd to 4 th seconds of video in the target video is a third preset dynamic cover, and so on, multiple preset dynamic covers corresponding to the target video can be obtained, which is only an example and is not limited by the present disclosure.

In step S203, a plurality of preset dynamic covers are used as input of the pre-trained video click rate estimation model, so as to obtain the estimated click rate corresponding to each preset dynamic cover.

The video click rate prediction model may include an S3DG model for video modeling, and the S3DG model is a modified version of adding a gate structure to the S3D model.

It should be noted that, the estimated click rate of the whole dynamic cover composed of multiple frames of images can be obtained by using the video click rate estimation model, so that the target dynamic cover of the target video is determined by taking the estimated click rate as a guide, and thus, after the target video is exposed online by taking the target dynamic cover as the dynamic cover of the target video, the user click rate of the target video can be improved.

In addition, the video click rate estimation model can be obtained by training in the following way: firstly, acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates respectively corresponding to the exposed dynamic covers, for example, acquiring the exposed dynamic covers and the user click rates from an information flow exposure database corresponding to an information flow recommendation system; then, the exposed dynamic cover and the user click rate are used as training samples to perform model training to obtain the video click rate estimation model, specifically, the exposed dynamic cover can be used as an input sample during model training, the user click rate is used as an output sample (i.e., a training label) during model training to perform model training, a loss function during model training can adopt an MSE (Mean Squared Error) function, specifically, a relevant description in the prior art can be referred to in a model training process, and details are not repeated in the present disclosure.

The preset exposure condition may include that the exposure time is greater than or equal to a second preset exposure time threshold; alternatively, the exposure time period is greater than or equal to the preset exposure time threshold, for example, an exposed dynamic cover with an exposure time number greater than or equal to 1000 times may be selected as the input sample of the model training, or an exposed dynamic cover with an exposure time period greater than or equal to 5 days may be selected as the input sample of the model training.

In step S204, a preset number of the preset dynamic covers with the highest estimated click rate are selected from the preset dynamic covers as candidate dynamic covers.

After the candidate dynamic covers are obtained by screening according to the estimated click rate, each candidate dynamic cover can enter the information flow recommendation system as a video creative cover, and then an explore and explore system can be adopted to enable each candidate dynamic cover to obtain sufficient exposure, and compete to obtain a target dynamic cover, wherein the target dynamic cover can include a plurality of dynamic covers with the highest click rate among the preset dynamic covers, and in this embodiment, the target dynamic cover can be determined by executing steps S205 to S207.

In step S205, each of the candidate dynamic covers is exposed on-line according to a preset exposure probability.

The preset exposure probability is used to represent the exposure possibility of each candidate dynamic cover, and each candidate dynamic cover has the preset exposure probability corresponding to the candidate dynamic cover.

In a possible implementation manner of this step, for each candidate dynamic cover, a probability interval corresponding to the candidate dynamic cover may be determined according to the preset exposure probability corresponding to the candidate dynamic cover, and before performing online exposure on the target video each time, any number between 0 and 1 may be randomly selected, and a probability interval where the currently randomly selected number is located is determined as a target probability interval, and then according to a corresponding relationship between the probability interval and the candidate dynamic cover, it is determined that the candidate dynamic cover corresponding to the target probability interval is a cover used when the target video is exposed this time.

For example, if 4 candidate dynamic covers, which are an a dynamic cover, a B dynamic cover, a C dynamic cover, and a D dynamic cover, are determined after step S204 is performed, the preset exposure probability corresponding to the a dynamic cover may be set to 0.25, the preset exposure probability corresponding to the B dynamic cover may be set to 0.25, the preset exposure probability corresponding to the C dynamic cover may be set to 0.25, and the preset exposure probability corresponding to the D dynamic cover may be set to 0.25, so that in one possible implementation, the probability interval corresponding to the a dynamic cover may be set to 0 to 0.25, the probability interval corresponding to the B dynamic cover may be 0.25 to 0.5, the probability interval corresponding to the C dynamic cover may be 0.5 to 0.75, and the probability interval corresponding to the D dynamic cover may be 0.75 to 1, any number between 0 and 1 may be randomly selected before the target video is exposed, and if the number falls between 0 and 0.25, the a dynamic cover may be selected as the cover of the target video that is exposed, if the number falls between 0.25 and 0.5, a B dynamic cover may be selected as a cover of the target video of the current exposure, if the number falls between 0.5 and 0.75, a C dynamic cover may be selected as a cover of the target video of the current exposure, and if the number falls between 0.75 and 1, a D dynamic cover may be selected as a cover of the target video of the current exposure, which is only an example and is not limited in this disclosure.

In step S206, for each candidate dynamic cover, the actual click rate of the candidate dynamic cover when the exposure number reaches the first preset exposure number threshold is obtained.

The first preset exposure time threshold may be empirically set as the exposure time when the actual click rate reaches the convergence state, for example, the first preset exposure time threshold may be set to 300 times.

In step S207, the candidate dynamic cover with the highest actual click rate is used as the target dynamic cover of the target video.

Continuing with the example in step S205, the candidate dynamic cover includes A, B, C, D four dynamic covers, after step S206 is executed, the actual click rate when the exposure number of the candidate dynamic cover a reaches 300 is 0.65, the actual click rate when the exposure number of the candidate dynamic cover B reaches 300 is 0.32, the actual click rate when the exposure number of the candidate dynamic cover C reaches 300 is 0.15, and the actual click rate when the exposure number of the candidate dynamic cover D reaches 300 is 0.05, at this time, the candidate dynamic cover a with the highest actual click rate may be determined as the target dynamic cover.

In addition, in order to obtain the highest possible exposure gain after the target video is exposed and improve the user click rate of the target video, in a possible implementation manner, after the candidate dynamic cover with the highest actual click rate is used as the target dynamic cover of the target video, the preset exposure probability corresponding to the target dynamic cover can be increased to the target exposure probability, and then the target dynamic cover is subjected to online exposure according to the target exposure probability; correspondingly, the preset exposure probability corresponding to other candidate dynamic covers (other candidate dynamic covers except the target dynamic cover) can be reduced, and the other candidate dynamic covers are subjected to online exposure according to the reduced preset exposure probability.

Specifically, the preset exposure probability corresponding to the target dynamic cover may be increased to the target exposure probability according to a preset probability adjustment policy, the preset exposure probabilities corresponding to other candidate dynamic covers may be decreased, and the probability interval corresponding to each candidate dynamic cover may be further adjusted accordingly, where the preset probability adjustment policy may be set according to actual application requirements, and the disclosure does not limit this.

Illustratively, also taking the candidate dynamic covers including A, B, C, D four dynamic covers, and in the initial state, the preset exposure probabilities corresponding to A, B, C, D four dynamic covers are all 0.25, if it is determined that the target dynamic cover is the candidate dynamic cover a, the preset exposure probability corresponding to the candidate dynamic cover a can be increased from 0.25 to 0.7 (i.e. the target exposure probability), and the preset exposure probabilities of B, C, D three candidate dynamic covers can be decreased from 0.25 to 0.1, so that the candidate dynamic cover a of the target video can be online exposed with a probability of 0.7, the candidate dynamic covers B, C, D can be online exposed with a probability of 0.1, and accordingly, after obtaining the adjusted preset exposure probabilities, the probability interval corresponding to the dynamic cover a can be adjusted from 0 to 0.25 to 0.7, the probability interval corresponding to the dynamic cover B can be adjusted from 0.25 to 0.5 to 0.7 to 0.8, the probability interval corresponding to the C dynamic cover is adjusted from 0.5 to 0.75 to 0.8 to 0.9, and the probability interval corresponding to the D dynamic cover is adjusted from 0.75 to 1 to 0.9 to 1, so that if the number randomly selected from 0 to 1 falls between 0 and 0.7, the a dynamic cover can be selected as the cover of the target video of the current exposure, if the number falls between 0.7 and 0.8, the B dynamic cover can be selected as the cover of the target video of the current exposure, if the number falls between 0.8 and 0.9, the C dynamic cover can be selected as the cover of the target video of the current exposure, if the number falls between 0.9 and 1, the D dynamic cover can be selected as the cover of the target video of the current exposure, so that the target video can obtain the highest possible exposure yield after exposure, and the user click rate during the exposure on the target video line can be increased, which above examples are only for illustration, the present disclosure is not limited thereto.

By adopting the method, a plurality of preset dynamic covers of the target video of which the dynamic covers are to be determined can be obtained firstly, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; then, taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; therefore, the target dynamic cover corresponding to the target video can be determined from the preset dynamic covers according to the estimated click rate, the method can obtain the part which attracts the user to click in the video according to the mode of determining the target dynamic cover according to the video click rate, therefore, the user click rate of the target video can be obviously improved after the target dynamic cover is used as the cover of the target video, and in addition, compared with the modes of target detection and behavior identification with strong purpose, the mode of determining the target dynamic cover according to the video click rate has strong generalization and wider application scene, and the sample data based on the training label is also very easy to obtain while manual marking is not needed, so that the acquisition cost of the sample data is saved.

Fig. 3 is a block diagram illustrating an apparatus for determining video motion covers, as shown in fig. 3, according to an exemplary embodiment, comprising:

an obtaining module 301, configured to obtain multiple preset dynamic covers of a target video, where the preset dynamic covers include multiple frames of images extracted from the target video;

a first determining module 302, configured to use a plurality of preset dynamic covers as an input of a pre-trained video click rate estimation model, so as to obtain an estimated click rate corresponding to each preset dynamic cover;

the second determining module 303 is configured to determine a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.

Optionally, the second determining module 303 is configured to select a preset number of the preset dynamic covers with the highest estimated click rate from a plurality of the preset dynamic covers as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.

Optionally, fig. 4 is a block diagram of an apparatus for determining a video dynamic cover according to the embodiment shown in fig. 3, and as shown in fig. 4, the apparatus further includes:

a probability adjusting module 304, configured to increase the preset exposure probability corresponding to the target dynamic cover to a target exposure probability;

and an exposure module 305, configured to perform online exposure on the target dynamic cover according to the target exposure probability.

Optionally, the obtaining module 301 is configured to cut the target video according to different preset time intervals to obtain a plurality of frame image sets, or cut the target video according to different preset frame intervals to obtain a plurality of frame image sets, where each frame image set includes multiple frames of images; and determining a plurality of preset dynamic covers according to the frame image sets, wherein the frame image sets correspond to the preset dynamic covers one by one.

Optionally, the video click rate prediction model is obtained by training in the following manner: acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively; and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

By adopting the device, a plurality of preset dynamic covers of the target video are obtained, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; the method comprises the steps of estimating the click rate of a target video, determining a target dynamic cover corresponding to the target video from a plurality of preset dynamic covers according to the estimated click rate, determining the mode of the target dynamic cover according to the video click rate, and acquiring a part which is more attractive to a user to click in the video.

Fig. 5 is a block diagram illustrating an electronic device 500 in accordance with an example embodiment. As shown in fig. 5, the electronic device 500 may include: a processor 501 and a memory 502. The electronic device 500 may also include one or more of a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.

The processor 501 is configured to control the overall operation of the electronic device 500 to complete all or part of the steps in the above method for determining a video dynamic cover page. The memory 502 is used to store various types of data to support operation at the electronic device 500, such as instructions for any application or method operating on the electronic device 500 and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 503 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 502 or transmitted through the communication component 505. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 504 provides an interface between the processor 501 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 505 may thus comprise: Wi-Fi module, Bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above-described method of determining a dynamic cover of a video.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method of determining a video dynamic cover page is also provided. For example, the computer readable storage medium may be the memory 502 described above that includes program instructions executable by the processor 501 of the electronic device 500 to perform the method described above for determining a video motion cover.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned method of determining video dynamic covers when executed by the programmable apparatus.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for determining video motion covers, the method comprising:

acquiring a plurality of preset dynamic covers of a target video, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video;

taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover;

and determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.

2. The method of claim 1, wherein the determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click through rate comprises:

selecting a preset number of preset dynamic covers with the highest estimated click rate from a plurality of preset dynamic covers as candidate dynamic covers;

respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability;

aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold;

and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.

3. The method of claim 2, wherein after the step of using the candidate dynamic cover with the highest actual click-through rate as the target dynamic cover of the target video, the method further comprises:

increasing the preset exposure probability corresponding to the target dynamic cover to a target exposure probability;

and carrying out on-line exposure on the dynamic cover of the target according to the target exposure probability.

4. The method of claim 1, wherein the obtaining the plurality of preset dynamic covers of the target video comprises:

cutting the target video according to different preset time intervals to obtain a plurality of frame image sets, or cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, wherein the frame image sets comprise a plurality of frame images;

and determining a plurality of preset dynamic covers according to the plurality of frame image sets, wherein the frame image sets correspond to the preset dynamic covers one to one.

5. The method of any one of claims 1 to 4, wherein the video click through rate prediction model is trained by:

acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively;

and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.

6. The method of claim 5, wherein the preset exposure condition comprises:

the exposure times are greater than or equal to a second preset exposure time threshold; alternatively, the first and second electrodes may be,

the exposure time is greater than or equal to a preset exposure time threshold.

7. An apparatus for determining video motion covers, the apparatus comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of preset dynamic covers of a target video, and the preset dynamic covers comprise a plurality of frames of images extracted from the target video;

the first determining module is used for taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover;

and the second determining module is used for determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.

8. The apparatus of claim 7, wherein the second determining module is configured to select a preset number of the preset dynamic covers with the highest estimated click rate from a plurality of the preset dynamic covers as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.