CN111984821A - Method and device for determining dynamic cover of video, storage medium and electronic equipment - Google Patents

Method and device for determining dynamic cover of video, storage medium and electronic equipment Download PDF

Info

Publication number
CN111984821A
CN111984821A CN202010575535.7A CN202010575535A CN111984821A CN 111984821 A CN111984821 A CN 111984821A CN 202010575535 A CN202010575535 A CN 202010575535A CN 111984821 A CN111984821 A CN 111984821A
Authority
CN
China
Prior art keywords
preset
dynamic
video
target
covers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010575535.7A
Other languages
Chinese (zh)
Inventor
郑多如
彭冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hanhai Information Technology Shanghai Co Ltd
Original Assignee
Hanhai Information Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hanhai Information Technology Shanghai Co Ltd filed Critical Hanhai Information Technology Shanghai Co Ltd
Priority to CN202010575535.7A priority Critical patent/CN111984821A/en
Publication of CN111984821A publication Critical patent/CN111984821A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/745Browsing; Visualisation therefor the internal structure of a single video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to a method, a device, a storage medium and an electronic device for determining video dynamic covers, wherein a plurality of preset dynamic covers of a target video are obtained, and the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover; and then determining a target dynamic cover corresponding to the target video from the preset dynamic covers according to the estimated click rate, determining a plurality of key frame images serving as the target dynamic cover according to the video click rate, and acquiring a part which is more attractive to click in the video, so that the user click rate of the target video can be remarkably improved after the target dynamic cover is used as the cover of the target video.

Description

Method and device for determining dynamic cover of video, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of video cover selection, and in particular, to a method and an apparatus for determining a dynamic cover of a video, a storage medium, and an electronic device.
Background
In the scene of the information flow recommendation system, videos have entertainment and content richness compared with pictures and characters, video key frames are used as an abstract of a video to embody the content of the video in a simplified mode, and excellent video key frames are extracted to be used as a dynamic video cover to attract users to Click, so that the improvement of CTR (Click-Through-Rate) of information flow is brought.
In the related art, target detection and behavior recognition are performed based on a deep learning model, a video key frame set with strong purposiveness can be obtained to serve as a dynamic cover of a video, for example, modes of detecting human bodies, gestures and the like in the video are adopted, but the method is applicable to narrower scenes due to the fact that the purposiveness is too strong, in addition, training data of the target detection and behavior recognition can be obtained only by needing a large amount of manual labeling, and therefore the obtaining cost of sample data is increased.
Disclosure of Invention
The invention aims to provide a method, a device, a storage medium and an electronic device for determining a video dynamic cover.
In a first aspect, the present disclosure provides a method of determining a video motion cover, the method comprising: acquiring a plurality of preset dynamic covers of a target video, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover; and determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.
Optionally, the determining, according to the estimated click rate, a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers includes: selecting a preset number of preset dynamic covers with the highest estimated click rate from a plurality of preset dynamic covers as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.
Optionally, after the candidate dynamic cover with the highest actual click rate is used as the target dynamic cover of the target video, the method further includes: increasing the preset exposure probability corresponding to the target dynamic cover to a target exposure probability; and carrying out on-line exposure on the dynamic cover of the target according to the target exposure probability.
Optionally, the obtaining of the plurality of preset dynamic covers of the target video includes: cutting the target video according to different preset time intervals to obtain a plurality of frame image sets, or cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, wherein the frame image sets comprise a plurality of frame images; and determining a plurality of preset dynamic covers according to the plurality of frame image sets, wherein the frame image sets correspond to the preset dynamic covers one to one.
Optionally, the video click rate pre-estimation model is obtained by training in the following manner: acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively; and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.
Optionally, the preset exposure condition includes: the exposure times are greater than or equal to a second preset exposure time threshold; or the exposure time is greater than or equal to a preset exposure time threshold.
In a second aspect, there is provided an apparatus for determining video motion covers, the apparatus comprising: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of preset dynamic covers of a target video, and the preset dynamic covers comprise a plurality of frames of images extracted from the target video; the first determining module is used for taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; and the second determining module is used for determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.
Optionally, the second determining module is configured to select, from the plurality of preset dynamic covers, a preset number of preset dynamic covers with a highest estimated click rate as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.
Optionally, the apparatus further comprises: the probability adjusting module is used for increasing the preset exposure probability corresponding to the target dynamic cover to a target exposure probability; and the exposure module is used for carrying out online exposure on the dynamic cover of the target according to the target exposure probability.
Optionally, the obtaining module is configured to cut the target video according to different preset time intervals to obtain a plurality of frame image sets, or cut the target video according to different preset frame intervals to obtain a plurality of frame image sets, where each frame image set includes multiple frames of images; and determining a plurality of preset dynamic covers according to the plurality of frame image sets, wherein the frame image sets correspond to the preset dynamic covers one to one.
Optionally, the video click rate pre-estimation model is obtained by training in the following manner: acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively; and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.
Optionally, the preset exposure condition includes: the exposure times are greater than or equal to a second preset exposure time threshold; or the exposure time is greater than or equal to a preset exposure time threshold.
In a third aspect, a computer readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method according to the first aspect of the disclosure.
In a fourth aspect, an electronic device is provided, comprising: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.
According to the technical scheme, a plurality of preset dynamic covers of a target video are obtained, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover; the method for determining the target dynamic cover corresponding to the target video from the preset dynamic covers according to the estimated click rate can obtain the part which is more attractive to the user to click in the video according to the mode for determining the target dynamic cover from the preset dynamic covers of the video, so that the user click rate of the target video can be obviously improved after the target dynamic cover is used as the cover of the target video, and in addition, compared with the modes of target detection and behavior identification with strong purpose, the mode for determining the target dynamic cover according to the video click rate has strong generalization and wider application scenes.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow diagram illustrating a first method of determining video motion covers in accordance with one illustrative embodiment;
FIG. 2 is a flow diagram illustrating a second method of determining video motion covers in accordance with one illustrative embodiment;
FIG. 3 is a block diagram illustrating a first apparatus for determining video motion covers in accordance with one illustrative embodiment;
FIG. 4 is a block diagram illustrating a second apparatus for determining video motion covers in accordance with an exemplary embodiment;
fig. 5 is a block diagram illustrating a structure of an electronic device according to an example embodiment.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Firstly, introduction is made to an application scene of the present disclosure, the present disclosure is mainly applied to a scene of selecting a dynamic cover for a video in an information flow recommendation system, and in the information flow recommendation system, the video is more entertaining and richer in content than pictures and characters, for example, an exposure proportion of the video in the information flow recommendation system of the mass opinion application software is close to 20%, a video key frame is used as an abstract of the video to embody the content of the video in a simplified manner, and an excellent video key frame extracted as a video dynamic cover can attract a user to click more certainly, so that the promotion of the information flow CTR is brought.
In the related art, one way is to select a video key frame as a dynamic cover of a video by using a preset rule, for example, the video from 1 st second to 3 rd second is selected, but the rule formula for extracting the key frame is not necessarily the optimal solution for extracting the key frame, and potential excellent key frames are lost; the other mode is that a mathematical model for extracting key frames is defined based on topological potential and norm, the model is used for calculating the score of a key frame set, and the key frame set with the highest score is used as a dynamic cover of the video, but the video content is rich and all contents in one video cannot be described through the mathematical model at all, so that the mode does not well utilize rich information in the video; in addition, in the related art, an information quantity index of a video key frame is defined based on adaptive clustering, and then a candidate key frame set is defined according to the information quantity index, but the problem of the clustering mode is how to define the information quantity index, the definition of the index is generally characterized by some simple features, such as hue, saturation, brightness, texture, inter-frame similarity and the like, but the features cannot reflect more abstract contents of the video; in a word, the video key frames acquired by the method are used as the dynamic cover of the video, so that rich information in the video cannot be well embodied, the user click rate of the video is influenced, in order to improve the user click rate of the video, in the prior art, target detection and behavior recognition are performed based on a deep learning model, a video key frame set with high purposiveness can be acquired as the dynamic cover of the video, for example, human bodies, gestures and the like in the video are detected, the user click rate of the video can be improved to a certain extent by the method, but the method is too strong in purposiveness, the applicable scene is narrower, in addition, training data of target detection and behavior recognition can be acquired only by a large amount of manual labels, and the acquisition cost of sample data is increased.
In order to solve the existing problems, the present disclosure provides a method, an apparatus, a storage medium, and an electronic device for determining a dynamic cover of a video, which may first obtain a plurality of preset dynamic covers of a target video (i.e., a video of a dynamic cover to be determined), where the preset dynamic covers include a plurality of frames of images extracted from the target video; then, taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; therefore, the target dynamic cover corresponding to the target video can be determined from the preset dynamic covers according to the estimated click rate, the method can obtain the part which attracts the user to click in the video according to the mode of determining the target dynamic cover according to the video click rate, therefore, the user click rate of the target video can be obviously improved after the target dynamic cover is used as the cover of the target video, and in addition, compared with the modes of target detection and behavior identification with strong purpose, the mode of determining the target dynamic cover according to the video click rate has strong generalization and wider application scene, and the sample data based on the training label is also very easy to obtain while manual marking is not needed, so that the acquisition cost of the sample data is saved.
Specific embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
FIG. 1 is a flow diagram illustrating a method for determining video motion covers, as shown in FIG. 1, according to an exemplary embodiment, including the steps of:
in step S101, a plurality of preset dynamic covers of a target video are obtained, where the preset dynamic covers include a plurality of frames of images extracted from the target video.
The target video is a video of a dynamic cover to be determined, and the preset dynamic cover comprises a plurality of frames of images selected from the target video according to a preset rule.
In this step, the target video may be cut according to different preset time intervals to obtain a plurality of frame image sets, for example, a 0-2 second video segment in the target video is a first frame image set, a 1-3 second video segment in the target video is a second frame image set, a 2-4 second video segment in the target video is a third frame image set, and the target video is cut into a plurality of frame image sets by the analogy; or cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, for example, the 1 st to 4 th frame images in the target video form a first frame image set, the 2 nd to 5 th frame images in the target video form a second frame image set, the 3 rd to 6 th frame images in the target video form a third frame image set, and so on, cutting the target video into a plurality of frame image sets; the frame image set comprises a plurality of frame images, and then a plurality of preset dynamic covers can be determined according to the frame image sets, wherein the frame image sets correspond to the preset dynamic covers one by one.
In step S102, a plurality of preset dynamic covers are used as input of the pre-trained video click rate estimation model, and an estimated click rate corresponding to each preset dynamic cover is obtained.
The video click rate prediction model may include an S3DG model for video modeling, and the S3DG model is a modified version model in which a gate structure is added to an S3D (separable 3D CNN) model.
In step S103, a target dynamic cover corresponding to the target video is determined from the plurality of preset dynamic covers according to the estimated click rate.
The target dynamic cover may include a plurality of dynamic covers with the highest user click rate in the preset dynamic cover.
In this step, a preset number of the preset dynamic covers with the highest estimated click rate can be selected from a plurality of the preset dynamic covers as candidate dynamic covers; then, performing on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video, wherein the preset exposure probability is used for representing the exposure possibility of each candidate dynamic cover, and each candidate dynamic cover has the preset exposure probability corresponding to the candidate dynamic cover.
By adopting the method, a plurality of preset dynamic covers of the target video are obtained, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; the method comprises the steps of estimating the click rate of a target video, determining a target dynamic cover corresponding to the target video from a plurality of preset dynamic covers according to the estimated click rate, determining the mode of the target dynamic cover according to the video click rate, and acquiring a part which is more attractive to a user to click in the video.
FIG. 2 is a flow diagram illustrating a method of determining video motion covers, as shown in FIG. 2, according to an exemplary embodiment, including the steps of:
in step S201, the target video is cut according to different preset time intervals to obtain a plurality of frame image sets, or the target video is cut according to different preset frame intervals to obtain a plurality of frame image sets.
Wherein the frame image set comprises a plurality of frame images.
In the process of cutting the target video according to different preset time intervals to obtain a plurality of frame image sets, the target video may be divided into different video segments according to different preset time intervals, and one video segment is regarded as one frame image set, for example, the 0 th to 2 th second video segments in the target video are the first frame image set, the 1 st to 3 th second video segments in the target video are the second frame image set, the 2 nd to 4 th second video segments in the target video are the third frame image set, and so on, the target video is cut into a plurality of frame image sets.
In the process of cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, the target video may be divided into different video segments according to different preset frame intervals, and one video segment is regarded as one frame image set, for example, the video segments corresponding to the 1 st to 4 th frame images in the target video form a first frame image set, the video segments corresponding to the 2 nd to 5 th frame images in the target video form a second frame image set, the video segments corresponding to the 3 rd to 6 th frame images in the target video form a third frame image set, and so on, the target video is cut into a plurality of frame image sets.
In step S202, a plurality of preset dynamic covers are determined according to the plurality of image sets.
The preset dynamic cover includes multiple frames of images selected from the target video according to a preset rule, the frame image sets correspond to the preset dynamic covers one to one, for example, the frame image set corresponding to the 0 th to 2 th seconds of video in the target video is a first preset dynamic cover, the frame image set corresponding to the 1 st to 3 th seconds of video in the target video is a second preset dynamic cover, the frame image set corresponding to the 2 nd to 4 th seconds of video in the target video is a third preset dynamic cover, and so on, multiple preset dynamic covers corresponding to the target video can be obtained, which is only an example and is not limited by the present disclosure.
In step S203, a plurality of preset dynamic covers are used as input of the pre-trained video click rate estimation model, so as to obtain the estimated click rate corresponding to each preset dynamic cover.
The video click rate prediction model may include an S3DG model for video modeling, and the S3DG model is a modified version of adding a gate structure to the S3D model.
It should be noted that, the estimated click rate of the whole dynamic cover composed of multiple frames of images can be obtained by using the video click rate estimation model, so that the target dynamic cover of the target video is determined by taking the estimated click rate as a guide, and thus, after the target video is exposed online by taking the target dynamic cover as the dynamic cover of the target video, the user click rate of the target video can be improved.
In addition, the video click rate estimation model can be obtained by training in the following way: firstly, acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates respectively corresponding to the exposed dynamic covers, for example, acquiring the exposed dynamic covers and the user click rates from an information flow exposure database corresponding to an information flow recommendation system; then, the exposed dynamic cover and the user click rate are used as training samples to perform model training to obtain the video click rate estimation model, specifically, the exposed dynamic cover can be used as an input sample during model training, the user click rate is used as an output sample (i.e., a training label) during model training to perform model training, a loss function during model training can adopt an MSE (Mean Squared Error) function, specifically, a relevant description in the prior art can be referred to in a model training process, and details are not repeated in the present disclosure.
The preset exposure condition may include that the exposure time is greater than or equal to a second preset exposure time threshold; alternatively, the exposure time period is greater than or equal to the preset exposure time threshold, for example, an exposed dynamic cover with an exposure time number greater than or equal to 1000 times may be selected as the input sample of the model training, or an exposed dynamic cover with an exposure time period greater than or equal to 5 days may be selected as the input sample of the model training.
In step S204, a preset number of the preset dynamic covers with the highest estimated click rate are selected from the preset dynamic covers as candidate dynamic covers.
After the candidate dynamic covers are obtained by screening according to the estimated click rate, each candidate dynamic cover can enter the information flow recommendation system as a video creative cover, and then an explore and explore system can be adopted to enable each candidate dynamic cover to obtain sufficient exposure, and compete to obtain a target dynamic cover, wherein the target dynamic cover can include a plurality of dynamic covers with the highest click rate among the preset dynamic covers, and in this embodiment, the target dynamic cover can be determined by executing steps S205 to S207.
In step S205, each of the candidate dynamic covers is exposed on-line according to a preset exposure probability.
The preset exposure probability is used to represent the exposure possibility of each candidate dynamic cover, and each candidate dynamic cover has the preset exposure probability corresponding to the candidate dynamic cover.
In a possible implementation manner of this step, for each candidate dynamic cover, a probability interval corresponding to the candidate dynamic cover may be determined according to the preset exposure probability corresponding to the candidate dynamic cover, and before performing online exposure on the target video each time, any number between 0 and 1 may be randomly selected, and a probability interval where the currently randomly selected number is located is determined as a target probability interval, and then according to a corresponding relationship between the probability interval and the candidate dynamic cover, it is determined that the candidate dynamic cover corresponding to the target probability interval is a cover used when the target video is exposed this time.
For example, if 4 candidate dynamic covers, which are an a dynamic cover, a B dynamic cover, a C dynamic cover, and a D dynamic cover, are determined after step S204 is performed, the preset exposure probability corresponding to the a dynamic cover may be set to 0.25, the preset exposure probability corresponding to the B dynamic cover may be set to 0.25, the preset exposure probability corresponding to the C dynamic cover may be set to 0.25, and the preset exposure probability corresponding to the D dynamic cover may be set to 0.25, so that in one possible implementation, the probability interval corresponding to the a dynamic cover may be set to 0 to 0.25, the probability interval corresponding to the B dynamic cover may be 0.25 to 0.5, the probability interval corresponding to the C dynamic cover may be 0.5 to 0.75, and the probability interval corresponding to the D dynamic cover may be 0.75 to 1, any number between 0 and 1 may be randomly selected before the target video is exposed, and if the number falls between 0 and 0.25, the a dynamic cover may be selected as the cover of the target video that is exposed, if the number falls between 0.25 and 0.5, a B dynamic cover may be selected as a cover of the target video of the current exposure, if the number falls between 0.5 and 0.75, a C dynamic cover may be selected as a cover of the target video of the current exposure, and if the number falls between 0.75 and 1, a D dynamic cover may be selected as a cover of the target video of the current exposure, which is only an example and is not limited in this disclosure.
In step S206, for each candidate dynamic cover, the actual click rate of the candidate dynamic cover when the exposure number reaches the first preset exposure number threshold is obtained.
The first preset exposure time threshold may be empirically set as the exposure time when the actual click rate reaches the convergence state, for example, the first preset exposure time threshold may be set to 300 times.
In step S207, the candidate dynamic cover with the highest actual click rate is used as the target dynamic cover of the target video.
Continuing with the example in step S205, the candidate dynamic cover includes A, B, C, D four dynamic covers, after step S206 is executed, the actual click rate when the exposure number of the candidate dynamic cover a reaches 300 is 0.65, the actual click rate when the exposure number of the candidate dynamic cover B reaches 300 is 0.32, the actual click rate when the exposure number of the candidate dynamic cover C reaches 300 is 0.15, and the actual click rate when the exposure number of the candidate dynamic cover D reaches 300 is 0.05, at this time, the candidate dynamic cover a with the highest actual click rate may be determined as the target dynamic cover.
In addition, in order to obtain the highest possible exposure gain after the target video is exposed and improve the user click rate of the target video, in a possible implementation manner, after the candidate dynamic cover with the highest actual click rate is used as the target dynamic cover of the target video, the preset exposure probability corresponding to the target dynamic cover can be increased to the target exposure probability, and then the target dynamic cover is subjected to online exposure according to the target exposure probability; correspondingly, the preset exposure probability corresponding to other candidate dynamic covers (other candidate dynamic covers except the target dynamic cover) can be reduced, and the other candidate dynamic covers are subjected to online exposure according to the reduced preset exposure probability.
Specifically, the preset exposure probability corresponding to the target dynamic cover may be increased to the target exposure probability according to a preset probability adjustment policy, the preset exposure probabilities corresponding to other candidate dynamic covers may be decreased, and the probability interval corresponding to each candidate dynamic cover may be further adjusted accordingly, where the preset probability adjustment policy may be set according to actual application requirements, and the disclosure does not limit this.
Illustratively, also taking the candidate dynamic covers including A, B, C, D four dynamic covers, and in the initial state, the preset exposure probabilities corresponding to A, B, C, D four dynamic covers are all 0.25, if it is determined that the target dynamic cover is the candidate dynamic cover a, the preset exposure probability corresponding to the candidate dynamic cover a can be increased from 0.25 to 0.7 (i.e. the target exposure probability), and the preset exposure probabilities of B, C, D three candidate dynamic covers can be decreased from 0.25 to 0.1, so that the candidate dynamic cover a of the target video can be online exposed with a probability of 0.7, the candidate dynamic covers B, C, D can be online exposed with a probability of 0.1, and accordingly, after obtaining the adjusted preset exposure probabilities, the probability interval corresponding to the dynamic cover a can be adjusted from 0 to 0.25 to 0.7, the probability interval corresponding to the dynamic cover B can be adjusted from 0.25 to 0.5 to 0.7 to 0.8, the probability interval corresponding to the C dynamic cover is adjusted from 0.5 to 0.75 to 0.8 to 0.9, and the probability interval corresponding to the D dynamic cover is adjusted from 0.75 to 1 to 0.9 to 1, so that if the number randomly selected from 0 to 1 falls between 0 and 0.7, the a dynamic cover can be selected as the cover of the target video of the current exposure, if the number falls between 0.7 and 0.8, the B dynamic cover can be selected as the cover of the target video of the current exposure, if the number falls between 0.8 and 0.9, the C dynamic cover can be selected as the cover of the target video of the current exposure, if the number falls between 0.9 and 1, the D dynamic cover can be selected as the cover of the target video of the current exposure, so that the target video can obtain the highest possible exposure yield after exposure, and the user click rate during the exposure on the target video line can be increased, which above examples are only for illustration, the present disclosure is not limited thereto.
By adopting the method, a plurality of preset dynamic covers of the target video of which the dynamic covers are to be determined can be obtained firstly, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; then, taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; therefore, the target dynamic cover corresponding to the target video can be determined from the preset dynamic covers according to the estimated click rate, the method can obtain the part which attracts the user to click in the video according to the mode of determining the target dynamic cover according to the video click rate, therefore, the user click rate of the target video can be obviously improved after the target dynamic cover is used as the cover of the target video, and in addition, compared with the modes of target detection and behavior identification with strong purpose, the mode of determining the target dynamic cover according to the video click rate has strong generalization and wider application scene, and the sample data based on the training label is also very easy to obtain while manual marking is not needed, so that the acquisition cost of the sample data is saved.
Fig. 3 is a block diagram illustrating an apparatus for determining video motion covers, as shown in fig. 3, according to an exemplary embodiment, comprising:
an obtaining module 301, configured to obtain multiple preset dynamic covers of a target video, where the preset dynamic covers include multiple frames of images extracted from the target video;
a first determining module 302, configured to use a plurality of preset dynamic covers as an input of a pre-trained video click rate estimation model, so as to obtain an estimated click rate corresponding to each preset dynamic cover;
the second determining module 303 is configured to determine a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.
Optionally, the second determining module 303 is configured to select a preset number of the preset dynamic covers with the highest estimated click rate from a plurality of the preset dynamic covers as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.
Optionally, fig. 4 is a block diagram of an apparatus for determining a video dynamic cover according to the embodiment shown in fig. 3, and as shown in fig. 4, the apparatus further includes:
a probability adjusting module 304, configured to increase the preset exposure probability corresponding to the target dynamic cover to a target exposure probability;
and an exposure module 305, configured to perform online exposure on the target dynamic cover according to the target exposure probability.
Optionally, the obtaining module 301 is configured to cut the target video according to different preset time intervals to obtain a plurality of frame image sets, or cut the target video according to different preset frame intervals to obtain a plurality of frame image sets, where each frame image set includes multiple frames of images; and determining a plurality of preset dynamic covers according to the frame image sets, wherein the frame image sets correspond to the preset dynamic covers one by one.
Optionally, the video click rate prediction model is obtained by training in the following manner: acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively; and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.
Optionally, the preset exposure condition includes: the exposure times are greater than or equal to a second preset exposure time threshold; or the exposure time is greater than or equal to a preset exposure time threshold.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
By adopting the device, a plurality of preset dynamic covers of the target video are obtained, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video; taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover; the method comprises the steps of estimating the click rate of a target video, determining a target dynamic cover corresponding to the target video from a plurality of preset dynamic covers according to the estimated click rate, determining the mode of the target dynamic cover according to the video click rate, and acquiring a part which is more attractive to a user to click in the video.
Fig. 5 is a block diagram illustrating an electronic device 500 in accordance with an example embodiment. As shown in fig. 5, the electronic device 500 may include: a processor 501 and a memory 502. The electronic device 500 may also include one or more of a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.
The processor 501 is configured to control the overall operation of the electronic device 500 to complete all or part of the steps in the above method for determining a video dynamic cover page. The memory 502 is used to store various types of data to support operation at the electronic device 500, such as instructions for any application or method operating on the electronic device 500 and application-related data, such as contact data, messaging, pictures, audio, video, and so forth. The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 503 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 502 or transmitted through the communication component 505. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 504 provides an interface between the processor 501 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 505 may thus comprise: Wi-Fi module, Bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic Device 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above-described method of determining a dynamic cover of a video.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method of determining a video dynamic cover page is also provided. For example, the computer readable storage medium may be the memory 502 described above that includes program instructions executable by the processor 501 of the electronic device 500 to perform the method described above for determining a video motion cover.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned method of determining video dynamic covers when executed by the programmable apparatus.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A method for determining video motion covers, the method comprising:
acquiring a plurality of preset dynamic covers of a target video, wherein the preset dynamic covers comprise a plurality of frames of images extracted from the target video;
taking a plurality of preset dynamic covers as input of a pre-trained video click rate estimation model to obtain an estimated click rate corresponding to each preset dynamic cover;
and determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.
2. The method of claim 1, wherein the determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click through rate comprises:
selecting a preset number of preset dynamic covers with the highest estimated click rate from a plurality of preset dynamic covers as candidate dynamic covers;
respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability;
aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold;
and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.
3. The method of claim 2, wherein after the step of using the candidate dynamic cover with the highest actual click-through rate as the target dynamic cover of the target video, the method further comprises:
increasing the preset exposure probability corresponding to the target dynamic cover to a target exposure probability;
and carrying out on-line exposure on the dynamic cover of the target according to the target exposure probability.
4. The method of claim 1, wherein the obtaining the plurality of preset dynamic covers of the target video comprises:
cutting the target video according to different preset time intervals to obtain a plurality of frame image sets, or cutting the target video according to different preset frame intervals to obtain a plurality of frame image sets, wherein the frame image sets comprise a plurality of frame images;
and determining a plurality of preset dynamic covers according to the plurality of frame image sets, wherein the frame image sets correspond to the preset dynamic covers one to one.
5. The method of any one of claims 1 to 4, wherein the video click through rate prediction model is trained by:
acquiring a plurality of exposed dynamic covers meeting preset exposure conditions and user click rates corresponding to the exposed dynamic covers respectively;
and performing model training by taking the exposed dynamic cover and the user click rate as training samples to obtain the video click rate estimation model.
6. The method of claim 5, wherein the preset exposure condition comprises:
the exposure times are greater than or equal to a second preset exposure time threshold; alternatively, the first and second electrodes may be,
the exposure time is greater than or equal to a preset exposure time threshold.
7. An apparatus for determining video motion covers, the apparatus comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a plurality of preset dynamic covers of a target video, and the preset dynamic covers comprise a plurality of frames of images extracted from the target video;
the first determining module is used for taking a plurality of preset dynamic covers as the input of a pre-trained video click rate estimation model to obtain the estimation click rate corresponding to each preset dynamic cover;
and the second determining module is used for determining a target dynamic cover corresponding to the target video from the plurality of preset dynamic covers according to the estimated click rate.
8. The apparatus of claim 7, wherein the second determining module is configured to select a preset number of the preset dynamic covers with the highest estimated click rate from a plurality of the preset dynamic covers as candidate dynamic covers; respectively carrying out on-line exposure on each candidate dynamic cover according to a preset exposure probability; aiming at each candidate dynamic cover, acquiring the actual click rate of the candidate dynamic cover when the exposure times reach a first preset exposure time threshold; and taking the candidate dynamic cover with the highest actual click rate as a target dynamic cover of the target video.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.
CN202010575535.7A 2020-06-22 2020-06-22 Method and device for determining dynamic cover of video, storage medium and electronic equipment Pending CN111984821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010575535.7A CN111984821A (en) 2020-06-22 2020-06-22 Method and device for determining dynamic cover of video, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010575535.7A CN111984821A (en) 2020-06-22 2020-06-22 Method and device for determining dynamic cover of video, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN111984821A true CN111984821A (en) 2020-11-24

Family

ID=73442262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010575535.7A Pending CN111984821A (en) 2020-06-22 2020-06-22 Method and device for determining dynamic cover of video, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111984821A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528075A (en) * 2020-12-02 2021-03-19 北京奇艺世纪科技有限公司 Video cover generation method and device
CN112689187A (en) * 2020-12-17 2021-04-20 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN112800276A (en) * 2021-01-20 2021-05-14 北京有竹居网络技术有限公司 Video cover determination method, device, medium and equipment
CN113157973A (en) * 2021-03-29 2021-07-23 广州市百果园信息技术有限公司 Method, device, equipment and medium for generating cover
CN113656642A (en) * 2021-08-20 2021-11-16 北京百度网讯科技有限公司 Cover image generation method, device, equipment, storage medium and program product
CN114996553A (en) * 2022-05-13 2022-09-02 阿里巴巴(中国)有限公司 Dynamic video cover generation method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104244113A (en) * 2014-10-08 2014-12-24 中国科学院自动化研究所 Method for generating video abstract on basis of deep learning technology
CN107958030A (en) * 2017-11-17 2018-04-24 北京奇虎科技有限公司 Video front cover recommended models optimization method and device
CN107995536A (en) * 2017-11-28 2018-05-04 百度在线网络技术(北京)有限公司 A kind of method, apparatus, equipment and computer-readable storage medium for extracting video preview
CN108650524A (en) * 2018-05-23 2018-10-12 腾讯科技(深圳)有限公司 Video cover generation method, device, computer equipment and storage medium
CN109165301A (en) * 2018-09-13 2019-01-08 北京字节跳动网络技术有限公司 Video cover selection method, device and computer readable storage medium
US20190163336A1 (en) * 2017-11-28 2019-05-30 Baidu Online Network Technology (Beijing) Co., Ltd. Video displaying method and apparatus, device and computer storage medium
CN109862432A (en) * 2019-01-31 2019-06-07 厦门美图之家科技有限公司 Clicking rate prediction technique and device
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
CN110263213A (en) * 2019-05-22 2019-09-20 腾讯科技(深圳)有限公司 Video pushing method, device, computer equipment and storage medium
US20190303682A1 (en) * 2018-03-27 2019-10-03 International Business Machines Corporation Automatic video summary generation
CN110516749A (en) * 2019-08-29 2019-11-29 网易传媒科技(北京)有限公司 Model training method, method for processing video frequency, device, medium and calculating equipment
CN110798752A (en) * 2018-08-03 2020-02-14 北京京东尚科信息技术有限公司 Method and system for generating video summary
CN111277892A (en) * 2020-01-20 2020-06-12 北京百度网讯科技有限公司 Method, apparatus, server and medium for selecting video clip

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104244113A (en) * 2014-10-08 2014-12-24 中国科学院自动化研究所 Method for generating video abstract on basis of deep learning technology
CN107958030A (en) * 2017-11-17 2018-04-24 北京奇虎科技有限公司 Video front cover recommended models optimization method and device
US20190163336A1 (en) * 2017-11-28 2019-05-30 Baidu Online Network Technology (Beijing) Co., Ltd. Video displaying method and apparatus, device and computer storage medium
CN107995536A (en) * 2017-11-28 2018-05-04 百度在线网络技术(北京)有限公司 A kind of method, apparatus, equipment and computer-readable storage medium for extracting video preview
US20190303682A1 (en) * 2018-03-27 2019-10-03 International Business Machines Corporation Automatic video summary generation
CN108650524A (en) * 2018-05-23 2018-10-12 腾讯科技(深圳)有限公司 Video cover generation method, device, computer equipment and storage medium
CN110798752A (en) * 2018-08-03 2020-02-14 北京京东尚科信息技术有限公司 Method and system for generating video summary
CN109165301A (en) * 2018-09-13 2019-01-08 北京字节跳动网络技术有限公司 Video cover selection method, device and computer readable storage medium
CN109862432A (en) * 2019-01-31 2019-06-07 厦门美图之家科技有限公司 Clicking rate prediction technique and device
CN110263213A (en) * 2019-05-22 2019-09-20 腾讯科技(深圳)有限公司 Video pushing method, device, computer equipment and storage medium
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
CN110516749A (en) * 2019-08-29 2019-11-29 网易传媒科技(北京)有限公司 Model training method, method for processing video frequency, device, medium and calculating equipment
CN111277892A (en) * 2020-01-20 2020-06-12 北京百度网讯科技有限公司 Method, apparatus, server and medium for selecting video clip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李润泽: "基于深度学习的视频封面提取算法", 中国优秀硕士学位论文全文数据库 信息科技辑, 15 August 2019 (2019-08-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528075A (en) * 2020-12-02 2021-03-19 北京奇艺世纪科技有限公司 Video cover generation method and device
CN112689187A (en) * 2020-12-17 2021-04-20 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN112800276A (en) * 2021-01-20 2021-05-14 北京有竹居网络技术有限公司 Video cover determination method, device, medium and equipment
CN112800276B (en) * 2021-01-20 2023-06-20 北京有竹居网络技术有限公司 Video cover determining method, device, medium and equipment
CN113157973A (en) * 2021-03-29 2021-07-23 广州市百果园信息技术有限公司 Method, device, equipment and medium for generating cover
CN113656642A (en) * 2021-08-20 2021-11-16 北京百度网讯科技有限公司 Cover image generation method, device, equipment, storage medium and program product
CN113656642B (en) * 2021-08-20 2024-05-28 北京百度网讯科技有限公司 Cover image generation method, device, apparatus, storage medium and program product
CN114996553A (en) * 2022-05-13 2022-09-02 阿里巴巴(中国)有限公司 Dynamic video cover generation method
WO2023217194A1 (en) * 2022-05-13 2023-11-16 阿里巴巴(中国)有限公司 Dynamic video cover generation method

Similar Documents

Publication Publication Date Title
CN111984821A (en) Method and device for determining dynamic cover of video, storage medium and electronic equipment
CN108694217B (en) Video label determination method and device
CN107797984B (en) Intelligent interaction method, equipment and storage medium
CN110909205B (en) Video cover determination method and device, electronic equipment and readable storage medium
CN110364146B (en) Speech recognition method, speech recognition device, speech recognition apparatus, and storage medium
US20170289619A1 (en) Method for positioning video, terminal apparatus and cloud server
CN110580290A (en) method and device for optimizing training set for text classification
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN110856037B (en) Video cover determination method and device, electronic equipment and readable storage medium
CN112860943A (en) Teaching video auditing method, device, equipment and medium
CN107193974B (en) Regional information determination method and device based on artificial intelligence
CN112533051A (en) Bullet screen information display method and device, computer equipment and storage medium
CN110751224A (en) Training method of video classification model, video classification method, device and equipment
CN110166802B (en) Bullet screen processing method and device and storage medium
CN111683274B (en) Bullet screen advertisement display method, device and equipment and computer readable storage medium
EP4239585A1 (en) Video loop recognition method and apparatus, computer device, and storage medium
CN112559800A (en) Method, apparatus, electronic device, medium, and product for processing video
CN111061867B (en) Text generation method, equipment, storage medium and device based on quality perception
CN110781960A (en) Training method, classification method, device and equipment of video classification model
CN111563161B (en) Statement identification method, statement identification device and intelligent equipment
CN116977774A (en) Image generation method, device, equipment and medium
CN113704509B (en) Multimedia recommendation method and device, electronic equipment and storage medium
CN117235371A (en) Video recommendation method, model training method and device
CN113377972A (en) Multimedia content recommendation method and device, computing equipment and storage medium
CN112509570B (en) Voice signal processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination