CN115086709A

CN115086709A - Dynamic cover setting method and system

Info

Publication number: CN115086709A
Application number: CN202110258999.XA
Authority: CN
Inventors: 时英选
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2022-09-20
Also published as: WO2022188563A1

Abstract

The embodiment of the present application provides a method for setting a dynamic cover, including: determining a target video clip from a video file; and extracting the target video clip, and obtaining a dynamic cover picture of the video file according to the target video clip. The embodiment of the application has the following advantages: firstly, the method comprises the following steps: due to the fact that the dynamic cover map is adopted, the dynamic display effect of the dynamic cover map enables the visual effect to be good, the dynamic cover map is colorful in vision, the visual enjoyment experience and interest are improved, the attention of other users is attracted, and the click rate of the video file is improved. Secondly, the method comprises the following steps: the target video clip is from the video file, so that the dynamic cover page image and the video file have strong correlation, the experience feeling of a user when browsing and selecting the video can be optimized, the user is prevented from mistakenly clicking and watching the video content which is not in line with the expectation of the user due to the cover page which is not in line with the video content, and the data traffic is prevented from being wasted.

Description

Dynamic cover setting method and system

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a dynamic cover setting method, a dynamic cover setting system, computer equipment and a computer readable storage medium.

Background

With the development of multimedia technology, network platforms such as Bilibilli and the like gradually develop a Content production mode in the form of UGC (User Generated Content). UGC is the initiative for each user to present their original content (e.g., video files) to other users through an internet platform. UGC enables people to be content generators, and therefore massive videos can be rapidly produced to enrich the spiritual life of people. However, the huge amount of video also causes the video files of each user to be easily buried in the huge amount of video. Therefore, when a user publishes a video file, a video cover is usually set for the video file published by the user, so that other users can more intuitively know the content in the video file to improve the click rate.

Disclosure of Invention

An object of an embodiment of the present application is to provide a method, a system, a computer device, and a computer-readable storage medium for setting a dynamic cover, which are used to solve the following problems: the cover in the prior art has poor experience and low click rate.

One aspect of an embodiment of the present application provides a method for dynamic cover setting, the method including determining a target video clip from a video file; and extracting the target video clip, and obtaining a dynamic cover picture of the video file according to the target video clip.

Optionally, the determining a target video segment from a video file includes: acquiring a plurality of barrages of the video file, wherein each barrage is associated with a time point on a time axis of the video file; acquiring bullet screen density distribution on the time axis according to the time point on the time axis associated with each bullet screen; screening out one or more video clips with the highest bullet screen density in the video files according to the bullet screen density distribution; and determining the one or more video clips or the one or more video clips carrying the barrage as the target video clip.

Optionally, the obtaining of the multiple barrages of the video file includes: acquiring all barrages of the video files; filtering a plurality of invalid bullet screens from all the bullet screens according to the bullet screen contents of all the bullet screens to obtain the bullet screens; wherein the plurality of ineffective bullet screens comprises: and the bullet screen content is irrelevant to the video content of the video file, and/or the bullet screen content is irrelevant to the video picture of the video file.

Optionally, determining the target video segment from the video file includes: dividing the video file into M video segments, wherein M is a positive integer greater than 1; performing quality scoring on each video clip; and determining the target video clip from the M video clips according to the quality scores of the video clips.

Optionally, the quality scoring of each video segment includes: according to the bullet screen characteristic information of each video clip and/or the frame characteristic information of each frame in each video clip, performing quality scoring on each video clip; wherein, the bullet screen characteristic information comprises bullet screen density.

Optionally, the quality scoring of each video segment includes: extracting frame feature information of each frame in the ith video clip, wherein i is more than or equal to 1 and less than or equal to M, and i is a positive integer; and according to the frame characteristic information of each frame in the ith video clip, carrying out quality scoring on the ith video clip.

Optionally, the quality scoring the ith video segment according to the frame feature information of each frame in the ith video segment includes: according to the picture characteristic information and the frame characteristic information of each frame, performing quality grading on the ith video clip; the picture characteristic information is characteristic information of a target static picture, and the target static picture comprises a static cover picture of the video file.

Optionally, the quality scoring the ith video segment according to the picture feature information and the frame feature information of each frame includes: sequentially inputting the frame characteristic information of each frame into an LSTM model according to the time sequence of the M frames to obtain M output vectors through the LSTM model, wherein the M output vectors correspond to the M frames one by one; performing convolution and pooling operation on a vector matrix formed by the M output vectors to obtain a first feature vector; obtaining a second feature vector according to the picture feature information; splicing the first feature vector and the second feature vector to obtain a spliced vector; and performing linear regression operation on the splicing vector to obtain a quality score corresponding to the ith video segment.

One aspect of an embodiment of the present application further provides a dynamic cover setting system, including: the determining module is used for determining a target video clip from a video file; and the setting module is used for extracting the target video clip and obtaining the dynamic cover picture of the video file according to the target video clip.

An aspect of the embodiments of the present application further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the dynamic cover setting method as described above.

An aspect of the embodiments of the present application further provides a computer-readable storage medium, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the dynamic cover setting method as described above.

The method, the system, the equipment and the computer readable storage medium for setting the dynamic cover page provided by the embodiment of the application can extract key or wonderful segments (namely target video segments) of a video file and obtain the dynamic cover page image according to the target video segments, thereby having the following advantages:

firstly: due to the fact that the dynamic cover map is adopted, the dynamic display effect of the dynamic cover map enables the visual effect to be good, the dynamic cover map is colorful in vision, the visual enjoyment experience and interest are improved, the attention of other users is attracted, and the click rate of the video file is improved.

Secondly, the method comprises the following steps: the target video clip is from the video file, so that the dynamic cover page image and the video file have strong correlation, the experience feeling of a user when browsing and selecting the video can be optimized, the user is prevented from mistakenly clicking and watching the video content which is not in line with the expectation of the user due to the cover page which is not in line with the video content, and the data traffic is prevented from being wasted.

Drawings

FIG. 1 schematically illustrates an application environment diagram of a dynamic cover setting method according to an embodiment of the present application;

FIG. 2 is a flow chart that schematically illustrates a method for dynamic cover setting, in accordance with a first embodiment of the present application;

FIG. 3 is a flowchart illustrating sub-steps of step S200 in FIG. 2;

FIG. 4 is a flowchart illustrating another sub-step of step S300 in FIG. 3;

FIG. 5 is an exemplary diagram of implementation of bullet screen screening;

FIG. 6 is a flowchart illustrating another sub-step of step S200 in FIG. 2;

FIG. 7 is a flowchart illustrating sub-steps of step S602 in FIG. 6;

FIG. 8 is a flowchart illustrating another sub-step of step S602 in FIG. 6;

FIG. 9 is a flowchart illustrating another sub-step of step S702 in FIG. 7;

FIG. 10 is a flowchart illustrating sub-steps of step S900 of FIG. 9;

FIG. 11 is an exemplary diagram of identifying a target video segment through artificial intelligence;

FIG. 12 schematically illustrates a block diagram of a dynamic cover setting system according to a second embodiment of the present application; and

fig. 13 schematically shows a hardware architecture diagram of a computer device suitable for implementing the dynamic cover setting method according to the third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the descriptions relating to "first", "second", etc. in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.

In the prior art, video covers have the following disadvantages:

firstly, the method comprises the following steps: all are static display, the visual effect is poor, and the vision looks monotonous and boring;

secondly, the method comprises the following steps: the video cover and the video content do not accord, and the situation often occurs in the contribution published by the cover party and the title party;

the above-mentioned drawbacks waste viewer time and degrade the video viewing experience, possibly making some video content click-through rates lower.

The present application provides several embodiments to address the above-mentioned deficiencies, and reference is made in detail to the following.

In the description of the present application, it should be understood that the numerical references before the steps do not identify the order of performing the steps, but merely serve to facilitate the description of the present application and to distinguish each step, and therefore should not be construed as limiting the present application.

The following are the term explanations of the present application:

LSTM (Long Short-Term Memory network) is one of Recurrent Neural Networks (Recurrent Neural Networks), and learns Long-Term dependency by introducing a Gate (Gate) mechanism to control circulation and loss of features.

Density distribution, also called probability density distribution, probability refers to the probability of an event occurring randomly. For example, for a uniform distribution function, the density distribution is equal to the probability of a segment (span of events) divided by the length of the segment.

The dynamic cover picture is a video clip comprising a plurality of frames.

The bullet screen is a subtitle which is popped up when a video is watched through a network and moves along a preset direction. Barrage has no fixed vocabulary in english, which is commonly called: comment, danmaku, barrage, bull screen, bull-screen comment, etc. The bullet screen allows a viewer to make comments or feel, but is different from a common video sharing website which is only displayed in a special comment area under a player, and can be displayed on a video picture in real time in a sliding subtitle mode, so that all viewers can notice the bullet screen. Some bullet screen systems can provide specific bullet screen forms by using a script language, such as appearance or disappearance of specific positions of bullet screens, control of bullet screen ejection speed, bullet screen positions and the like. In addition, the bullet screen which appears fixedly at the bottom or top of the picture can also be used as a subtitle of the subtitle-free video.

For example, each barrage may include the following information:

fig. 1 schematically shows an environment application diagram according to an embodiment of the present application. As shown in fig. 1:

the provider network 2 may be connected to a plurality of mobile terminals 6 through a network 4. The provider network 2 may provide content services.

The content services may include content streaming services such as internet protocol video streaming services. Content streaming services may be configured to distribute content via various transmission techniques. The content service may be configured to provide content such as video, audio, text data, combinations thereof, and the like. The content may include content streams (e.g., video streams, audio streams, information streams), content files (e.g., video files, audio files, text files), and/or other data.

Provider network 2 may implement a barrage service configured to allow users to comment and/or share comments associated with content, i.e., barrages. The barrage is presented on the same screen with the content. For example, the barrage may be displayed in an overlay over the content. The bullet screen may have an animation effect when displayed. For example, the barrage may be scrolled (e.g., right to left, left to right, top to bottom, bottom to top), and such animation effects may be implemented based on the transition property of CSS3(cascading style sheets).

Provider network 2 may be located at a data center, such as a single site, or distributed in different geographic locations (e.g., at multiple sites). The provider network 2 may provide services via one or more networks 4. The network 4 includes various network devices such as routers, switches, multiplexers, hubs, modems, bridges, repeaters, firewalls, proxy devices, and/or the like. The network 4 may include physical links, such as coaxial cable links, twisted pair cable links, fiber optic links, combinations thereof, and the like. The network 4 may include wireless links such as cellular links, satellite links, Wi-Fi links, and the like.

The provider network 2 may be configured to receive a plurality of messages. The plurality of messages may include a plurality of barrages associated with the content.

The provider network 2 may be configured to manage messages for various content items. Users may browse content and access different content items to view comments for particular content, such as comments posted by other users for that particular content. Comments from users associated with a particular content item may be output to other users viewing the particular content item. For example, all users accessing a content item (e.g., a video clip) may view comments associated with the content item. The input comment content may be output in real-time or near real-time.

Provider network 2 may be configured to process multiple messages, e.g., various processing operations such as message storage, message screening, message pushing, etc. Wherein the message store is for storing a plurality of messages in a data store, such as a database. Message screening may include rejecting or marking messages that match the screening criteria. Wherein the filter criteria may specify terms and/or phrases such as profanity, hate-hate, inelegant language, and the like. The filter criteria may specify characters, such as symbols, fonts, and the like. The filter criteria may specify a language, a computer readable code pattern, and the like.

Provider network 2 may perform natural language processing, topic recognition, pattern recognition, artificial intelligence, etc., to automatically determine characteristics of messages and/or group messages. As an example, frequently occurring phrases or patterns may be identified as topics. As another example, a database of topics associated with content may be maintained. Themes may include genre (e.g., action, drama, comedy), personality (e.g., actor, actress, director), language, and the like. Messages may be grouped based on characteristics of the client device and/or the user sending the message. Demographics, interests, history, and/or the like may be stored for multiple users to determine potential groupings of messages. In other embodiments, the provider network 2 may also identify highlights, pictures, etc. in the video file based on artificial intelligence.

The provider network 2 may be implemented by one or more computing nodes. One or more compute nodes may include virtualized compute instances. The virtualized compute instance may include an emulation of a virtual machine, such as a computer system, operating system, server, and the like. The computing node may load a virtual machine by the computing node based on the virtual image and/or other data defining the particular software (e.g., operating system, dedicated application, server) used for emulation. As the demand for different types of processing services changes, different virtual machines may be loaded and/or terminated on one or more compute nodes. A hypervisor may be implemented to manage the use of different virtual machines on the same compute node.

A plurality of mobile terminals 6 may be configured to access content and services of provider network 2. The plurality of mobile terminals 6 may include any type of electronic device, such as a mobile device, a tablet device, a laptop computer, a workstation, a virtual reality device, a gaming device, a set-top box, a digital streaming media device, a vehicle terminal, a smart television, a set-top box, and so forth.

The plurality of mobile terminals 6 may output (e.g., display, render, present) the content (video, etc.) to the user. In other embodiments, the mobile terminal 6 may also identify highlight segments in the video file based on artificial intelligence, etc.

In an exemplary embodiment, the provider network 2 (or the mobile terminal 6) may extract the highlight of the video file and use the highlight of the video file as its dynamic cover map to improve the user experience and increase the interest of the cover of the video file, thereby attracting the attention of other users and increasing the click rate of the video file.

In an exemplary embodiment, the provider network 2 may screen a high-quality video file from a large number of video files, extract a highlight of the high-quality video file, and use the highlight of the high-quality video file as a dynamic cover map thereof, thereby optimizing the experience of the user when browsing and selecting a video and improving the click rate of the high-quality video file.

The dynamic cover map setting scheme will be described below by way of several embodiments. The solution may be implemented by a computer device 1300, and the computer device 1300 may be the provider network 2 or a computing node thereof, or may be the mobile terminal 6.

Example one

Fig. 2 schematically shows a flowchart of a dynamic cover setting method according to a first embodiment of the present application.

As shown in FIG. 2, the dynamic cover setting method may include steps S200 to S202, in which:

step S200, determining a target video clip from the video file.

The video file may be a video manuscript based on various video formats, such as: AVI (Audio Video Interleaved) format, h.264/AVC (Advanced Video Coding), h.265/HEVC (High Efficiency Video Coding) h.265 format, and the like.

The target video segment may be a highlight video segment in the video file.

In this embodiment, whether a video segment is a highlight video segment or not may be determined by the leap-leap level of a large number of viewers, by artificial intelligence (e.g., a trained neural network model), or by other means.

Step S202, extracting the target video clip, and obtaining a dynamic cover page of the video file according to the target video clip.

When the target video clip is determined, the computer device 1300 may automatically clip the video file to obtain the target video clip, and use the target video clip as a material for making the dynamic cover picture.

One is as follows: the target video clip can be directly set as the dynamic cover picture.

The second step is as follows: the target video clip may be processed, and the processed video content may be used as the dynamic cover page map. By way of example, the process may add video rendering special effects (such as two-dimensional stickers), compose partially highlights frames, and so forth.

And thirdly: when the target video clip comprises a plurality of sub video clips of different time clips, the plurality of sub video clips need to be synthesized, or one or more sub video clips are selected from the plurality of sub video clips and the selected one or more sub video clips are synthesized, or a plurality of key frames are extracted from the plurality of sub video clips and synthesized, and the synthesized video clip is used as the dynamic cover picture.

Several ways of obtaining the dynamic cover map are listed above, and it should be understood that it is not used to limit the scope of protection of the present application.

The method for setting the dynamic cover page image, provided by the embodiment of the application, can extract key or wonderful segments (namely target video segments) of the video file, and obtain the dynamic cover page image according to the target video segments, so that the method has the following advantages:

firstly, the method comprises the following steps: due to the fact that the dynamic cover map is adopted, the dynamic display effect of the dynamic cover map enables the visual effect to be good, the dynamic cover map is colorful in vision, the visual enjoyment experience and interest are improved, the attention of other users is attracted, and the click rate of the video file is improved.

Secondly, the method comprises the following steps: because the target video clip is from the video file, the dynamic cover page image and the video file have strong relevance, so that the experience sense of a user in browsing and selecting the video can be optimized, the user is prevented from mistakenly clicking and watching the video content which is not in accordance with the user due to the cover page which is not in accordance with the video content, and the waste of data traffic is avoided.

Several schemes for implementing step S200 are provided below:

the first method is as follows:

and searching for a highlight video clip (namely, the target video clip) in the video file based on the bullet screen.

In an exemplary embodiment, as shown in fig. 3, the step of determining the target video segment from the video file may include steps S300 to S306, where: step S300, acquiring a plurality of barrages of the video file, wherein each barrage is associated with a time point on a time axis of the video file; step S302, acquiring bullet screen density distribution on the time axis according to the time point on the time axis associated with each bullet screen; step S304, screening out one or more video clips with the highest bullet screen density in the video files according to the bullet screen density distribution; and step S306, determining the one or more video clips or the one or more video clips carrying the barrage as the target video clip. The time axis may be represented by a progress bar. The applicant finds, through research, that the time interval for intensively sending the barrage generally corresponds to a key node of the corresponding video file, and the key node generally corresponds to a highlight video clip, a key video clip or a video clip which is easy to attract a lot of user attention of the corresponding video file. According to the method, the target video clip which can effectively attract attention can be accurately found by analyzing the bullet screen density distribution. In addition, when the video clip carrying the barrage is taken as the target video clip, the information richness and the user experience of the cover can be further provided.

In an exemplary embodiment, as shown in fig. 4, the step S300 may include steps S400 to S402, wherein: step S400, acquiring all barrages of the video files; step S402, according to the bullet screen content of each bullet screen in all the bullet screens, filtering a plurality of invalid bullet screens from all the bullet screens to obtain a plurality of bullet screens; wherein the plurality of ineffective bullet screens comprises: and the bullet screen content is irrelevant to the video content of the video file, and/or the bullet screen content is irrelevant to the video picture of the video file. The embodiment can improve the efficiency and accuracy of screening the one or more video clips based on the bullet screen density distribution.

For ease of understanding, an example of operation is provided below in connection with FIG. 5:

firstly, all barrages of the video file A at the current moment are obtained.

And secondly, executing the bullet screen filtering operation according to the bullet screen content of each bullet screen.

Such as: the time of arrival represents the time of the advertisement, 111 or 222 represents the barrage interaction with the up host (content provider), etc.

Thirdly, analyzing the density distribution of the bullet screen on the time axis, and selecting a plurality of video clips with the highest bullet screen concentration according to the density distribution of the bullet screen.

The second method comprises the following steps:

a highlight video segment (i.e., the target video segment) in the video file is searched based on the quality score.

In an exemplary embodiment, as shown in fig. 6, the determining of the target video clip from the video file in step S200 may include steps S600 to S604, wherein: step S600, dividing the video file into M video segments, wherein M is a positive integer greater than 1; step S602, performing quality scoring on each video clip; and step S604, determining the target video clip from the M video clips according to the quality scores of the video clips. Wherein, the quality scoring of each video segment is realized by various ways, such as:

(1) a non-artificial intelligence mode, such as a weight-based evaluation mode:

in the following, taking the video segment a as an example, only some means for quality scoring of the video segment a are described as an example:

example 1: and according to the evaluation dimensions such as the number of the bullet screens, the bullet screen form, the bullet screen user type and the like associated with the video clip A, distributing and setting a weight coefficient for each evaluation dimension, and obtaining the quality score of the video clip A through weighting calculation.

For example: and acquiring the number of the users with the high-grade user types of the bullet screen sender, and multiplying the number of the users with the high-grade user types by a preset high-weight system. The example is that whether each bullet screen sender is a high-level user is obtained according to the bullet screen id of each bullet screen, and the higher the number ratio of all the bullet screen senders of the high-level user is, the higher the quality score is.

Example 2: the following information in the time interval corresponding to the video clip a is acquired: a progress bar drag event (e.g., a drag-in event that drags the progress bar into the time interval, a drag-out event that drags the progress bar out of the time interval), etc.; and configuring a positive weight coefficient for the pull-in event, configuring a negative weight coefficient for the pull-out event, and multiplying the number of the pull-in events and the data of the pull-out events by the respective weight coefficients to obtain the quality score of the video clip A.

(2) An artificial intelligence mode:

the inventor finds that the wonderful degree or density of the bullet screen is usually highly related to the wonderful video content in the same time interval, and may also be highly related to the content instruction in the same time zone. Thus, computer device 1300 may determine the target video segment based to some extent on the quality of the bullet screen or the video segment itself.

In an exemplary embodiment, as shown in fig. 7, step S602 may also be implemented by the following steps: step S700, according to the bullet screen characteristic information of each video clip and/or the frame characteristic information of each frame in each video clip, performing quality scoring on each video clip; wherein, the bullet screen characteristic information comprises bullet screen density. Of course, the bullet screen feature information may also include bullet screen content features and the like. In this embodiment, through the barrage information in each video clip, the frame feature information of each frame, or a combination of the two, the highlight video clip (i.e., the target video clip) in the video file can be searched more accurately.

In an exemplary embodiment, in order to accurately search for a highlight video clip (i.e., the target video clip) in the video file, as shown in fig. 8, step S602 may be implemented by: step S800, extracting frame characteristic information of each frame in the ith video clip, wherein i is more than or equal to 1 and less than or equal to M, and i is a positive integer; and step S802, according to the frame characteristic information of each frame in the ith video clip, carrying out quality scoring on the ith video clip. By way of example, computer device 1300 performs the following operations: extracting frame feature information, such as feature vectors, of each frame through a convolutional neural network and the like; and inputting the frame characteristic information of each frame into a trained quality scoring model, and outputting the quality score of the ith video clip by using the quality assessment model. The quality scoring model may be a model based on various algorithms, such as the LSTM algorithm.

The inventors have discovered that a representative still cover picture is typically selected by the upmaster when uploading a video file. Thus, computer device 1300 may reference the still cover picture to some extent to determine the target video clip.

In an exemplary embodiment, as shown in fig. 9, step S702 can also be implemented by the following steps: step S900, according to the picture characteristic information and the frame characteristic information of each frame, the quality of the ith video clip is scored; the picture characteristic information is characteristic information of a target static picture, and the target static picture comprises a static cover picture of the video file. In the present embodiment, by introducing a static cover clip, a highlight video clip (i.e., the target video clip) in the video file can be searched more accurately.

Step S804 can be implemented by various artificial intelligence models or artificial intelligence model combinations.

In an exemplary embodiment, as shown in fig. 10, step S900 may be implemented by: step S1000, according to the time sequence order of the M frames, sequentially inputting the frame characteristic information of each frame into an LSTM model to obtain M output vectors through the LSTM model, wherein the M output vectors correspond to the M frames one by one; step S1002, performing convolution and pooling operation on a vector matrix formed by the M output vectors to obtain a first feature vector; step S1004, obtaining a second feature vector according to the picture feature information; step S1006, the first feature vector and the second feature vector are spliced to obtain a spliced vector; step S1008, performing a linear regression operation on the stitching vector to obtain a quality score corresponding to the ith video segment. According to the embodiment, the relation among frames can be learned through the capture long-term dependence of the LSTM model, and the accuracy of determining the target video clip can be improved by combining the characteristic information of the static cover image.

For ease of understanding, an example of operation is provided below in connection with FIG. 11:

firstly, each frame (X) in the ith video segment is processed by a CNN (Convolutional Neural network) model ₁ 、X ₂ 、...X _M ) A convolution operation is performed to obtain M feature vectors (i.e., frame feature information).

As an example, the CNN model may include 256 convolution kernels, in frame X ₁ For example, 256 convolution kernels are each for frame X ₁ Performing a convolution operation to generate a corresponding frame X ₁ Feature vector x of ₁ The feature vector x ₁ Is a one-dimensional vector comprising 256 parameters, each parameter being one of the convolution kernel-pair frames X ₁ Performing convolutionAnd operating the obtained convolution result. It can be known that M eigenvectors, namely x, can be obtained through the CNN module ₁ 、x _t 、...x _M 。

Secondly, according to the time sequence, each feature vector x is processed ₁ 、x _t 、...x _M Sequentially inputting the vector into an LSTM model, and outputting M output vectors h through the LSTM model ₁ 、h ₂ 、...h _M 。

By inputting a vector x _t For example, the working principle of the LSTM model is introduced:

forget the door: f. of _t ＝σ(W _f [x _t ，h _t-1 ]+b _f )

An input gate:

i _t ＝σ(W _i [x _t ，h _t-1 ]+b _i )

q _t ＝tanh(W _q [h _t-1 +x _t ])+b _q )

an input gate:

o _t ＝σ(W _o [x _t ，h _t-1 ]+b _o )

h _t ＝o _t *tanh C _t

wherein f is _t Information C for deciding whether to let t-1 learn _t-1 Pass or partially pass. Wherein f is _t ∈[0，1]The selection weight, W, of the node pair at time t to the cell memory at time t-1 _f Weight matrix for forgetting gate, b _f Bias term for forgetting gate, h _t-1 Hidden layer state information representing a t-1 node, where a nonlinear function σ (x) is 1/(1| e) ^-x )；

i _t And the selection weight of the node at the time t to the current node information is used for determining which information should be reserved. Wherein i _t ∈[0，1]，b _i For input of offset terms of gates, W _i For the weight matrix of the input gate, the nonlinear function σ (x) is 1/(1+ e) ^-x )；

q _t Representing a new candidate vector for updating the cell state. Wherein, b _q Is an offset term, W _q To representAnd the weight matrix of the information to be updated, tan/l is a hyperbolic tangent activation function.

o _t Representing one of the output vectors at time t. b _o For biasing of output gates, W _o Is a weight matrix of output gates, [ x ] _t ，h _t-1 ]Denotes x _t And h _t-1 And (5) splicing the vectors.

h _t Representing another output vector (hidden state vector) at time t.

C _t For updated current cell state information, C _t ＝f _t *C _t-1 +i _t *q _t ，C _t-1 The last cell state information. f. of _t *C _t-1 Information indicating a desire to delete i _t *q _t Indicating the newly added information.

It should be noted that, in the present embodiment, various modified LSTM models may be used, and the LSTM model is only an example.

Thirdly, outputting the vector h according to the M output vectors ₁ 、h ₂ 、...h _M Forming a vector matrix (M-256 matrix), and sequentially performing Conv1d (one-dimensional convolution) and Max Pool (pooling, taking the maximum value of each block), Conv1d (one-dimensional convolution) and AVE Pool (pooling, taking the average value of each block) on the vector matrix to obtain a first feature vector.

And fourthly, extracting the characteristics of the target static picture through another CNN model to obtain a characteristic image (picture characteristic information) corresponding to the target static picture, and operating the picture characteristic information through two full connection layers to obtain a second characteristic vector.

And fifthly, splicing the first characteristic vectors with the second characteristic vectors respectively to obtain spliced vectors.

And sixthly, performing linear calculation on the splicing vector through two full-connection layer operations, and obtaining the quality score of the ith video clip after Sigmoid processing. Wherein, Sigmoid is used for limiting the quality score to be between 0 and 1.

And when the quality score of the ith video clip is more than 0.85, the ith video clip is considered as a highlight video clip.

Example two

FIG. 12 schematically illustrates a block diagram of a dynamic cover setting system that may be partitioned into one or more program modules that are stored in a storage medium and executed by one or more processors to implement an embodiment of the present application, in accordance with a second embodiment of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments that can perform specific functions, and the following description will specifically describe the functions of the program modules in the embodiments of the present application.

As shown in FIG. 12, the dynamic cover setting system 1200 may include a determination module 1210 and a setting module 1220, wherein:

a determining module 1210 configured to determine a target video segment from a video file.

The setting module 1220 is configured to extract the target video clip, and obtain a dynamic cover picture of the video file according to the target video clip.

Optionally, the determining module 1210 is further configured to: acquiring a plurality of barrages of the video file, wherein each barrage is associated with a time point on a time axis of the video file; acquiring bullet screen density distribution on the time axis according to the time point on the time axis associated with each bullet screen; screening out one or more video clips with the highest bullet screen density in the video files according to the bullet screen density distribution; and determining the one or more video clips or the one or more video clips carrying the barrage as the target video clip.

Optionally, the determining module 1210 is further configured to: acquiring all barrages of the video files; filtering a plurality of invalid bullet screens from all the bullet screens according to the bullet screen contents of all the bullet screens to obtain the bullet screens; wherein the plurality of ineffective bullet screens comprises: and the bullet screen content is irrelevant to the video content of the video file, and/or the bullet screen content is irrelevant to the video picture of the video file.

Optionally, the determining module 1210 is further configured to: dividing the video file into M video segments, wherein M is a positive integer greater than 1; performing quality scoring on each video clip; and determining the target video clip from the M video clips according to the quality scores of the video clips.

Optionally, the determining module 1210 is further configured to: according to the bullet screen characteristic information of each video clip and/or the frame characteristic information of each frame in each video clip, performing quality scoring on each video clip; wherein, the bullet screen characteristic information comprises bullet screen density.

Optionally, the determining module 1210 is further configured to: extracting frame feature information of each frame in the ith video clip, wherein i is more than or equal to 1 and less than or equal to M, and i is a positive integer; and according to the frame characteristic information of each frame in the ith video clip, carrying out quality scoring on the ith video clip.

Optionally, the determining module 1210 is further configured to: according to the picture characteristic information and the frame characteristic information of each frame, performing quality grading on the ith video clip; the image feature information is feature information of a target static image, and the target static image comprises a static cover image of the video file.

Optionally, the determining module 1210 is further configured to: sequentially inputting the frame characteristic information of each frame into an LSTM model according to the time sequence of the M frames to obtain M output vectors through the LSTM model, wherein the M output vectors correspond to the M frames one by one; performing convolution and pooling operation on a vector matrix formed by the M output vectors to obtain a first feature vector; obtaining a second feature vector according to the picture feature information; splicing the first feature vector and the second feature vector to obtain a spliced vector; and performing linear regression operation on the splicing vector to obtain a quality score corresponding to the ith video clip.

EXAMPLE III

Fig. 13 schematically shows a hardware architecture diagram of a computer device 1300 adapted to implement the dynamic cover setting method according to the third embodiment of the present application. In this embodiment, the computer device 1300 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set in advance or stored. For example, the server may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server, or a rack server (including an independent server or a server cluster composed of a plurality of servers). As shown in fig. 13, computer device 1300 includes at least, but is not limited to: memory 1310, processor 1320, network interface 1330 may be communicatively linked to each other via a system bus. Wherein:

the memory 1310 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 1310 may be an internal storage module of the computer device 1300, such as a hard disk or memory of the computer device 1300. In other embodiments, the memory 1310 may also be an external storage device of the computer device 1300, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device 1300. Of course, memory 1310 may also include both internal and external memory modules of computer device 1300. In this embodiment, the memory 1310 is generally used for storing an operating system and various types of application software installed in the computer device 1300, such as program codes of a dynamic cover setting method. In addition, the memory 1310 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 1320 may be, in some embodiments, a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip. The processor 1320 is generally configured to control the overall operation of the computer device 1300, such as performing control and processing related to data interaction or communication with the computer device 1300. In this embodiment, the processor 1320 is used to execute program codes stored in the memory 1310 or process data.

Network interface 1330 may comprise a wireless network interface or a wired network interface, with network interface 1330 typically being used to establish communication links between computer device 1300 and other computer devices. For example, the network interface 1330 is used to connect the computer device 1300 to an external terminal via a network, establish a data transmission channel and a communication link between the computer device 1300 and the external terminal, and the like. The network may be an Intranet (Internet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, or other wireless or wired network.

It should be noted that FIG. 13 only shows a computer device having components 1310 and 1330, but it is understood that not all of the shown components are required and that more or fewer components may be implemented instead.

In this embodiment, the dynamic cover setting method stored in the memory 1310 may be further divided into one or more program modules and executed by one or more processors (in this embodiment, the processor 1320) to complete the embodiment of the present application.

Example four

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the dynamic cover setting method in the embodiments.

In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device. Of course, the computer-readable storage medium may also include both internal and external storage devices of the computer device. In this embodiment, the computer-readable storage medium is generally used for storing an operating system and various types of application software installed in the computer device, for example, the program code of the dynamic cover setting method in the embodiment, and the like. Further, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A method for dynamic cover setting, the method comprising:

determining a target video clip from a video file; and

and extracting the target video clip, and obtaining a dynamic cover picture of the video file according to the target video clip.

2. The dynamic cover setting method of claim 1, wherein the determining a target video clip from a video file comprises:

acquiring a plurality of barrages of the video file, wherein each barrage is associated with a time point on a time axis of the video file;

acquiring bullet screen density distribution on the time axis according to the time point on the time axis associated with each bullet screen;

screening out one or more video clips with the highest bullet screen density in the video files according to the bullet screen density distribution; and

determining the one or more video clips or the one or more video clips carrying the barrage as the target video clip.

3. The dynamic cover setting method of claim 1, wherein the obtaining of the plurality of barrages of video files comprises:

acquiring all barrages of the video files; and

according to the bullet screen content of each bullet screen in all the bullet screens, filtering a plurality of invalid bullet screens from all the bullet screens to obtain a plurality of bullet screens; wherein the plurality of ineffective bullet screens comprises: and the bullet screen content is irrelevant to the video content of the video file, and/or the bullet screen content is irrelevant to the video picture of the video file.

4. The dynamic cover setting method of claim 1, wherein determining the target video clip from the video file comprises:

dividing the video file into M video segments, wherein M is a positive integer greater than 1;

performing quality scoring on each video clip; and

and determining the target video clip from the M video clips according to the quality scores of the video clips.

5. The dynamic cover setting method of claim 4, wherein the quality scoring of each video clip comprises:

according to the bullet screen characteristic information of each video clip and/or the frame characteristic information of each frame in each video clip, performing quality scoring on each video clip; wherein, the bullet screen characteristic information comprises bullet screen density.

6. The dynamic cover setting method of claim 4, wherein the quality scoring of each video clip comprises:

extracting frame feature information of each frame in the ith video clip, wherein i is more than or equal to 1 and less than or equal to M, and i is a positive integer; and

and according to the frame characteristic information of each frame in the ith video clip, performing quality scoring on the ith video clip.

7. The method of claim 6, wherein the quality scoring the ith video clip according to the frame feature information of each frame in the ith video clip comprises:

according to the picture characteristic information and the frame characteristic information of each frame, performing quality grading on the ith video clip;

the picture characteristic information is characteristic information of a target static picture, and the target static picture comprises a static cover picture of the video file.

8. The method of claim 7, wherein the quality scoring the ith video clip according to the picture feature information and the frame feature information of each frame comprises:

sequentially inputting the frame characteristic information of each frame into an LSTM model according to the time sequence of the M frames to obtain M output vectors through the LSTM model, wherein the M output vectors correspond to the M frames one by one;

performing convolution and pooling operation on a vector matrix formed by the M output vectors to obtain a first feature vector;

obtaining a second feature vector according to the picture feature information;

splicing the first feature vector and the second feature vector to obtain a spliced vector;

and performing linear regression operation on the splicing vector to obtain a quality score corresponding to the ith video segment.

9. A dynamic cover setting system, comprising:

the determining module is used for determining a target video clip from a video file; and

and the setting module is used for extracting the target video clip and obtaining the dynamic cover picture of the video file according to the target video clip.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor is adapted to carry out the steps of the method of dynamic cover setting according to any of the claims 1 to 8 when executing the computer program.

11. A computer-readable storage medium having stored thereon a computer program executable by at least one processor to cause the at least one processor to perform the steps of the dynamic cover setting method of any of claims 1-8.