CN111225236A - Method and device for generating video cover, electronic equipment and computer-readable storage medium - Google Patents

Method and device for generating video cover, electronic equipment and computer-readable storage medium Download PDF

Info

Publication number
CN111225236A
CN111225236A CN202010065823.8A CN202010065823A CN111225236A CN 111225236 A CN111225236 A CN 111225236A CN 202010065823 A CN202010065823 A CN 202010065823A CN 111225236 A CN111225236 A CN 111225236A
Authority
CN
China
Prior art keywords
video
segment
sequence
importance
slow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010065823.8A
Other languages
Chinese (zh)
Other versions
CN111225236B (en
Inventor
林天威
李甫
何栋梁
孙昊
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010065823.8A priority Critical patent/CN111225236B/en
Publication of CN111225236A publication Critical patent/CN111225236A/en
Application granted granted Critical
Publication of CN111225236B publication Critical patent/CN111225236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The application discloses a method and a device for generating a video cover, electronic equipment and a computer readable storage medium, and relates to the technical field of multimedia processing. The implementation scheme adopted when generating the video envelope is as follows: acquiring a video to be processed and a video image sequence corresponding to the video to be processed; acquiring an importance score sequence corresponding to the video image sequence; selecting a first segment corresponding to the length of a highlight segment from the video to be processed according to the importance score sequence; selecting a second segment corresponding to the length of the slow playing segment from the first segment, and performing video slow playing processing on the second segment to obtain a third segment; and sequentially splicing two video clips which are adjacent to the second clip in front and back of the first clip with the third clip, and taking the splicing result as a video cover of the video to be processed. The video front cover generation method and device can reduce generation cost of the video front cover and improve generation efficiency of the video front cover and display effect of the video front cover.

Description

Method and device for generating video cover, electronic equipment and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a video cover, an electronic device, and a computer-readable storage medium in the field of multimedia processing technologies.
Background
In the prior art, when a video cover is generated, a mode of manually editing a video is generally adopted, and a highlight segment of the video is used as the video cover. Therefore, the cost for acquiring the video cover is high, the generation efficiency of the video cover is low, and the generated video cover can not be accurately corresponding to the wonderful content in the video.
Disclosure of Invention
The technical solution adopted by the present application to solve the technical problem is to provide a method, an apparatus, an electronic device, and a computer-readable medium for generating a video cover, where the method includes: acquiring a video to be processed and a video image sequence corresponding to the video to be processed; acquiring an importance score sequence corresponding to the video image sequence; selecting a first segment corresponding to the length of a highlight segment from the video to be processed according to the importance score sequence; selecting a second segment corresponding to the length of the slow playing segment from the first segment, and performing video slow playing processing on the second segment to obtain a third segment; and sequentially splicing two video clips which are adjacent to the second clip in front and back of the first clip with the third clip, and taking the splicing result as a video cover of the video to be processed. This application need not the manual participation of user alright generate high-quality splendid video clip to reduce the generating cost of video front cover, promote the generating efficiency of video front cover, this application can also play the splendid content in the splendid video clip slowly in addition, thereby further promote the bandwagon effect of video front cover.
According to a preferred embodiment of the present application, the obtaining a sequence of importance scores corresponding to the sequence of video images comprises: extracting the characteristics of each frame of image in the video image sequence to obtain a characteristic sequence; and inputting the characteristic sequence into an importance detection network obtained by pre-training, and acquiring an importance score sequence corresponding to the video image sequence according to an output result of the importance detection network. In the step, the importance score sequence is obtained through the importance detection network obtained through pre-training, so that the accuracy of the obtained importance score sequence can be improved.
According to a preferred embodiment of the present application, the importance detection network is obtained by pre-training in the following manner: acquiring each video and a category marking result corresponding to each video as a training sample; constructing a classification model containing a recurrent neural network and a classification network; taking the characteristic sequence corresponding to each video as the input of a recurrent neural network, and acquiring a hidden layer characteristic sequence and an importance fraction sequence output by the recurrent neural network; taking a matrix multiplication result of the hidden layer characteristic sequence and the importance degree fraction sequence as the input of the classification network, and acquiring the output result of the classification network; calculating a loss function of the classification model according to the output result and the class marking result corresponding to each video; and adjusting parameters in the recurrent neural network and the classification network according to the loss function until the obtained loss function is converged, and extracting the recurrent neural network in the classification model as the importance detection network.
According to a preferred embodiment of the present application, said selecting a first segment corresponding to a highlight segment length from the video to be processed according to the importance score sequence comprises: determining a first sliding window according to the length of the wonderful segment; and performing sliding window in the video image sequence by using the first sliding window, and selecting a section of image sequence with the highest importance degree score as a first section. The step can enable the acquired first segment to accurately correspond to the wonderful content in the video to be processed.
According to a preferred embodiment of the present application, said selecting a second segment corresponding to the length of the slow segment from the first segment includes: determining a second sliding window according to the length of the slow-playing segment; and performing sliding window in the image sequence corresponding to the first segment by using the second sliding window, and selecting a segment of image sequence with the highest importance degree score as a second segment.
According to a preferred embodiment of the present application, the video slow-playing processing on the second segment includes: and performing image interpolation on the second segment according to a preset slow playing speed.
The technical scheme that this application adopted for solving technical problem provides a device that generates video front cover, the device includes: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a video to be processed and a video image sequence corresponding to the video to be processed; the processing unit is used for acquiring an importance score sequence corresponding to the video image sequence; the selecting unit is used for selecting a first segment corresponding to the length of the highlight segment from the video to be processed according to the importance degree score sequence; the slow playing unit is used for selecting a second segment corresponding to the length of the slow playing segment from the first segment, and obtaining a third segment after performing video slow playing processing on the second segment; and the splicing unit is used for sequentially splicing two video clips which are adjacent to the second clip in front and back in the first clip with the third clip, and taking a splicing result as a video cover of the video to be processed.
According to a preferred embodiment of the present application, when acquiring the importance score sequence corresponding to the video image sequence, the processing unit specifically executes: extracting the characteristics of each frame of image in the video image sequence to obtain a characteristic sequence; and inputting the characteristic sequence into an importance detection network obtained by pre-training, and acquiring an importance score sequence corresponding to the video image sequence according to an output result of the importance detection network.
According to a preferred embodiment of the present application, the apparatus further includes a training unit, configured to pre-train to obtain the importance detection network in the following manner: acquiring each video and a category marking result corresponding to each video as a training sample; constructing a classification model containing a recurrent neural network and a classification network; taking the characteristic sequence corresponding to each video as the input of a recurrent neural network, and acquiring a hidden layer characteristic sequence and an importance fraction sequence output by the recurrent neural network; taking a matrix multiplication result of the hidden layer characteristic sequence and the importance degree fraction sequence as the input of the classification network, and acquiring the output result of the classification network; calculating a loss function of the classification model according to the output result and the class marking result corresponding to each video; and adjusting parameters in the recurrent neural network and the classification network according to the loss function until the obtained loss function is converged, and extracting the recurrent neural network in the classification model as the importance detection network.
According to a preferred embodiment of the present application, when the selecting unit selects the first segment corresponding to the length of the highlight segment from the video to be processed according to the importance score sequence, the selecting unit specifically performs: determining a first sliding window according to the length of the wonderful segment; and performing sliding window in the video image sequence by using the first sliding window, and selecting a section of image sequence with the highest importance degree score as a first section.
According to a preferred embodiment of the present application, when the slow playing unit selects a second segment corresponding to the length of the slow playing segment from the first segment, the slow playing unit specifically executes: determining a second sliding window according to the length of the slow-playing segment; and performing sliding window in the image sequence corresponding to the first segment by using the second sliding window, and selecting a segment of image sequence with the highest importance degree score as a second segment.
According to a preferred embodiment of the present application, when performing video slow-play processing on the second segment, the slow-play unit specifically performs: according to a preset slow playing speed, performing image frame interpolation on the second segment
One embodiment in the above application has the following advantages or benefits: the video front cover generation method and device can reduce generation cost of the video front cover, improve generation efficiency of the video front cover and improve display effect of the video front cover. Because the first segment in the video to be processed is extracted according to the importance score sequence, the technical problem that in the prior art, the wonderful segment needs to be obtained by manual editing is solved, the technical effects of reducing the generation cost of the video cover and improving the generation efficiency of the video cover are achieved, the generated video cover can slowly play wonderful content in the wonderful segment, and the display effect of the video cover is improved.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flow chart of a method for generating a video cover according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of a process for training an importance detection network according to a second embodiment of the present application;
fig. 3 is a schematic diagram of a process for acquiring a video cover of a video to be processed according to a third embodiment of the present application;
FIG. 4 is a block diagram of an apparatus for generating a video cover according to a fourth embodiment of the present application;
fig. 5 is a block diagram of an electronic device for implementing a method of generating a video cover according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a method for generating a video cover according to an embodiment of the present application, as shown in fig. 1, the method includes:
in S101, a video to be processed and a video image sequence corresponding to the video to be processed are acquired.
In this step, a video to be processed is first acquired, and then a video image sequence corresponding to the video to be processed is acquired. In this step, the video in the network may be used as the video to be processed, or the video shot by the user in the terminal may be used as the video to be processed.
It can be understood that the video image sequence obtained in this step is a sequence of images corresponding to one frame in a video, and the obtained video image sequence belongs to the prior art and is not described herein again.
In S102, a sequence of importance scores corresponding to the sequence of video images is obtained.
In this step, an importance score sequence corresponding to the video image sequence in step S101 is acquired. The importance degree score sequence obtained in the step includes an importance degree score corresponding to each frame image, the importance degree score is used for representing the contribution degree of each frame image to the classification of the video to be detected, and the higher the importance degree score is, the higher the contribution degree of the frame image to the classification of the video to be detected is.
Therefore, the highlight of each frame image is evaluated by the importance score of each frame image, and the highlight of the frame image is higher when the importance score of one frame image is higher.
Specifically, in the step of obtaining the importance score sequence corresponding to the video image sequence, the following method may be adopted: extracting the characteristics of each frame of image in the video image sequence to obtain a characteristic sequence; and inputting the obtained characteristic sequence into an importance detection network obtained by pre-training, and obtaining an importance score sequence corresponding to the video image sequence according to an output result of the importance detection network. That is, in this step, after the feature sequence with the length T is input into the importance detection network, the importance score sequence with the length T output by the importance detection network is obtained.
In this step, the image features of each frame image in the video image sequence can be extracted by using an existing feature extractor, for example, a feature extraction network in a CNN classification model trained on an image net data set as a feature extractor.
The importance detection network used in this step is trained in advance, and is capable of outputting an importance score of the frame image, which indicates the degree of contribution of the frame image to the classification of the video to which the frame image belongs, based on the input image features.
Specifically, when the importance detection network is obtained through pre-training in this step, the following method may be adopted: acquiring each video and a category marking result corresponding to each video as a training sample; constructing a classification model containing a recurrent neural network and a classification network; taking the characteristic sequence corresponding to each video as the input of a recurrent neural network, and acquiring a hidden layer characteristic sequence and an importance fraction sequence output by the recurrent neural network; taking a matrix multiplication result of the hidden layer characteristic sequence and the importance degree fraction sequence as the input of the classification network, and acquiring the output result of the classification network; calculating a loss function of the classification model according to the obtained output result and the class marking result corresponding to each video; and adjusting parameters in the recurrent neural network and the classification network according to the loss function until the obtained loss function is converged, and extracting the recurrent neural network in the classification model as an importance detection network.
It is understood that the recurrent neural network in this step may be a bidirectional recurrent neural network, and the classification network may be a two-layer fully-connected network. The present application does not limit the types of recurrent neural networks and classification networks.
Fig. 2 is a schematic diagram of a process when an importance detection Network is obtained through training in an embodiment of the present application, in fig. 2, first, a feature sequence of a video image sequence is extracted, then, the feature sequence is input into an RNN (recurrent neural Network) in a classification model, an importance score sequence and a hidden layer feature sequence output by the RNN are obtained, then, a result of matrix multiplication between the two is input into the classification Network in the classification model, and a video classification result output by the classification Network is obtained, that is, a category of the video is "sports (basketball)".
In S103, according to the importance score sequence, a first segment corresponding to the length of the highlight segment is selected from the video to be processed.
In this step, according to the importance score sequence obtained in step S102, a first segment corresponding to the length of the highlight segment is selected from the video to be processed. That is to say, the first segment selected in this step is a segment of video with the highest importance score corresponding to the length of the highlight segment in the video to be processed. And because the importance degree score can represent the highlight of the frame image, the first segment selected in the step is the segment of the video with the highest highlight corresponding to the length of the highlight segment in the video to be processed.
Specifically, when the first segment corresponding to the length of the highlight segment is selected from the video to be processed according to the importance score sequence, the following method may be adopted: determining a first sliding window according to the length of the wonderful segment, wherein the first sliding window corresponds to the number of images contained in the selected first segment; and performing sliding window in the video image sequence by using the determined first sliding window, and selecting a segment of image sequence with the highest importance degree score as a first segment.
The length of the highlight segment in this step is preset, for example, 5s, so that the first segment selected in this step is a segment with the most highlight duration of 5s in the video to be processed.
In S104, a second segment corresponding to the length of the slow-playing segment is selected from the first segments, and after the second segment is subjected to video slow-playing processing, a third segment is obtained.
In this step, first, a second segment corresponding to the length of the slow-release segment is selected from the first segments selected in step S103, and after the video slow-release processing is performed on the selected second segment, the processing result is used as a third segment. That is to say, the third segment obtained in this step is a segment of slow-playing video with the highest importance score corresponding to the length of the slow-playing segment in the first segment, that is, a segment of slow-playing video with the highest wonderness corresponding to the length of the slow-playing segment in the first segment.
Specifically, in this step, when selecting the second segment corresponding to the length of the slow segment from the first segment, the following method may be adopted: determining a second sliding window according to the length of the slow-playing segment, wherein the second sliding window corresponds to the number of images contained in the selected second segment; and performing sliding window on the video image sequence corresponding to the first segment by using the determined second sliding window, and selecting a segment of image sequence with the highest importance degree score as a second segment.
The length of the slow segment in this step is preset, for example, 2s, so that the second segment selected in this step is the most wonderful segment with the duration of 2s in the first segment. Preferably, the slow segment length in this application is smaller than the highlight segment length.
In addition, in the step, when the video slow-playing processing is performed on the second segment, the video slow-playing processing can be performed on the second segment in an image frame interpolation mode. In this step, the slow playing speed of the video slow playing may be further set, for example, the slow playing speed may be set to 0.5 times slow playing.
In S105, sequentially splicing the two video clips that are adjacent to the second clip in the first clip and the third clip, and using the splicing result as a video cover of the video to be processed.
In this step, two video clips, which are adjacent to the second clip obtained in step S104 in front and back, of the first clip obtained in step S103 are sequentially spliced with the third clip obtained in step S104, and the splicing result is used as a video cover of the video to be processed. That is, the video cover obtained in this step is a segment of video played in the form of "original speed-slow speed-original speed".
Fig. 3 is a schematic diagram of a process of acquiring a video cover of a video to be processed according to an embodiment of the present application. As shown in fig. 3, first, a video image sequence is obtained, a feature sequence corresponding to the video image sequence is input into the RNN network obtained by training to obtain an importance score sequence, and if a corresponding T is selected from the importance score sequencehighlightThe first segment of length is [ t ]s,te](fragment within dotted line box in FIG. 3), corresponding to TslowThe second segment of length is
Figure BDA0002375938430000071
The second fragment
Figure BDA0002375938430000072
Performing slow video playback to obtain a third segment (the segment within the solid line frame in fig. 3), and combining the third segment with the third segment
Figure BDA0002375938430000073
And
Figure BDA0002375938430000074
and splicing to obtain a result video played at the original speed-slow speed-original speed.
Therefore, according to the method and the device, through the acquired importance score sequence, the high-quality highlight video clip corresponding to the video to be processed can be generated without manual participation of a user, so that the generation efficiency of the video cover is improved, in addition, the highlight content in the generated highlight video clip can be slowly played, and the display effect of the generated video cover is improved.
Fig. 4 is a block diagram of an apparatus for generating a video cover according to an embodiment of the present application, as shown in fig. 4, the apparatus includes: the device comprises an acquisition unit 401, a processing unit 402, a selection unit 403, a slow playing unit 404, a splicing unit 405 and a training unit 406.
An obtaining unit 401, configured to obtain a video to be processed and a video image sequence corresponding to the video to be processed.
The acquisition unit 401 first acquires a video to be processed, and then acquires a video image sequence corresponding to the video to be processed. The obtaining unit 401 may use a video in the network as a video to be processed, or use a video shot by a user in the terminal as a video to be processed.
It can be understood that the video image sequence acquired by the acquiring unit 401 is a sequence of images corresponding to one frame in a video, and the acquiring of the video image sequence belongs to the prior art and is not described herein again.
A processing unit 402, configured to obtain a sequence of importance scores corresponding to the sequence of video images.
A processing unit 402, configured to acquire an importance score sequence corresponding to the video image sequence acquired by the acquisition unit 401. The importance degree score sequence obtained by the processing unit 402 includes an importance degree score corresponding to each frame image, the importance degree score is used for indicating the contribution degree of each frame image to the classification of the video to be detected, and the higher the importance degree score is, the higher the contribution degree of the frame image to the classification of the video to be detected is.
Therefore, the highlight of each frame image is evaluated by the importance score of each frame image, and the highlight of the frame image is higher when the importance score of one frame image is higher.
Specifically, the processing unit 402 may adopt the following manner when acquiring the importance score sequence corresponding to the video image sequence: extracting the characteristics of each frame of image in the video image sequence to obtain a characteristic sequence; and inputting the obtained characteristic sequence into an importance detection network obtained by pre-training, and obtaining an importance score sequence corresponding to the video image sequence according to an output result of the importance detection network. That is, after inputting the feature sequence with the length T into the importance detection network, the processing unit 402 acquires the importance score sequence with the length T output by the importance detection network.
The processing unit 402 may extract image features of each frame image by using an existing feature extractor, for example, a feature extraction network in a CNN classification model trained on an image net data set as a feature extractor, to extract image features of each frame image in a video image sequence.
The importance detection network used by the processing unit 402 is trained in advance, and is capable of outputting an importance score of the frame image according to the input image feature, where the importance score represents the degree of contribution of the frame image to the classification of the video to which the frame image belongs.
The training unit 406 is configured to train in advance to obtain the importance detection network.
Specifically, when the training unit 406 obtains the importance detection network through pre-training, the following method may be adopted: acquiring each video and a category marking result corresponding to each video as a training sample; constructing a classification model containing a recurrent neural network and a classification network; taking the characteristic sequence corresponding to each video as the input of a recurrent neural network, and acquiring a hidden layer characteristic sequence and an importance fraction sequence output by the recurrent neural network; taking a matrix multiplication result of the hidden layer characteristic sequence and the importance degree fraction sequence as the input of the classification network, and acquiring the output result of the classification network; calculating a loss function of the classification model according to the obtained output result and the class marking result corresponding to each video; and adjusting parameters in the recurrent neural network and the classification network according to the loss function until the obtained loss function is converged, and extracting the recurrent neural network in the classification model as an importance detection network.
It is understood that the recurrent neural network in the training unit 406 may be a bidirectional recurrent neural network, and the classification network may be a two-layer fully-connected network. The present application does not limit the types of recurrent neural networks and classification networks.
A selecting unit 403, configured to select, according to the importance score sequence, a first segment corresponding to the length of the highlight segment from the video to be processed.
The selecting unit 403 selects a first segment corresponding to the length of the highlight segment from the video to be processed according to the importance score sequence obtained by the processing unit 402. That is, the first segment selected by the selecting unit 403 is a segment of the video to be processed with the highest importance score corresponding to the length of the highlight segment. Since the importance score can represent the highlight of the frame of image, the first segment selected by the selecting unit 403 is a segment of video with the highest highlight corresponding to the length of the highlight segment in the video to be processed.
Specifically, when the selecting unit 403 selects the first segment corresponding to the length of the highlight segment from the video to be processed according to the importance score sequence, the following method may be adopted: determining a first sliding window according to the length of the wonderful segment, wherein the first sliding window corresponds to the number of images contained in the selected first segment; and performing sliding window in the video image sequence by using the determined first sliding window, and selecting a segment of image sequence with the highest importance degree score as a first segment.
The length of the highlight segment of the selecting unit 403 is preset, for example, 5s, so that the first segment selected by the selecting unit 403 is a segment with the most highlight duration of 5s in the video to be processed.
And a slow playing unit 404, configured to select a second segment corresponding to the length of the slow playing segment from the first segment, and obtain a third segment after performing video slow playing processing on the second segment.
The slow-playing unit 404 first selects a second segment corresponding to the length of the slow-playing segment from the first segment selected by the selecting unit 403, and after performing video slow-playing processing on the selected second segment, takes the processing result as a third segment. That is to say, the third segment obtained by the slow playing unit 404 is a segment of slow playing video with the highest importance score corresponding to the length of the slow playing segment in the first segment, that is, a segment of slow playing video with the highest wonderness corresponding to the length of the slow playing segment in the first segment.
Specifically, when the slow playing unit 404 selects the second segment corresponding to the length of the slow playing segment from the first segments, the following manner may be adopted: determining a second sliding window according to the length of the slow-playing segment, wherein the second sliding window corresponds to the number of images contained in the selected second segment; and performing sliding window on the video image sequence corresponding to the first segment by using the determined second sliding window, and selecting a segment of image sequence with the highest importance degree score as a second segment.
The length of the slow-playing segment in the slow-playing unit 404 is preset, for example, 2s, so that the second segment selected by the slow-playing unit 404 is the most wonderful segment in the first segment with the duration of 2 s. Preferably, the slow segment length in this application is smaller than the highlight segment length.
In addition, when performing video slow-play processing on the second segment, the slow-play unit 404 may perform video slow-play processing on the second segment by means of image frame interpolation. The slow-playing unit 404 may further set a slow-playing speed of the video slow-playing.
A splicing unit 405, configured to splice two video clips that are adjacent to the second clip in the first clip and the third clip in sequence, and use a splicing result as a video cover of the to-be-processed video.
The splicing unit 405 sequentially splices two video clips, which are adjacent to each other in front of and behind the second clip obtained by the slow-playing unit 404, in the first clip obtained by the selecting unit 403 with the third clip obtained by the slow-playing unit 404, and uses the splicing result as a video cover of the video to be processed. That is, the video cover obtained by the stitching unit 405 is a piece of video played in the form of "original speed-slow speed-original speed".
Fig. 5 is a block diagram of an electronic device for generating a video cover according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 5, the electronic apparatus includes: one or more processors 501, memory 502, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor 501 is taken as an example.
Memory 502 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of generating a video cover provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of generating a video cover provided by the present application.
The memory 502, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for generating video covers in the embodiments of the present application (for example, the obtaining unit 401, the processing unit 402, the selecting unit 403, the slow playing unit 404, the splicing unit 405, and the training unit 406 shown in fig. 4). The processor 501 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 502, that is, implements the method of generating a video cover page in the above-described method embodiments.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device that generates the video cover, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to an electronic device that generates a video cover over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method of generating a video cover may further include: an input device 503 and an output device 504. The processor 501, the memory 502, the input device 503 and the output device 504 may be connected by a bus or other means, and fig. 5 illustrates the connection by a bus as an example.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus that generates the video cover, such as an input device such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of this application embodiment, need not the manual participation of user alright generate high-quality splendid video clip to reduce the generating cost of video front cover, promote the generating efficiency of video front cover, this application can also play the splendid content in the splendid video clip slowly in addition, thereby further promote the bandwagon effect of video front cover.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A method of generating a video cover, the method comprising:
acquiring a video to be processed and a video image sequence corresponding to the video to be processed;
acquiring an importance score sequence corresponding to the video image sequence;
selecting a first segment corresponding to the length of a highlight segment from the video to be processed according to the importance score sequence;
selecting a second segment corresponding to the length of the slow playing segment from the first segment, and performing video slow playing processing on the second segment to obtain a third segment;
and sequentially splicing two video clips which are adjacent to the second clip in front and back of the first clip with the third clip, and taking the splicing result as a video cover of the video to be processed.
2. The method of claim 1, wherein obtaining the sequence of importance scores corresponding to the sequence of video images comprises:
extracting the characteristics of each frame of image in the video image sequence to obtain a characteristic sequence;
and inputting the characteristic sequence into an importance detection network obtained by pre-training, and acquiring an importance score sequence corresponding to the video image sequence according to an output result of the importance detection network.
3. The method of claim 2, wherein the importance detection network is pre-trained by:
acquiring each video and a category marking result corresponding to each video as a training sample;
constructing a classification model containing a recurrent neural network and a classification network;
taking the characteristic sequence corresponding to each video as the input of a recurrent neural network, and acquiring a hidden layer characteristic sequence and an importance fraction sequence output by the recurrent neural network;
taking a matrix multiplication result of the hidden layer characteristic sequence and the importance degree fraction sequence as the input of the classification network, and acquiring the output result of the classification network;
calculating a loss function of the classification model according to the output result and the class marking result corresponding to each video;
and adjusting parameters in the recurrent neural network and the classification network according to the loss function until the obtained loss function is converged, and extracting the recurrent neural network in the classification model as the importance detection network.
4. The method of claim 1, wherein selecting a first segment corresponding to a highlight segment length from the video to be processed according to the importance score sequence comprises:
determining a first sliding window according to the length of the wonderful segment;
and performing sliding window in the video image sequence by using the first sliding window, and selecting a section of image sequence with the highest importance degree score as a first section.
5. The method of claim 1, wherein said selecting a second segment from said first segment corresponding to a length of a slow segment comprises:
determining a second sliding window according to the length of the slow-playing segment;
and performing sliding window in the image sequence corresponding to the first segment by using the second sliding window, and selecting a segment of image sequence with the highest importance degree score as a second segment.
6. The method of claim 1, wherein video slow-playing the second segment comprises:
and performing image interpolation on the second segment according to a preset slow playing speed.
7. An apparatus for generating a video cover, the apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a video to be processed and a video image sequence corresponding to the video to be processed;
the processing unit is used for acquiring an importance score sequence corresponding to the video image sequence;
the selecting unit is used for selecting a first segment corresponding to the length of the highlight segment from the video to be processed according to the importance degree score sequence;
the slow playing unit is used for selecting a second segment corresponding to the length of the slow playing segment from the first segment, and obtaining a third segment after performing video slow playing processing on the second segment;
and the splicing unit is used for sequentially splicing two video clips which are adjacent to the second clip in front and back in the first clip with the third clip, and taking a splicing result as a video cover of the video to be processed.
8. The apparatus according to claim 7, wherein the processing unit, when obtaining the sequence of importance scores corresponding to the sequence of video images, specifically performs:
extracting the characteristics of each frame of image in the video image sequence to obtain a characteristic sequence;
and inputting the characteristic sequence into an importance detection network obtained by pre-training, and acquiring an importance score sequence corresponding to the video image sequence according to an output result of the importance detection network.
9. The apparatus according to claim 8, further comprising a training unit for pre-training the importance detection network by:
acquiring each video and a category marking result corresponding to each video as a training sample;
constructing a classification model containing a recurrent neural network and a classification network;
taking the characteristic sequence corresponding to each video as the input of a recurrent neural network, and acquiring a hidden layer characteristic sequence and an importance fraction sequence output by the recurrent neural network;
taking a matrix multiplication result of the hidden layer characteristic sequence and the importance degree fraction sequence as the input of the classification network, and acquiring the output result of the classification network;
calculating a loss function of the classification model according to the output result and the class marking result corresponding to each video;
and adjusting parameters in the recurrent neural network and the classification network according to the loss function until the obtained loss function is converged, and extracting the recurrent neural network in the classification model as the importance detection network.
10. The apparatus according to claim 7, wherein the selecting unit specifically performs, when selecting the first segment corresponding to the highlight segment length from the video to be processed according to the importance score sequence:
determining a first sliding window according to the length of the wonderful segment;
and performing sliding window in the video image sequence by using the first sliding window, and selecting a section of image sequence with the highest importance degree score as a first section.
11. The apparatus of claim 7, wherein the slow-playing unit, when selecting the second segment corresponding to the length of the slow-playing segment from the first segment, specifically performs:
determining a second sliding window according to the length of the slow-playing segment;
and performing sliding window in the image sequence corresponding to the first segment by using the second sliding window, and selecting a segment of image sequence with the highest importance degree score as a second segment.
12. The apparatus according to claim 7, wherein the slow-play unit, when performing video slow-play processing on the second segment, specifically performs:
and performing image interpolation on the second segment according to a preset slow playing speed.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202010065823.8A 2020-01-20 2020-01-20 Method and device for generating video cover, electronic equipment and computer-readable storage medium Active CN111225236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010065823.8A CN111225236B (en) 2020-01-20 2020-01-20 Method and device for generating video cover, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010065823.8A CN111225236B (en) 2020-01-20 2020-01-20 Method and device for generating video cover, electronic equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN111225236A true CN111225236A (en) 2020-06-02
CN111225236B CN111225236B (en) 2022-03-25

Family

ID=70828303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010065823.8A Active CN111225236B (en) 2020-01-20 2020-01-20 Method and device for generating video cover, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN111225236B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111726685A (en) * 2020-06-28 2020-09-29 百度在线网络技术(北京)有限公司 Video processing method, video processing device, electronic equipment and medium
CN112468743A (en) * 2020-11-09 2021-03-09 泓准达科技(上海)有限公司 Method, device, medium and electronic equipment for displaying hotspot change process
CN113810782A (en) * 2020-06-12 2021-12-17 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN113849088A (en) * 2020-11-16 2021-12-28 阿里巴巴集团控股有限公司 Target picture determining method and device
JP2022037876A (en) * 2020-08-25 2022-03-09 ペキン シャオミ パインコーン エレクトロニクス カンパニー, リミテッド Video clip extraction method, video clip extraction device, and storage medium
WO2022188563A1 (en) * 2021-03-10 2022-09-15 上海哔哩哔哩科技有限公司 Dynamic cover setting method and system

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469065A (en) * 2015-12-07 2016-04-06 中国科学院自动化研究所 Recurrent neural network-based discrete emotion recognition method
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN107995536A (en) * 2017-11-28 2018-05-04 百度在线网络技术(北京)有限公司 A kind of method, apparatus, equipment and computer-readable storage medium for extracting video preview
CN108141645A (en) * 2015-10-20 2018-06-08 微软技术许可有限责任公司 Video emphasis detection with pairs of depth ordering
CN108280757A (en) * 2017-02-13 2018-07-13 腾讯科技(深圳)有限公司 User credit appraisal procedure and device
CN108377417A (en) * 2018-01-17 2018-08-07 百度在线网络技术(北京)有限公司 Video reviewing method, device, computer equipment and storage medium
CN109121021A (en) * 2018-09-28 2019-01-01 北京周同科技有限公司 A kind of generation method of Video Roundup, device, electronic equipment and storage medium
CN109522450A (en) * 2018-11-29 2019-03-26 腾讯科技(深圳)有限公司 A kind of method and server of visual classification
CN109902601A (en) * 2019-02-14 2019-06-18 武汉大学 A kind of video object detection method of combination convolutional network and Recursive Networks
CN109919087A (en) * 2019-03-06 2019-06-21 腾讯科技(深圳)有限公司 A kind of method of visual classification, the method and device of model training
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
CN110602546A (en) * 2019-09-06 2019-12-20 Oppo广东移动通信有限公司 Video generation method, terminal and computer-readable storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108141645A (en) * 2015-10-20 2018-06-08 微软技术许可有限责任公司 Video emphasis detection with pairs of depth ordering
CN105469065A (en) * 2015-12-07 2016-04-06 中国科学院自动化研究所 Recurrent neural network-based discrete emotion recognition method
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN108280757A (en) * 2017-02-13 2018-07-13 腾讯科技(深圳)有限公司 User credit appraisal procedure and device
CN107995536A (en) * 2017-11-28 2018-05-04 百度在线网络技术(北京)有限公司 A kind of method, apparatus, equipment and computer-readable storage medium for extracting video preview
CN108377417A (en) * 2018-01-17 2018-08-07 百度在线网络技术(北京)有限公司 Video reviewing method, device, computer equipment and storage medium
CN109121021A (en) * 2018-09-28 2019-01-01 北京周同科技有限公司 A kind of generation method of Video Roundup, device, electronic equipment and storage medium
CN109522450A (en) * 2018-11-29 2019-03-26 腾讯科技(深圳)有限公司 A kind of method and server of visual classification
CN109902601A (en) * 2019-02-14 2019-06-18 武汉大学 A kind of video object detection method of combination convolutional network and Recursive Networks
CN109919087A (en) * 2019-03-06 2019-06-21 腾讯科技(深圳)有限公司 A kind of method of visual classification, the method and device of model training
CN110191357A (en) * 2019-06-28 2019-08-30 北京奇艺世纪科技有限公司 The excellent degree assessment of video clip, dynamic seal face generate method and device
CN110602546A (en) * 2019-09-06 2019-12-20 Oppo广东移动通信有限公司 Video generation method, terminal and computer-readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113810782A (en) * 2020-06-12 2021-12-17 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN111726685A (en) * 2020-06-28 2020-09-29 百度在线网络技术(北京)有限公司 Video processing method, video processing device, electronic equipment and medium
JP2022037876A (en) * 2020-08-25 2022-03-09 ペキン シャオミ パインコーン エレクトロニクス カンパニー, リミテッド Video clip extraction method, video clip extraction device, and storage medium
JP7292325B2 (en) 2020-08-25 2023-06-16 ペキン シャオミ パインコーン エレクトロニクス カンパニー, リミテッド Video clip extraction method, video clip extraction device and storage medium
CN112468743A (en) * 2020-11-09 2021-03-09 泓准达科技(上海)有限公司 Method, device, medium and electronic equipment for displaying hotspot change process
CN113849088A (en) * 2020-11-16 2021-12-28 阿里巴巴集团控股有限公司 Target picture determining method and device
WO2022188563A1 (en) * 2021-03-10 2022-09-15 上海哔哩哔哩科技有限公司 Dynamic cover setting method and system
CN115086709A (en) * 2021-03-10 2022-09-20 上海哔哩哔哩科技有限公司 Dynamic cover setting method and system

Also Published As

Publication number Publication date
CN111225236B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
CN111225236B (en) Method and device for generating video cover, electronic equipment and computer-readable storage medium
CN110933487B (en) Method, device and equipment for generating click video and storage medium
CN112131988B (en) Method, apparatus, device and computer storage medium for determining virtual character lip shape
EP3902280A1 (en) Short video generation method and platform, electronic device, and storage medium
EP3890294B1 (en) Method and apparatus for extracting hotspot segment from video
CN104866275B (en) Method and device for acquiring image information
CN109726712A (en) Character recognition method, device and storage medium, server
CN111770375B (en) Video processing method and device, electronic equipment and storage medium
JP7223056B2 (en) Image screening method, device, electronic device and storage medium
CN111935502A (en) Video processing method, video processing device, electronic equipment and storage medium
CN111031373A (en) Video playing method and device, electronic equipment and computer readable storage medium
CN111726682A (en) Video clip generation method, device, equipment and computer storage medium
JP7267379B2 (en) Image processing method, pre-trained model training method, device and electronic equipment
CN111970560B (en) Video acquisition method and device, electronic equipment and storage medium
US11615140B2 (en) Method and apparatus for detecting temporal action of video, electronic device and storage medium
CN111669647B (en) Real-time video processing method, device and equipment and storage medium
CN111949820B (en) Video associated interest point processing method and device and electronic equipment
CN111797801B (en) Method and apparatus for video scene analysis
CN112100530B (en) Webpage classification method and device, electronic equipment and storage medium
CN111638787A (en) Method and device for displaying information
CN111309200A (en) Method, device, equipment and storage medium for determining extended reading content
CN113542802B (en) Video transition method and device
US20220328076A1 (en) Method and apparatus of playing video, electronic device, and storage medium
CN111970559B (en) Video acquisition method and device, electronic equipment and storage medium
CN111611415B (en) Picture display method, server, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant