CN110121105B - Clip video generation method and device - Google Patents

Clip video generation method and device Download PDF

Info

Publication number
CN110121105B
CN110121105B CN201810119047.8A CN201810119047A CN110121105B CN 110121105 B CN110121105 B CN 110121105B CN 201810119047 A CN201810119047 A CN 201810119047A CN 110121105 B CN110121105 B CN 110121105B
Authority
CN
China
Prior art keywords
video
alternative
shot
shooting object
clipped
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810119047.8A
Other languages
Chinese (zh)
Other versions
CN110121105A (en
Inventor
狄杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Youku Culture Technology Beijing Co ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN201810119047.8A priority Critical patent/CN110121105B/en
Publication of CN110121105A publication Critical patent/CN110121105A/en
Application granted granted Critical
Publication of CN110121105B publication Critical patent/CN110121105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • H04N21/8113Monomedia components thereof involving special audio data, e.g. different tracks for different languages comprising music, e.g. song in MP3 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The present disclosure relates to a clip video generation method and apparatus, the method comprising: carrying out image recognition on an original video, and determining a shooting object in the original video according to a recognition result; determining frames of the shot objects in the original video according to the determined shot objects; according to the determined frames of the shot objects, capturing alternative videos of the shot objects from the original video; and obtaining the shooting object clip video of the original video according to the alternative video of each shooting object. The video clipped by the shooting object clipped by the method can quickly and objectively embody the original video.

Description

Clip video generation method and device
Technical Field
The present disclosure relates to the field of video production, and in particular, to a method and an apparatus for generating a clip video.
Background
In order to promote the content of the video, the video is generally required to be edited by shooting objects for video production. When, for example, a new movie or a new piece of television is to be shown, the earlier promotion is usually performed using a trailer (clip video) of the movie or television. At present, when a video is processed (for example, a participant who needs to cut a certain video), the staff table is manually browsed, then shooting object segments in the video are manually found, and then a localized cutting tool is applied to manually splice and cut performance segments of each shooting object in a film. The problem of low efficiency is obviously shown when massive videos exist. Moreover, the more famous stars are clipped out, the longer the time of the segments is, and more artificial subjective factors are added.
Disclosure of Invention
In view of this, the present disclosure provides a clipped video generating method and apparatus, so as to solve the problem that a manually generated clipped video of a shooting object is not visible.
According to an aspect of the present disclosure, there is provided a clip video generating method including:
carrying out image recognition on an original video, and determining a shooting object in the original video according to a recognition result;
determining frames of the shot objects in the original video according to the determined shot objects;
according to the determined frames of the shot objects, capturing alternative videos of the shot objects from the original video;
and obtaining the shooting object clip video of the original video according to the alternative video of each shooting object.
In one possible implementation manner, intercepting the alternative video of each of the photographic subjects in the original video includes:
and intercepting alternative videos of the shot objects in the original video, wherein the alternative videos of the shot objects comprise frames in which the shot objects appear.
In one possible implementation manner, obtaining a subject clip video of the original video according to the candidate video of each subject includes:
determining a video to be edited of each shooting object in the alternative videos of each shooting object according to the display weight of the shooting object in the alternative videos of the shooting object;
and obtaining the shooting object clipping video of the original video according to the video to be clipped of each shooting object.
In one possible implementation manner, determining a video to be edited of each photographic subject in the alternative videos of the photographic subject according to a display weight of the photographic subject in the alternative videos of the photographic subject includes:
determining display weight of an image of a shooting object in a frame picture of an alternative video of the shooting object;
and determining the video to be edited of each shooting object in the alternative videos of each shooting object according to the display weight and the weight threshold.
In one possible implementation manner, determining a video to be edited of each photographic subject in the alternative videos of the photographic subject according to a display weight of the photographic subject in the alternative videos of the photographic subject includes:
and determining the video to be edited of each shooting object in the alternative videos of each shooting object according to the display weight and definition of the shooting object in the alternative videos of the shooting object.
In one possible implementation, the method further includes:
and adding editing information into the shot object clipped video to obtain the shot object edited video of the original video, wherein the editing information comprises one or any combination of background music information, image-text special effect information and subtitle text information.
In one possible implementation, the method further includes:
and obtaining a shot object clip video of the shot object according to the alternative video of the shot object intercepted from the plurality of original videos.
According to another aspect of the present disclosure, there is provided a clip video generating apparatus including:
the shot object identification module is used for carrying out image identification on the original video and determining a shot object in the original video according to an identification result;
the shooting object frame determining module is used for determining frames of all the shooting objects in the original video according to the determined shooting objects;
the alternative video acquisition module is used for intercepting alternative videos of the shot objects in the original video according to the determined frames of the shot objects;
and the clip video acquisition module is used for obtaining the clipped video of the shooting object of the original video according to the alternative video of each shooting object.
In one possible implementation manner, the alternative video obtaining module includes:
and the alternative video acquisition sub-module is used for intercepting alternative videos of the shot objects in the original video, wherein the alternative videos of the shot objects comprise frames in which the shot objects appear.
In one possible implementation, the clip video obtaining module includes:
the to-be-clipped video acquisition sub-module is used for determining to-be-clipped videos of the shot objects in the alternative videos of the shot objects according to the display weights of the shot objects in the alternative videos of the shot objects;
and the clip video acquisition sub-module is used for obtaining the clip video of the shooting object of the original video according to the video to be clipped of each shooting object.
In a possible implementation manner, the to-be-clipped video obtaining sub-module includes:
the weight determining submodule is used for determining the display weight of the image of the shot object in the frame picture of the alternative video of the shot object;
and the first video to be clipped submodule is used for determining the video to be clipped of each shooting object in the alternative videos of each shooting object according to the display weight and the weight threshold value.
In a possible implementation manner, the to-be-clipped video obtaining sub-module includes:
and the second video sub-module to be clipped is used for determining the video to be clipped of each shooting object in the alternative videos of each shooting object according to the display weight and the definition of the shooting object in the alternative videos of the shooting object.
In one possible implementation, the apparatus further includes:
and the editing information module is used for adding editing information into the shot object clipped video to obtain the shot object edited video of the original video, wherein the editing information comprises one or any combination of background music information, image-text special effect information and subtitle text information.
In one possible implementation, the apparatus further includes:
and the shot object clip video acquisition module is used for acquiring the shot object clip video of the shot object according to the alternative video of the shot object intercepted from the plurality of original videos.
According to another aspect of the present disclosure, there is provided a clip video generating apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: the steps described in the above clip video generation method are performed.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps described in the above-described clip video generation method.
In the method, after image recognition is carried out on an original video, shooting objects in the original video are determined, frames of the shooting objects and alternative videos of the shooting objects are obtained according to the determined shooting objects, and finally the shooting object clip video of the original video is obtained by using the alternative videos of the shooting objects. Therefore, the main contents played by each shooting object can be completely and accurately obtained from the original video, and the requirement of acquiring the contents played by any shooting object from the original video is met.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 shows a flow diagram of a clip video generation method according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a clip video generation method according to another embodiment of the present disclosure;
FIG. 3 shows a flow diagram of a clip video generation method according to another embodiment of the present disclosure;
FIG. 4 shows a flow diagram of a clip video generation method according to another embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a clip video generation method according to another embodiment of the present disclosure;
FIG. 6 shows a flow diagram of a clip video generation method according to another embodiment of the present disclosure;
FIG. 7 shows a flow diagram of a clip video generation method according to another embodiment of the present disclosure;
fig. 8 illustrates an application example diagram of a clip video generation method according to an embodiment of the present disclosure;
fig. 9 shows a block diagram of a clip video generation apparatus according to an embodiment of the present disclosure;
fig. 10 shows a block diagram of a clip video generation apparatus according to another embodiment of the present disclosure;
fig. 11 is a block diagram illustrating an apparatus for clip video generation according to an example embodiment.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Fig. 1 illustrates a flowchart of a clip video generation method according to an embodiment of the present disclosure, as illustrated in fig. 1, the clip video generation method including:
and step S10, carrying out image recognition on the original video, and determining the shooting object in the original video according to the recognition result.
The photographic subject includes a person, an animal, and the like in the original video. The image recognition can recognize each photographic subject based on the feature information of the photographic subject. After an image or video stream containing a shot object is acquired by a camera or a camera, the shot object in the image is detected and a human face is tracked by an artificial intelligence technology, and then the detected shot object is identified. Image recognition includes portrait recognition, facial recognition, and the like. For example, the participating shooting objects can be quickly and accurately identified in the video by utilizing the neural network model based on image identification.
Step S20, determining, in the original video, a frame in which each of the photographic subjects appears, based on the determined photographic subjects.
In one possible implementation, when determining the frame in which each of the photographic subjects appears according to the recognized photographic subject, determining one frame in a video segment in which the photographic subject appears continuously includes determining a plurality of frames in the video segment in which the photographic subject appears.
For example, in one original video, the photographic subject a appears in the video segment 1, the video segment 2, and the video segment 3, respectively. One frame (or five frames) of the photographic subject a in the video segment 1 is determined as the frame 1 in which the photographic subject a appears; one frame (or five frames) in the video segment 2 is determined as a frame 2 in which the photographic subject a appears; one frame (or five frames) in the video segment 3 is determined as the frame 3 in which the photographic subject a appears.
Further, according to the result of image recognition, the proportion of the image of the photographic subject in the whole frame image is calculated in one frame image in which the photographic subject appears, and when the calculated proportion is larger than a set threshold value, the frame image is determined as the frame in which the photographic subject appears.
And step S30, capturing alternative videos of the respective photographic objects from the original video according to the determined frames where the respective photographic objects appear.
In a possible implementation manner, the extracted alternative video of each photographic subject can embody main content of each photographic subject showing in the original video. Alternative videos of different durations may be extracted depending on the purpose of the clipping. For example, a candidate video of 10 seconds length including a frame in which each of the photographic subjects appears is extracted, or a candidate video of 1 second length including a frame in which each of the photographic subjects appears is extracted.
Since there are a plurality of frames in which each subject appears extracted from one original video, there are a plurality of candidate videos of the subject extracted from the frames in which each subject appears. For example, the extraction time period of the candidate video of the photographic subject a is 20 seconds. And extracting an alternative video 1 of the photographic subject A according to the frame 1 of the occurrence of the photographic subject A, wherein the alternative video 1 is a video clip with the time range of 00:01:29-00:01:49 in the original video. And extracting an alternative video 2 of the photographic subject A according to the frame 2 in which the photographic subject A appears, wherein the alternative video 2 is a video clip with the time range of 00:15:19-00:15:39 in the original video. And extracting an alternative video 3 of the photographic subject A according to the frame 3 in which the photographic subject A appears, wherein the alternative video 3 is a video clip with the time range of 00:28:29-00:28:49 in the original video.
And step S40, obtaining the clipped video of the original video according to the alternative video of each shooting object.
In one possible implementation, the editing is a process of obtaining a complete edited video after selecting, dividing, decomposing and assembling video segments. Because the alternative videos of the shooting objects are the embodiment of the main contents of the exhibition of the shooting objects in the original video, the shooting object clipped video formed by clipping the alternative videos of the shooting objects can completely and accurately embody the contents of the exhibition of the shooting objects in the original video. For example, after the candidate video 1, the candidate video 2, and the candidate video 3 of the photographic subject a are clipped, a photographic subject clipped video of the photographic subject a having a duration of 1 minute is generated. Similarly, when the original video further includes the photographic subject B and the photographic subject C, the photographic subject a, the photographic subject B, and the alternative video of the photographic subject C are clipped together to obtain the photographic subject clipped video of the original video.
In this embodiment, after image recognition is performed on an original video, a shooting object in the original video is determined, a frame where each shooting object appears and a candidate video of each shooting object are obtained according to the determined shooting object, and finally a shooting object clip video of the original video is obtained by using the candidate video of each shooting object. The alternative video of each shooting object, including the frame of each shooting object, can embody the main contents of each shooting object in the original video, so the edited video of the shooting object can embody the original video rapidly and objectively.
Fig. 2 shows a flowchart of a clip video generation method according to another embodiment of the present disclosure, and as shown in fig. 2, differs from the above-described embodiment in that step S30 in the method includes:
and step S31, intercepting alternative videos of each shooting object in the original video, wherein the alternative videos of the shooting objects comprise frames in which the shooting objects appear.
When the alternative video is intercepted in the original video, the alternative video can be intercepted by different shooting objects according to the same set time length. And according to the importance degree of the shot object in the original video, different shot objects can intercept alternative videos with different time lengths. For example, if the object a is a main corner and the object B is a main corner, the original video is cut out from the candidate video with the set time length of the object a being 20 seconds and the original video is cut out from the candidate video with the set time length of the object B being 10 seconds. The set time length of the alternative video is shorter when the alternative video is located in the first third of the original video, and the set time length of the alternative video is longer when the alternative video is located in the middle or the last third of the original video. And when the time length of the actual frame of the shooting object is less than the set time length, using the actual frame of the shooting object as the alternative video.
If the set time length is N seconds, extracting the alternative video of the shot object with the time length of N seconds from the original video, and intercepting the alternative video by various methods. For example, a frame in which the shooting object appears is taken as a starting frame of the candidate video, and a video clip of N seconds is extracted from the original video as the candidate video. For another example, the frame where the shooting object appears is taken as the end frame of the candidate video, and a video clip of N seconds is extracted from the original video as the candidate video. For another example, with the frame where the subject appears as the center of the candidate video, N/2 second video segments are extracted from the original video as candidate videos.
Fig. 3 shows a flowchart of a clip video generation method according to another embodiment of the present disclosure, and as shown in fig. 3, differs from the above-described embodiment in that step S40 in the method includes:
step S41, determining a video to be clipped of each of the photographic subjects in the candidate videos of each of the photographic subjects according to the display weight of the photographic subject in the candidate videos of the photographic subject.
And step S42, obtaining the clipped video of the shooting object of the original video according to the video to be clipped of each shooting object.
In one possible implementation, multiple subjects are typically present in a shot scene of the original video, with only one or two primary subjects, and other subjects present may be unrelated to the scene content. Therefore, in the candidate video where the photographic subject is located, the degree of contribution of the photographic subject can be represented by the display weight of each photographic subject in the scene. In actual use, it is preferable to select an alternative video in which the photographic subject appears alone in the frame image, and to select an alternative video in which the image of the photographic subject is dominant in the frame image. For example, it is preferable to capture alternative videos in which the proportion of the image of the subject in the frame image is more than fifty percent. The display weight may be adjusted according to the requirements of the clip. And determining the video to be clipped of each shooting object according to the display weight of the shooting object, so that the content in the video to be clipped is the content really related to the shooting object, and the final clipping result is more accurate.
Fig. 4 shows a flowchart of a clip video generation method according to another embodiment of the present disclosure, and as shown in fig. 4, the difference from the above-described embodiment is that step S41 in the method includes:
in step S411, a display weight of the image of the photographic subject in the frame of the candidate video of the photographic subject is determined.
Step 412, determining the video to be clipped of each shooting object in the alternative video of each shooting object according to the display weight and the weight threshold.
In one possible implementation manner, the display weight of the photographic subject can be obtained by calculating the weight proportion of the image of the photographic subject in the image of the candidate video. And then, comparing the calculated weight proportion with a set weight threshold, and determining the candidate video of the shooting object with the weight proportion larger than the weight threshold as the video to be edited of the shooting object. For example, the weight ratio of the photographic subject a in the candidate video 1 is 0.8, the weight ratio in the candidate video 2 is 0.5, the weight ratio in the candidate video 3 is 0.7, and the weight threshold is 0.6. And determining the alternative video 1 and the alternative video 3 as the video to be edited of the shooting object A because the weight proportion of the shooting object A in the alternative video 1 and the alternative video 3 is greater than the weight threshold value.
Fig. 5 shows a flowchart of a clip video generation method according to another embodiment of the present disclosure, and as shown in fig. 5, differs from the above-described embodiment in that the method step S41 further includes:
step S413, determining a video to be clipped of each of the photographic subjects in the alternative videos of each of the photographic subjects according to the display weight and definition of the photographic subjects in the alternative videos of the photographic subjects.
When the alternative video of the shot object is determined according to the display weight, the definition of the alternative video appearing in the shot object is considered, the alternative video with the definition not meeting the set standard is removed, and the video to be edited of the shot object is synthesized, so that the obtained video to be edited has high image quality and good editing effect.
Fig. 6 shows a flowchart of a clip video generation method according to another embodiment of the present disclosure, and as shown in fig. 6, differs from the above-described embodiment in that the method further includes:
and step S50, adding editing information into the shot object clipped video to obtain the shot object edited video of the original video, wherein the editing information comprises one or any combination of background music information, image-text special effect information and subtitle text information.
The editing information may also be other information that needs to be added to the clip video at the later stage of the actual application scene, and is not limited in this disclosure. Editing information such as background music, image-text special effects, subtitles and the like is added into the clipped video, so that the clipped video of the shot object has a better watching effect. The editing information also includes a watermark in the form of text or images.
According to the purpose of clipping, all the alternative videos of the shooting objects can be clipped, and part of the alternative videos of the shooting objects can be selected to be clipped.
In one possible implementation, to improve the efficiency of adding editing information, a template for generating editing information is provided. Different video effects can be realized by selecting different templates according to different requirements. Corresponding editing information can be extracted from the original video according to the template, and the editing information can also be manually input according to the template.
For example: the subtitle text information added to the clip video by adopting the template can be text extracted from subtitles of the original video and corresponding to frame segments, and can also be content input by editors. The fonts of the characters added into the clipped video can be subjected to processing such as thickening, amplifying, special effects and the like by adopting various templates.
For another example, the background music added to the clip video by using the template may be a music piece extracted from a beginning piece, an end piece, or other related audio of the original video, or may be other audio recorded or selected by an editor, and the like.
For example, the template is used to perform special effect processing such as enlarging and displaying the head portrait of the character in the clipped video, or perform special effect processing such as changing color and fragmenting the characters added in the clipped video.
Fig. 7 shows a flowchart of a clip video generation method according to another embodiment of the present disclosure, and as shown in fig. 7, differs from the above-described embodiment in that the method further includes:
step S60, obtaining a subject clip video of the subject from the candidate video of the subject captured in the plurality of original videos.
In one possible implementation manner, when one photographic subject has a plurality of original videos, alternative videos of the photographic subject intercepted in the plurality of original videos may be collectively edited into a photographic subject edited video of the photographic subject. For example, the shot object a appears in both the original video 1 and the original video 2, the alternative video a and the alternative video b of the shot object a are intercepted in the original video 1, the alternative video c, the alternative video d and the alternative video e of the shot object a are intercepted in the original video 2, and then the alternative video a, the alternative video b, the alternative video c, the alternative video d and the alternative video e are clipped to obtain a shot object clipped video of the shot object a.
Application example:
fig. 8 illustrates a flowchart of a clip video generation method according to another embodiment of the present disclosure, as illustrated in fig. 8, the method including:
step 1, recognizing a plurality of shooting objects, namely star 1, star 2, star 3 and star 4 from the original video on the left side by using an image recognition technology. And determining a frame in which the photographic subject appears. And intercepting the alternative video of each shot object in the original video according to the frame of each shot object. As shown in fig. 8, the second column from the left are alternate videos for star 1, star 2, star 3, and star 4, respectively.
And 2, determining the video to be edited of each shot object according to the alternative video of each shot object. For example, a video to be clipped of each photographic subject is determined according to the display weight of the photographic subject.
And 3, automatically clipping according to the video to be clipped of each shot object to obtain the clipped video of the shot object of the original video.
And 4, adding editing information such as background music, image-text special effects, subtitle files and the like in the process of editing the video by the shooting object of the original video to obtain the final film.
Fig. 9 illustrates a block diagram of a clip video generating apparatus according to an embodiment of the present disclosure, as illustrated in fig. 9, the apparatus including:
a shot object identification module 41, configured to perform image identification on an original video, and determine a shot object in the original video according to an identification result;
a shooting object frame determining module 42, configured to determine, according to the determined shooting objects, frames in the original video in which the shooting objects appear;
an alternative video obtaining module 43, configured to intercept, in the original video, an alternative video of each of the shot objects according to the determined frame where each of the shot objects appears;
and the clip video acquisition module 44 is configured to obtain a subject clip video of the original video according to the alternative video of each subject.
Fig. 10 shows a block diagram of a clip video generating apparatus according to another embodiment of the present disclosure, as shown in fig. 10, in one possible implementation, the alternative video obtaining module 43 includes:
and the alternative video acquisition submodule 431 is used for intercepting alternative videos of all the shot objects with set time length in the original video, wherein the alternative videos of the shot objects comprise frames where the shot objects appear.
In one possible implementation, the clip video capture module 44 includes:
the to-be-clipped video acquisition sub-module 441 is configured to determine, according to the display weight of the photographic subject in the alternative videos of the photographic subject, a to-be-clipped video of each photographic subject in the alternative videos of each photographic subject;
the clipped video obtaining sub-module 442 is configured to obtain a clipped video of the shooting object of the original video according to the video to be clipped of each shooting object.
In a possible implementation manner, the to-be-clipped video obtaining sub-module 431 includes:
the weight determining submodule is used for determining the display weight of the image of the shot object in the frame picture of the alternative video of the shot object;
and the first video to be clipped submodule is used for determining the video to be clipped of each shooting object in the alternative videos of each shooting object according to the display weight and the weight threshold value.
In a possible implementation manner, the to-be-clipped video obtaining sub-module 431 includes:
and the second video sub-module to be clipped is used for determining the video to be clipped of each shooting object in the alternative videos of each shooting object according to the display weight and the definition of the shooting object in the alternative videos of the shooting object.
In one possible implementation, the apparatus further includes:
and an editing information module 45, configured to add editing information to the shot object clip video to obtain a shot object editing video of the original video, where the editing information includes one or any combination of background music information, text-text special effect information, and subtitle text information.
In one possible implementation, the apparatus further includes:
and a subject clip video acquiring module 46, configured to obtain a subject clip video of the subject according to the candidate video of the subject captured in the plurality of original videos.
Fig. 11 is a block diagram illustrating an apparatus 1900 for clip video generation according to an example embodiment. For example, the apparatus 1900 may be provided as a server. Referring to FIG. 11, the device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by the processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.
The device 1900 may also include a power component 1926 configured to perform power management of the device 1900, a wired or wireless network interface 1950 configured to connect the device 1900 to a network, and an input/output (I/O) interface 1958. The device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the apparatus 1900 to perform the above-described methods.
The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (14)

1. A clip video generation method, the method comprising:
carrying out image recognition on an original video, and determining a shooting object in the original video according to a recognition result;
determining frames of the shot objects in the original video according to the determined shot objects;
according to the determined frames of the shot objects, capturing alternative videos of the shot objects from the original video;
obtaining a shot object clip video of the original video according to the alternative videos of the shot objects, wherein the steps comprise: determining a video to be edited of each shooting object in the alternative videos of each shooting object according to the display weight of the shooting object in the alternative videos of the shooting object; and obtaining the clipped video of the shooting object of the original video according to the video to be clipped of each shooting object, wherein the display weight is used for representing the weight proportion of the image of the shooting object in the image of the alternative video.
2. The method of claim 1, wherein intercepting the alternative video for each of the subjects in the original video comprises:
and intercepting alternative videos of the shot objects in the original video, wherein the alternative videos of the shot objects comprise frames in which the shot objects appear.
3. The method of claim 1, wherein determining the video to be edited of each photographic subject in the alternative videos of each photographic subject according to the display weight of the photographic subject in the alternative videos of the photographic subject comprises:
determining display weight of an image of a shooting object in a frame picture of an alternative video of the shooting object;
and determining the video to be edited of each shooting object in the alternative videos of each shooting object according to the display weight and the weight threshold.
4. The method of claim 1, wherein determining the video to be edited of each photographic subject in the alternative videos of each photographic subject according to the display weight of the photographic subject in the alternative videos of the photographic subject comprises:
and determining the video to be edited of each shooting object in the alternative videos of each shooting object according to the display weight and definition of the shooting object in the alternative videos of the shooting object.
5. The method of claim 1, further comprising:
and adding editing information into the shot object clipped video to obtain the shot object edited video of the original video, wherein the editing information comprises one or any combination of background music information, image-text special effect information and subtitle text information.
6. The method of claim 1, further comprising:
and obtaining a shot object clip video of the shot object according to the alternative video of the shot object intercepted from the plurality of original videos.
7. A clip video generation apparatus, characterized in that the apparatus comprises:
the shot object identification module is used for carrying out image identification on the original video and determining a shot object in the original video according to an identification result;
the shooting object frame determining module is used for determining frames of all the shooting objects in the original video according to the determined shooting objects;
the alternative video acquisition module is used for intercepting alternative videos of the shot objects in the original video according to the determined frames of the shot objects;
a clip video acquisition module, configured to obtain a subject clip video of the original video according to the candidate video of each subject, where the clip video acquisition module includes: the to-be-clipped video acquisition sub-module is used for determining to-be-clipped videos of the shot objects in the alternative videos of the shot objects according to the display weights of the shot objects in the alternative videos of the shot objects; and the clip video acquisition sub-module is used for obtaining the clip video of the shooting object of the original video according to the video to be clipped of each shooting object, wherein the display weight is used for representing the weight proportion of the image of the shooting object in the image of the alternative video.
8. The apparatus of claim 7, wherein the alternative video acquisition module comprises:
and the alternative video acquisition sub-module is used for intercepting alternative videos of the shot objects in the original video, wherein the alternative videos of the shot objects comprise frames in which the shot objects appear.
9. The apparatus of claim 7, wherein the to-be-clipped video acquisition sub-module comprises:
the weight determining submodule is used for determining the display weight of the image of the shot object in the frame picture of the alternative video of the shot object;
and the first video to be clipped submodule is used for determining the video to be clipped of each shooting object in the alternative videos of each shooting object according to the display weight and the weight threshold value.
10. The apparatus of claim 7, wherein the to-be-clipped video acquisition sub-module comprises:
and the second video sub-module to be clipped is used for determining the video to be clipped of each shooting object in the alternative videos of each shooting object according to the display weight and the definition of the shooting object in the alternative videos of the shooting object.
11. The apparatus of claim 7, further comprising:
and the editing information module is used for adding editing information into the shot object clipped video to obtain the shot object edited video of the original video, wherein the editing information comprises one or any combination of background music information, image-text special effect information and subtitle text information.
12. The apparatus of claim 7, further comprising:
and the shot object clip video acquisition module is used for acquiring the shot object clip video of the shot object according to the alternative video of the shot object intercepted from the plurality of original videos.
13. A clip video generation apparatus, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: performing the method of any one of claims 1 to 6.
14. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of claims 1 to 6.
CN201810119047.8A 2018-02-06 2018-02-06 Clip video generation method and device Active CN110121105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810119047.8A CN110121105B (en) 2018-02-06 2018-02-06 Clip video generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810119047.8A CN110121105B (en) 2018-02-06 2018-02-06 Clip video generation method and device

Publications (2)

Publication Number Publication Date
CN110121105A CN110121105A (en) 2019-08-13
CN110121105B true CN110121105B (en) 2022-04-29

Family

ID=67519974

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810119047.8A Active CN110121105B (en) 2018-02-06 2018-02-06 Clip video generation method and device

Country Status (1)

Country Link
CN (1) CN110121105B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110545408B (en) * 2019-09-06 2021-01-26 苏州凌犀物联网技术有限公司 Intelligent manufacturing display system and method based on intelligent service platform
CN110855904B (en) * 2019-11-26 2021-10-01 Oppo广东移动通信有限公司 Video processing method, electronic device and storage medium
CN110996112A (en) * 2019-12-05 2020-04-10 成都市喜爱科技有限公司 Video editing method, device, server and storage medium
CN112135046B (en) * 2020-09-23 2022-06-28 维沃移动通信有限公司 Video shooting method, video shooting device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103535023A (en) * 2011-05-18 2014-01-22 高智83基金会有限责任公司 Video summary including a particular person
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
CN105493512A (en) * 2014-12-14 2016-04-13 深圳市大疆创新科技有限公司 Video processing method, video processing device and display device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113766161B (en) * 2014-12-14 2023-06-20 深圳市大疆创新科技有限公司 Video processing method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103535023A (en) * 2011-05-18 2014-01-22 高智83基金会有限责任公司 Video summary including a particular person
CN105493512A (en) * 2014-12-14 2016-04-13 深圳市大疆创新科技有限公司 Video processing method, video processing device and display device
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device

Also Published As

Publication number Publication date
CN110121105A (en) 2019-08-13

Similar Documents

Publication Publication Date Title
CN110121105B (en) Clip video generation method and device
CN109803180B (en) Video preview generation method and device, computer equipment and storage medium
CN112954450B (en) Video processing method and device, electronic equipment and storage medium
CN109218629B (en) Video generation method, storage medium and device
US10657379B2 (en) Method and system for using semantic-segmentation for automatically generating effects and transitions in video productions
WO2019042341A1 (en) Video editing method and device
CN111464833A (en) Target image generation method, target image generation device, medium, and electronic apparatus
CN110832583A (en) System and method for generating a summary storyboard from a plurality of image frames
CN110121104A (en) Video clipping method and device
US10897658B1 (en) Techniques for annotating media content
CN114222196A (en) Method and device for generating short video of plot commentary and electronic equipment
US20150035835A1 (en) Enhanced video description
CN108960130B (en) Intelligent video file processing method and device
CN117851639A (en) Video processing method, device, electronic equipment and storage medium
CN110582021B (en) Information processing method and device, electronic equipment and storage medium
CN111881734A (en) Method and device for automatically intercepting target video
CN113691835B (en) Video implantation method, device, equipment and computer readable storage medium
CN110312171B (en) Video clip extraction method and device
US11490170B2 (en) Method for processing video, electronic device, and storage medium
CN117197308A (en) Digital person driving method, digital person driving apparatus, and storage medium
CN108924588B (en) Subtitle display method and device
CN114299428A (en) Cross-media video character recognition method and system
JP2014229092A (en) Image processing device, image processing method and program therefor
KR20160038375A (en) Contents creation apparatus and method for operating the contents creation apparatus
CN109151523B (en) Multimedia content acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200521

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: Alibaba (China) Co.,Ltd.

Address before: 200241 room 1162, building 555, Dongchuan Road, Shanghai, Minhang District

Applicant before: SHANGHAI QUAN TOODOU CULTURAL COMMUNICATION Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240628

Address after: 101400 Room 201, 9 Fengxiang East Street, Yangsong Town, Huairou District, Beijing

Patentee after: Youku Culture Technology (Beijing) Co.,Ltd.

Country or region after: China

Address before: 310052 room 508, 5th floor, building 4, No. 699 Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee before: Alibaba (China) Co.,Ltd.

Country or region before: China