CN114401440A - Video clip and clip model generation method, device, apparatus, program, and medium - Google Patents

Video clip and clip model generation method, device, apparatus, program, and medium Download PDF

Info

Publication number
CN114401440A
CN114401440A CN202111530280.3A CN202111530280A CN114401440A CN 114401440 A CN114401440 A CN 114401440A CN 202111530280 A CN202111530280 A CN 202111530280A CN 114401440 A CN114401440 A CN 114401440A
Authority
CN
China
Prior art keywords
target object
image
video
target
clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111530280.3A
Other languages
Chinese (zh)
Inventor
洪嘉慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202111530280.3A priority Critical patent/CN114401440A/en
Publication of CN114401440A publication Critical patent/CN114401440A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The present disclosure relates to a video clip and clip model generation method, apparatus, device, program, and medium, the method including: acquiring a target object image set, wherein the target object image set comprises at least one first area image of a target object; acquiring a video to be edited, and obtaining a plurality of target video segments containing target objects in the video to be edited according to at least one first area image; in response to a change in the image of the target object within the target video segment, a second region image of the target object is generated and incorporated into the target object image set. By adopting the method and the device, the target object image set of the target object can be automatically referred to, and a plurality of target video segments including the target object are clipped from the video to be clipped, so that the time consumption of video clipping can be shortened and the operation efficiency of the video clipping can be improved.

Description

Video clip and clip model generation method, device, apparatus, program, and medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, a program, and a medium for generating a video clip and a clip model.
Background
In the related art, when a user needs to clip all video segments including a target character in a certain video, the user needs to see the end of the video all the time from the beginning of the video, manually mark the start frame and the end frame corresponding to each video segment where the target character appears one by one in the process of watching the video, and then clip each video segment based on the corresponding start frame and end frame of each marked video segment. The related art method for cutting all video segments containing the target person in a certain video is long in time and low in operation efficiency.
Disclosure of Invention
The present disclosure provides a video clip and a clip model generation method, apparatus, device, program, and medium, which at least solve the problems of long time consumption and low operation efficiency in a manner of clipping all video segments including a target person in a certain video in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a video clip and clip model generation method, including:
acquiring a target object image set, wherein the target object image set comprises at least one first area image of a target object;
acquiring a video to be edited, and obtaining a plurality of target video segments containing the target object in the video to be edited according to the at least one first area image;
in response to a change in an image of a target object within the target video segment, generating a second region image of the target object and merging the second region image into the set of target object images.
Optionally, the at least one first region image corresponds to at least one angle or at least one morphology of the target object.
Optionally, the acquiring the target object image set includes:
and searching the target object image set based on the identification of the target object in a pre-established appearance database.
Optionally, the acquiring the target object image set includes:
acquiring at least one to-be-processed image input by a user;
identifying an object contained in each image to be processed;
determining the same target object contained in each image to be processed from the objects contained in each image to be processed;
and extracting at least one first area image of the target object from the at least one image to be processed.
Optionally, after extracting at least one first region image of the target object from the at least one image to be processed, the method further includes:
and in response to the modification operation of the user on any area image in the at least one first area image, modifying the any area image.
Optionally, after extracting at least one first region image of the target object from the at least one image to be processed, the method further includes:
determining a third area image of other angles of the target object based on the extracted at least one first area image of the target object, wherein the other angles are angles except for the angle corresponding to the at least one first area image;
merging the third region image into the target object image set.
Optionally, after extracting at least one first region image of the target object from the at least one image to be processed, the method further includes:
responding to the sharing operation of the at least one first area image, and acquiring the identification of the target object input by the user;
uploading the at least one first region image and the identification association of the target object into an appearance database.
Optionally, the acquiring the target object image set includes:
acquiring a fourth area image of a target object at a preset angle;
taking the fourth area image as a tracking target, and performing tracking processing on the video to be clipped to determine at least one video frame containing the target object in the video to be clipped;
at least one first region image of the target object is extracted from the at least one video frame.
Optionally, after obtaining a number of target video segments containing the target object in the video to be clipped according to the at least one first region image, the method further includes:
and outputting the target video clip.
Optionally, after outputting the target video segment, the method further comprises:
and in response to a user's deletion operation on any one of the target video segments, deleting the any one of the target video segments.
Optionally, the obtaining, according to the at least one first region image, a plurality of target video segments including the target object in the video to be clipped includes:
and inputting the at least one first area image and the video to be clipped into a pre-trained clipping model to obtain a plurality of target video segments containing the target object in the video to be clipped.
Optionally, after deleting any of the target video segments, the method further includes:
and performing optimization training on the clipping model based on the target video segment of which any video segment is deleted.
Optionally, the obtaining, according to the at least one first region image, a plurality of target video segments including the target object in the video to be clipped includes:
if the target video segment of the target object contained in the video to be clipped is not identified according to the at least one first area image, outputting prompt information, wherein the prompt information is used for prompting a user that the target video segment of the target object contained in the video to be clipped is not identified.
Optionally, the generating a second region image of the target object includes:
generating a second region image of a target object within the target video segment in response to a manual matting operation on an image of the target object.
According to a second aspect of the embodiments of the present disclosure, there is provided a video clip and clip model generation apparatus including:
an acquisition unit configured to perform acquiring a target object image set, the target object image set comprising at least one first region image of a target object;
the editing unit is configured to acquire a video to be edited and obtain a plurality of target video segments containing the target object in the video to be edited according to the at least one first area image;
a merging unit configured to perform generating a second region image of the target object in response to a change in an image of the target object within the target video segment, and merging the second region image into the target object image set.
Optionally, the at least one first region image corresponds to at least one angle or at least one morphology of the target object.
Optionally, the obtaining unit is configured to perform:
and searching the target object image set based on the identification of the target object in a pre-established appearance database.
Optionally, the obtaining unit is configured to perform:
acquiring at least one to-be-processed image input by a user;
identifying an object contained in each image to be processed;
determining the same target object contained in each image to be processed from the objects contained in each image to be processed;
and extracting at least one first area image of the target object from the at least one image to be processed.
Optionally, the apparatus further comprises:
a modification unit configured to perform a modification operation on any one of the at least one first region image in response to a modification operation on the any region image by the user.
Optionally, the merging unit is further configured to perform:
determining a third area image of other angles of the target object based on the extracted at least one first area image of the target object, wherein the other angles are angles except for the angle corresponding to the at least one first area image;
merging the third region image into the target object image set.
Optionally, the apparatus further comprises a sharing unit;
the sharing unit is configured to perform:
responding to the sharing operation of the at least one first area image, and acquiring the identification of the target object input by the user;
uploading the at least one first region image and the identification association of the target object into an appearance database.
Optionally, the obtaining unit is configured to perform:
acquiring a fourth area image of a target object at a preset angle;
taking the fourth area image as a tracking target, and performing tracking processing on the video to be clipped to determine at least one video frame containing the target object in the video to be clipped;
at least one first region image of the target object is extracted from the at least one video frame.
Optionally, the apparatus further comprises:
an output unit configured to perform outputting the target video segment.
Optionally, the apparatus further comprises:
a deleting unit configured to perform a deletion operation of any one of the target video segments in response to a user deleting the any one of the target video segments.
Optionally, the clipping unit is configured to perform:
and inputting the at least one first area image and the video to be clipped into a pre-trained clipping model to obtain a plurality of target video segments containing the target object in the video to be clipped.
Optionally, the apparatus further comprises:
a training unit configured to perform optimization training of the clipping model based on deletion of a target video segment of the any video segment.
Optionally, the clipping unit is configured to perform:
if the target video segment of the target object contained in the video to be clipped is not identified according to the at least one first area image, outputting prompt information, wherein the prompt information is used for prompting a user that the target video segment of the target object contained in the video to be clipped is not identified.
Optionally, the merging unit is configured to perform:
generating a second region image of a target object within the target video segment in response to a manual matting operation on an image of the target object.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video clip and the clip model generation method provided by the first aspect of the embodiments of the present disclosure.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the video clip and clip model generation method provided by the first aspect of the embodiments of the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the video clips and the clip model generation method provided by the first aspect of embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
by adopting the method and the device, a plurality of target video segments including the target object can be clipped from the video to be clipped by automatically referring to the target object image set of the target object. Therefore, the situation that the user sees the end from the beginning of the video to be edited all the time can be avoided, the situation that the user manually marks the starting frames and the ending frames of all the video segments of the target object to be displayed to edit the video is avoided, and further, the time consumption of video editing can be shortened and the operation efficiency of the video editing can be improved by adopting the video editing method provided by the embodiment of the disclosure.
In addition, since the target video segment is composed of a plurality of video frames containing the target object, the second region image of the target object can be extracted from the target video segment. Since the angle or shape of the target object in each video frame of the target video segment is continuously changed, the second area image is also directed to a different angle or a different shape of the target object. The second region image may be incorporated into the target object image set as a supplement to the target object image set. Therefore, when other videos to be clipped are clipped based on the regional images in the target object image set, the appearance information of the target object is more sufficient, and the accuracy of the video clipping can be further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method of video clip and clip model generation in accordance with an exemplary embodiment;
FIG. 2 is an interface diagram illustrating an application for intelligently clipping videos in accordance with an illustrative embodiment;
FIG. 3 is an interface diagram illustrating an application for intelligently clipping videos in accordance with an illustrative embodiment;
FIG. 4 is an interface diagram illustrating an application for intelligently clipping videos in accordance with an illustrative embodiment;
FIG. 5 is an interface diagram illustrating an application for intelligently clipping videos in accordance with an illustrative embodiment;
FIG. 6 is an interface diagram illustrating an application for intelligently clipping videos in accordance with an illustrative embodiment;
FIG. 7 is a block diagram of a video clip and clip model generation apparatus shown in accordance with an exemplary embodiment;
FIG. 8 is a block diagram of an electronic device shown in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
FIG. 1 is a flow diagram illustrating a method for video clip and clip model generation, which may be implemented in an electronic device, according to an example embodiment. As shown in fig. 1, the method may include the following steps.
In step S11, a target object image set is acquired, the target object image set including at least one first region image of the target object.
In step S12, a video to be clipped is obtained, and a plurality of target video segments containing target objects in the video to be clipped are obtained according to the at least one first region image.
In step S13, in response to a change in the image of the target object within the target video segment, a second region image of the target object is generated and incorporated into the target object image set.
In the disclosed embodiment, the target object may be an object having a relatively fixed shape such as a person, an animal, an object, a scene, or the like. The first region image may be an image obtained by cutting out a region corresponding to the target object from an entire image including the target object.
Optionally, the at least one first region image corresponds to at least one angle or at least one morphology of the target object.
If the whole image containing the target object is shot by the image shooting device, the images of the target object shot from different angles can be obtained by adjusting the relative angle between the target object and the image shooting device, and the first area image extracted from the image of the target object corresponds to different angles of the target object. Or the target object can be put in various postures, images of the target object in different forms are shot through the image shooting device, and the first area image extracted from the images of the target object in different forms corresponds to the different forms of the target object.
Various ways of acquiring the target object image set provided by the embodiments of the present disclosure will be explained below.
Alternatively, the process of acquiring the target object image set may be implemented as: and searching a target object image set based on the identification of the target object in a pre-established appearance database.
It should be noted that, the appearance database stores object image sets corresponding to a plurality of objects, and these object image sets may be uploaded to the appearance database by different users. Specifically, the set of object images in the appearance database may be supplemented by User Generated Content (UGC) sharing. And on the basis of the fact that the object image set corresponding to the object is stored in the appearance database in a way of being associated with the identification of the object, when the target object image set of a certain target object needs to be used, the appearance database can be searched for the target object image set through the identification of the target object. The appearance database can be established in a server, and a user can download the searched target object image set to a terminal for local use.
For example, a user may search the appearance database for the name of a given person a, and if the object image set corresponding to the given person a is stored in the appearance database, the appearance database may output the object image set corresponding to the given person a. The user can directly download the object image set corresponding to the known person A to the local for use.
Optionally, the appearance database supports user modification of different object image sets stored therein, the user may delete some object image sets stored therein, or the user may choose to modify the identity of some objects therein.
By adopting the mode of acquiring the target object image set from the appearance database, the operation cost of a user for manufacturing the target object image set can be saved, and the operation efficiency is greatly improved.
Alternatively, the process of acquiring the target object image set may be implemented as: acquiring at least one to-be-processed image input by a user; identifying an object contained in each image to be processed; determining the same target object contained in each image to be processed from the objects contained in each image to be processed; at least one first region image of the target object is extracted from the at least one image to be processed.
In an embodiment of the present disclosure, if the user does not want to use the target object image set in the appearance database, or cannot search the target object image set in the appearance database, at least one first region image of the target object may be made.
First, the user can find several images to be processed containing the target object from the local image library, and the images to be processed can be images containing multiple angles of the target object. Since the image to be processed includes other regions except the target object, such as a background region, which may interfere with subsequent steps, the first region image of the target object needs to be extracted from the image to be processed.
In order to improve the intellectualization, the first region image of the target object in different images to be processed can be scratched out in an automatic scratching mode. In the automatic cutout process, the terminal can automatically identify the same object contained in each object to be processed and take the same object as a target object, and a user does not need to specify which object in the image to be processed is the target object to be cutout. This process may be implemented as: the objects contained in each image to be processed are identified, and then, among the objects contained in each image to be processed, the same target object contained in each image to be processed is determined.
Specifically, the process of determining the same target object contained in each image to be processed may be implemented as: calculating the similarity between objects contained in different images to be processed, and determining the target object contained in each image to be processed with the similarity larger than a preset threshold value.
For example, assuming that the image to be processed a and the image to be processed B, it can be recognized that the image to be processed a includes the person 1 and the person 2, and the image to be processed B includes the person 2 and the person 3. The target object included in both the image to be processed a and the image to be processed B is a person 2.
In the above solution, since at least one first region image is obtained by means of automatic matting, there may be some error in the result obtained by means of automatic matting. Based on this, in order to eliminate the error, optionally, after extracting at least one first region image of the target object from the at least one image to be processed, the method provided by the embodiment of the present disclosure may further include: and modifying any area image in response to the modification operation of the user on any area image in the at least one first area image.
In practical application, after automatic cutout, the terminal can show the result after automatic cutout for the user, if the user is unsatisfied with the result that certain automatic cutout obtained, can manually adjust. For example, the terminal highlights a first region image of the automatically scratched target object, and the user can manually drag the outer edge of the highlighted portion to adjust the scratching region.
It is noted that if the first region image of the target object in a sufficient multi-angle can be obtained, the appearance information of the target object is more complete, which is more beneficial to the processing of the subsequent steps. Based on this, after extracting at least one first region image of the target object from the at least one image to be processed, the method provided by the embodiment of the present disclosure may further include: determining third area images of other angles of the target object based on the extracted at least one first area image of the target object, wherein the other angles are angles except for the angle corresponding to the at least one first area image; and merging the third area image into the target object image set.
In the scheme provided by the embodiment of the disclosure, the third region images of the target object at other angles can be automatically generated based on the extracted at least one first region image of the target object, so as to supplement the original appearance information of the target object, and improve the appearance information of the target object.
In practical application, a three-dimensional model of the target object may be established based on the extracted at least one first region image of the target object, and then third region images of other angles of the target object may be generated based on the three-dimensional model.
Alternatively, after the at least one first region image of the target object is produced, the at least one first region image may be added to the target object image set, and then the target object image set may be uploaded to the appearance database to supplement the appearance database. In this way, other users can download the target object image set from the appearance database for direct use. Based on this, the method provided by the embodiment of the present disclosure may further include: responding to the sharing operation of at least one first area image, and acquiring the identification of a target object input by a user; uploading the at least one first region image and the identification association of the target object into the appearance database.
In the embodiment of the disclosure, a video to be clipped corresponding to the target object may also be obtained. For example, the target object is a known actor, and the video to be edited may be a movie or television work taken by the known actor, a live recording of attending an event, or the like.
In some application scenarios, the user cannot obtain the first region image of the target object at a sufficient number of angles, and only has the fourth region image of the target object at a preset angle. For example, the user only has an image to be processed of the front face of a certain person, and only the fourth region image of the front face of the person can be extracted based on the image to be processed of the front face of the person. Due to the lack of sufficient appearance information of the target object and the difficulty in supplementing the third area image at other angles by only the fourth area image at the preset angle of the single target object, other ways need to be found to obtain sufficient appearance information of the target object. At this time, the appearance information of the target object may be supplemented based on the video to be clipped.
Based on this, optionally, the process of acquiring the target object image set may be implemented as: acquiring a fourth area image of a target object at a preset angle; taking the fourth area image as a tracking target, and tracking the video to be clipped to determine at least one video frame containing the target object in the video to be clipped; at least one first region image of the target object is extracted from at least one video frame.
It can be understood that some video segments in the video to be clipped contain video frames corresponding to the target object, and therefore, the fourth area image of the preset angle of the target object can be used as a tracking target to track the video to be clipped, so that at least one video frame containing the target object in the video to be clipped can be determined, and then at least one first area image of the target object can be extracted from the at least one video frame.
For example, a fourth area image of the front of a certain person may be used as a tracking target, and a first video segment corresponding to a target object appearing in a video to be clipped, which is composed of 5 consecutive video frames, is obtained by tracking. The 5 consecutive video frames may be extracted and then the 5 consecutive video frames are automatically subjected to matting to extract 5 first region images of the target object contained in each video frame.
By the method, at least one first area image of the target object can be acquired, and after the at least one first area image of the target object and the video to be edited are acquired, a plurality of target video segments containing the target object in the video to be edited can be acquired according to the at least one first area image.
Optionally, the process of obtaining a plurality of target video segments containing the target object in the video to be clipped according to the at least one first region image may be implemented as follows: and inputting at least one first area image and the video to be clipped into a pre-trained clipping model to obtain a plurality of target video segments containing target objects in the video to be clipped.
By means of the clipping model, several video segments including the target object can be clipped out of the video to be clipped, automatically with reference to at least one first region image of the target object. Therefore, the situation that the user sees the end from the beginning of the video to be edited all the time can be avoided, the situation that the user manually marks the starting frames and the ending frames of all the video segments of the target object to be displayed to edit the video is avoided, and the operation efficiency of the video editing can be improved by adopting the video editing mode provided by the embodiment of the disclosure.
If, as described above, in addition to the at least one first region image of the target object, a third region image of the target object is also supplemented, at this time, the at least one first region image of the target object, the third region image and the video to be edited may be input into the editing model together to obtain a plurality of target video segments including the target object in the video to be edited. When the area images of the angles corresponding to the target object input into the clipping model are more comprehensive, the more accurate the clipping model clips the target video clip including the target object with reference to the area images of the angles corresponding to the target object.
After several target video segments containing target objects in the video to be edited are identified, second region images of the target objects can be generated in response to changes of images of the target objects in the target video segments, and the second region images are combined into the target object image set.
It is understood that a plurality of target video segments containing the target object in the video to be clipped can be automatically clipped out based on the first region image. Since the target video segment is composed of a plurality of video frames containing the target object, the second region image of the target object can be extracted from the target video segment. Since the angle or shape of the target object in each video frame of the target video segment is continuously changed, the second area image is also directed to a different angle or a different shape of the target object. The second region image may be incorporated into the target object image set as a supplement to the target object image set. Therefore, when other videos to be clipped are subsequently clipped based on the regional images in the target object image set, the appearance information of the target object is more sufficient, and the clipping accuracy can be further improved.
Alternatively, the process of generating the second region image of the target object may be implemented as: in response to a manual matting operation on an image of a target object within a target video segment, a second region image of the target object is generated.
Since other regions than the target object exist in each video frame of the target video segment, the second region image of the target object can be extracted from each video frame of the target video segment. Specifically, the user can manually scratch out the second region image of the target object from each video frame of the target video segment.
After identifying a plurality of target video segments containing target objects in the video to be edited, the target video segments can also be output. Outputting the target video segment may be implemented as presenting the target video segment to a user. Optionally, the user may label the recognition result. The process of labeling the recognition result can be implemented as follows: and in response to the user's deletion operation on any one of the target video segments, deleting any one of the target video segments.
For example, assuming that a movie is commonly shown by character 1, character 2, and character 3 … …, character n, the user needs to clip all target video segments of the movie including character 2. Assuming that the result of the clip model output includes 5 video segments, the 5 video segments can be presented to the user. Assuming that video segments 1, 3, 4, and 5 of the 5 video segments are all video segments containing person 2, and video segment 2 contains only person 3 and person 2 does not appear, it can be considered that video segment 2 is an erroneous output result. The user may choose to delete video clip 2 from the 5 video clips that are output.
And if the video to be clipped does not contain any video segment of the target object, the clipping result is empty. Optionally, the process of obtaining a plurality of target video segments containing the target object in the video to be clipped according to the at least one first region image may also be implemented as follows: and if the target video segment of the target object contained in the video to be clipped is not identified according to the at least one first area image, outputting prompt information, wherein the prompt information is used for prompting a user that the target video segment of the target object contained in the video to be clipped is not identified.
Optionally, in order to improve the accuracy of the intelligent clipping of the video to be clipped by the clipping model, the clipping model may be optimally trained based on the deletion of the target video segment of any video segment.
It can be understood that, the user performs manual annotation on the result output by the clipping model, and the correct target video segment containing the target object after manual annotation can be used as a positive sample to perform optimization training on the clipping model, so as to improve the accuracy of the clipping model for clipping the video to be clipped.
For ease of understanding, specific implementations of the video clip and clip model generation methods provided above are exemplified in connection with the following application scenarios.
Suppose that the user needs to clip a video segment containing character 1 from the video to be clipped. FIG. 2 is an interface schematic of an application for intelligently clipping videos. In the interface, a preview window of the video to be clipped is included, and the video to be clipped can be played from the preview window. And a playing progress bar is also arranged below the preview window, so that the playing progress of the video to be edited can be checked. Meanwhile, controls such as fast forward, rewind, pause and the like are also arranged in the interface and are used for controlling the playing progress of the video to be clipped.
In the interface shown in fig. 2, a search dialog is also provided in which the name of an object for which a region image needs to be downloaded from the appearance database can be entered. Below the search dialog, an operation control for importing an image from an album may be provided. In the vicinity of the operation control, a preview image and an identification of some hot objects can be set for the user to directly select from.
In response to the selection operation of the operation control, as shown in fig. 3, an album list may pop up, where the album list includes thumbnails of a plurality of images, and a user may select an image to be processed that needs to be imported from the album list. In the example shown in fig. 3, the user has selected 4 photographs of the angle A, B, C, D corresponding to the person 1 as the images to be processed. After the user picks up the to-be-processed image to be imported, a completion button arranged below the interface can be clicked.
After the user clicks the finish button, jump to the interface shown in fig. 4, and enter the step of automatic matting. A preview window of the image to be processed is arranged in the interface, and the image to be processed which needs to be scratched at present can be viewed from the preview window. An operation control of intelligent cutout is arranged below the preview window, and when the user selects the operation control, the cutout operation can be automatically carried out on the current image to be processed. And a modification control is also arranged on the side of the operation control of the intelligent cutout, and when the user selects the modification control, the cutout result can be adjusted on the basis of the intelligent cutout.
After completing the operations of intelligent matting and modifying the matting result, the application program of intelligent video clipping can clip the video segment containing the character 1 from the video to be clipped based on the area image of the angle A, B, C, D corresponding to the finally confirmed character 1. The intelligent video clipping application can also show the result of the intelligent clipping in the interface shown in fig. 5, and in the example shown in fig. 5, the intelligent video clipping application clips 4 video segments from the video to be clipped, wherein the person 1 does not appear in the 2 nd video segment, so that the user can select to delete the video segment, and the interface jumps to fig. 6.
It should be noted that, in the interfaces shown in fig. 5 and 6, an operation control for adding the current frame may also be provided. Through the operation control, a user can select a currently played video frame from a video to be clipped in playing, and the currently played video frame is subjected to matting and then supplemented into the appearance information of the character 1, so that the accuracy of intelligent clipping is improved. In addition, modification controls may also be provided. Through the modification control, the user can modify the area images at different angles corresponding to the input person 1, and the accuracy of intelligent cutting can be further improved.
By adopting the method and the device, a plurality of target video segments including the target object can be clipped from the video to be clipped by automatically referring to the target object image set of the target object. Therefore, the situation that the user sees the end from the beginning of the video to be edited all the time can be avoided, the situation that the user manually marks the starting frames and the ending frames of all the video segments of the target object to be displayed to edit the video is avoided, and further, the time consumption of video editing can be shortened and the operation efficiency of the video editing can be improved by adopting the video editing method provided by the embodiment of the disclosure.
In addition, since the target video segment is composed of a plurality of video frames containing the target object, the second region image of the target object can be extracted from the target video segment. Since the angle or shape of the target object in each video frame of the target video segment is continuously changed, the second area image is also directed to a different angle or a different shape of the target object. The second region image may be incorporated into the target object image set as a supplement to the target object image set. Therefore, when other videos to be clipped are clipped based on the regional images in the target object image set, the appearance information of the target object is more sufficient, and the accuracy of the video clipping can be further improved.
FIG. 7 is a block diagram illustrating a video clip and clip model generation apparatus in accordance with an example embodiment. Referring to fig. 7, the apparatus includes:
an acquisition unit 71 configured to perform acquiring a target object image set, the target object image set comprising at least one first region image of a target object;
the clipping unit 72 is configured to execute acquiring a video to be clipped, and obtain a plurality of target video segments containing the target object in the video to be clipped according to the at least one first area image;
a merging unit 73 configured to perform generating a second region image of the target object in response to a change in an image of the target object within the target video segment, and merging the second region image into the target object image set.
Optionally, the at least one first region image corresponds to at least one angle or at least one morphology of the target object.
Optionally, the obtaining unit 71 is configured to perform:
and searching the target object image set based on the identification of the target object in a pre-established appearance database.
Optionally, the obtaining unit 71 is configured to perform:
acquiring at least one to-be-processed image input by a user;
identifying an object contained in each image to be processed;
determining the same target object contained in each image to be processed from the objects contained in each image to be processed;
and extracting at least one first area image of the target object from the at least one image to be processed.
Optionally, the apparatus further comprises:
a modification unit configured to perform a modification operation on any one of the at least one first region image in response to a modification operation on the any region image by the user.
Optionally, the merging unit 73 is further configured to perform:
determining a third area image of other angles of the target object based on the extracted at least one first area image of the target object, wherein the other angles are angles except for the angle corresponding to the at least one first area image;
merging the third region image into the target object image set.
Optionally, the apparatus further comprises a sharing unit;
the sharing unit is configured to perform:
responding to the sharing operation of the at least one first area image, and acquiring the identification of the target object input by the user;
uploading the at least one first region image and the identification association of the target object into an appearance database.
Optionally, the obtaining unit 71 is configured to perform:
acquiring a fourth area image of a target object at a preset angle;
taking the fourth area image as a tracking target, and performing tracking processing on the video to be clipped to determine at least one video frame containing the target object in the video to be clipped;
at least one first region image of the target object is extracted from the at least one video frame.
Optionally, the apparatus further comprises:
an output unit configured to perform outputting the target video segment.
Optionally, the apparatus further comprises:
a deleting unit configured to perform a deletion operation of any one of the target video segments in response to a user deleting the any one of the target video segments.
Optionally, the clipping unit 72 is configured to perform:
and inputting the at least one first area image and the video to be clipped into a pre-trained clipping model to obtain a plurality of target video segments containing the target object in the video to be clipped.
Optionally, the apparatus further comprises:
a training unit configured to perform optimization training of the clipping model based on deletion of a target video segment of the any video segment.
Optionally, the clipping unit 72 is configured to perform:
if the target video segment of the target object contained in the video to be clipped is not identified according to the at least one first area image, outputting prompt information, wherein the prompt information is used for prompting a user that the target video segment of the target object contained in the video to be clipped is not identified.
Optionally, the merging unit 73 is configured to perform:
generating a second region image of a target object within the target video segment in response to a manual matting operation on an image of the target object.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
In one possible design, the structure of the video clip and clip model generation apparatus shown in fig. 7 may be implemented as an electronic device, as shown in fig. 8, which may include: a processor 91, and a memory 92. Wherein the memory 92 has stored thereon executable code, which when executed by the processor 91, causes the processor 91 to implement at least the video clip and clip model generation method as provided in the foregoing embodiments of fig. 1 to 6.
Optionally, the electronic device may further include a communication interface 93 for communicating with other devices.
In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 92 comprising instructions, executable by the processor 91 of the server to perform the method described above is also provided. Alternatively, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, there is also provided a computer program product comprising computer programs/instructions which, when executed by the processor 91, implement the video clips and clip model generation methods provided in the foregoing embodiments illustrated in fig. 1 to 6.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A video clip and clip model generation method, comprising:
acquiring a target object image set, wherein the target object image set comprises at least one first area image of a target object;
acquiring a video to be edited, and obtaining a plurality of target video segments containing the target object in the video to be edited according to the at least one first area image;
in response to a change in an image of a target object within the target video segment, generating a second region image of the target object and merging the second region image into the set of target object images.
2. The method of claim 1, wherein the at least one first region image corresponds to at least one angle or at least one morphology of the target object.
3. The method of claim 1, wherein said obtaining a target object image set comprises:
and searching the target object image set based on the identification of the target object in a pre-established appearance database.
4. The method of claim 1, wherein said obtaining a target object image set comprises:
acquiring at least one to-be-processed image input by a user;
identifying an object contained in each image to be processed;
determining the same target object contained in each image to be processed from the objects contained in each image to be processed;
and extracting at least one first area image of the target object from the at least one image to be processed.
5. The method according to claim 4, wherein after extracting at least one first region image of the target object from the at least one image to be processed, the method further comprises:
and in response to the modification operation of the user on any area image in the at least one first area image, modifying the any area image.
6. The method according to claim 4, wherein after extracting at least one first region image of the target object from the at least one image to be processed, the method further comprises:
determining a third area image of other angles of the target object based on the extracted at least one first area image of the target object, wherein the other angles are angles except for the angle corresponding to the at least one first area image;
merging the third region image into the target object image set.
7. A video clip and clip model generation apparatus, comprising:
an acquisition unit configured to perform acquiring a target object image set, the target object image set comprising at least one first region image of a target object;
the editing unit is configured to acquire a video to be edited and obtain a plurality of target video segments containing the target object in the video to be edited according to the at least one first area image;
a merging unit configured to perform generating a second region image of the target object in response to a change in an image of the target object within the target video segment, and merging the second region image into the target object image set.
8. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the video clip and clip model generation method of any of claims 1-6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video clip and clip model generation method of any of claims 1-6.
10. A computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the video clip and clip model generation method of any of claims 1-6.
CN202111530280.3A 2021-12-14 2021-12-14 Video clip and clip model generation method, device, apparatus, program, and medium Pending CN114401440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111530280.3A CN114401440A (en) 2021-12-14 2021-12-14 Video clip and clip model generation method, device, apparatus, program, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111530280.3A CN114401440A (en) 2021-12-14 2021-12-14 Video clip and clip model generation method, device, apparatus, program, and medium

Publications (1)

Publication Number Publication Date
CN114401440A true CN114401440A (en) 2022-04-26

Family

ID=81227386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111530280.3A Pending CN114401440A (en) 2021-12-14 2021-12-14 Video clip and clip model generation method, device, apparatus, program, and medium

Country Status (1)

Country Link
CN (1) CN114401440A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007121111A (en) * 2005-10-27 2007-05-17 Mitsubishi Heavy Ind Ltd Target identifying technique using synthetic aperture radar image and device therof
CN106534967A (en) * 2016-10-25 2017-03-22 司马大大(北京)智能系统有限公司 Video editing method and device
CN110691202A (en) * 2019-08-28 2020-01-14 咪咕文化科技有限公司 Video editing method, device and computer storage medium
CN111460219A (en) * 2020-04-01 2020-07-28 百度在线网络技术(北京)有限公司 Video processing method and device and short video platform
CN111476059A (en) * 2019-01-23 2020-07-31 北京奇虎科技有限公司 Target detection method and device, computer equipment and storage medium
CN111586474A (en) * 2020-05-21 2020-08-25 口碑(上海)信息技术有限公司 Live video processing method and device
US20200410241A1 (en) * 2019-06-28 2020-12-31 Nvidia Corporation Unsupervised classification of gameplay video using machine learning models
JP2021039740A (en) * 2019-09-02 2021-03-11 株式会社Nttドコモ Pedestrian re-identification device and method
WO2021056450A1 (en) * 2019-09-27 2021-04-01 深圳市汇顶科技股份有限公司 Method for updating image template, device, and storage medium
CN112801004A (en) * 2021-02-05 2021-05-14 网易(杭州)网络有限公司 Method, device and equipment for screening video clips and storage medium
CN112800805A (en) * 2019-10-28 2021-05-14 上海哔哩哔哩科技有限公司 Video editing method, system, computer device and computer storage medium
CN113286173A (en) * 2021-05-19 2021-08-20 北京沃东天骏信息技术有限公司 Video editing method and device
CN113709384A (en) * 2021-03-04 2021-11-26 腾讯科技(深圳)有限公司 Video editing method based on deep learning, related equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007121111A (en) * 2005-10-27 2007-05-17 Mitsubishi Heavy Ind Ltd Target identifying technique using synthetic aperture radar image and device therof
CN106534967A (en) * 2016-10-25 2017-03-22 司马大大(北京)智能系统有限公司 Video editing method and device
CN111476059A (en) * 2019-01-23 2020-07-31 北京奇虎科技有限公司 Target detection method and device, computer equipment and storage medium
US20200410241A1 (en) * 2019-06-28 2020-12-31 Nvidia Corporation Unsupervised classification of gameplay video using machine learning models
CN110691202A (en) * 2019-08-28 2020-01-14 咪咕文化科技有限公司 Video editing method, device and computer storage medium
JP2021039740A (en) * 2019-09-02 2021-03-11 株式会社Nttドコモ Pedestrian re-identification device and method
WO2021056450A1 (en) * 2019-09-27 2021-04-01 深圳市汇顶科技股份有限公司 Method for updating image template, device, and storage medium
CN112800805A (en) * 2019-10-28 2021-05-14 上海哔哩哔哩科技有限公司 Video editing method, system, computer device and computer storage medium
CN111460219A (en) * 2020-04-01 2020-07-28 百度在线网络技术(北京)有限公司 Video processing method and device and short video platform
CN111586474A (en) * 2020-05-21 2020-08-25 口碑(上海)信息技术有限公司 Live video processing method and device
CN112801004A (en) * 2021-02-05 2021-05-14 网易(杭州)网络有限公司 Method, device and equipment for screening video clips and storage medium
CN113709384A (en) * 2021-03-04 2021-11-26 腾讯科技(深圳)有限公司 Video editing method based on deep learning, related equipment and storage medium
CN113286173A (en) * 2021-05-19 2021-08-20 北京沃东天骏信息技术有限公司 Video editing method and device

Similar Documents

Publication Publication Date Title
US8204312B2 (en) Moving image editing apparatus
US11769528B2 (en) Systems and methods for automating video editing
CN107707931B (en) Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment
US11393208B2 (en) Video summarization using selected characteristics
US9324171B2 (en) Image overlaying and comparison for inventory display auditing
KR100827846B1 (en) Method and system for replaying a movie from a wanted point by searching specific person included in the movie
US6970639B1 (en) System and method for editing source content to produce an edited content sequence
JP5355422B2 (en) Method and system for video indexing and video synopsis
JP5371083B2 (en) Face identification feature value registration apparatus, face identification feature value registration method, face identification feature value registration program, and recording medium
CN104995639B (en) terminal and video file management method
US20130236162A1 (en) Video editing apparatus and method for guiding video feature information
CN110769314B (en) Video playing method and device and computer readable storage medium
CN112118395B (en) Video processing method, terminal and computer readable storage medium
CN108388649B (en) Method, system, device and storage medium for processing audio and video
US11445272B2 (en) Video processing method and apparatus
CN104821001A (en) Content management system, management content generation method, management content reproduction method, program and recording medium
CN111385670A (en) Target role video clip playing method, system, device and storage medium
CN114363714B (en) Title generation method, title generation device and storage medium
JP2019092025A (en) Editing system
CN114401440A (en) Video clip and clip model generation method, device, apparatus, program, and medium
US20140247392A1 (en) Systems and Methods for Determining, Storing, and Using Metadata for Video Media Content
US11099811B2 (en) Systems and methods for displaying subjects of an audio portion of content and displaying autocomplete suggestions for a search related to a subject of the audio portion
CN113965806A (en) Video recommendation method and device and computer-readable storage medium
CN109523941B (en) Indoor accompanying tour guide method and device based on cloud identification technology
CN112019789B (en) Video playback method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination