CN114125474A

CN114125474A - Video clip generation method and device and electronic equipment

Info

Publication number: CN114125474A
Application number: CN202010892789.1A
Authority: CN
Inventors: 杨文强; 赵普; 吴鹏冲
Original assignee: Ximei Technology Beijing Co ltd
Current assignee: Ximei Technology Beijing Co ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2022-03-01

Abstract

The embodiment of the disclosure discloses a video clip generation method, a video clip generation device, electronic equipment and a computer-readable storage medium. The video clip generation method comprises the following steps: acquiring a first video; receiving an annotation instruction to annotate a meta-object of the first video to obtain annotation information of the meta-object, wherein the meta-object represents an annotation object of the first video content; acquiring the combined information of the annotation information, wherein the combined information of the annotation information corresponds to an event in the video; and generating at least one video clip from the first video according to the combination information, wherein the at least one video clip contains the event. The method solves the technical problem that a large amount of manual editing and searching are needed for obtaining the specific video content in the prior art through marking the combined information of the information.

Description

Video clip generation method and device and electronic equipment

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and an apparatus for generating a video clip, an electronic device, and a computer-readable storage medium.

Background

In recent years, with the rapid development of the internet and multimedia technology, multimedia information data has been increasing dramatically. Digital video, as a main multimedia information carrier, is widely used in many aspects of life. A large number of videos bring convenience to life and acquire information in a richer form.

In some scenes, a user needs to obtain specific content in a video, for example, when a monitoring video acquires a person who makes some specific actions, the user needs to watch the video from beginning to end to acquire a related video clip; as with sports game videos, coaches often use the game video to guide players in tactics, and referees review the game through the game video and analyze their penalty performance throughout the game. At present, a coach and a referee review videos to analyze a game, and when finding a video needing important analysis, cut off related video segments through common tools on the market, such as various video editing tools, and store the video segments on a storage device for the coach, the referee or players to use. The current solution has a number of drawbacks: 1. a great deal of manual work is required to clip required video segments and the like; 2. the required video clip cannot be obtained quickly; 3. the video after editing is relatively fixed, and after the same video is edited into a plurality of video segments, the required storage space is increased because the video content may be repeated.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In order to solve the technical problems in the prior art, the embodiments of the present disclosure provide the following technical solutions.

In a first aspect, an embodiment of the present disclosure provides a method for generating a video segment, including:

acquiring a first video;

receiving an annotation instruction to annotate a meta-object of the first video to obtain annotation information of the meta-object, wherein the meta-object represents an annotation object of the first video content;

acquiring the combined information of the annotation information, wherein the combined information of the annotation information corresponds to an event in the video;

and generating at least one video clip from the first video according to the combination information, wherein the at least one video clip contains the event.

In a second aspect, an embodiment of the present disclosure provides a video segment generating apparatus, including:

the video acquisition module is used for acquiring a first video;

the annotation information acquisition module is used for receiving an annotation instruction to annotate the meta-object of the first video to obtain annotation information of the meta-object, wherein the meta-object represents an annotation object of the first video content;

the combined information acquisition module is used for acquiring the combined information of the annotation information, wherein the combined information of the annotation information corresponds to an event in the video;

and the video clip generating module is used for generating at least one video clip from the first video according to the combination information, wherein the at least one video clip comprises the event.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the preceding first aspects.

In a fourth aspect, the present disclosure provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to perform the method of any one of the foregoing first aspects.

The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic flow chart of a video segment generating method according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart illustrating the process of acquiring annotation information in the video segment generating method according to the embodiment of the disclosure;

fig. 3 is a schematic flowchart of further acquiring annotation information in the video segment generating method according to the embodiment of the disclosure;

fig. 4 is a schematic flowchart of dividing an active video segment and an inactive video segment in a video segment generating method according to an embodiment of the present disclosure;

fig. 5 is another schematic flow chart of dividing an active video segment and an inactive video segment in the video segment generating method provided by the embodiment of the present disclosure;

fig. 6 is a schematic flowchart of acquiring combination information in a video segment generating method according to an embodiment of the present disclosure;

fig. 7 is another schematic flow chart of acquiring combination information in a video segment generating method according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of generating a video segment in a video segment generating method provided by an embodiment of the present disclosure;

fig. 9 is a schematic view of a specific application scenario of a video segment generation method according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an embodiment of a video segment generating apparatus according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Fig. 1 is a flowchart of an embodiment of a video segment generating method provided in this disclosure, where the video segment generating method provided in this embodiment may be executed by a video segment generating apparatus, and the video segment generating apparatus may be implemented as software, or implemented as a combination of software and hardware, and the video segment generating apparatus may be integrated in a certain device in a video segment generating system, such as a video segment generating terminal device. As shown in fig. 1, the method comprises the steps of:

step S101, acquiring a first video;

in the present disclosure, the first video may be any video. Typically as a video file stored in local or network storage. The first video may be represented by one video file or there may be multiple video files representing the same content.

Alternatively, the video segment generating method in the present disclosure may be executed by software or an application running on the terminal, and the software or the application provides a human-machine interaction interface to acquire the first video. Illustratively, the human-computer interaction interface is an import button, when the import button is triggered, a first interface is displayed to receive a local storage address or a network storage address of the video file, and when the local storage address or the network storage address is received, the first video which responds is obtained from the corresponding local storage position or network storage position. Illustratively, the first video is a football game video, and the football game video may be a recorded video of a rebroadcast signal or a plurality of videos of a plurality of angles acquired from a plurality of machine positions in a game scene.

Optionally, the acquiring the first video further includes acquiring a video through an image sensor. In this optional embodiment, the first video may be a video file captured and saved by an image sensor, such as a segment of video recorded by a user through a camera of the terminal device or a real-time video captured by the user through the camera of the terminal device.

It can be understood that the above-mentioned method for acquiring the first video and the specific form of the first video are both examples, and do not limit the present disclosure, and any other method for acquiring the first video in a video form may be applied to the technical solution of the present disclosure, and is not described herein again.

Step S102, receiving a labeling instruction to label a meta-object of the first video to obtain labeling information of the meta-object, wherein the meta-object represents a labeling object of the first video content;

in the present disclosure, a meta-object is included in the first video, the meta-object representing an annotatable object in the first video content. Such as objects, people, sounds in the video. Illustratively, the first video is a video of a soccer game, and the meta-objects include one or more of a team, a team member's clothing, a court, a referee, a soccer ball, and the like.

Wherein the annotation instruction can be received from a human-computer interaction interface or from an annotation model. Illustratively, a human-computer interaction interface is provided, and through options or input fields on the human-computer interaction interface, a user can define and label the meta-object in the video; illustratively, a pre-trained annotation model is provided that can annotate meta-objects of a predetermined type. And marking the basic marking object in the first video through the marking instruction to obtain the marking information of the video.

Optionally, the step S102 includes:

step S201, acquiring a plurality of meta-objects in the first video;

step S202, receiving a labeling instruction to label the attributes of the plurality of meta-objects to obtain labeling information of the attributes of the plurality of meta-objects.

In this alternative embodiment, a plurality of meta-objects in the first video are obtained first, and the process of obtaining the meta-objects may be preset, for example, in a video scene of a soccer match, the meta-objects may be preset as teams, players, referees, court, etc., and provided to the user by using a human-computer interaction interface option to receive a selection signal of the user; or a plurality of meta-objects in the first video are identified through the identification model and provided to the user or the phenotype is marked. Then, receiving a labeling instruction to label the attributes of the plurality of meta-objects to obtain labeling information of the attributes of the plurality of objects, wherein the attributes of the meta-objects comprise characteristics and the like describing the meta-objects; for example, if the meta-object is a team, the attribute of the meta-object is the name of the team, the attack direction of the team, and the like; the meta-object is a player, and the attributes of the meta-object are the color of the ball cover of the player, the name of the player and the like; the meta-object is a court, and the attributes of the meta-object are the size, boundary position, goal position, forbidden zone position, and the like of the court.

It can be understood that the above method for obtaining the meta-object and obtaining the annotation information of the meta-object is only an example, and does not limit the present disclosure, and in practical applications, for videos in different scenes, the meta-object may be different, and details are not described here.

Optionally, after the step S202, the method further includes:

step S301, dividing the first video into an active video segment and an inactive video segment according to the plurality of meta-objects;

step S302, performing frame extraction on the effective video segment to obtain a plurality of video frames;

step S303, identifying the activity information and the time information of a plurality of meta-objects in each of the plurality of video frames;

step S304, the activity information and the time information are used as the labeling information of the meta-object.

Optionally, the step S301 includes:

step S401, dividing video segments corresponding to video frames in which the number of first meta-objects in the first video is less than a first threshold into invalid video segments;

step S402, dividing video segments corresponding to video frames in which the number of the first meta-objects in the first video is greater than or equal to a first threshold into effective video segments.

Illustratively, the first video is a football game video and the first meta-object is a player; in the football match, the condition that the match is paused or interrupted due to various reasons exists, and the players leave the court to have a rest at the side at the moment, so that whether the match is in a normally-performed state can be judged by judging the number of the players in the range of the court, if the number of the players in the range of the court is less than a first threshold value, the match is in the paused or interrupted state, the corresponding video segment is an invalid video segment at the moment, the video segment can be removed from the first video, and the invalid video segment is not used when the video segment is generated subsequently; if the players in the range of the court are larger than or equal to the first threshold value, the corresponding video segment is the effective video segment. Therefore, the data volume of the subsequent generated video segment can be reduced, the first video can be automatically clipped according to the invalid video segment and the valid video segment, and only the valid video segment is reserved to reduce the volume of the first video.

Optionally, the step S301 includes:

step S501, identifying a second object in the first video;

step S502, determining the occurrence frequency of the second object in the first video;

step S503, the video segment between the second object appearing at the n-th time and the n + 1-th time is taken as an invalid video segment;

step S504, the video segment which is not between the second object appearing at the nth time and the (n + 1) th time is taken as an effective video segment; wherein n is an odd number greater than 0 or n is an even number greater than or equal to 0.

Optionally, the second object is a meta-object indicating whether the video content is valid. In an example, the first video is a football game video, and the second object is a referee whistle, in an example, the referee whistle in the first video needs to be recognized first, and the referee whistle sounds, which is generally a player foul in the game or a game end, etc. The second object may be identified by using a pre-trained second object identification model, or by direct labeling by the user, which is not described herein again. The second object may also comprise a plurality of types, such as a foul whistle, an end whistle, etc.

Determining a number of times a second object appears in the first video after identifying the second object; illustratively, each time the second object occurs, the time of occurrence of the second object or the video frame sequence number is stored by an array, with the array index indicating the number of times the second object occurs.

Then, taking the video segment between the second object appearing at the n-th time and the (n + 1) -th time as an invalid video segment; taking a video segment not between the second object appearing n-th and n + 1-th times as an active video segment; wherein n is an odd number greater than 0 or n is an even number greater than or equal to 0. It can be understood that the value of n is related to the counting rule, and if n starts to count from 1, n is an odd number greater than 0; if n is counted from 0, n is an even number equal to or greater than 0. Illustratively, if counting from 1, the game pauses when the first whistle is recognized and continues when the second whistle is recognized, the video segment between the first whistle and the second whistle being divided into inactive video segments. Thus, the video segments between whistle No. 1 and whistle No. 2 and the video segments … … between whistle No. 3 and whistle No. 4 are all divided into inactive video segments and the other video segments are active video segments. It is to be understood that the above-mentioned active video segment and inactive video segment are divided according to the purpose of video segmentation, and if the match video is used for penalty analysis for referees, the division method is exactly opposite to the above-mentioned one, and the video segment between the second objects appearing n-th and n + 1-th times is taken as the active video segment. In practical application, the effective video segment and the ineffective video segment can be flexibly divided according to actual needs to obtain the video segment related to the actual target.

It can be understood that the above method for dividing the active video segment and the inactive video segment can be used in a mixed manner, that is, the division is performed by the above two dividing methods, so that the division is more accurate, and further description is omitted here.

After the active video segment and the inactive video segment are obtained, the callout in steps S302-S304 can be only for the active video segment to reduce the workload of callout.

In step S302, frames are extracted from the active video segment to obtain a plurality of video frames. Because the video comprises a plurality of video frames, when the duration of the video is long, the workload of marking the meta-object in each video frame is huge; due to the continuity of the front and the back of the video, the video content also has continuity, so that only the meta-objects in part of the frames can be labeled to reduce the calculation amount. Optionally, the frame extraction manner may be random frame extraction, fixed interval frame extraction, or key frame extraction, and is not described herein again.

In steps S303 and S304, activity information and time information of a plurality of meta-objects in the extracted video frame are identified, and the activity information and the time information are taken as annotation information of the meta-objects. The identification of the activity information can be realized by a pre-trained activity identification model; illustratively, the first video is a football match video, the activity recognition model can recognize ball contact actions, ball passing actions, shooting actions, shoveling actions and the like of players, the action time can be recorded through the sequence number of a video frame or video time, and the teams of the ball contact players, personal information of the players and the like can be obtained through basic marking information of the meta-objects, such as player information and the like; position information of the player can be further identified, the position of the player is marked, the time at the position is marked, and the like; the activity recognition model can also recognize the judgment penalty of the referee, such as recognizing the judgment penalty and the time for the judgment penalty through the whistle of the referee, the red and yellow cards and/or the action of the referee; the activity recognition model can recognize the position of the football so as to judge that the football is out of bounds, enters a forbidden area or enters a goal and the like, and therefore the state of the football can be marked. In steps S303 and S304, some important actions of the meta-object in the first video and the time when the actions occur are identified, and the meta-object is also tagged as tagging information of the meta-object.

Through the steps, basic information and/or basic activity is/are marked on the object in the video content, and the basic information and/or basic action can express an event in the video content for subsequent use in generating a video segment.

It should be understood that the specific examples in the above embodiments are only examples, and do not limit the disclosure, and therefore, the detailed description is omitted here.

Step S103, acquiring the combined information of the annotation information, wherein the combined information of the annotation information corresponds to an event in the video;

optionally, the step S103 includes:

step S601, receiving at least one selection instruction of the annotation information;

step S602, generating the combination information of the labeling information according to the selection instruction.

Optionally, in step S601, a selection instruction of at least one piece of annotation information is received through a human-computer interface; illustratively, the selection instruction of the at least one piece of labeling information includes selection instructions of a ball touching action, a ball passing action and a shooting action, and the specific number of the ball touching actions and the specific number of the ball passing actions can also be configured through a human-computer interaction interface.

In step S602, generating combination information of the annotation information according to the selection instruction; illustratively, the combined information is generated according to the sequence of the plurality of selection instructions, for example, the sequence of the plurality of selection instructions is ball touch action, pass action, and shoot action, then the combined information is { ball touch, ball pass, ball touch, pass, and shoot }, and the combined information is related to the goal event in the video, and three times of ball touch and two times of ball pass are required before shooting, so that a specific event occurring in the video can be accurately expressed through the combined information of the annotation information. The events include, but are not limited to, an attack, defense, side route attack, middle route attack, goal, foul, and the like; more accurate event types such as individual dribbling goal, multiple people participating in goal, left-side route attack, right-side route attack, defense counterattack and the like can be expressed through the combination of standard information, and the details are not repeated herein.

Optionally, the step S103 includes:

step S701, receiving event information;

step S702, analyzing the characteristics of the event information;

step S703, generating the combination information of the annotation information according to the characteristics of the event information.

Optionally, the event information in step S701 is a preset event type, the exemplary event information is goal, yellow card, red card, broken ball, and the like, each event type is composed of different event information features, for example, the goal feature includes pass, goal, or labeling information such as ball touch, goal, and the like, or the goal feature includes goal, 5 seconds before goal, 5 seconds after goal, and the like; in step S702, analyzing the characteristics of the event information to obtain one or more pieces of label information; in step S703, the combination information of the annotation information is generated according to the one or more annotation information, wherein the generated combination information may include a plurality of combination information, that is, each of the plurality of combinations of the annotation information may be a combination information. Or before step S701, combination information corresponding to different events may be preset, and in step S701, only a selection instruction of an event needs to be received, that is, the event information is a selection instruction of an event type, and then the corresponding combination information is obtained according to the event type.

It can be understood that the above-mentioned manner of obtaining the combined information of the annotation information is only an example, and does not form a limitation to the present disclosure, and any other manner of obtaining the combined information of the annotation information can be applied to the technical solution of the present disclosure, and is not described herein again.

Step S104, generating at least one video clip from the first video according to the combination information, wherein the at least one video clip comprises the event.

In this step, a video clip is generated from the first video based on the combination information obtained in step S103.

Optionally, the step S104 includes:

step S801, at least one group of annotation information which accords with the combination mode of the annotation information in the combination information in the first video is searched according to the combination information;

step S802, generating at least one video clip according to the plurality of video frames in which the set of annotation information is located.

Optionally, in the step S801, the combined information includes the sequence of the annotation information and the standard information, for example, the combined information indicates that a team a makes a goal after passing twice, when a group of annotation information of the combined mode that meets the annotation information in the first video is searched, the group of annotation information can be searched in a reverse search manner, that is, the goal annotation information is first found, then a pass of the team a that precedes the goal annotation information at a search time point is found, and two pass annotation information that is closest in time to the goal annotation information are found, and then the three annotation information are at least one group of annotation information of the combined mode that meets the annotation information in the combined information;

then, in step S802, all video frames between the video frames of the three annotation information, or the video corresponding to the time between the time points of the three annotation information is taken as the at least one video clip. The annotation information of the combination pattern determined in step S801 to correspond to the annotation information in the combination information may be multiple sets, and the corresponding video segment generated in step S802 is multiple segments.

Optionally, the step S802 includes:

acquiring a plurality of video frames where the group of marking information is located;

determining a first video frame and a last video frame of the plurality of video frames;

determining a first frame of the video clip according to the first video frame;

determining a last frame of the video segment according to the last video frame;

and generating the video clip according to the head frame and the tail frame.

In the above step, a plurality of video frames in which a group of annotation information is located are determined, in the above example, the annotation information of the first pass is in the video frame a, the annotation information of the second pass is in the video frame b, and the annotation information of the goal is in the video frame c, and according to the number of the video frames, it can be determined that the first video frame of the three video frames is the video frame a, and the last video frame is the video frame c; then, the first video frame can be directly determined as the first frame of the video clip, and the last video frame is determined as the last frame of the video clip; or forward intercepting video frames with a preset number or a preset duration as a first frame of the video clip according to the first video frame, and backward intercepting video frames with a preset number or a preset duration as a last frame of the video clip according to the last video frame; and taking the video frame between the head frame and the tail frame as the video segment. For example, the video frame between the a video frame and the c video frame may be directly used as the video clip, or the first frame may be obtained by cutting the video frame of 5 seconds forward from the a video frame, the last frame may be obtained by cutting the video frame of 5 seconds backward from the c video frame, and the video frame between the first frame and the last frame may be used as the video clip.

Optionally, the step S104 includes:

determining a start position and an end position of the at least one video segment in the first video according to the combination information.

In this alternative embodiment, the video segment is not stored in the form of a video file, but only the start position and the end position are stored, as in the above-mentioned embodiment, the start and end time points of the video segment or the numbers of the start and end video frames may be recorded, so that when the video segment is obtained, by positioning the video segment in the first video and playing, only the start and end positions of the video segment need to be stored, and the storage space may be greatly saved. Illustratively, the first video can also be stored in the server, so that the first video can also be stored in only one copy, the generation of the video clip can be realized by locally generating the start and end positions of the video clip, when the user needs to acquire the video clip, the start and end positions can be transmitted to the server, and the server transmits the video clip to the terminal device of the user in the form of video stream and plays the video clip.

Fig. 9 is a schematic diagram of a practical application scenario of the embodiment of the present disclosure. As shown in fig. 9, the method for generating a video segment is executed by software, and the software includes an interface 900 shown in fig. 9, which includes a first video display area 901, a marking instruction sending generation area 902, and a video segment generation instruction generation area 903; the method comprises the steps that a football game video to be analyzed by a user is displayed in a 901, then the user can select a meta-object in the game video in the 901 through a mouse, then the type of information needing to be labeled is selected in a 902, and labeling is carried out so as to add labeling information to the selected meta-object; after the annotation is completed, a corresponding instruction is generated through a video segment generation instruction button in 903 to search the game video for a qualified video segment. Therefore, events in various videos can be combined through some annotation information, and video segments of the events can be found through the annotation information. Through the technical scheme disclosed by the invention, the video does not need to be manually clipped or dragged to search the related video segments, so that the workload is greatly saved.

The embodiment of the disclosure discloses a video clip generation method, which comprises the following steps: acquiring a first video; receiving an annotation instruction to annotate a meta-object of the first video to obtain annotation information of the meta-object, wherein the meta-object represents an annotation object of the first video content; acquiring the combined information of the annotation information, wherein the combined information of the annotation information corresponds to an event in the video; and generating at least one video clip from the first video according to the combination information, wherein the at least one video clip contains the event. The method solves the technical problem that a large amount of manual editing and searching are needed for obtaining the specific video content in the prior art through marking the combined information of the information.

In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, other steps may also be added by those skilled in the art, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.

Fig. 10 is a schematic structural diagram of an embodiment of a video segment generating apparatus provided in an embodiment of the present disclosure, and as shown in fig. 10, the apparatus 1000 includes: a video acquisition module 1001, a annotation information acquisition module 1002, a combination information acquisition module 1003, and a video clip generation module 1004. Wherein the content of the first and second substances,

a video acquisition module 1001 configured to acquire a first video;

a tagging information obtaining module 1002, configured to receive a tagging instruction to tag a meta-object of the first video to obtain tagging information of the meta-object, where the meta-object represents a tagged object of first video content;

a combined information obtaining module 1003, configured to obtain combined information of the annotation information, where the combined information of the annotation information corresponds to an event in a video;

a video clip generating module 1004, configured to generate at least one video clip from the first video according to the combination information, where the at least one video clip includes the event.

Further, the module 1002 for obtaining the label information is further configured to:

acquiring a plurality of meta-objects in the first video;

and receiving a labeling instruction to label the attributes of the plurality of meta-objects to obtain labeling information of the attributes of the plurality of meta-objects.

dividing the first video into an active video segment and an inactive video segment according to the plurality of meta-objects;

extracting frames from the effective video segment to obtain a plurality of video frames;

identifying activity information and temporal information for a plurality of meta-objects in each of the plurality of video frames;

and taking the activity information and the time information as the labeling information of the meta-object.

dividing video segments corresponding to video frames of which the number of first meta-objects in the first video is less than a first threshold into invalid video segments;

and dividing video segments corresponding to the video frames of which the number of the first meta-objects in the first video is greater than or equal to a first threshold into effective video segments.

identifying a second object in the first video;

determining a number of occurrences of the second object in the first video;

taking a video segment between the second object appearing n times and n +1 times as an invalid video segment;

taking a video segment not between the second object appearing n-th and n + 1-th times as an active video segment; wherein n is an odd number greater than 0 or n is an even number greater than or equal to 0.

Further, the combination information obtaining module 1003 is further configured to:

receiving a selection instruction of at least one piece of labeling information;

and generating the combined information of the labeling information according to the selection instruction.

receiving event information;

analyzing the characteristics of the event information;

and generating the combined information of the labeling information according to the characteristics of the event information.

Further, the video segment generating module 1004 is further configured to:

searching at least one group of annotation information which accords with the combination mode of the annotation information in the combination information in the first video according to the combination information;

and generating at least one video segment according to the plurality of video frames in which the group of the annotation information is positioned.

Further, the video segment generating module 1004 is further configured to:

determining a first frame of the video clip according to the first video frame;

and generating the video clip according to the head frame and the tail frame.

Further, the video segment generating module 1004 is further configured to:

The apparatus shown in fig. 10 can perform the method of the embodiment shown in fig. 1-8, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-8. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 8, and are not described herein again.

Referring now to FIG. 11, shown is a schematic diagram of an electronic device 1100 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 11, the electronic device 1100 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1101 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage means 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are also stored. The processing device 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Generally, the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 1107 including, for example, Liquid Crystal Displays (LCDs), speakers, vibrators, and the like; storage devices 1108, including, for example, magnetic tape, hard disk, etc.; and a communication device 1109. The communication means 1109 may allow the electronic device 1100 to communicate wirelessly or wiredly with other devices to exchange data. While fig. 11 illustrates an electronic device 1100 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication device 1109, or installed from the storage device 1108, or installed from the ROM 1102. The computer program, when executed by the processing device 1101, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the method of any of the embodiments.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method for generating a video clip, comprising:

acquiring a first video;

2. The video segment generating method of claim 1, wherein the receiving an annotation instruction to annotate a meta-object of the first video to obtain annotation information of the meta-object comprises:

acquiring a plurality of meta-objects in the first video;

3. The video clip generation method of claim 2, wherein the method further comprises:

4. The video clip generation method of claim 3, wherein said dividing said first video into active video segments and inactive video segments based on said plurality of meta-objects comprises:

5. The video clip generation method of claim 3, wherein said dividing said first video into active video segments and inactive video segments based on said plurality of meta-objects comprises:

identifying a second object in the first video;

determining a number of occurrences of the second object in the first video;

6. The video segment generating method according to claim 1, wherein said obtaining the combination information of the annotation information comprises:

7. The video segment generating method according to claim 1, wherein said obtaining the combination information of the annotation information comprises:

receiving event information;

analyzing the characteristics of the event information;

8. The video clip generation method of claim 1, wherein said generating at least one video clip from said first video according to said combination information comprises:

9. The video segment generation method of claim 8, wherein said generating at least one video segment from a plurality of video frames in which said set of annotation information resides comprises:

determining a first frame of the video clip according to the first video frame;

and generating the video clip according to the head frame and the tail frame.

10. The video clip generation method of claim 1, wherein said generating at least one video clip from said first video according to said combination information comprises:

11. A video clip generation apparatus, comprising:

the video acquisition module is used for acquiring a first video;

12. An electronic device, comprising:

a memory for storing computer readable instructions; and

a processor for executing the computer-readable instructions, which when executed, cause the electronic device to implement the method of any of claims 1-10.

13. A non-transitory computer readable storage medium storing computer readable instructions which, when executed by a computer, cause the computer to perform the method of any one of claims 1-10.