CN113747238A

CN113747238A - Video clip intercepting method and system, computer equipment and readable storage medium

Info

Publication number: CN113747238A
Application number: CN202111031248.0A
Authority: CN
Inventors: 包英泽; 冯富森; 舒科; 卢景熙
Original assignee: Beijing Tiaoyue Intelligent Technology Co ltd
Current assignee: Beijing Tiaoyue Intelligent Technology Co ltd
Priority date: 2021-09-03
Filing date: 2021-09-03
Publication date: 2021-12-03

Abstract

The invention relates to a video clip intercepting method and system, computer equipment and a computer readable storage medium, which are designed by a brand new method, and are characterized in that firstly, a gesture detection model is applied to perform gesture detection on video frames to obtain each frame number to be processed containing a gesture, and further to obtain each continuous video frame group; then, updating each continuous video frame group in a mode of filling up missing frame serial numbers, namely obtaining each video clip; therefore, the gesture video clips are obtained, the whole design method is efficient and simple, each obtained gesture video clip has accurate time, preparation is conducted for synthesizing a long video, and the working efficiency of video capture is effectively improved.

Description

Video clip intercepting method and system, computer equipment and readable storage medium

Technical Field

The invention relates to a video clip intercepting method and system, computer equipment and a computer readable storage medium, and belongs to the technical field of video clips.

Background

There are many demands for video content generation in society today, and the existing method of composing video is to compose a plurality of short video segments (e.g., 0.5 seconds) into longer video content. For example, a series of short videos, such as anchor tedder, anchor sitting, anchor pan, etc., may be combined to compose an anchor talking video. However, in order to ensure the consistency and visual effect of the splicing, the accuracy of the short video time is required, so that a large number of short video segments at the accurate time are obtained, which is a very labor-consuming process. The invention provides a method for automatically intercepting short video clips from other long video materials to synthesize long videos again.

Disclosure of Invention

The invention aims to solve the technical problem of providing a video clip intercepting method and system, computer equipment and a computer readable storage medium, which automatically realize short video clip intercepting based on a gesture detection model, realize the acquisition of short video clips at accurate moments and prepare for synthesizing long videos.

The invention adopts the following technical scheme for solving the technical problems: the invention designs a video clip intercepting method for obtaining each video clip containing gesture actions in a target video, which comprises the following steps:

a, respectively aiming at each video frame in a target video, applying a gesture detection model to perform gesture detection on the video frame, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed; after finishing the gesture detection of all video frames in the target video, entering the step B;

b, based on the sequence of the frame numbers to be processed from small to large, forming each continuous video frame group by the frame numbers to be processed, and then entering the step C;

step C, respectively aiming at each continuous video frame group, according to the frame number of each video frame in the target video, supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame group, updating the continuous video frame group, further respectively updating each continuous video frame group, and then entering the step D;

d, respectively aiming at each continuous video frame group, sequentially forming a video clip by the video frames corresponding to each frame sequence number in the continuous video frame group, and taking the video clip as the video clip corresponding to the target video; further acquiring each video clip corresponding to the target video, and then entering the step E;

step E, counting the number of video frames respectively corresponding to each gesture type in the video clips respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips.

As a preferred technical scheme of the invention, the method also comprises the following steps of F, and after the step E is executed, the step F is executed;

and F, adjusting the sizes of the foreground characters in the video clips aiming at the video clips, enabling the sizes of the foreground characters in the video clips to be the same, and updating the video clips corresponding to the target video.

As a preferred embodiment of the present invention, the step F includes the following steps F1 to F2;

step F1, respectively aiming at each video frame in each video segment, obtaining a face frame surrounding a face area in the video frame, wherein if the scale distance of the face frame in the adjacent video frame is larger than a preset face scale threshold value, judging that the adjacent video frames need to be scaled, and then entering step F2;

step F2., for each video segment that needs to execute scale scaling, applying the scaling ratio between the size of the face frame in the initial video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment, and scaling each video frame in the video segment by the scaling ratio; and then, performing operation on each video clip needing to be scaled so that the sizes of foreground characters in the video clips are the same, and updating each video clip corresponding to the target video.

As a preferred technical scheme of the invention: in step F2, for each video frame in each video segment that needs to be scaled, applying a scaling ratio between the size of the face frame in the starting video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment according to steps F2-1 to F2-4, and scaling each video frame in the video segment by using the scaling ratio;

step F2-1, finely dividing the foreground, dividing the foreground to obtain a foreground person of the video frame to form a mask, and scaling the mask by applying cv2.resize () in opencv according to the scaling to obtain a scaled mask;

step F2-2, based on the alignment of the video frame foreground bottom edge midpoint and the mask bottom edge midpoint, multiplying the video frame foreground bottom edge midpoint position coordinate by the zoom ratio to obtain the mask bottom edge midpoint position coordinate, and obtaining the deviation between the two position coordinates as the position coordinate to be processed;

f2-3, if the scaling ratio is larger than 1, taking the coordinate of the position to be processed as the upper left corner point, cutting the mask to the size of the video frame, and updating the mask; if the scaling ratio is smaller than 1, a blank frame with the same size as the video frame is newly built, the position coordinate corresponding to the position to be processed on the blank frame is taken as the starting point, the mask is filled and reduced onto the blank frame, and the mask is updated;

and F2-4, overlapping and fusing the mask and the blank background according to the transparency, and finishing the scaling of the video frame.

As a preferred embodiment of the present invention, the step B includes the following steps B1 to B6:

step B1, based on the sequence of the frame numbers to be processed from small to large, initializing a parameter n to be 1, and entering step B2;

b2, obtaining a frame number difference value between the nth frame number and the (n + 1) th frame number, judging whether the frame number difference value is smaller than a preset frame number difference value threshold, if so, judging that the nth frame number to the (n + 1) th frame number form a continuous video frame group, and entering a step B3; otherwise, the non-continuous interval is determined between the nth frame sequence number and the (n + 1) th frame sequence number;

step B3, if n is 1, proceeding to step B6; if n >1, go to step B4;

step B4., determining whether there is a discontinuous interval in the upstream direction of the consecutive video frame group consisting of the nth frame number to the (n + 1) th frame number, if yes, entering step B5; otherwise, respectively aiming at each frame number from the 1 st frame number to the n-1 st frame number, forming a continuous video frame group by the frame numbers from the n +1 st frame number to further obtain each continuous video frame group, and then entering the step B5;

step B5., determining whether the frame number of the downstream direction corresponding to the upstream adjacent discontinuous interval of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number is the nth frame number, if yes, entering step B6; otherwise, aiming at the adjacent discontinuous interval in the upstream direction of the continuous video frame group consisting of the nth frame number to the (n + 1) th frame number, further respectively aiming at each frame number from the frame number to the (n-1) th frame number corresponding to the downstream direction of the discontinuous interval, forming a continuous video frame group by the frame number to the (n + 1) th frame number, further obtaining each continuous video frame group, and then entering the step B6;

step B6., judging whether n +1 is equal to the number of the frame number to be processed, if yes, obtaining each continuous video frame group corresponding to all the frame number to be processed; otherwise, the value of n +1 is assigned to n, and the procedure returns to step B2.

As a preferred technical scheme of the invention: the preset frame number difference threshold in step B2 is equal to 5.

As a preferred technical scheme of the invention: step BC is also included, after step B is executed, the step BC is entered, and after step BC is executed, the step C is entered;

step BC, aiming at each continuous video frame group, filling up the buffer video frames corresponding to the continuous video frame group according to the following conditions, updating the continuous video frame group, and then entering the step C;

if the first frame number in the continuous video frame group is 1, according to the frame number of each video frame in the target video, filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the adjacent preset buffer frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group;

in case 1, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to each frame number adjacent to the downstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated;

case 2. if the last frame number in the continuous video frame group is the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the preset buffer frame number adjacent to the first frame number in the continuous video frame group corresponding to the upstream direction in the target video, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group;

in case 2, if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is less than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated;

case 3. if the first frame number in the continuous video frame group is greater than 1 and the last frame number in the continuous video frame group is less than the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the first frame number in the continuous video frame group corresponding to the upstream direction in the target video and the preset buffer frame number adjacent to the first frame number, and filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the preset buffer frame number adjacent to the first frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video;

in case 3, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is smaller than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, namely, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.

Correspondingly, the technical problem to be solved by the invention is to provide a system of the video clip intercepting method, which can automatically realize short video clip intercepting based on a gesture detection model, realize the acquisition of short video clips at accurate time and prepare for synthesizing long video.

The invention adopts the following technical scheme for solving the technical problems: the invention designs a system of a video clip intercepting method, which comprises a gesture video frame detection module, a continuous video frame group generation module, a missing frame supplementing module, a video clip construction module, a video clip acquisition module and a video frame zooming module;

the gesture video frame detection module is used for respectively carrying out gesture detection on the video frames by applying a gesture detection model aiming at each video frame in the target video, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed;

the continuous video frame group generating module is used for forming each continuous video frame group by each frame serial number to be processed based on the sequence of each frame serial number to be processed from small to large;

the missing frame supplementing module is used for supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame groups according to the frame numbers of the video frames in the target video respectively aiming at each continuous video frame group, updating the continuous video frame groups and further updating each continuous video frame group respectively;

the video clip construction module is used for respectively aiming at each continuous video frame group, and video clips are formed by video frame sequences respectively corresponding to each frame sequence number in the continuous video frame groups and are used as video clips corresponding to the target video; further obtaining each video clip corresponding to the target video;

the video clip acquisition module is used for counting the number of video frames respectively corresponding to each gesture type in the video clip respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips;

and the video frame scaling module is used for adjusting the sizes of the foreground characters in the video clips aiming at the video clips, so that the sizes of the foreground characters in the video clips are the same, and the video clips corresponding to the target video are updated.

Accordingly, the present invention relates to a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to one of claims 1 to 7 are implemented when the computer program is executed by the processor.

Accordingly, the present invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program realizes the steps of the method according to one of claims 1 to 7 when executed by a processor. Compared with the prior art, the video clip intercepting method adopting the technical scheme has the following technical effects:

(1) the invention designs a video clip intercepting method, which adopts a novel method design, firstly, a gesture detection model is applied to carry out gesture detection on video frames to obtain each frame number to be processed containing a gesture, and further to obtain each continuous video frame group; then, updating each continuous video frame group in a mode of filling up missing frame serial numbers, namely obtaining each video clip; therefore, the gesture video clips are obtained, the whole design method is efficient and simple, each obtained gesture video clip has accurate time, preparation is conducted for synthesizing a long video, and the working efficiency of video capture is effectively improved.

Drawings

FIG. 1 is a flow chart of a video clip capture method according to the present invention;

FIG. 2 is an enlarged schematic view of the size of a face frame in a corresponding step F in an application embodiment of the video clip capture method of the present invention;

fig. 3 is a schematic diagram of the reduction of the size of the face frame in the corresponding step F in the embodiment of the video clip intercepting method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

The invention designs a video clip intercepting method, which is used for acquiring each video clip containing gesture actions in a target video, and in practical application, as shown in fig. 1, the following steps A to E are specifically executed.

A, respectively aiming at each video frame in a target video, performing gesture detection on the video frame by using a gesture detection model, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in 18 preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed; and C, after finishing the gesture detection of all the video frames in the target video, entering the step B.

And step B, based on the sequence of the frame numbers to be processed from small to large, executing the following steps B1 to B6, forming each continuous video frame group by the frame numbers to be processed, and then entering the step BC.

And B1, based on the sequence of the frame numbers to be processed from small to large, initializing the parameter n to be 1, and entering a step B2.

B2, obtaining a frame number difference value between the nth frame number and the (n + 1) th frame number, judging whether the frame number difference value is smaller than a preset frame number difference value threshold, if so, judging that the nth frame number to the (n + 1) th frame number form a continuous video frame group, and entering a step B3; otherwise, the non-continuous interval is determined between the nth frame sequence number and the (n + 1) th frame sequence number. In practical application, the preset frame number difference threshold is designed to be equal to 5, that is, the frame number difference between the nth frame number and the (n + 1) th frame number is obtained, and whether the frame number difference is less than 5 is judged.

Step B3, if n is 1, proceeding to step B6; if n >1, proceed to step B4.

Step B4., determining whether there is a discontinuous interval in the upstream direction of the consecutive video frame group consisting of the nth frame number to the (n + 1) th frame number, if yes, entering step B5; otherwise, a continuous video frame group is formed by the frame numbers from the 1 st frame number to the (n + 1) th frame number according to each frame number from the 1 st frame number to the (n-1) th frame number, so as to obtain each continuous video frame group, and then the step B5 is performed.

Step B5., determining whether the frame number of the downstream direction corresponding to the upstream adjacent discontinuous interval of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number is the nth frame number, if yes, entering step B6; otherwise, for the adjacent discontinuous interval in the upstream direction of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number, a continuous video frame group is formed by the frame number to the (n + 1) th frame number according to each frame number from the frame number to the (n-1) th frame number corresponding to the downstream direction of the discontinuous interval, so as to obtain each continuous video frame group, and then the step B6 is entered.

And step BC, respectively aiming at each continuous video frame group, filling up the buffer video frames corresponding to the continuous video frame group according to the following conditions, updating the continuous video frame group, and then entering the step C.

If the first frame number in the continuous video frame group is 1, according to the frame number of each video frame in the target video, filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the adjacent preset buffer frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; in case 1, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to each frame number adjacent to the downstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.

Case 2. if the last frame number in the continuous video frame group is the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the preset buffer frame number adjacent to the first frame number in the continuous video frame group corresponding to the upstream direction in the target video, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; in case 2, if the first frame number in the continuous video frame set corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is less than the preset buffer frame number, the first frame number in the continuous video frame set is supplemented with the frame numbers adjacent to the first frame number in the target video in the upstream direction, that is, the buffer video frame corresponding to the continuous video frame set is supplemented, and the continuous video frame set is updated.

Case 3. if the first frame number in the continuous video frame group is greater than 1 and the last frame number in the continuous video frame group is less than the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the first frame number in the continuous video frame group corresponding to the upstream direction in the target video and the preset buffer frame number adjacent to the first frame number, and filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the preset buffer frame number adjacent to the first frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; in case 3, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is smaller than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, namely, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.

The number of preset buffer frames involved in each of the above cases is specifically designed to be 2.

And C, respectively aiming at each continuous video frame group, according to the frame number of each video frame in the target video, supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame group, updating the continuous video frame group, further respectively updating each continuous video frame group, and then entering the step D.

D, respectively aiming at each continuous video frame group, sequentially forming a video clip by the video frames corresponding to each frame sequence number in the continuous video frame group, and taking the video clip as the video clip corresponding to the target video; and then each video clip corresponding to the target video is obtained, and then the step E is carried out.

Step E, counting the number of video frames respectively corresponding to each gesture type in the video clips respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and then acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips, and then entering the step F.

In practical applications, the step F specifically executes the following steps F1 to F2.

Step F1, respectively aiming at each video frame in each video segment, applying face _ detection to obtain a face frame surrounding a face area in the video frame, wherein if the scale distance of the face frame in the adjacent video frame is larger than a preset face scale threshold value, judging that the adjacent video frame needs to be scaled, and then entering step F2.

In practical applications, as shown in fig. 2, namely, the embodiment implements the face frame size enlargement in step F, and as shown in fig. 3, namely, the embodiment implements the face frame size reduction in step F.

Specifically, in the step F2, for each video frame in each video segment that needs to be scaled, the following steps F2-1 to F2-4 are further designed and executed to apply the scaling ratio between the size of the face frame in the starting video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment, and to scale each video frame in the video segment by using the scaling ratio.

And F2-1, finely dividing the foreground, dividing the foreground to obtain a foreground person of the video frame to form a mask, and scaling the mask by applying cv2.resize () in opencv to obtain the scaled mask.

And F2-2, based on the alignment of the center point of the foreground bottom edge of the video frame and the center point of the mask bottom edge, multiplying the position coordinate of the center point of the foreground bottom edge of the video frame by a scaling ratio to obtain the position coordinate of the center point of the mask bottom edge, and obtaining the deviation between the two position coordinates as the position coordinate to be processed.

F2-3, if the scaling ratio is larger than 1, taking the coordinate of the position to be processed as the upper left corner point, cutting the mask to the size of the video frame, and updating the mask; if the scaling ratio is smaller than 1, a blank frame with the same size as the video frame is newly built, the position coordinate corresponding to the position to be processed on the blank frame is taken as the starting point, the mask is filled and reduced onto the blank frame, and the mask is updated.

Correspondingly, in practical application, the system of the video clip intercepting method is specifically designed, and comprises a gesture video frame detection module, a continuous video frame group generation module, a missing frame supplementing module, a video clip construction module, a video clip acquisition module and a video frame zooming module.

The gesture video frame detection module is used for respectively carrying out gesture detection on the video frames by applying a gesture detection model aiming at each video frame in the target video, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; and if the number of the hands in the video frame is equal to 0, not processing the video frame.

And the continuous video frame group generation module is used for forming each continuous video frame group by the sequence numbers of the frames to be processed based on the sequence from small to large of the sequence numbers of the frames to be processed.

And the missing frame supplementing module is used for supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame groups according to the frame numbers of the video frames in the target video respectively aiming at each continuous video frame group, updating the continuous video frame groups and further updating each continuous video frame group respectively.

The video clip construction module is used for respectively aiming at each continuous video frame group, and video clips are formed by video frame sequences respectively corresponding to each frame sequence number in the continuous video frame groups and are used as video clips corresponding to the target video; and then each video clip corresponding to the target video is obtained.

The video clip acquisition module is used for counting the number of video frames respectively corresponding to each gesture type in the video clip respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips.

In practical implementation, the method is based on a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any one of claims 1 to 7 are implemented when the computer program is executed by the processor.

Accordingly, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to one of claims 1 to 7.

The video clip intercepting method designed by the technical scheme adopts a novel method design, firstly, a gesture detection model is applied to perform gesture detection on video frames to obtain each frame number to be processed containing a gesture, and further each continuous video frame group is obtained; then, updating each continuous video frame group in a mode of filling up missing frame serial numbers, namely obtaining each video clip; therefore, the gesture video clips are obtained, the whole design method is efficient and simple, each obtained gesture video clip has accurate time, preparation is conducted for synthesizing a long video, and the working efficiency of video capture is effectively improved.

It will be apparent to those skilled in the art that various modifications and variations can be made in the above embodiments of the present invention without departing from the spirit of the invention.

Claims

1. A video clip intercepting method is used for obtaining each video clip containing gesture actions in a target video, and is characterized by comprising the following steps:

2. The method of claim 1, further comprising the following steps, after the step E is executed, entering the step F;

3. The video clip intercepting method of claim 2, wherein the step F comprises the following steps F1 through F2;

4. A video clip intercepting method according to claim 3, wherein: in step F2, for each video frame in each video segment that needs to be scaled, applying a scaling ratio between the size of the face frame in the starting video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment according to steps F2-1 to F2-4, and scaling each video frame in the video segment by using the scaling ratio;

5. The method of claim 1, wherein the step B comprises the following steps B1 to B6:

step B3, if n is 1, proceeding to step B6; if n >1, go to step B4;

6. A video clip intercepting method according to claim 5, wherein: the preset frame number difference threshold in step B2 is equal to 5.

7. A method as claimed in claim 1, wherein: step BC is also included, after step B is executed, the step BC is entered, and after step BC is executed, the step C is entered;

8. A system for performing the video segment capturing method of any one of claims 1 to 7, characterized in that: the system comprises a gesture video frame detection module, a continuous video frame group generation module, a missing frame supplementing module, a video segment construction module, a video segment acquisition module and a video frame zooming module;

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.