CN113747238A - Video clip intercepting method and system, computer equipment and readable storage medium - Google Patents

Video clip intercepting method and system, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113747238A
CN113747238A CN202111031248.0A CN202111031248A CN113747238A CN 113747238 A CN113747238 A CN 113747238A CN 202111031248 A CN202111031248 A CN 202111031248A CN 113747238 A CN113747238 A CN 113747238A
Authority
CN
China
Prior art keywords
video
frame
continuous
frame number
video frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111031248.0A
Other languages
Chinese (zh)
Inventor
包英泽
冯富森
舒科
卢景熙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tiaoyue Intelligent Technology Co ltd
Original Assignee
Beijing Tiaoyue Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tiaoyue Intelligent Technology Co ltd filed Critical Beijing Tiaoyue Intelligent Technology Co ltd
Priority to CN202111031248.0A priority Critical patent/CN113747238A/en
Publication of CN113747238A publication Critical patent/CN113747238A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Abstract

The invention relates to a video clip intercepting method and system, computer equipment and a computer readable storage medium, which are designed by a brand new method, and are characterized in that firstly, a gesture detection model is applied to perform gesture detection on video frames to obtain each frame number to be processed containing a gesture, and further to obtain each continuous video frame group; then, updating each continuous video frame group in a mode of filling up missing frame serial numbers, namely obtaining each video clip; therefore, the gesture video clips are obtained, the whole design method is efficient and simple, each obtained gesture video clip has accurate time, preparation is conducted for synthesizing a long video, and the working efficiency of video capture is effectively improved.

Description

Video clip intercepting method and system, computer equipment and readable storage medium
Technical Field
The invention relates to a video clip intercepting method and system, computer equipment and a computer readable storage medium, and belongs to the technical field of video clips.
Background
There are many demands for video content generation in society today, and the existing method of composing video is to compose a plurality of short video segments (e.g., 0.5 seconds) into longer video content. For example, a series of short videos, such as anchor tedder, anchor sitting, anchor pan, etc., may be combined to compose an anchor talking video. However, in order to ensure the consistency and visual effect of the splicing, the accuracy of the short video time is required, so that a large number of short video segments at the accurate time are obtained, which is a very labor-consuming process. The invention provides a method for automatically intercepting short video clips from other long video materials to synthesize long videos again.
Disclosure of Invention
The invention aims to solve the technical problem of providing a video clip intercepting method and system, computer equipment and a computer readable storage medium, which automatically realize short video clip intercepting based on a gesture detection model, realize the acquisition of short video clips at accurate moments and prepare for synthesizing long videos.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a video clip intercepting method for obtaining each video clip containing gesture actions in a target video, which comprises the following steps:
a, respectively aiming at each video frame in a target video, applying a gesture detection model to perform gesture detection on the video frame, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed; after finishing the gesture detection of all video frames in the target video, entering the step B;
b, based on the sequence of the frame numbers to be processed from small to large, forming each continuous video frame group by the frame numbers to be processed, and then entering the step C;
step C, respectively aiming at each continuous video frame group, according to the frame number of each video frame in the target video, supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame group, updating the continuous video frame group, further respectively updating each continuous video frame group, and then entering the step D;
d, respectively aiming at each continuous video frame group, sequentially forming a video clip by the video frames corresponding to each frame sequence number in the continuous video frame group, and taking the video clip as the video clip corresponding to the target video; further acquiring each video clip corresponding to the target video, and then entering the step E;
step E, counting the number of video frames respectively corresponding to each gesture type in the video clips respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips.
As a preferred technical scheme of the invention, the method also comprises the following steps of F, and after the step E is executed, the step F is executed;
and F, adjusting the sizes of the foreground characters in the video clips aiming at the video clips, enabling the sizes of the foreground characters in the video clips to be the same, and updating the video clips corresponding to the target video.
As a preferred embodiment of the present invention, the step F includes the following steps F1 to F2;
step F1, respectively aiming at each video frame in each video segment, obtaining a face frame surrounding a face area in the video frame, wherein if the scale distance of the face frame in the adjacent video frame is larger than a preset face scale threshold value, judging that the adjacent video frames need to be scaled, and then entering step F2;
step F2., for each video segment that needs to execute scale scaling, applying the scaling ratio between the size of the face frame in the initial video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment, and scaling each video frame in the video segment by the scaling ratio; and then, performing operation on each video clip needing to be scaled so that the sizes of foreground characters in the video clips are the same, and updating each video clip corresponding to the target video.
As a preferred technical scheme of the invention: in step F2, for each video frame in each video segment that needs to be scaled, applying a scaling ratio between the size of the face frame in the starting video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment according to steps F2-1 to F2-4, and scaling each video frame in the video segment by using the scaling ratio;
step F2-1, finely dividing the foreground, dividing the foreground to obtain a foreground person of the video frame to form a mask, and scaling the mask by applying cv2.resize () in opencv according to the scaling to obtain a scaled mask;
step F2-2, based on the alignment of the video frame foreground bottom edge midpoint and the mask bottom edge midpoint, multiplying the video frame foreground bottom edge midpoint position coordinate by the zoom ratio to obtain the mask bottom edge midpoint position coordinate, and obtaining the deviation between the two position coordinates as the position coordinate to be processed;
f2-3, if the scaling ratio is larger than 1, taking the coordinate of the position to be processed as the upper left corner point, cutting the mask to the size of the video frame, and updating the mask; if the scaling ratio is smaller than 1, a blank frame with the same size as the video frame is newly built, the position coordinate corresponding to the position to be processed on the blank frame is taken as the starting point, the mask is filled and reduced onto the blank frame, and the mask is updated;
and F2-4, overlapping and fusing the mask and the blank background according to the transparency, and finishing the scaling of the video frame.
As a preferred embodiment of the present invention, the step B includes the following steps B1 to B6:
step B1, based on the sequence of the frame numbers to be processed from small to large, initializing a parameter n to be 1, and entering step B2;
b2, obtaining a frame number difference value between the nth frame number and the (n + 1) th frame number, judging whether the frame number difference value is smaller than a preset frame number difference value threshold, if so, judging that the nth frame number to the (n + 1) th frame number form a continuous video frame group, and entering a step B3; otherwise, the non-continuous interval is determined between the nth frame sequence number and the (n + 1) th frame sequence number;
step B3, if n is 1, proceeding to step B6; if n >1, go to step B4;
step B4., determining whether there is a discontinuous interval in the upstream direction of the consecutive video frame group consisting of the nth frame number to the (n + 1) th frame number, if yes, entering step B5; otherwise, respectively aiming at each frame number from the 1 st frame number to the n-1 st frame number, forming a continuous video frame group by the frame numbers from the n +1 st frame number to further obtain each continuous video frame group, and then entering the step B5;
step B5., determining whether the frame number of the downstream direction corresponding to the upstream adjacent discontinuous interval of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number is the nth frame number, if yes, entering step B6; otherwise, aiming at the adjacent discontinuous interval in the upstream direction of the continuous video frame group consisting of the nth frame number to the (n + 1) th frame number, further respectively aiming at each frame number from the frame number to the (n-1) th frame number corresponding to the downstream direction of the discontinuous interval, forming a continuous video frame group by the frame number to the (n + 1) th frame number, further obtaining each continuous video frame group, and then entering the step B6;
step B6., judging whether n +1 is equal to the number of the frame number to be processed, if yes, obtaining each continuous video frame group corresponding to all the frame number to be processed; otherwise, the value of n +1 is assigned to n, and the procedure returns to step B2.
As a preferred technical scheme of the invention: the preset frame number difference threshold in step B2 is equal to 5.
As a preferred technical scheme of the invention: step BC is also included, after step B is executed, the step BC is entered, and after step BC is executed, the step C is entered;
step BC, aiming at each continuous video frame group, filling up the buffer video frames corresponding to the continuous video frame group according to the following conditions, updating the continuous video frame group, and then entering the step C;
if the first frame number in the continuous video frame group is 1, according to the frame number of each video frame in the target video, filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the adjacent preset buffer frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group;
in case 1, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to each frame number adjacent to the downstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated;
case 2. if the last frame number in the continuous video frame group is the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the preset buffer frame number adjacent to the first frame number in the continuous video frame group corresponding to the upstream direction in the target video, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group;
in case 2, if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is less than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated;
case 3. if the first frame number in the continuous video frame group is greater than 1 and the last frame number in the continuous video frame group is less than the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the first frame number in the continuous video frame group corresponding to the upstream direction in the target video and the preset buffer frame number adjacent to the first frame number, and filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the preset buffer frame number adjacent to the first frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video;
in case 3, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is smaller than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, namely, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.
Correspondingly, the technical problem to be solved by the invention is to provide a system of the video clip intercepting method, which can automatically realize short video clip intercepting based on a gesture detection model, realize the acquisition of short video clips at accurate time and prepare for synthesizing long video.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a system of a video clip intercepting method, which comprises a gesture video frame detection module, a continuous video frame group generation module, a missing frame supplementing module, a video clip construction module, a video clip acquisition module and a video frame zooming module;
the gesture video frame detection module is used for respectively carrying out gesture detection on the video frames by applying a gesture detection model aiming at each video frame in the target video, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed;
the continuous video frame group generating module is used for forming each continuous video frame group by each frame serial number to be processed based on the sequence of each frame serial number to be processed from small to large;
the missing frame supplementing module is used for supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame groups according to the frame numbers of the video frames in the target video respectively aiming at each continuous video frame group, updating the continuous video frame groups and further updating each continuous video frame group respectively;
the video clip construction module is used for respectively aiming at each continuous video frame group, and video clips are formed by video frame sequences respectively corresponding to each frame sequence number in the continuous video frame groups and are used as video clips corresponding to the target video; further obtaining each video clip corresponding to the target video;
the video clip acquisition module is used for counting the number of video frames respectively corresponding to each gesture type in the video clip respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips;
and the video frame scaling module is used for adjusting the sizes of the foreground characters in the video clips aiming at the video clips, so that the sizes of the foreground characters in the video clips are the same, and the video clips corresponding to the target video are updated.
Accordingly, the present invention relates to a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to one of claims 1 to 7 are implemented when the computer program is executed by the processor.
Accordingly, the present invention relates to a computer-readable storage medium, on which a computer program is stored, characterized in that the computer program realizes the steps of the method according to one of claims 1 to 7 when executed by a processor. Compared with the prior art, the video clip intercepting method adopting the technical scheme has the following technical effects:
(1) the invention designs a video clip intercepting method, which adopts a novel method design, firstly, a gesture detection model is applied to carry out gesture detection on video frames to obtain each frame number to be processed containing a gesture, and further to obtain each continuous video frame group; then, updating each continuous video frame group in a mode of filling up missing frame serial numbers, namely obtaining each video clip; therefore, the gesture video clips are obtained, the whole design method is efficient and simple, each obtained gesture video clip has accurate time, preparation is conducted for synthesizing a long video, and the working efficiency of video capture is effectively improved.
Drawings
FIG. 1 is a flow chart of a video clip capture method according to the present invention;
FIG. 2 is an enlarged schematic view of the size of a face frame in a corresponding step F in an application embodiment of the video clip capture method of the present invention;
fig. 3 is a schematic diagram of the reduction of the size of the face frame in the corresponding step F in the embodiment of the video clip intercepting method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
The invention designs a video clip intercepting method, which is used for acquiring each video clip containing gesture actions in a target video, and in practical application, as shown in fig. 1, the following steps A to E are specifically executed.
A, respectively aiming at each video frame in a target video, performing gesture detection on the video frame by using a gesture detection model, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in 18 preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed; and C, after finishing the gesture detection of all the video frames in the target video, entering the step B.
And step B, based on the sequence of the frame numbers to be processed from small to large, executing the following steps B1 to B6, forming each continuous video frame group by the frame numbers to be processed, and then entering the step BC.
And B1, based on the sequence of the frame numbers to be processed from small to large, initializing the parameter n to be 1, and entering a step B2.
B2, obtaining a frame number difference value between the nth frame number and the (n + 1) th frame number, judging whether the frame number difference value is smaller than a preset frame number difference value threshold, if so, judging that the nth frame number to the (n + 1) th frame number form a continuous video frame group, and entering a step B3; otherwise, the non-continuous interval is determined between the nth frame sequence number and the (n + 1) th frame sequence number. In practical application, the preset frame number difference threshold is designed to be equal to 5, that is, the frame number difference between the nth frame number and the (n + 1) th frame number is obtained, and whether the frame number difference is less than 5 is judged.
Step B3, if n is 1, proceeding to step B6; if n >1, proceed to step B4.
Step B4., determining whether there is a discontinuous interval in the upstream direction of the consecutive video frame group consisting of the nth frame number to the (n + 1) th frame number, if yes, entering step B5; otherwise, a continuous video frame group is formed by the frame numbers from the 1 st frame number to the (n + 1) th frame number according to each frame number from the 1 st frame number to the (n-1) th frame number, so as to obtain each continuous video frame group, and then the step B5 is performed.
Step B5., determining whether the frame number of the downstream direction corresponding to the upstream adjacent discontinuous interval of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number is the nth frame number, if yes, entering step B6; otherwise, for the adjacent discontinuous interval in the upstream direction of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number, a continuous video frame group is formed by the frame number to the (n + 1) th frame number according to each frame number from the frame number to the (n-1) th frame number corresponding to the downstream direction of the discontinuous interval, so as to obtain each continuous video frame group, and then the step B6 is entered.
Step B6., judging whether n +1 is equal to the number of the frame number to be processed, if yes, obtaining each continuous video frame group corresponding to all the frame number to be processed; otherwise, the value of n +1 is assigned to n, and the procedure returns to step B2.
And step BC, respectively aiming at each continuous video frame group, filling up the buffer video frames corresponding to the continuous video frame group according to the following conditions, updating the continuous video frame group, and then entering the step C.
If the first frame number in the continuous video frame group is 1, according to the frame number of each video frame in the target video, filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the adjacent preset buffer frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; in case 1, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to each frame number adjacent to the downstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.
Case 2. if the last frame number in the continuous video frame group is the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the preset buffer frame number adjacent to the first frame number in the continuous video frame group corresponding to the upstream direction in the target video, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; in case 2, if the first frame number in the continuous video frame set corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is less than the preset buffer frame number, the first frame number in the continuous video frame set is supplemented with the frame numbers adjacent to the first frame number in the target video in the upstream direction, that is, the buffer video frame corresponding to the continuous video frame set is supplemented, and the continuous video frame set is updated.
Case 3. if the first frame number in the continuous video frame group is greater than 1 and the last frame number in the continuous video frame group is less than the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the first frame number in the continuous video frame group corresponding to the upstream direction in the target video and the preset buffer frame number adjacent to the first frame number, and filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the preset buffer frame number adjacent to the first frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; in case 3, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is smaller than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, namely, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.
The number of preset buffer frames involved in each of the above cases is specifically designed to be 2.
And C, respectively aiming at each continuous video frame group, according to the frame number of each video frame in the target video, supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame group, updating the continuous video frame group, further respectively updating each continuous video frame group, and then entering the step D.
D, respectively aiming at each continuous video frame group, sequentially forming a video clip by the video frames corresponding to each frame sequence number in the continuous video frame group, and taking the video clip as the video clip corresponding to the target video; and then each video clip corresponding to the target video is obtained, and then the step E is carried out.
Step E, counting the number of video frames respectively corresponding to each gesture type in the video clips respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and then acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips, and then entering the step F.
And F, adjusting the sizes of the foreground characters in the video clips aiming at the video clips, enabling the sizes of the foreground characters in the video clips to be the same, and updating the video clips corresponding to the target video.
In practical applications, the step F specifically executes the following steps F1 to F2.
Step F1, respectively aiming at each video frame in each video segment, applying face _ detection to obtain a face frame surrounding a face area in the video frame, wherein if the scale distance of the face frame in the adjacent video frame is larger than a preset face scale threshold value, judging that the adjacent video frame needs to be scaled, and then entering step F2.
Step F2., for each video segment that needs to execute scale scaling, applying the scaling ratio between the size of the face frame in the initial video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment, and scaling each video frame in the video segment by the scaling ratio; and then, performing operation on each video clip needing to be scaled so that the sizes of foreground characters in the video clips are the same, and updating each video clip corresponding to the target video.
In practical applications, as shown in fig. 2, namely, the embodiment implements the face frame size enlargement in step F, and as shown in fig. 3, namely, the embodiment implements the face frame size reduction in step F.
Specifically, in the step F2, for each video frame in each video segment that needs to be scaled, the following steps F2-1 to F2-4 are further designed and executed to apply the scaling ratio between the size of the face frame in the starting video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment, and to scale each video frame in the video segment by using the scaling ratio.
And F2-1, finely dividing the foreground, dividing the foreground to obtain a foreground person of the video frame to form a mask, and scaling the mask by applying cv2.resize () in opencv to obtain the scaled mask.
And F2-2, based on the alignment of the center point of the foreground bottom edge of the video frame and the center point of the mask bottom edge, multiplying the position coordinate of the center point of the foreground bottom edge of the video frame by a scaling ratio to obtain the position coordinate of the center point of the mask bottom edge, and obtaining the deviation between the two position coordinates as the position coordinate to be processed.
F2-3, if the scaling ratio is larger than 1, taking the coordinate of the position to be processed as the upper left corner point, cutting the mask to the size of the video frame, and updating the mask; if the scaling ratio is smaller than 1, a blank frame with the same size as the video frame is newly built, the position coordinate corresponding to the position to be processed on the blank frame is taken as the starting point, the mask is filled and reduced onto the blank frame, and the mask is updated.
And F2-4, overlapping and fusing the mask and the blank background according to the transparency, and finishing the scaling of the video frame.
Correspondingly, in practical application, the system of the video clip intercepting method is specifically designed, and comprises a gesture video frame detection module, a continuous video frame group generation module, a missing frame supplementing module, a video clip construction module, a video clip acquisition module and a video frame zooming module.
The gesture video frame detection module is used for respectively carrying out gesture detection on the video frames by applying a gesture detection model aiming at each video frame in the target video, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; and if the number of the hands in the video frame is equal to 0, not processing the video frame.
And the continuous video frame group generation module is used for forming each continuous video frame group by the sequence numbers of the frames to be processed based on the sequence from small to large of the sequence numbers of the frames to be processed.
And the missing frame supplementing module is used for supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame groups according to the frame numbers of the video frames in the target video respectively aiming at each continuous video frame group, updating the continuous video frame groups and further updating each continuous video frame group respectively.
The video clip construction module is used for respectively aiming at each continuous video frame group, and video clips are formed by video frame sequences respectively corresponding to each frame sequence number in the continuous video frame groups and are used as video clips corresponding to the target video; and then each video clip corresponding to the target video is obtained.
The video clip acquisition module is used for counting the number of video frames respectively corresponding to each gesture type in the video clip respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips.
And the video frame scaling module is used for adjusting the sizes of the foreground characters in the video clips aiming at the video clips, so that the sizes of the foreground characters in the video clips are the same, and the video clips corresponding to the target video are updated.
In practical implementation, the method is based on a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any one of claims 1 to 7 are implemented when the computer program is executed by the processor.
Accordingly, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to one of claims 1 to 7.
The video clip intercepting method designed by the technical scheme adopts a novel method design, firstly, a gesture detection model is applied to perform gesture detection on video frames to obtain each frame number to be processed containing a gesture, and further each continuous video frame group is obtained; then, updating each continuous video frame group in a mode of filling up missing frame serial numbers, namely obtaining each video clip; therefore, the gesture video clips are obtained, the whole design method is efficient and simple, each obtained gesture video clip has accurate time, preparation is conducted for synthesizing a long video, and the working efficiency of video capture is effectively improved.
It will be apparent to those skilled in the art that various modifications and variations can be made in the above embodiments of the present invention without departing from the spirit of the invention.

Claims (10)

1. A video clip intercepting method is used for obtaining each video clip containing gesture actions in a target video, and is characterized by comprising the following steps:
a, respectively aiming at each video frame in a target video, applying a gesture detection model to perform gesture detection on the video frame, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed; after finishing the gesture detection of all video frames in the target video, entering the step B;
b, based on the sequence of the frame numbers to be processed from small to large, forming each continuous video frame group by the frame numbers to be processed, and then entering the step C;
step C, respectively aiming at each continuous video frame group, according to the frame number of each video frame in the target video, supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame group, updating the continuous video frame group, further respectively updating each continuous video frame group, and then entering the step D;
d, respectively aiming at each continuous video frame group, sequentially forming a video clip by the video frames corresponding to each frame sequence number in the continuous video frame group, and taking the video clip as the video clip corresponding to the target video; further acquiring each video clip corresponding to the target video, and then entering the step E;
step E, counting the number of video frames respectively corresponding to each gesture type in the video clips respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; and further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips.
2. The method of claim 1, further comprising the following steps, after the step E is executed, entering the step F;
and F, adjusting the sizes of the foreground characters in the video clips aiming at the video clips, enabling the sizes of the foreground characters in the video clips to be the same, and updating the video clips corresponding to the target video.
3. The video clip intercepting method of claim 2, wherein the step F comprises the following steps F1 through F2;
step F1, respectively aiming at each video frame in each video segment, obtaining a face frame surrounding a face area in the video frame, wherein if the scale distance of the face frame in the adjacent video frame is larger than a preset face scale threshold value, judging that the adjacent video frames need to be scaled, and then entering step F2;
step F2., for each video segment that needs to execute scale scaling, applying the scaling ratio between the size of the face frame in the initial video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment, and scaling each video frame in the video segment by the scaling ratio; and then, performing operation on each video clip needing to be scaled so that the sizes of foreground characters in the video clips are the same, and updating each video clip corresponding to the target video.
4. A video clip intercepting method according to claim 3, wherein: in step F2, for each video frame in each video segment that needs to be scaled, applying a scaling ratio between the size of the face frame in the starting video frame in the video segment and the size of the face frame in the last video frame in the previous adjacent video segment according to steps F2-1 to F2-4, and scaling each video frame in the video segment by using the scaling ratio;
step F2-1, finely dividing the foreground, dividing the foreground to obtain a foreground person of the video frame to form a mask, and scaling the mask by applying cv2.resize () in opencv according to the scaling to obtain a scaled mask;
step F2-2, based on the alignment of the video frame foreground bottom edge midpoint and the mask bottom edge midpoint, multiplying the video frame foreground bottom edge midpoint position coordinate by the zoom ratio to obtain the mask bottom edge midpoint position coordinate, and obtaining the deviation between the two position coordinates as the position coordinate to be processed;
f2-3, if the scaling ratio is larger than 1, taking the coordinate of the position to be processed as the upper left corner point, cutting the mask to the size of the video frame, and updating the mask; if the scaling ratio is smaller than 1, a blank frame with the same size as the video frame is newly built, the position coordinate corresponding to the position to be processed on the blank frame is taken as the starting point, the mask is filled and reduced onto the blank frame, and the mask is updated;
and F2-4, overlapping and fusing the mask and the blank background according to the transparency, and finishing the scaling of the video frame.
5. The method of claim 1, wherein the step B comprises the following steps B1 to B6:
step B1, based on the sequence of the frame numbers to be processed from small to large, initializing a parameter n to be 1, and entering step B2;
b2, obtaining a frame number difference value between the nth frame number and the (n + 1) th frame number, judging whether the frame number difference value is smaller than a preset frame number difference value threshold, if so, judging that the nth frame number to the (n + 1) th frame number form a continuous video frame group, and entering a step B3; otherwise, the non-continuous interval is determined between the nth frame sequence number and the (n + 1) th frame sequence number;
step B3, if n is 1, proceeding to step B6; if n >1, go to step B4;
step B4., determining whether there is a discontinuous interval in the upstream direction of the consecutive video frame group consisting of the nth frame number to the (n + 1) th frame number, if yes, entering step B5; otherwise, respectively aiming at each frame number from the 1 st frame number to the n-1 st frame number, forming a continuous video frame group by the frame numbers from the n +1 st frame number to further obtain each continuous video frame group, and then entering the step B5;
step B5., determining whether the frame number of the downstream direction corresponding to the upstream adjacent discontinuous interval of the continuous video frame group composed of the nth frame number to the (n + 1) th frame number is the nth frame number, if yes, entering step B6; otherwise, aiming at the adjacent discontinuous interval in the upstream direction of the continuous video frame group consisting of the nth frame number to the (n + 1) th frame number, further respectively aiming at each frame number from the frame number to the (n-1) th frame number corresponding to the downstream direction of the discontinuous interval, forming a continuous video frame group by the frame number to the (n + 1) th frame number, further obtaining each continuous video frame group, and then entering the step B6;
step B6., judging whether n +1 is equal to the number of the frame number to be processed, if yes, obtaining each continuous video frame group corresponding to all the frame number to be processed; otherwise, the value of n +1 is assigned to n, and the procedure returns to step B2.
6. A video clip intercepting method according to claim 5, wherein: the preset frame number difference threshold in step B2 is equal to 5.
7. A method as claimed in claim 1, wherein: step BC is also included, after step B is executed, the step BC is entered, and after step BC is executed, the step C is entered;
step BC, aiming at each continuous video frame group, filling up the buffer video frames corresponding to the continuous video frame group according to the following conditions, updating the continuous video frame group, and then entering the step C;
if the first frame number in the continuous video frame group is 1, according to the frame number of each video frame in the target video, filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the adjacent preset buffer frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group;
in case 1, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to each frame number adjacent to the downstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated;
case 2. if the last frame number in the continuous video frame group is the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the preset buffer frame number adjacent to the first frame number in the continuous video frame group corresponding to the upstream direction in the target video, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group;
in case 2, if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is less than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, that is, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated;
case 3. if the first frame number in the continuous video frame group is greater than 1 and the last frame number in the continuous video frame group is less than the maximum frame number of each video frame in the target video, according to the frame number of each video frame in the target video, filling each frame number of the first frame number in the continuous video frame group corresponding to the upstream direction in the target video and the preset buffer frame number adjacent to the first frame number, and filling each frame number of the last frame number in the continuous video frame group corresponding to the downstream direction in the target video and the preset buffer frame number adjacent to the first frame number, namely filling the buffer video frame corresponding to the continuous video frame group, and updating the continuous video frame group; if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video;
in case 3, if the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and the number of the frame numbers adjacent to the last frame number is less than the preset buffer frame number, the last frame number in the continuous video frame group corresponds to the downstream direction in the target video and each frame number adjacent to the downstream direction in the target video; if the first frame number in the continuous video frame group corresponds to the upstream direction in the target video and the number of the frame numbers adjacent to the first frame number is smaller than the preset buffer frame number, the first frame number in the continuous video frame group corresponds to each frame number adjacent to the upstream direction in the target video, namely, the buffer video frame corresponding to the continuous video frame group is supplemented, and the continuous video frame group is updated.
8. A system for performing the video segment capturing method of any one of claims 1 to 7, characterized in that: the system comprises a gesture video frame detection module, a continuous video frame group generation module, a missing frame supplementing module, a video segment construction module, a video segment acquisition module and a video frame zooming module;
the gesture video frame detection module is used for respectively carrying out gesture detection on the video frames by applying a gesture detection model aiming at each video frame in the target video, if the number of hands in the video frame is more than 0, detecting and obtaining a gesture type in preset gesture types corresponding to a gesture in the video frame, and obtaining a frame number of the video frame as a frame number to be processed; if the number of hands in the video frame is equal to 0, the video frame is not processed;
the continuous video frame group generating module is used for forming each continuous video frame group by each frame serial number to be processed based on the sequence of each frame serial number to be processed from small to large;
the missing frame supplementing module is used for supplementing missing frame numbers among the frame numbers to be processed in the continuous video frame groups according to the frame numbers of the video frames in the target video respectively aiming at each continuous video frame group, updating the continuous video frame groups and further updating each continuous video frame group respectively;
the video clip construction module is used for respectively aiming at each continuous video frame group, and video clips are formed by video frame sequences respectively corresponding to each frame sequence number in the continuous video frame groups and are used as video clips corresponding to the target video; further obtaining each video clip corresponding to the target video;
the video clip acquisition module is used for counting the number of video frames respectively corresponding to each gesture type in the video clip respectively aiming at each video clip, and selecting the gesture type corresponding to the most video frames as the gesture type corresponding to the video clip; further acquiring gesture types corresponding to the video clips, namely acquiring the video clips corresponding to the target video and the gesture types corresponding to the video clips;
and the video frame scaling module is used for adjusting the sizes of the foreground characters in the video clips aiming at the video clips, so that the sizes of the foreground characters in the video clips are the same, and the video clips corresponding to the target video are updated.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202111031248.0A 2021-09-03 2021-09-03 Video clip intercepting method and system, computer equipment and readable storage medium Pending CN113747238A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111031248.0A CN113747238A (en) 2021-09-03 2021-09-03 Video clip intercepting method and system, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111031248.0A CN113747238A (en) 2021-09-03 2021-09-03 Video clip intercepting method and system, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113747238A true CN113747238A (en) 2021-12-03

Family

ID=78735421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111031248.0A Pending CN113747238A (en) 2021-09-03 2021-09-03 Video clip intercepting method and system, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113747238A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140347263A1 (en) * 2013-05-23 2014-11-27 Fastvdo Llc Motion-Assisted Visual Language For Human Computer Interfaces
CN110650368A (en) * 2019-09-25 2020-01-03 新东方教育科技集团有限公司 Video processing method and device and electronic equipment
CN113301385A (en) * 2021-05-21 2021-08-24 北京大米科技有限公司 Video data processing method and device, electronic equipment and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140347263A1 (en) * 2013-05-23 2014-11-27 Fastvdo Llc Motion-Assisted Visual Language For Human Computer Interfaces
US20180088679A1 (en) * 2013-05-23 2018-03-29 Fastvdo Llc Motion-Assisted Visual Language for Human Computer Interfaces
CN110650368A (en) * 2019-09-25 2020-01-03 新东方教育科技集团有限公司 Video processing method and device and electronic equipment
CN113301385A (en) * 2021-05-21 2021-08-24 北京大米科技有限公司 Video data processing method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US11281925B2 (en) Method and terminal for recognizing object node in image, and computer-readable storage medium
RU2613038C2 (en) Method for controlling terminal device with use of gesture, and device
US9639914B2 (en) Portrait deformation method and apparatus
US10929648B2 (en) Apparatus and method for data processing
US10656811B2 (en) Animation of user interface elements
US20180204052A1 (en) A method and apparatus for human face image processing
CN107239216A (en) Drawing modification method and apparatus based on touch-screen
CN107204044B (en) Picture display method based on virtual reality and related equipment
CN104035664A (en) Display method and device for user interface of application program
CN111669502B (en) Target object display method and device and electronic equipment
WO2023035531A1 (en) Super-resolution reconstruction method for text image and related device thereof
WO2020228353A1 (en) Motion acceleration-based image search method, system, and electronic device
CN105427358B (en) View animation generation method and system based on android
CN104142807A (en) Android-control-based method and system for drawing image through OpenGL
WO2018223724A1 (en) Resource distribution-based map zooming method and system, memory and control apparatus
CN113657396B (en) Training method, translation display method, device, electronic equipment and storage medium
CN115471404A (en) Image scaling method, processing device and storage medium
CN114202648A (en) Text image correction method, training method, device, electronic device and medium
CN114063858A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111158840B (en) Image carousel method and device
CN113747238A (en) Video clip intercepting method and system, computer equipment and readable storage medium
CN113469148A (en) Text erasing method, model training method, device and storage medium
CN108520532B (en) Method and device for identifying motion direction of object in video
CN102131078B (en) Video image correcting method and system
CN115578495A (en) Special effect image drawing method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination